Skip to content

Rust command line utility to merge "genotyped" VCF, what have the exact same positions/ref/alt

Notifications You must be signed in to change notification settings

magnusmanske/merge_gt_vcf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

merge_gt_vcf

A tool to merge VCF files that have the exact same number of (data) rows, chromosomes, positions, and ref/alt alleles. It takes a list of "genotying" VCF files from STDIN and outputs plain-text VCF, consisting of the first VCF file in the list, with sample columns from all the other VCF files appended.

Caution: This only works if all VCF files have the exact same CHROM, POS, ID, REF, and ALT columns. You can enforce these checks using the --check option, at a slight performance cost. The command will fail if the number of data rows is not exactly the same in all files.

Installation

  1. Install Rust.
  2. git clone https://github.com/magnusmanske/merge_gt_vcf.git
  3. RUSTFLAGS="-C target-cpu=native" cargo build --release

Usage

# Get help
merge_gt_vcf --help

# For plain-text VCF:
merge_gt_vcf < FILE_WITH_VCF_FILENAMES > OUTPUT.vcf

# For bgzipped VCF:
merge_gt_vcf --bgzip < FILE_WITH_VCF_FILENAMES > OUTPUT.vcf.gz
# or, if samtools is installed:
merge_gt_vcf < FILE_WITH_VCF_FILENAMES | bgzip > OUTPUT.vcf.gz

# For plain-text VCF with CROM/POS/ID/REF/ALT checks:
merge_gt_vcf --check < FILE_WITH_VCF_FILENAMES > OUTPUT.vcf

# For plain-text VCF with few VCF input files and/or few CPU cores:
merge_gt_vcf --serial < FILE_WITH_VCF_FILENAMES > OUTPUT.vcf

About

Rust command line utility to merge "genotyped" VCF, what have the exact same positions/ref/alt

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published