Scripts to remap the Genome in a Bottle NA12878 validation variant calls to build 38 (GRCh38/hg38) of the human genome.
These convert the VCF calls and assessment region BED files from build 37 to build 38 coordinates using remapping. We take multiple remapping approaches for testing purposes:
-
Remapped truth sets and validation files
-
Genome in a Bottle regions for GRCh37 that map to build 38: GiaB_v2_19-37_prep_regions.bed
-
Crossmap hg38 liftover with UCSC chain files, regions and VCF file: GiaB_v2_19-38_crossmap-regions.bed, GiaB_v2_19-38_crossmap.vcf.gz
-
NCBI remap hg38 regions and VCF file: GiaB_v2_19-38_remap-regions.bed, GiaB_v2_19-38_remap.vcf.gz
-
Validation VCFs and statistics using rtg vcfeval: giab-hg38-validation-results.tar.gz
-
Indels that are different but overlapping between Platinum Genomes and Genome in a Bottle: giab-platinum-indel-diffs.tar.gz
-
Download the inputs with:
cd inputs && bash get_inputs.sh
Run the remapping with:
bash run.sh
This depends on external tools to do the actual work:
- Python with pyfaidx
- Perl with XML::XPath
- CrossMap -- can be installed with conda
- GNU parallel
- bedtools
- bcftools
- vt
- GATK MIT licensed scripts
- vcf2bed.py from vcflib
The easiest way to install the Python dependencies is with Miniconda. Then do:
conda install -c bcbio crossmap pyfaidx