Remap the Genome in a Bottle NA12878 validation variant calls to human genome build 38
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
inputs
scripts
.gitignore
README.md
find_problem_regions.sh
run.sh
run_dream.sh
run_sv.sh
run_validation.sh

README.md

Genome in a Bottle NA12878 Human Genome 38 remapped validation set

Scripts to remap the Genome in a Bottle NA12878 validation variant calls to build 38 (GRCh38/hg38) of the human genome.

These convert the VCF calls and assessment region BED files from build 37 to build 38 coordinates using remapping. We take multiple remapping approaches for testing purposes:

Results

  • Validation results

  • Remapped truth sets and validation files

    • Genome in a Bottle regions for GRCh37 that map to build 38: GiaB_v2_19-37_prep_regions.bed

    • Crossmap hg38 liftover with UCSC chain files, regions and VCF file: GiaB_v2_19-38_crossmap-regions.bed, GiaB_v2_19-38_crossmap.vcf.gz

    • NCBI remap hg38 regions and VCF file: GiaB_v2_19-38_remap-regions.bed, GiaB_v2_19-38_remap.vcf.gz

    • Validation VCFs and statistics using rtg vcfeval: giab-hg38-validation-results.tar.gz

    • Indels that are different but overlapping between Platinum Genomes and Genome in a Bottle: giab-platinum-indel-diffs.tar.gz

Usage

Download the inputs with:

cd inputs && bash get_inputs.sh

Run the remapping with:

bash run.sh

Requirements

This depends on external tools to do the actual work:

The easiest way to install the Python dependencies is with Miniconda. Then do:

conda install -c bcbio crossmap pyfaidx

Contributors