Skip to content

popitsch/wtchg-rg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RG (ReliableGenome)

ReliableGenome (RG) is a method for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. RG integrates variant call sets created by multiple pipelines from arbitrary numbers of input datasets and interpolates expected concordance for genomic regions without data resulting in a genome-wide concordance score. Ultimately, genomic regions of high/low concordance are calculated from this genome-wide signal.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Read more about RG in this paper or have a look at this poster that was presented at the NGS16 conference.

RG genomic partitions

  • A genomic partition calculated from 219 deep WGS alignments can be downloaded here(.tbi). The partition contains 2,209,778 concordance intervals located on chromosomes 1,..,22,X,Y.
    • Reference sequence: hs37d5 (GRCh37 + decoy)
    • Read mapper: bwa + stampy
    • Variant callers: GATK HaplotypeCaller, platypus, samtools
    • Params: wc = 1; wd = -3; tc=td=0.5; window size x=1000
  • The same partition with LCR and HD regions removed (RG-LCR-HD(.tbi)), see paper.
  • The same partition with LCR100 and HD regions removed (RG-LCR100-HD(.tbi)), see paper.
  • The same partition with UM75 regions removed (RG-UM75(.tbi)), see paper.

Building and Running RG

Prerequisites

  • maven3
  • jdk 1.7+

Building

  • Clone source code from github
  • Run the bash script build.sh
  • If everything worked you will find a standalone executable JAR in the bin directory and a library jar (containing only the classes from the RG source tree) in the target directory.

Running RG

You can run RG via java -jar bin/wtchg-rg-1.0.jar which will print basic usage information. Use java -Xmx12g -jar ... to run RG with more dedicated heap space (recommended).

RG JOIN

To join VCF files from different variant callers, run: java -jar bin/wtchg-rg-1.0.jar CalcreliabilitySignals join to get detailed usage information.

Usage example: java -Xmx12g -jar bin/wtchg-rg-1.0.jar join -d <GATK.VCF> -dl GATK -d <PLAT.VCF> -dl PLAT -d <SAMT.VCF> -dl SAMT -o <SNV.out.vcf> -oi <INDEL.out.vcf> -dontCheckSort -dropAllFiltered -indelMergeWin 5

RG CALC

To calculate the genome-wide concordance score signal, run: java -jar bin/wtchg-rg-1.0.jar CalcreliabilitySignals calc to get detailed usage information.

Usage example: java -Xmx12g -jar bin/wtchg-rg-1.0.jar calc -o <OUTDIR> -w 1000 -scoringSchema 1,-3 -thresholds 0.5,0.5 -dontCheckSort -v


Test data

Find some test VCF files that are ready to JOIN in data/public/vcf/. Usage example (please modify paths to vcf/jar files as required): java -Xmx12g -jar wtchg-rg-1.0.jar calc -o results -w 1000 -scoringSchema 1,-3 -thresholds 0.5,0.5 -createWigs -dontCheckSort -v -d vcf/AW_CRS_1631.DP+MDI.vcf.gz -d vcf/AW_CRS_1632.DP+MDI.vcf.gz -d vcf/AW_CRS_1806.DP+MDI.vcf.gz -d vcf/AW_CRS_1807.DP+MDI.vcf.gz -d vcf/AW_CRS_4103.DP+MDI.vcf.gz -d vcf/AW_CRS_4917.DP+MDI.vcf.gz -d vcf/AW_SC_4654.DP+MDI.vcf.gz -d vcf/AW_SC_4655.DP+MDI.vcf.gz -d vcf/AW_SC_4659.DP+MDI.vcf.gz Please note that the "-createWigs" switch results in the creation of WIG files containing the genome-wide (interpolated) score signal and a signal showing the number of contributing datasets per position ("power signal"). The produced WIG files are too large to load them into a genome browser directly and should be converted, e.g., to the BigWig format using the following commandline wigToBigWig <WIG> <CHRSIZES> <BIGWIG>. (a chromosome sizes file is provided here for convenience).


Citing RG

Please cite our paper when using RG:

Niko Popitsch, WGS500 Consortium, Anna Schuh, and Jenny C. Taylor. ReliableGenome : Annotation of Genomic Regions with High/Low Variant Calling Concordance Bioinformatics, 2016 doi:10.1093/bioinformatics/btw587

Contact

If you want to get in touch, please write to <a href="mailto:niko@well.ox.ac.uk">niko@well.ox.ac.uk.

About

Reliable Genome project

Resources

License

Stars

Watchers

Forks

Packages

No packages published