A de novo variant caller which uses information from a mother, father and child trio with a bayesian inference method.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
.travis.yml
CONTRIBUTING.rst
LICENSE
README.rst
denovo.pdf
pom.xml

README.rst

denovo-variant-caller-java Build Status Build Coverage

Calls de novo variants using information from a mother, father and child trio.

Uses a bayes net encoded with the inheritence relationship in the trio in order to judge the denovo calls.

NOTE : Currently under development. Usage should be considered experimental. It will also run much faster when https://github.com/googlegenomics/denovo-variant-caller-java/issues/23 is completed.

Documentation

Documentation for the project can be found in denovo.pdf.

Getting started

This Java program allows you to discover denovo variants using Bayesian de novo variant calling.

There are three modes for Denovo calling

  • Variants Based - Examines variant calls and filters based on mendelian inheritance rules.

  • Reads based - Examines reads for candidate positions and filters based based on Bayesian evidence weighting. Lower false positive rate but more expensive to compute. Note that this step requires a pre selected list of candidate positions --input_calls_file which can be obtained from the previous variant based step.

  • Full - A utility mode that runs both the variant and the reads mode for you such that output of variants mode is piped to reads mode

    java -jar target/denovo-variant-caller-0.1.jar --caller full \
    --dataset_id 3049512673186936334 \
    --dad_callset_name NA12891 \
    --mom_callset_name NA12892 \
    --child_callset_name NA12878 \
    --chromosome chr1 \
    --start_position 75884300 \
    --end_position 75884400 \
    --log_level debug \
    --num_threads 2 \
    --output_file NA12878_full.calls
    

Additional Options

To speed up execution increase the number of threads with the --num_threads option.

To restrict to one or more chromosomes use the --chromosome flag.

See below for all options

Usage: DenovoMain [flags...]
 --caller [VARIANT | READ | FULL]       : The caller mode
 --child_callset_name <name>            : Child's callset name e.g. NA12879
 --chromosome <name>                    : specify the chromosomes to search
                                          (specify multiple times for multiple
                                          chromsomes)
 --dad_callset_name <name>              : Dad's callset name e.g. NA12877
 --dataset_id <id>                      : Dataset id
 --denovo_mut_rate <rate>               : Specify the denovo mutation rate
                                          (default 1e-8)
 --end_position <position>              : end position ( usually set
                                          automatically )
 --inference_method [MAP | BAYES | LRT] : Inference method (map | bayes | lrt)
 --input_calls_file <file>              : File to read from
 --log_file <file>                      : specify the log file
 --log_level [ERROR | INFO | DEBUG]     : specify the logging level
 --lrt_threshold <sig_level>            : likelihood ratio test significance
                                          level (default 1. ;higher the
                                          stricter)
 --max_variant_results <num>            : max variants returned per request
                                          (default 10000)
 --mom_callset_name <name>              : Mom's callset name e.g. NA12878
 --num_threads <num>                    : Specify the number of threads
                                          (default 1 ; 1 to 50 suggested)
 --output_dir <dir>                     : File to write results
 --output_file <file>                   : File to write results
 --seq_err_rate <rate>                  : Specify the sequence error rate
                                          (default 1e-2)
 --start_position <position>            : start position ( usually 1 )

Building Documentation

The documentation in this repository relies on the LaTeX maven plugin

To build the documentation you need to first have pdflatex available on your system. Try MacTeX for Macs or TeX Live for Windows.

cd denovo-variant-caller-java
mvn latex:latex
cp target/denovo.pdf denovo.pdf

Todos / Next Steps

  • The caller currently calls SNPs and ignores indels. This feature can be added by carefully treating structural variations.
  • Parameters in the bayes net are fixed and not learned. Baseline mutation rates could be learned for the trio under study.
  • Additional supervised classifiers could be added to the set of callers. It should be sufficient to derive from DenovoCaller class and initialized by DenovoCallers static factory.
  • To get a correct estimate of the precision/recall values of the caller a gold standard dataset with de novo mutations is needed. Unfortunately, none such exists. It can be closely approximated with blood derived DNA samples from multiple trios of siblings.

The mailing list

The Google Genomics Discuss mailing list is a good way to sync up with other people who use genomics-tools including the core developers. You can subscribe by sending an email to google-genomics-discuss+subscribe@googlegroups.com or just post using the web forum page.