Skip to content
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Python C++ Shell Dockerfile
Branch: r0.8
Clone or download
pichuan and gunjanbaid (1) Update documentation about dependencies (2) Copy to …
…Docker image.(3) Remove the CMD behavior for Docker image which didn't seem particularly useful.

PiperOrigin-RevId: 245454297
Latest commit c0820eb Apr 26, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Add PULL_REQUEST_TEMPLATE that says we aren't taking pull requests Dec 22, 2017
deepvariant Automated g4 rollback of changelist 241400051 Apr 8, 2019
docs Update an incorrect script reference, and add explicit mention for `m… Apr 19, 2019
scripts Pin docker-ce version in Apr 11, 2019
third_party Add protobuf to pip package dependencies. Nucleus actually already in… Apr 19, 2019
tools Move tools/bazel.rc to .bazelrc. In upcoming versions of Bazel, tools… Feb 15, 2019
.bazelrc Updated our bazelrc to check correct path for TensorFlow bazelrc. Mar 26, 2019
.gitignore Initial release of DeepVariant Dec 1, 2017
AUTHORS Change to "Google LLC." in headers. Dec 4, 2018
BUILD Automated g4 rollback of changelist 241410906 Apr 8, 2019 Project import generated by Copybara. Dec 3, 2017
Dockerfile (1) Update documentation about dependencies (2) Copy to … Apr 29, 2019
LICENSE Change to "Google LLC." in headers. Dec 4, 2018 (1) Update documentation about dependencies (2) Copy to … Apr 29, 2019
WORKSPACE Stop using deprecated built-in repository rules "http_archive" and "n… Feb 14, 2019 Make the build scripts quieter, so that the experience of running run… Mar 27, 2019 internal change Apr 11, 2019 Automated g4 rollback of changelist 241410906 Apr 8, 2019 Update TF version to 1.13.1. Apr 9, 2019 Removed unnecessary TF variables. Apr 10, 2019


DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework.

Why Use DeepVariant?

  • High accuracy - In 2016 DeepVariant won PrecisionFDA Truth Challenge for best SNP Performance. DeepVariant maintains high accuracy across data from different sequencing technologies, prep methods, and species.
  • Flexibility - Out-of-the-box use for PCR-positive samples and low quality sequencing runs, and easy adjustments for different sequencing technologies and non-human species.
  • Ease of use - No filtering is needed beyond setting your preferred minimum quality threshold.
  • Cost effectiveness - With an optimized setup on Google Cloud, it costs ~$2-3 to call a whole genome and $0.20 to call an exome with preemptible instances.
  • Speed - On a 64-core CPU-only machine, DeepVariant completes a 50x WGS in 5 hours and an exome in 16 minutes (1). Multiple options for acceleration exist, taking the WGS pipeline to as fast as 40 minutes (see external solutions).
  • Usage options - DeepVariant can be run via Docker or binaries, using both on-premise hardware or in the cloud, with support for hardware accelerators like GPUs and TPUs.

(1): Time estimates do not include mapping.

DeepVariant Setup


  • Unix-like operating system (cannot run on Windows)
  • Python 2.7

Official Solutions

Below are the official solutions provided by the Genomics team in Google Brain.

Name Description
Docker This is the recommended method.
Build from source DeepVariant comes with scripts to build it on Ubuntu 14 and 16, with Ubuntu 16 recommended. To build and run on other Unix-based systems, you will need to modify these scripts.
Prebuilt Binaries Available at gs://deepvariant/. These are compiled to use SSE4 and AVX instructions, so you will need a CPU (such as Intel Sandy Bridge) that supports them. You can check the /proc/cpuinfo file on your computer, which lists these features under "flags".

External Solutions

The following pipelines are not created or maintained by the Genomics team in Google Brain. Please contact the relevant teams if you have any questions or concerns.

Name Description
Running DeepVariant on Google Cloud Platform Docker-based pipelines optimized for cost and speed. Code can be found here.
DeepVariant-on-spark from ATGENOMIX A germline short variant calling pipeline that runs DeepVariant on Apache Spark at scale with support for multi-GPU clusters (e.g. NVIDIA DGX-1).
Parabricks An accelerated DeepVariant pipeline with multi-GPU support that runs our WGS pipeline in just 40 minutes, at a cost of $2-$3 per sample. This provides a 7.5x speedup over a 64-core CPU-only machine at lower cost.
DNAnexus DeepVariant App Offers parallelized execution with a GUI interface (requires platform account).
Nextflow Pipeline Offers parallel processing of multiple BAMs and Docker support.
DNAstack Pipeline Cost-optimized DeepVariant pipeline (requires platform account).

Run DeepVariant

Additional References

Contribution Guidelines

Please open a pull request if you wish to contribute to DeepVariant. Note, we have not set up the infrastructure to merge pull requests externally. If you agree, we will test and submit the changes internally and mention your contributions in our release notes. We apologize for any inconvenience.

If you have any difficulty using DeepVariant, feel free to open an issue. If you have general questions not specific to DeepVariant, we recommend that you post on a community discussion forum such as BioStars.


BSD-3-Clause license


DeepVariant happily makes use of many open source packages. We would like to specifically call out a few key ones:

We thank all of the developers and contributors to these packages for their work.


This is not an official Google product.

You can’t perform that action at this time.