Skip to content
Python and C++ code for reading and writing genomics data.
Python C++ Jupyter Notebook Shell
Branch: master
Clone or download
Genomics team in Google Brain Copybara-Service
Genomics team in Google Brain and Copybara-Service Make more explicit the different instructions for installing Nucleus
for use with Python 2 or Python 3.

PiperOrigin-RevId: 252851636
Latest commit 3cc9412 Jun 12, 2019


Nucleus is a library of Python and C++ code designed to make it easy to read, write and analyze data in common genomics file formats like SAM and VCF. In addition, Nucleus enables painless integration with the TensorFlow machine learning framework, as anywhere a genomics file is consumed or produced, a TensorFlow tfrecords file may be used instead.

New Tutorial!

Please check out our new tutorial on using Nucleus and TensorFlow for DNA sequencing error correction. It's a Python notebook that really demonstrates the power of Nucleus at integrating information from multiple file types (BAM, VCF and Fasta) and turning it into a form usable by TensorFlow.


Which of these would most increase your usage of Nucleus? (Click on an option to vote on it.)


Nucleus currently only works on modern Linux systems. If you are using Python 3, to install it, just run

pip install --user google-nucleus

If you are using Python 2, instead run

pip install --user google-nucleus==0.3.2


Building from source

For Ubuntu 14, Ubuntu 16 and Debian 9 systems, building from source is easy. Simply type


For all other systems, you will need to first install CLIF by following the instructions at before running

Note that extensively depends on apt-get, so it is unlikely to run without extensive modifications on non-Debian-based systems.

Nucleus depends on TensorFlow. By default, will install a CPU-only version of a stable TensorFlow release (currently 1.11). If that isn't what you want, there are several other options that can be enabled with a simple edit to

Running will build all of Nucleus's programs and libraries. You can find the generated binaries under bazel-bin/nucleus. If in addition to building Nucleus you would like to run its tests, execute

bazel test -c opt $COPT_FLAGS nucleus/...


This is Nucleus 0.4.1. Nucleus follows semantic versioning.

New in 0.4.1:

  • Pip package is slightly more robust.

New in 0.4.0:

  • The Nucleus pip package now works with Python 3.

New in 0.3.0:

  • Reading of VCF, SAM, and most other genomics files is now twice as fast.
  • Read range and end calculations are now done in C++ for speed.
  • VcfReader can now read "headerless" VCF files.
  • variant_utils.major_allele_frequency now 5x faster.
  • Memory leaks fixed in TFRecordReader/Writer and gfile_cc.

New in 0.2.3:

  • Nucleus no longer depends on any specific version of TensorFlow's python code. This should make it easier to use Nucleus with for example TensorFlow 2.0.
  • Added BCF support to VcfWriter.
  • Fixed memory leaks in VcfWriter::Write.
  • Added print_tfrecord example program.

New in 0.2.2:

  • Faster SAM file querying and read overlap calculations.
  • Writing protocol buffers to files uses less memory.
  • Smaller pip package.
  • nucleus/util:io_utils refactored into nucleus/io:tfrecord and nucleus/io:sharded_file_utils.
  • Alleles coming from VCF files are now always normalized as uppercase.

New in 0.2.1:

  • Upgrades htslib dependency from 1.6 to 1.9.
  • Minor VCF parsing fixes.
  • Added new example program, apply_genotyping_prior.
  • Slightly more robust pip package.

New in 0.2.0:

  • Support for reading and writing BedGraph files.
  • Support for reading and writing GFF files.
  • Support for reading and writing CRAM files.
  • Support for writing SAM/BAM files.
  • Support for reading unindexed FASTA files.
  • Iteration support for indexed FASTA files.
  • Ability to read VCF files from memory.
  • Python API documentation.
  • Python 3 compatibility.
  • Added universal file converter example program.


Nucleus is licensed under the terms of the Apache 2 license.


The Genomics team in Google Brain actively supports Nucleus and are always interested in improving its quality. If you run into an issue, please report the problem on our Issue tracker. Be sure to add enough detail to your report that we can reproduce the problem and fix it. We encourage including links to snippets of BAM/VCF/etc files that provoke the bug, if possible. Depending on the severity of the issue we may patch Nucleus immediately with the fix or roll it into the next release.


Interested in contributing? See CONTRIBUTING.


Nucleus grew out of the DeepVariant project.


This is not an official Google product.

You can’t perform that action at this time.