Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.
images_ood Internal change Feb 6, 2020
test_data Training a generative model and a classifier for genomic OOD dataset Jul 2, 2019 Pixel_cnn evaluation code for LLR OOD paper Jan 16, 2020 internal Jan 22, 2020 Explicitly replace "import tensorflow" with "tensorflow.compat.v1" fo… Feb 18, 2020
requirements.txt Training a generative model and a classifier for genomic OOD dataset Jul 2, 2019 internal Jan 22, 2020 internal Jan 22, 2020

Out-of-Distribution Detection for Genomics Sequences

This directory contains implementation of the generative model and the classification model for the bacteria genomic dataset that are used in the paper of Ren J, Liu PJ, Fertig E, Snoek J, Poplin R, DePristo MA, Dillon JV, Lakshminarayanan B. Likelihood Ratios for Out-of-Distribution Detection. arXiv preprint arXiv:1906.02845.


virtualenv -p python3 .
source ./bin/activate

pip install -r genomics_ood/requirements.txt


This directory contains two python scripts: build an autoregressive generative model for DNA sequences using LSTM. build a classifier for DNA sequences using ConvNets.

To test the models on a toy dataset,

python -m genomics_ood.generative \
--hidden_lstm_size=30 \
--val_freq=100 \
--num_steps=1000 \
--in_tr_data_dir=$DATA_DIR/before_2011_in_tr \
--in_val_data_dir=$DATA_DIR/between_2011-2016_in_val \
--ood_val_data_dir=$DATA_DIR/between_2011-2016_ood_val \

python -m genomics_ood.classifier \
--num_motifs=30 \
--val_freq=100 \
--num_steps=1000 \
--in_tr_data_dir=$DATA_DIR/before_2011_in_tr \
--in_val_data_dir=$DATA_DIR/between_2011-2016_in_val \
--ood_val_data_dir=$DATA_DIR/between_2011-2016_ood_val \
--label_dict_file=$DATA_DIR/label_dict.json \

Real Bacteria Dataset

The real bacteria dataset with 10 in-distribtution classes, 60 validation out-of-distribution (OOD) classes, and 60 test OOD classes can be downloaded at Google Drive

To run models on the real dataset, one needs to set DATA_DIR=// and specify the OUT_DIR.

Likelihood Ratios

To compute likelihood ratios, we train two generative models using the The full model is trained with L2 regularization weight and mutation rate both 0.0. The background model is trained with L2 regularization weight 0.0001 and mutation rate 0.1.

You can’t perform that action at this time.