Skip to content

jmniederle/VADA

Repository files navigation

VADA

Code for the paper: "VADA: a Data-Driven Simulator for Nanopore Sequencing"

Create conda environment using:

conda create --name vada_env --file vada_requirements.txt

Demo

To see an example of loss computation, sampling and generate an example plot, run:

python VADA_demo.py 

Data

The data that was used for training VADA is publicly available, to download follow instructions on GitHub Repo. Note: this download is ~30GB

For training on data where the reference DNA sequence has not been aligned with the nanopore observations, use the Tombo Package

To read a sequence of nanopore observations use the read_fast5() function from src/utils/read.py. And to preprocess a sequence of nanopore observations use split_and_process_nano_read_kmer() in datasets/data.util.py, where arguments should be specified as follows:

  • nano_read: the ReadData object (output of read_fast5())
  • split_len: the length of subsequences to split the nanopore sequence into
  • kmer_one_hot_enc: a kmer onehotencoder object, i.e. by running get_kmer_one_hot_encoder() from datasets/data_util.py
  • normalize: whether to normalize the sequences

Training

The model was trained using configurations that can be found in configs/config_VADA.json

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages