GitHub - nh13/adVNTR: A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data

adVNTR - A tool for genotyping VNTRs

adVNTR is a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data. It works with both NGS short reads (Illumina HiSeq) and SMRT reads (PacBio) and finds diploid repeating counts for VNTRs and identifies possible mutations in the VNTR sequences.

Software Requirements

Following libraries are required
- python2.7
- python-pip
- python-tk
- libz-dev
- samtools

You can install these requirement in Ubuntu Linux by running sudo apt-get install python2.7 python-pip python-tk libz-dev samtools

Following python2.7 packages are required:
- biopython
- pysam version 0.9.1.4 or above
- cython
- networkx version 1.11
- scipy
- joblib

You can install required python libraries by running pip install -r requirements.txt

In addition, ncbi-blast version 2.2.29 or above is required

Data Requirements

To run adVNTR on trained VNTR models:
- Download vntr_data.zip and extract it inside the project directory.

Alternatively, you can add model for custom VNTR. See :ref:`add-custom-vntr-label` for more information.

Execution:

Use following command to see the help for running the tool.

python advntr.py --help

The program outputs the RU count genotypes for all VNTRs in vntr_data directory. To specify a single VNTR by its ID use --vntr_id <id> option.

Demo 1: input in BAM format

--alignment_file specifies the alignment file containing mapped and unmapped reads:

python advntr.py --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/

With --pacbio, adVNTR assumes the alignment file contains PacBio sequencing data:

python advntr.py --alignment_file aligned_pacbio_reads.bam --working_directory ./log_dir/ --pacbio

Use --frameshift to find the possible frameshifts in VNTR:

python advntr.py --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/ --frameshift

Demo 2: input in fasta format

Use the following command to genotype the RU count using fasta file:

python advntr.py --fasta unaligned_illumina_reads.fasta --working_directory ./log_dir/

Citation:

Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. and Bafna, V., 2017. Targeted Genotyping of Variable Number Tandem Repeats with adVNTR. bioRxiv, p.221754.

Name		Name	Last commit message	Last commit date
Latest commit History 453 Commits
blast_tmp		blast_tmp
docs		docs
hg19_chromosomes		hg19_chromosomes
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
advntr.py		advntr.py
requirements-linux.txt		requirements-linux.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

adVNTR - A tool for genotyping VNTRs

Software Requirements

Data Requirements

Execution:

Demo 1: input in BAM format

Demo 2: input in fasta format

Citation:

About

Releases

Packages

Languages

License

nh13/adVNTR

Folders and files

Latest commit

History

Repository files navigation

adVNTR - A tool for genotyping VNTRs

Software Requirements

Data Requirements

Execution:

Demo 1: input in BAM format

Demo 2: input in fasta format

Citation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages