adVNTR is a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data. It works with both NGS short reads (Illumina HiSeq) and SMRT reads (PacBio) and finds diploid repeating counts for VNTRs and identifies possible mutations in the VNTR sequences.
- Following libraries are required
python2.7
python-pip
python-tk
libz-dev
samtools
You can install these requirement in Ubuntu Linux by running sudo apt-get install python2.7 python-pip python-tk libz-dev samtools
- Following python2.7 packages are required:
biopython
pysam
version 0.9.1.4 or abovecython
networkx
version 1.11scipy
joblib
You can install required python libraries by running pip install -r requirements.txt
- In addition,
ncbi-blast
version 2.2.29 or above is required
- To run adVNTR on trained VNTR models:
- Download vntr_data.zip and extract it inside the project directory.
Alternatively, you can add model for custom VNTR. See :ref:`add-custom-vntr-label` for more information.
Use following command to see the help for running the tool.
python advntr.py --help
The program outputs the RU count genotypes for all VNTRs in vntr_data
directory. To specify a single VNTR by its ID use --vntr_id <id>
option.
--alignment_file
specifies the alignment file containing mapped and unmapped reads:
python advntr.py --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/
- With
--pacbio
, adVNTR assumes the alignment file contains PacBio sequencing data:
python advntr.py --alignment_file aligned_pacbio_reads.bam --working_directory ./log_dir/ --pacbio
- Use
--frameshift
to find the possible frameshifts in VNTR:
python advntr.py --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/ --frameshift
- Use the following command to genotype the RU count using fasta file:
python advntr.py --fasta unaligned_illumina_reads.fasta --working_directory ./log_dir/
Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. and Bafna, V., 2017. Targeted Genotyping of Variable Number Tandem Repeats with adVNTR. bioRxiv, p.221754.