Skip to content

phglab/MIRUReader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIRUReader

Description

Identify 24-locus MIRU-VNTR for Mycobacterium tuberculosis complex (MTBC) directly from long reads generated by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). Also work on assembled genome.

Requirements

  • Linux
  • primersearch from EMBOSS
    • install from the official website or
    • install via conda conda install -c bioconda emboss
    • Ensure the primersearch command is in your device's environment path, where primersearch program can be executed directly by typing primersearch on the commandline
  • pandas
    • can be installed via conda conda install pandas or via PyPI pip install pandas
  • statistics
    • can be installed via PyPI pip install statistics

Installation

git clone https://github.com/phglab/MIRUReader.git

Change log

13/09/2019

  • Added a check to ensure primersearch is executable prior to MIRUReader program execution
  • Updated documentation to the README

04/07/2019

  • Update output format for option '--details'.

14/06/2019

  • Auto convert fastq to fasta.

Usage example

For one sample analysis:

python /your/path/to/MIRUReader.py -r sample.fasta -p sampleID > miru.txt

For multiple samples analysis:

  1. Create a mapping file (mappingFile.txt) that looks like:

    sample_001.fasta sample_001
    sample_002.fasta sample_002
    ...

  2. Then run the program:

cat mappingFile.txt | while read -a line; do python /your/path/to/MIRUReader.py -r ${line[0]} -p ${line[1]}; done > miru.multiple.txt

Output example

sample_prefix   0154    0424    0577    0580    0802    0960    1644    1955    2059    2163b   2165    2347    2401    2461    2531    2687    2996    3007    3171    3192    3690    4052    4156    4348
sample_001      2       4       4       2       3       3       3       2       2       5       4       4       4       2       5       1       6       3       3       5       3       7       2       3

Notes:

  • The program is compatible to Python 2 and Python 3.
  • Accepted reads file format includes '.fastq', '.fastq.gz', '.fasta', and '.fasta.gz'.
  • The program output is a tab-delimited plain text which can be copied to or opened in Excel spreadsheet.

Full usage

Main options Description
-r READS Input reads file in fastq/fasta format, can be gzipped or not gzipped
-p PREFIX Sample ID required for naming output file.
--table TABLE Allele calling table, default is MIRU_table. Can be user-defined in fixed format. However, providing custom allele calling table for other VNTR is not tested.
--primers PRIMERS Primers sequences, default is MIRU_primers. Can be user-defined in fixed format.
Optional options Description
--amplicons Use output from primersearch ("prefix.18.primersearch.out") and summarize MIRU profile directly.
--details This option is for further inspection. It displays details of repeat count for each loci with total mismatch error in the primer sequences alignment.
--nofasta Delete fasta file generated if your input read is in fastq format.

FAQ

  1. Why are there two MIRU allele calling tables (MIRU_table and MIRU_table_0580)?

MIRU loci 0580 (MIRU_table_0580) consist of a different numbering system for determination of repeat numbers as compared to the other 23 MIRU locus (MIRU_table) for MTBC isolates.

Troubleshooting

  1. If an error message OSError: primersearch is not found. appears, please ensure your primersearch executable file is in your environment path (echo $PATH) and can be called directly.

About

In-silico MIRU-VNTR typing using long reads

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages