STMLST

STMLST is an effective approach and automatic bioinformatics tool for serotype identification of multiple microbial organisms.

STMLST based on the key alleles-sequence types-serotypes associations for the identification of serotypes of microbial organisms.
STMLST firstly construct an association database collecting the information of key alleles, sequence types and serotypes of microbial organisms.
STMLST then introduce a sigmoid scoring strategy to evaluate the possible microbial organisms and the sequence types.
STMLST infer the corresponding serotypes using the mapping relationships between sequence types and serotypes in the association database, and complete the identification of serotypes for microbial organisms.

Install

Download program

Download program first: git clone https://github.com/lyotvincent/STMLST.git
Install external tools:
2.1. Install miniconda from https://docs.conda.io/en/latest/miniconda.html or anaconda from https://www.anaconda.com/products/individual
2.2. Create python3 environment (because QUAST depends on python3.7) conda create -n env_name python=3
2.3. Install external tools by running a command in the conda environment conda install any2fasta blast
2.supplement. If user want combine the serotype identification result of seqsero, install it by running a command conda install seqsero2

Download database

1.cd /PATH/TO/stmlst/db
2.python download_publist.py could download data used by STMLST from PUBMLST to local folder.
3.build blastdb and "key alleles-sequence types-serotypes" association database usingpython make_db.py.

How to use STMLST

simple usage python stmlst.py -f XXX.fastq

python stmlst.py -h to help.

parameters in pipeline:

help:
-h, --help show this help message and exit
-f FILE_NAME, --file_name FILE_NAME
input file
-n NUM_THREADS, --num_threads NUM_THREADS
number of threads
--min_id MIN_ID Percent identity <Real, 0..100> DNA identity of full
allelle to consider 'similar' [~]
--min_cov MIN_COV DNA cov to report partial allele at all [?]
--specified_scheme SPECIFIED_SCHEME
specified a scheme
-s, --seqsero fill null serotype with seqsero
-v, --version show program's version number and exit

An example of running stmlst:

python stmlst/bin/stmlst.py -f SRR5986253.contigs.fa

[INFO] highest probability organism: ['senterica_achtman_2', 100.0, {'dnaN': '169', 'hemD': '48', 'thrA': '4', 'hisD': '16', 'purE': '12', 'aroC': '42', 'sucA': '23'}]
[INFO] serotype identification result table:
|ST|aroC|dnaN|hemD|hisD|purE|sucA|thrA|serotype|
|----|----|----|----|----|----|----|----|----|
|2041|42|169|48|16|12|23|4|unknown:0.21428571428571427;Abaetetuba:0.7142857142857143;other:0.07142857142857142|

The first row of the result indicates that “senterica” is the most likely organism to which the input data belongs.
The fields of this result table are indicated in the third row of the result, the first item is the sequence type, the last item is the serotype, and the remaining items are the names of allele loci named aroC, dnaN, hemD, hisD, purE, sucA, and thrA.
The text and numbers in the fifth row of the result correspond to the fields in the fifth row. “2041” is the serial number representing the sequence type. “42, 169, 48, 16, 12, 23, 4” are the serial numbers representing one of the alleles on the allele locus. “unknown:0.21428571428571427;Abaetetuba:0.7142857142857143;other:0.07142857142857142” means that the input data has 0.7142857142857143 probability of belonging to serotype “Abaetetuba”, and the other probabilities belong to unknown type.

Test Data and test records

test/md_v2/test_on_s_set.xlsx contains the data used in 3.1 of our paper. It consists of NGS data of single species.
test/md_v2/test_on_n_set.xlsx contains the data used in 3.2 of our paper. It consists of NGS data of single species.
test/md_v2/test_on_nanopore_sequencing_data.xlsx contains the data used in 3.3 of our paper. It consists of nanopore sequencing data of single species.
test/md_v2/test_on_multiple_microbial_organisms_set.xlsx contains the data used in 3.4 of our paper. It consists of NGS data of multiple microbial organisms.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
bin		bin
db		db
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STMLST

Menu

Install

Download program

Download database

How to use STMLST

An example of running stmlst:

Test Data and test records

About

Releases

Packages

Languages

lyotvincent/STMLST

Folders and files

Latest commit

History

Repository files navigation

STMLST

Menu

Install

Download program

Download database

How to use STMLST

An example of running stmlst:

Test Data and test records

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages