A tool easily taken advantage of for in silico serogrouping of Pseudomonas aeruginosa isolates
Pseudomonas aeruginosa serotyper (PAst) is available as a web-service from CGE - PAst. The web-service works great, but is not the most efficient for 100s of genomes. I also looked into the original implementation available at PAst. I tried my best to start tweaking it to meet my needs, but unfortunately my Perl skills have dulled over the years! And it ended up being easier and quicker to just rewrite it in Python.
pasty
is a tool to identify the serogroup of Pseudomonas aeruginosa isolates. Using an
input assembly (uncompressed or gzip-compressed), the sequences are blasted against a set of O-antigens.
The serogroup is then predicted based on these results.
The serogroup logic is based on Table 1 of the original PAst publication (citation below).
pasty
Is available from Bioconda
mamba create -n pasty -c conda-forge -c bioconda pasty
conda activate pasty
pasty --help
Usage: pasty [OPTIONS]
A tool easily taken advantage of for in silico serogrouping of Pseudomonas aeruginosa isolates
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────╮
│ --version Show the version and exit. │
│ * --assembly TEXT Input assembly in FASTA format (gzip is OK) [required] │
│ --db TEXT Input database in uncompressed FASTA format [default: db/OSAdb.fasta] │
│ --prefix TEXT Prefix to use for output files [default: basename of input] │
│ --outdir TEXT Directory to save output files [default: ./] |
│ --min_pident INTEGER Minimum percent identity to count a hit [default: 95] │
│ --min_coverage INTEGER Minimum percent coverage to count a hit [default: 95] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
An assembly in FASTA format (compressed with gzip, or uncompressed) to predict the serogroup on.
A FASTA file with nucleotide sequences of each O-antigen and WyzB. In most cases using the default value will be all that is needed.
The prefix to use for the output files. If a prefix is not given, the basename of the input assembly will be used.
The directory to save result files. If the directory does not exist is will be created.
The minimum percent identity for a blast hit to be considered for serogrouping. The is compared
against the pident
column of the blast output.
The minimum coverage of a O-antigen to be considered for serogrouping. This looks at the percent
of the O-antigen that was aligned to. This calculation does include mismatches and gaps,
but those should be expected to be minimal if --min_pident
is set to the default (95%) or
similar.
Filename | Description |
---|---|
{PREFIX}.tsv |
A tab-delimited file with the predicted serogroup |
{PREFIX}.blasn.tsv |
A tab-delimited file of all blast hits |
{PREFIX}.details.tsv |
A tab-delimited file with details for each serogroup |
This file will contain the final predicted serogroup based on highest coverage. Here's what to expect the output to look like:
sample serogroup coverage fragments comment
GCF_003000695 O2 100.00 1
Column Name | Description |
---|---|
sample | Name of the sample processed |
serogroup | The predicted serogroup |
coverage | The percent of the O-antigen that was aligned to |
fragments | The number of blast hits included in the prediction (fewer the better) |
comment | A small comment when no serogroup can be predicted |
The blast results include sequences which won't display very nicely here. But, don't fret,
it is the standard -outfmt 6
, so tab-delimited and easy to parse.
This file provides the the coverage and number of fragments for each of the serogroups. This can be useful for a deeper review, and it should look like the following:
sample serogroup coverage fragments
GCF_003000695 O1 12.52 2
GCF_003000695 O2 100.00 1
GCF_003000695 O3 1.43 1
GCF_003000695 O4 13.86 2
GCF_003000695 O6 14.78 2
GCF_003000695 O7 11.82 2
GCF_003000695 O9 11.42 1
GCF_003000695 O10 12.97 2
GCF_003000695 O11 15.87 2
GCF_003000695 O12 0.00 0
GCF_003000695 O13 16.22 2
GCF_003000695 WzyB 0.00 0
Column Name | Description |
---|---|
sample | Name of the sample processed |
serogroup | The predicted serogroup |
coverage | The percent of the O-antigen that was aligned to |
fragments | The number of blast hits included in the prediction (fewer the better) |
The original implementation was called PAst, and given this is a Python version I originally
went with PAst-py. But decided, "nah, let's go with pasty". Hopefully pasty
will be
your patsy when it comes to serogrouping your P. aeruginosa isolates.
The original PAst did not include a license, so I selected Apache 2.0 as the license to match other CGE tools.
If you use this tool please cite the following:
Serogroup Method
Thrane SW, Taylor VL, Lund O, Lam JS, Jelsbak L Application of Whole-Genome Sequencing Data for O-Specific Antigen Analysis and In Silico Serotyping of Pseudomonas aeruginosa Isolates. Journal of Clinical Microbiology, 54(7), 1782–1788. (2016)
pasty
Petit III RA pasty: A tool easily taken advantage of for in silico serogrouping of Pseudomonas aeruginosa isolates (GitHub)
BLAST+
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)