Kleborate is a tool to screen Klebsiella genome assemblies for:
- MLST sequence type
- species (e.g. K. pneumoniae, K. quasipneumoniae, K. variicola, etc.)
- ICEKp associated virulence loci: yersiniabactin (ybt), colibactin (clb)
- virulence plasmid associated loci: salmochelin (iro), aerobactin (iuc), hypermucoidy (rmpA, rmpA2)
- antimicrobial resistance genes, including quinolone resistance SNPs and colistin resistance truncations
- K (capsule) and O antigen (LPS) serotype prediction, via wzi alleles and Kaptive
A manuscript describing the Kleborate software in full is currently in preparation. (Note that the BLAST logic has been checked in the light of this article describing a common misconception regarding the BLAST parameter -max_target_seqs.)
In the meantime, if you use Kleborate, please cite the component schemes that you report:
Yersiniabactin and colibactin (ICEKp) Lam, MMC. et al. Genetic diversity, mobilisation and spread of the yersiniabactin-encoding mobile element ICEKp in Klebsiella pneumoniae populations. Microbial Genomics (2018).
Kaptive for capsule (K) serotyping: Wyres, KL. et al. Identification of Klebsiella capsule synthesis loci from whole genome data. Microbial Genomics (2016).
Kaptive for O antigen (LPS) serotyping: Wick, RR et. al. Kaptive Web: user-friendly capsule and lipopolysaccharide serotype prediction for Klebsiella genomes. Journal of Clinical Microbiology (2018).
Table of Contents
- Basic usage
- Full usage
- Screening details
- Example output
- Typing from Illumina reads
- Contact us
Klebsiella pneumoniae (Kp) is a commensal bacterium that causes opportunistic infections, with a handful of hypervirulent lineages recognised as true human pathogens. Evidence is now mounting that other Kp strains carrying acquired siderophores (yersiniabactin, salmochelin and aerobactin) and/or the genotoxin colibactin are also highly pathogenic and can cause invasive disease.
Our goal is to help identify emerging pathogenic Kp lineages, and to make it easy for people who are using genomic surveillance to monitor for antibiotic resistance to also look out for the convergence of antibiotic resistance and virulence. To help facilitate that, in this repo we share code for genotyping virulence and resistance genes in K. pneumoniae. A table of pre-computed results for 2500 public Klebs genomes is also provided in the data directory.
- Python (either 2.7 or 3)
- setuptools (required to install Kleborate)
- To install:
pip install setuptools
- To install:
- BLAST+ command line tools (
- Version 2.2.30 or later is needed, as earlier versions have a bug with the
- Version 2.2.30 or later is needed, as earlier versions have a bug with the
- Mash is required to use the
As input, Kleborate takes Klebsiella genome assemblies (either completed or draft). If you have unassembled reads, try assembling them with our Unicycler assembler which works great on Illumina or hybrid Illumina + Nanopore/PacBio reads).
Kleborate can be installed to your system for easy usage:
git clone --recursive https://github.com/katholt/Kleborate.git cd Kleborate python setup.py install kleborate -h
Alternatively, you can clone and run Kleborate without installation directly from its source directory:
git clone --recursive https://github.com/katholt/Kleborate.git Kleborate/kleborate-runner.py -h
See examples below to test out your installation on some public genome data.
Screen some genomes for MLST and virulence loci:
kleborate -o results.txt -a *.fasta
Also screen for resistance genes:
kleborate --resistance -o results.txt -a *.fasta
Turn on all of Kleborate's optional screens (resistance genes, species check and both K and O loci):
kleborate --all -o results.txt -a *.fasta
Screen everything in a set of gzipped assemblies:
kleborate --all -o results.txt -a *.fasta.gz
usage: kleborate -a ASSEMBLIES [ASSEMBLIES ...] [-r] [-s] [--kaptive_k] [--kaptive_o] [-k] [--all] [-o OUTFILE] [--kaptive_k_outfile KAPTIVE_K_OUTFILE] [--kaptive_o_outfile KAPTIVE_O_OUTFILE] [-h] [--version] Kleborate: a tool for characterising virulence and resistance in Klebsiella Required arguments: -a ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...] FASTA file(s) for assemblies, can be gzipped (.gz) Screening options: -r, --resistance Turn on resistance genes screening (default: no resistance gene screening) -s, --species Turn on Klebsiella species identification (requires Mash, default: no species identification) --kaptive_k Turn on Kaptive screening of K loci (default: do not run Kaptive for K loci) --kaptive_o Turn on Kaptive screening of O loci (default: do not run Kaptive for O loci) -k, --kaptive Equivalent to --kaptive_k --kaptive_o --all Equivalent to --resistance --species --kaptive Output options: -o OUTFILE, --outfile OUTFILE File for detailed output (default: Kleborate_results.txt) --kaptive_k_outfile KAPTIVE_K_OUTFILE File for full Kaptive K locus output (default: do not save Kaptive K locus results to separate file) --kaptive_o_outfile KAPTIVE_O_OUTFILE File for full Kaptive O locus output (default: do not save Kaptive O locus results to separate file) Help: -h, --help Show this help message and exit --version Show program's version number and exit
Multilocus sequencing typing of Klebsiella follows the schemes described at the Klebsiella pneumoniae BIGSdb hosted at the Pasteur Institute. The alleles and schemes are stored in the data directory of this repository.
Some notes on Kleborate's MLST calls:
- Kleborate makes an effort to report the closest matching ST / clonal group if a precise match is not found.
- Imprecise allele matches are indicated with a
- Imprecise ST calls are indicated with
-nLV, where n indicates the number of loci that disagree with the ST reported. So
258-1LVindicates a single-locus variant of (SLV) of ST258, i.e. 6/7 loci match ST258.
Kleborate examines four key virulence loci in Klebsiella: the siderophores yersiniabactin (ybt), aerobactin (iuc) and salmochelin (iro), and the genotoxin colibactin (clb).
- For each of these loci, Kleborate will call a sequence type using the same logic as the MLST described above.
- If the locus is not detected, Kleborate reports the ST as
0and the lineage as
- Kleborate will also report the lineage associated with the virulence sequence types, as outlined below and detailed in the corresponding papers (for yersiniabactin, we also report the predicted ICEKp structure based on the ybt lineage assignment).
Yersiniabactin and colibactin (primarily mobilised by ICEKp)
We recently explored the diversity of the Kp integrative conjugative element (ICEKp), which mobilises the yersiniabactin locus ybt, using genomic analysis of a diverse set of 2498 Klebsiella (see this paper). Overall, we found ybt in about a third of all Kp genomes and clb in about 14%. We identified 17 distinct lineages of ybt (see figure) embedded within 14 structural variants of ICEKp that can integrate at any of four tRNA-Asn sites in the chromosome. Three of the ybt 17 lineages were associated with three lineages of colibactin, with which they are co-located in the same ICE structure designated ICEKp10. One ICE structure (ICEKp1) carries the salmochelin synthesis locus iro and rmpA hypermucoidy gene in addition to ybt (lineage 2). Additionally, we identify a lineage of ybt that is plasmid-encoded, representing a new mechanism for ybt dispersal in Kp populations. Based on this analysis, we developed a MLST-style approach for assigning yersiniabactin sequence types (YbST) and colibactin sequence types (CbST), which is implemented in Kleborate. Annotated reference sequences for each ICEKp variant are included in the data directory of this repository).
Aerobactin and salmochelin (primarily mobilised by virulence plasmids)
We further explored the genetic diversity of the aerobactin (iuc) and salmochelin (iro) loci among a dataset of 2733 Klebsiella genomes (see this preprint). We identified five iro and six iuc lineages (see figure), each of which was associated with a specific location within Kp genomes. The most common lineages were iuc1 and iro1, which are found together on the virulence plasmid KpVP-1 (typified by pK2044 or pLVPK common to the hypervirulent clones ST23, ST86, etc). iuc2 and iro2 lineages were associated with the alternative virulence plasmid KpVP-2 (typified by Kp52.145 plasmid II from the K2 ST66 lab strain known as Kp52.145 or B5055). iuc5 and iro5 originate from E. coli and are carried (often together) on E. coli plasmids that can transfer to Kp. The lineages iuc2A, iuc3 and iro4 were associated with other novel plasmids that have not yet been previously described in Kp. In addition, we found the salmochelin locus present in ICEKp1 constitutes its own lineage iro3, and the aerobactin locus present in the chromosome of ST67 Kp subsp rhinoscleromatis strains constitutes its own lineage iuc4. Based on this analysis, we developed a MLST-style approach for assigning aerobactin sequence types (AbST) and salmochelin sequence types (SmST) which is implemented in Kleborate.
Please note that the aerobactin iuc and salmochelin iro lineage names have been updated between Kleborate version 0.2.0 and 0.3.0 to match the nomenclature used in the preprint. The AbST and SmST allele numbers are unchanged. Lineage name re-assignments are:
|v0.2.0||v0.3.0||location (see preprint for details)|
|iuc 2||iuc 1||KpVP-1 (e.g. pLVPK)|
|iuc 3B||iuc 2||KpVP-2|
|iuc 3A||iuc 2A||other plasmids|
|iuc 4||iuc 3||other plasmids|
|iuc 5||iuc 4||rhinoscleromatis chromosome|
|iuc 1||iuc 5||E. coli variant|
|iro 3||iro 1||KpVP-1 (e.g. pLVPK)|
|iro 4||iro 2||KpVP-2|
|iro 5||iro 3||ICEKp1|
|iro 2||iro 4||Enterobacter variant|
|iro 1||iro 5||E. coli variant|
Kleborate screens for alleles of the rmpA and rmpA2 genes which result in a hypermucoid phenotype by upregulating capsule production.
- The two genes share ~83% nucleotide identity so are easily distinguished, and are reported in separate columns.
- Alleles for each gene are sourced from the BIGSdb. For rmpA, we have also mapped thes alleles to the various known locations for rmpA in Klebsiella (i.e. major virulence plasmids KpVP-1 and KpVP-2; other virulences plasmids simply designated as VP; ICEKp1 and the chromosome in rhinoscleromatis).
- Unique (non-overlapping) nucleotide BLAST hits with >95% identity and >50% coverage are reported. Note multiple hits to the same gene are reported if found (e.g. the NTUH-K2044 genome carries rmpA in the virulence plasmid and also in ICEKp1, which is reported in the rmpA column as rmpA_11(ICEKp1),rmpA_2(KpVP-1)).
Resistance gene detection
By using the
--resistance option, Kleborate will screen for resistance genes against the ARG-Annot database of acquired resistance genes (SRST2 version), which includes allelic variants. It attempts to report the best matching variant for each locus in the genome:
- Imprecise allele matches are indicated with
- If the length of match is less than the length of the reported allele (i.e. a partial match), this is indicated with
- Note that narrow spectrum beta-lactamases AmpH and SHV () are core genes in K. pneumoniae and so should be detected in most genomes.
- These genes include: SHV (K. pneumoniae), LEN (K. variicola), OKP (K. quasipneumoniae) and AmpH (all of the above species)
- See this paper for more information.
- Note that oqxAB are also core genes in K. pneumoniae, but have been removed from this version of the ARG-Annot DB as they don't actually confer resistance to fluoroquinolones
--resistance option also turns on screening for resistance-conferring mutations:
- Fluoroquinolone resistance SNPs: GyrA 83 & 87 and ParC 80 & 84.
- Colistin resistance due to truncation or loss of MgrB or PmrB (less than 90% gene coverage counts as a truncation/loss).
- AGly (aminoglycosides)
- Bla (beta-lactamases)
- Bla_broad (broad spectrum beta-lactamases)
- Bla_broad_inhR (broad spectrum beta-lactamases with resistance to beta-lactamase inhibitors)
- Bla_Carb (carbapenemase)
- Bla_ESBL (extended spectrum beta-lactamases)
- Bla_ESBL_inhR (extended spectrum beta-lactamases with resistance to beta-lactamase inhibitors)
- Fcyn (fosfomycin)
- Flq (fluoroquinolones)
- Gly (glycopeptides)
- MLS (macrolides)
- Ntmdz (nitroimidazole, e.g. metronidazole)
- Phe (phenicols)
- Rif (rifampin)
- Sul (sulfonamides)
- Tet (tetracyclines)
- Tmt (trimethoprim)
Scores and counts
Kleborate outputs a simple categorical virulence score, and if resistance screening is enabled, an antimicrobial resistance score as well. These scores provide a rough categorisation of the strains to facilitate monitoring resistance-virulence convergence:
- The virulence score ranges from 0 to 5:
- 0 = no virulence loci
- 1 = yersiniabactin only
- 2 = yersiniabactin and colibactin, or colibactin only
- 3 = aerobactin and/or salmochelin only (without yersiniabactin or colibactin)
- 4 = aerobactin and/or salmochelin with yersiniabactin (without colibactin)
- 5 = yersiniabactin, colibactin and aerobactin and/or salmochelin
- The resistance score ranges from 0 to 3:
- 0 = no ESBL, no carbapenemase (regardless of colistin resistance)
- 1 = ESBL, no carbapenemase (regardless of colistin resistance)
- 2 = Carbapenemase without colistin resistance (regardless of ESBL)
- 3 = Carbapenemase with colistin resistance (regardless of ESBL)
When resistance screening is enabled, Kleborate also quantifies how many resistance genes are present and how many resistance classes have at least one gene. Since a resistance class can have multiple genes (as is often the case for the intrinsic genes in the Bla class), the gene count is typically higher than the class count.
By using the
--species option, Kleborate will attempt to identify the species of Klebsiella. It does this by comparing the assembly using Mash to a curated set of Klebsiella assemblies from NCBI and reporting the species of the closest match. Kleborate considers a Mash distance of ≤ 0.01 to be a strong species match. A distance of > 0.01 and ≤ 0.03 is a weak match and might indicate that your sample is a novel lineage or a hybrid between multiple Klebsiella species.
Here is an annotated tree of the reference assemblies, made by mashtree:
Kleborate is designed for the well-studied group of species at the top right of the tree which includes the 'big three': pneumoniae, quasipneumoniae (two subspecies) and variicola. K. quasivariicola is more recently characterised and described here: Long 2017. The Kp5 group does not yet have a species name and was described in this paper: Blin 2017. More distant Klebsiella species (oxytoca, michiganensis, grimontii and aerogenes) are also included, but the virulence profiles of these are less well characterised and deserve further attention.
Kleborate will also call other species in Enterobacteriaceae, as different species sometimes end up in Klebsiella collections. These names are again assigned based on the clades in a mashtree, but were not as carefully curated as the Klebsiella species (so take them with a grain of salt).
Basic capsule prediction with wzi allele typing
By default, Kleborate will report the closest match amongst the wzi alleles in the BIGSdb. This is a marker of capsule locus (KL) type, which is highly predictive of capsule (K) serotype. Although there is not a 1-1 relationship between wzi allele and KL/K type, there is a strong correlation (see Wyres et al, MGen 2016). The wzi allele can provide a handy way of spotting the virulence-associated types (wzi=K1, wzi2=K2, wzi5=K5); or spotting capsule switching within clones, e.g. you can tell which ST258 lineage you have from the wzi type (wzi154: the main lineage II; wzi29: recombinant lineage I; others: probably other recombinant lineages).
Capsule (K) and O antigen (LPS) serotype prediction using Kaptive
You can optionally turn on capsule typing using the dedicated capsule typing tool Kaptive:
--kaptive_kturns on Kaptive screening of the K locus
--kaptive_oturns on Kaptive screening of the O locus
--kaptiveturns on both (is equivalent to
This will significantly increase the runtime of Kleborate, but provide much more detailed information about the K and/or O loci and their genes.
Run these commands to download some well-known Klebsiella genomes and run Kleborate with all optional screens enabled:
wget -O NTUH-K2044.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/885/GCA_000009885.1_ASM988v1/GCA_000009885.1_ASM988v1_genomic.fna.gz wget -O SGH10.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/813/595/GCA_002813595.1_ASM281359v1/GCA_002813595.1_ASM281359v1_genomic.fna.gz wget -O Klebs_HS11286.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/240/185/GCA_000240185.2_ASM24018v2/GCA_000240185.2_ASM24018v2_genomic.fna.gz wget -O MGH78578.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/016/305/GCA_000016305.1_ASM1630v1/GCA_000016305.1_ASM1630v1_genomic.fna.gz kleborate --all -o results.txt -a *.fasta.gz
Concise results (stdout)
These are the concise Kleborate results that it prints to the terminal:
|Klebs_HS11286||Klebsiella pneumoniae||ST11||1||2||ybt 9; ICEKp3||15||-||0||-||0||-||0||-||-||wzi74||KL103||Very high||O2v1||Very high||StrB;StrA*;AadA2*;RmtB;Aac3-IId*?||-||-||GyrA-83I;ParC-80I||-||-||-||-||-||SulII||TetG||DfrA12?||AmpH*||KPC-2||CTX-M-14;CTX-M-14||-||SHV-11||TEM-30*;TEM-30*;TEM-30*|
|NTUH-K2044||Klebsiella pneumoniae||ST23||4||0||ybt 2; ICEKp1||326||-||0||iuc 1||1||iro 3||18-1LV||rmpA_11 (ICEKp1),rmpA_2 (KpVP-1)||rmpA2_3||wzi1||KL1||Perfect||O1v2||Very high||-||-||-||-||-||-||-||-||-||-||-||-||AmpH;SHV-190*||-||-||-||-||-|
|SGH10||Klebsiella pneumoniae||ST23||5||0||ybt 1; ICEKp10||53||clb 2||29||iuc 1||1||iro 1||2||rmpA_2 (KpVP-1)||rmpA2_6*||wzi1||KL1||Very high||O1v2||Very high||-||-||-||-||-||-||-||-||-||-||-||-||AmpH;SHV-190*||-||-||-||-||-|
Full results (file)
Here are the full Kleborate results, written to
|Klebs_HS11286||Klebsiella pneumoniae||strong||7||5333942||5333942||ST11||1||2||9||17||ybt 9; ICEKp3||15||-||0||-||0||-||0||-||-||wzi74||KL103||*||Very high||96.69%||O2v1||none||Very high||97.72%||ST11||3||3||1||1||1||1||4||14||11||14||5||9||22||19||10||5||11||11||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||StrB;StrA*;AadA2*;RmtB;Aac3-IId*?||-||-||GyrA-83I;ParC-80I||-||-||-||-||-||SulII||TetG||DfrA12?||AmpH*||KPC-2||CTX-M-14;CTX-M-14||-||SHV-11||TEM-30*;TEM-30*;TEM-30*|
|NTUH-K2044||Klebsiella pneumoniae||strong||2||5248520||5248520||ST23||4||0||0||0||ybt 2; ICEKp1||326||-||0||iuc 1||1||iro 3||18-1LV||rmpA_11 (ICEKp1),rmpA_2 (KpVP-1)||rmpA2_3||wzi1||KL1||none||Perfect||100.00%||O1v2||none||Very high||99.13%||ST23||2||1||1||1||9||4||12||9||7||9||6||5||1||1||6||7||7||6||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||-||AmpH;SHV-190*||-||-||-||-||-|
|SGH10||Klebsiella pneumoniae||strong||2||5485114||5485114||ST23||5||0||0||0||ybt 1; ICEKp10||53||clb 2||29||iuc 1||1||iro 1||2||rmpA_2 (KpVP-1)||rmpA2_6*||wzi1||KL1||none||Very high||100.00%||O1v2||none||Very high||99.11%||ST23||2||1||1||1||9||4||12||2||2||2||2||2||6||124||2||2||2||2||2||2||2||2||2||2||2||3||2||2||2||2||2||2||2||-||-||-||-||-||-||-||-||-||-||-||-||AmpH;SHV-190*||-||-||-||-||-|
Typing from Illumina reads
MLST assignment can also be achieved direct from reads using SRST2:
- Download the YbST, CbST, AbST, SmST allele sequences and profile tables from the data directory in this repository.
- Install SRST2 if you don't already have it (
git clone https://github.com/katholt/srst2).
- Run SRST2, setting the
--mlst_definitionsto point to the YbST or CbST allele sequences and profile tables.
Note that currently you can only run SRST2 with one MLST scheme at a time, so in order to type MLST, YbST and CbST you will need to run three separate commands:
srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output YbST --log --mlst_db ybt_alleles.fasta --mlst_definitions YbST_profiles.txt srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output CbST --log --mlst_db clb_alleles.fasta --mlst_definitions CbST_profiles.txt srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output Klebs --log --mlst_db Klebsiella_pneumoniae.fasta --mlst_definitions kpnuemoniae.txt
Kleborate is under active development with many other Klebs genomic analysis tools and projects in progress.
Please get in touch via the GitHub issues tracker if you have any issues, questions or ideas.
For more on our lab, including other software, see http://holtlab.net
Stop! Kleborate and listen
ICEKp is back with my brand-new invention
If there was a problem, Klebs'll solve it
Check out the hook while Klebs evolves it