Skip to content

Commit

Permalink
fix and polish readme
Browse files Browse the repository at this point in the history
  • Loading branch information
oschwengers committed Jun 25, 2019
1 parent 1b5749f commit b52cec3
Showing 1 changed file with 35 additions and 36 deletions.
71 changes: 35 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Platon: Plasmid contig classification and characterization for short read draft assemblies.
Author: Oliver Schwengers (oliver.schwengers@computational.bio.uni-giessen.de)


## Contents
Expand All @@ -13,32 +12,27 @@ Author: Oliver Schwengers (oliver.schwengers@computational.bio.uni-giessen.de)
- [Citation](#citation)



## Description
Platon classifies contigs from bacterial WGS short read assemblies as plasmid or
chromosome contigs, i.e. they either originate from a plasmid or a chromosome, respectively.
Therefore, Platon takes advantage of pre-computed protein distribution statistics
and computes mean protein scores (**MPS**) for each contig and finally tests
them against certain thresholds. Contigs below a sensitivity threshold get
classified as chromosome, contigs above a specifivity threshold get classified
as plasmid. Contigs which protein score lies in between these thresholds get
chromosome contigs. Therefore, Platon computes mean protein scores (**MPS**)
based on pre-computed protein distribution statistics and tests them against
specific thresholds. Contigs which MPS does not reach the thresholds are
comprehensively characterized and finally classified following an heuristic approach.

In detail Platon conducts three analysis steps. First, it predicts open reading
frames and searches the coding sequences against a database of marker genes.
These are based on the NCBI RefSeq PCLA clusters to which we automatically
pre-computed individual protein scores capturing the probability on which kind of
replicon a certain protein is rather to be found on, i.e. on a plasmid or a
chromosome. Platon then calculates the **MPS** for each contig and either
classifies them as chromosome if the **MPS** is below a sensitivity cutoff
(counting for 95 % sensitivity) or as plasmid if the **MPS** is above a
specificity cutoff, counting for 99.99 % specificity.
These threshold have been calculated by Monte Carlo simulations of artifical
contigs created from closed RefSeq chromosome and plasmid sequences. In a second
Platon conducts three analysis steps. First, it predicts open reading
frames and searches the coding sequences against a custom and pre-computed database
comprising marker protein sequences and probability scores. These scores express
the empirical probability on which kind of replicon a certain protein was found
based on complete NCBI RefSeq genomes and plasmids.
Platon then calculates the MPS for each contig and either classifies them
as chromosome if the MPS is below a sensitivity cutoff (95% sensitivity) or as
plasmid if the MPS is above a specificity cutoff (99.99% specificity).
These thresholds have been calculated by Monte Carlo simulations of artifical
contigs created from complete RefSeq chromosome and plasmid sequences. In a second
step contigs passing the sensitivity filter get comprehensivley characterized.
Hereby, Platon tries to circularize the contig sequences, searches for rRNA,
replication, mobilization and conjugation genes as well as incompatibility group
DNA probes and finally performs a BLAST search against a plasmid database.
DNA probes and finally performs a BLAST search against the NCBI plasmid database.
In a third step, Platon finally classifies all remaining contigs based on an heuristic
approach, i.e. a decision tree of simple rules exploiting all information at hand.

Expand All @@ -47,11 +41,11 @@ approach, i.e. a decision tree of simple rules exploiting all information at han

### Input
Platon accepts draft assemblies in fasta format. If contigs have been assembled with
SPAdes, Platon is able to extract the coverage information from the contigs names.
SPAdes, Platon is able to extract the coverage information from the contig names.

### Output
Contigs classified as plasmid sequences are printed as tab separated values to
`STDOUT` comprising the following columns:
For each contig classified as plasmid sequence the following columns are printed
to `STDOUT` as tab separated values:
- Contig ID
- Length
- Coverage
Expand All @@ -65,7 +59,7 @@ Contigs classified as plasmid sequences are printed as tab separated values to
- \# rRNA Genes
- \# Plasmid Database Hits

Additionally, Platon writes the following files into the output directory:
In addition, Platon writes the following files into the output directory:
- `<prefix>`.plasmid.fasta: contigs classified as plasmids or plasmodal origin
- `<prefix>`.chromosome.fasta: contigs classified as chromosomal origin
- `<prefix>`.tsv: dense information as printed to STDOUT (see above)
Expand All @@ -74,9 +68,9 @@ All files are prefixed (`<prefix>`) as the input genome fasta file.


## Installation
Platon can be installed/used in 3 different ways.
Platon can be installed/used in 2 different ways.

In all cases, a custom database must be downloaded which we provide for download:
In all cases, the custom database must be downloaded which we provide for download:
https://s3.computational.bio.uni-giessen.de/swift/v1/platon/db.tar.gz

### GitHub
Expand All @@ -86,10 +80,10 @@ https://s3.computational.bio.uni-giessen.de/swift/v1/platon/db.tar.gz
Example:
```
$ git clone git@github.com:oschwengers/platon.git
$ wget http://www.bi.cs.titech.ac.jp/ghostz/releases/ghostz-1.0.2.tar.gz
$ wget https://s3.computational.bio.uni-giessen.de/swift/v1/platon/db.tar.gz
$ tar -xzf db.tar.gz
$ rm db.tar.gz
$ platon/bin/platon --db ./db ...
$ platon/bin/platon --db ./db genome.fasta
```

Info: Just move the extracted database directory into the platon directory.
Expand All @@ -101,7 +95,7 @@ $ wget https://s3.computational.bio.uni-giessen.de/swift/v1/platon/db.tar.gz
$ tar -xzf db.tar.gz
$ rm db.tar.gz
$ mv db $PLATON_HOME
$ platon/bin/platon ...
$ platon/bin/platon genome.fasta
```

### Pip
Expand All @@ -115,7 +109,7 @@ $ pip3 install cb-platon
$ wget https://s3.computational.bio.uni-giessen.de/swift/v1/platon/db.tar.gz
$ tar -xzf db.tar.gz
$ rm db.tar.gz
$ platon --db ./db ...
$ platon --db ./db genome.fasta
```

3rd party dependencies on Ubuntu (3.):
Expand All @@ -127,6 +121,9 @@ $ cd ghostz-1.0.2/
$ make
$ sudo cp ghostz /usr/bin/
```
If there are any issues compiling ghostz, please make sure you have everything
correctly setup, e.g. `$ sudo apt install build-essential`.


## Usage
Usage:
Expand All @@ -151,24 +148,27 @@ optional arguments:
--version show program's version number and exit
```


## Examples
Simple:
```
$ platon ecoli.fasta
$ platon genome.fasta
```

Expert: writing results to `results` directory with verbose output using 8 threads:
```
$ platon --output ./results --verbose --threads 8 ecoli.fasta
$ platon -db ~/db --output results/ --verbose --threads 8 genome.fasta
```


## Database
Platon depends on a custom database based on NCBI RefSeq nonredundant proteins
(NRP), PCLA clusters, RefSeq Plasmid database, PlasmidFinder db as well as custom
HMM models. These databases (RefSeq release 90) can be downloaded here:
(zipped 1.8 Gb, unzipped 2.5 Gb)
https://s3.computational.bio.uni-giessen.de/swift/v1/platon/db.tar.gz


## Dependencies
Platon was developed and tested on Python 3.5.
It depends on BioPython (>=1.71).
Expand All @@ -180,12 +180,11 @@ Additionally, it depends on the following 3rd party executables:
- MUMmer (4.0.0-beta2) <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC395750/> <https://github.com/gmarcais/mummer>
- INFERNAL (1.1.2) <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810854> <http://eddylab.org/infernal>

Platon has been tested against aforementioned software versions.


## Citation
A manuscript is in preparation... stay tuned!

To temporarily cite our work, please transitionally refer to:
> Schwengers O., Barth P., Falgenhauer L., Chakraborty T., Goesmann A. (2019) PLATON: Plasmid contig classification and characterization for short read draft assemblies. GitHub https://github.com/oschwengers/platon
PLATON: Plasmid contig classification and characterization for short read draft assemblies. Oliver Schwengers, Patrick Barth, Linda Falgenhauer, Trinad Chakraborty, Alexander Goesmann. GitHub https://github.com/oschwengers/platon
As PLATON takes advantage of PlasmidFinder's incompatibility database, please also cite:
> Carattoli A., Zankari E., Garcia-Fernandez A., Voldby Larsen M., Lund O., Villa L., Aarestrup F.M., Hasman H. (2014) PlasmidFinder and pMLST: in silico detection and typing of plasmids. Antimicrobial Agents and Chemotherapy, https://doi.org/10.1128/AAC.02412-14

0 comments on commit b52cec3

Please sign in to comment.