plaSquid: Plasmid Sequences Identification in metagenomic assemblies.

Description

plaSquid is a Nextflow pipeline for plasmid detection and classification in genomic and metagenomic data. This pipeline accepts either genomic or metagenomic assemblies as input (.fasta). It uses two different approaches to detect plasmids sequences: alignment with minimap2 against a plasmidic database (minidist) and HMM dependent search of plasmid specific genes (repsearch).
plaSquid also classifies plasmids into replicon types and MOB groups by comparing RIPs or Relaxases against custom HMMs.
plaSquid can extract plasmids RIP or MOB sequences in order to further analyze these proteins.
plaSquid summarises the information gathered by the two complementary approaches in a single output table and allows further analysis as it outputs plasmidic contigs in a single multifasta file ("Result.fasta")

Installation

git clone https://github.com/mgimenez720/plaSquid/
cd plaSquid/

You need nextflow installed in order to run plaSquid. Documentation is available here

PlaSquid can be ran using docker or conda.

If you want to generate a permanent conda environment (recommended) you can try:

conda env create -f environments/plaSquid.yml

If you want to generate a permanent docker image you can try:

docker pull mgimenez720/plasquid:latest

Dependencies

All dependencies are provided within the containers available. Manual installation is discouraged.

hmmer 3.3.1, infernal 1.1.3, minimap2 2.17, prodigal 2.6.3, R packages: dplyr 1.0.4, tidyverse 1.3.0, seqinr 4.2.5, biostrings 2.58.0.

Usage

nextflow run main.nf --contigs {testdata/test.fasta} --outdir {plaSquid_result} 

arguments:

--contigs       Path to input assemblies.
--mmi           Path to Minimap2 indexed (.mmi) or fasta (.fasta/.fna) plsdb database.
--outdir        Path to output directory where results are written.
--help          Print help message and exit

subworkflows:

--minidist      Run mapping of contigs against plsdb database.
--repsearch     Run search and classification of RIP and MOB (Rel) genes.
--ripextract    Extract replication initiator proteins sequences.
--mobextract    Extract relaxases sequences.

profiles:

-profile conda  Installs dependencies using a conda environment
-profile docker Installs dependencies within a docker image
-profile server runs using 15 cpus and 50 Gb
-profile test   tests dependencies and normal functioning


Authors:

Matías Giménez
Ignacio Ferrés
Gregorio Iraola


Microbial Genomics Laboratory
Institut Pasteur Montevideo (Uruguay)

Output

plaSquid outputs consist of a fasta file "Result.fasta" with plasmids contigs detected along with a table "Result.tsv" with the following fields:

"Contig": contig id for plaSquid
"name": contig name in the assembly file
"Sim-dist": S value obtained by Minidist workflow
"plsdb_match": plasmid matched at plsdb database
"Match_length": length of the plasmid matched at plsdb
"RIP_domain": RIP-domain found in that contig
"MOB_group": MOB group classification of relaxase found in that contig
"Rep_type": Rep-type classifiation of the contig detected
"Contig_length": size of the contig detected

Citation

Available preprint https://doi.org/10.1101/2022.08.04.502827

If you are going to use plaSquid results for further analysis, consider citing the following tools:

MOBscan (https://castillo.dicom.unican.es/mobscan_about/) RepliconFinder (https://journals.asm.org/doi/pdf/10.1128/aac.02412-14) PLSDB (https://ccb-microbe.cs.uni-saarland.de/plsdb/) GTDB-tk (https://github.com/Ecogenomics/GTDBTk)

Note

This is a beta version, please report bugs or misfunctions detected.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
Modules		Modules
bin		bin
conf		conf
data		data
environments		environments
img		img
testdata		testdata
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plaSquid: Plasmid Sequences Identification in metagenomic assemblies.

Description

Installation

Dependencies

Usage

Output

Citation

Note

About

Releases 1

Packages

Languages

License

mgimenez720/plaSquid

Folders and files

Latest commit

History

Repository files navigation

plaSquid: Plasmid Sequences Identification in metagenomic assemblies.

Description

Installation

Dependencies

Usage

Output

Citation

Note

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages