Skip to content
/ plaSquid Public template

Nextflow pipeline for plasmid detection and classification from metagenomic data

License

Notifications You must be signed in to change notification settings

mgimenez720/plaSquid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

plaSquid: Plasmid Sequences Identification in metagenomic assemblies.

Description

  • plaSquid is a Nextflow pipeline for plasmid detection and classification in genomic and metagenomic data. This pipeline accepts either genomic or metagenomic assemblies as input (.fasta). It uses two different approaches to detect plasmids sequences: alignment with minimap2 against a plasmidic database (minidist) and HMM dependent search of plasmid specific genes (repsearch).

  • plaSquid also classifies plasmids into replicon types and MOB groups by comparing RIPs or Relaxases against custom HMMs.

  • plaSquid can extract plasmids RIP or MOB sequences in order to further analyze these proteins.

  • plaSquid summarises the information gathered by the two complementary approaches in a single output table and allows further analysis as it outputs plasmidic contigs in a single multifasta file ("Result.fasta")

Pipeline overview

Installation

git clone https://github.com/mgimenez720/plaSquid/
cd plaSquid/

You need nextflow installed in order to run plaSquid. Documentation is available here

PlaSquid can be ran using docker or conda.

If you want to generate a permanent conda environment (recommended) you can try:

conda env create -f environments/plaSquid.yml

If you want to generate a permanent docker image you can try:

docker pull mgimenez720/plasquid:latest

Dependencies

All dependencies are provided within the containers available. Manual installation is discouraged.

hmmer 3.3.1, infernal 1.1.3, minimap2 2.17, prodigal 2.6.3, R packages: dplyr 1.0.4, tidyverse 1.3.0, seqinr 4.2.5, biostrings 2.58.0.

Usage

nextflow run main.nf --contigs {testdata/test.fasta} --outdir {plaSquid_result} 

arguments:

--contigs       Path to input assemblies.
--mmi           Path to Minimap2 indexed (.mmi) or fasta (.fasta/.fna) plsdb database.
--outdir        Path to output directory where results are written.
--help          Print help message and exit

subworkflows:

--minidist      Run mapping of contigs against plsdb database.
--repsearch     Run search and classification of RIP and MOB (Rel) genes.
--ripextract    Extract replication initiator proteins sequences.
--mobextract    Extract relaxases sequences.

profiles:

-profile conda  Installs dependencies using a conda environment
-profile docker Installs dependencies within a docker image
-profile server runs using 15 cpus and 50 Gb
-profile test   tests dependencies and normal functioning


Authors:

Matías Giménez
Ignacio Ferrés
Gregorio Iraola


Microbial Genomics Laboratory
Institut Pasteur Montevideo (Uruguay)

Output

  • plaSquid outputs consist of a fasta file "Result.fasta" with plasmids contigs detected along with a table "Result.tsv" with the following fields:

"Contig": contig id for plaSquid
"name": contig name in the assembly file
"Sim-dist": S value obtained by Minidist workflow
"plsdb_match": plasmid matched at plsdb database
"Match_length": length of the plasmid matched at plsdb
"RIP_domain": RIP-domain found in that contig
"MOB_group": MOB group classification of relaxase found in that contig
"Rep_type": Rep-type classifiation of the contig detected
"Contig_length": size of the contig detected

Citation

Available preprint https://doi.org/10.1101/2022.08.04.502827

If you are going to use plaSquid results for further analysis, consider citing the following tools:

MOBscan (https://castillo.dicom.unican.es/mobscan_about/) RepliconFinder (https://journals.asm.org/doi/pdf/10.1128/aac.02412-14) PLSDB (https://ccb-microbe.cs.uni-saarland.de/plsdb/) GTDB-tk (https://github.com/Ecogenomics/GTDBTk)

Note

This is a beta version, please report bugs or misfunctions detected.

About

Nextflow pipeline for plasmid detection and classification from metagenomic data

Resources

License

Stars

Watchers

Forks

Packages

No packages published