metashot/prok-classify is a workflow for assigning objective taxonomic classifications to bacterial and archaeal genomes using GTDB-Tk and the Genome Database Taxonomy GTDB.
- Input: prokaryotic genomes in FASTA format;
- Taxonomic classification using GTDB-TK version 2.1.1 (requires GTDB reference R207_v2);
- Filter genomes by domain (Bacteria and Achaea).
-
Install Docker (or Singulariry) and Nextflow (see Dependences);
-
Download and extract/unzip the GTDB-TK reference data (see https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data):
wget https://data.gtdb.ecogenomic.org/releases/release207/207.0/auxillary_files/gtdbtk_r207_v2_data.tar.gz tar -xvzf gtdbtk_r207_v2_data.tar.gz
-
Start running the analysis:
nextflow run metashot/prok-classify \ --genomes "data/*.fa" \ --gtdbtk_db ./release207_v2 \ --outdir results
See the file nextflow.config
for the complete list of
parameters.
The files and directories listed below will be created in the results
directory
after the pipeline has finished.
bacteria_summary.tsv
: the GTDB-Tk summary for bacterial genomes (documentation);archaea_summary.tsv
: the GTDB-Tk summary for archaeal genomes (documentation);bacteria_genomes
: genomes classified as bacteria by GTDB-Tk;archaea_genomes
: genomes classified as archaea by GTDB-Tk.
gtdbtk
: main GTDB-Tk output files (documentation).
Please refer to System requirements for the complete list of system requirements options.