Skip to content

06. Assembly

Krista Ternus edited this page Nov 29, 2020 · 8 revisions

Assembly

Table of Contents

Workflow Overview

The tools within this workflow perform metagenome assemblies with the de novo assemblers metaSPAdes in SPAdes version 3.14.0, as well as MEGAHIT version 1.1.2, on trimmed Illumina paired-end reads. The SPAdes container can also be used to perform de novo assemblies of isolate DNA with SPAdes and plasmidSPAdes, or RNA transcripts with rnaSPAdes. QUAST version 5.0.2 is used to evaluate the assemblies, and MultiQC version 1.4 provides aggregated visualizations for the QUAST reports. This workflow has been tested to run offline in an air-gapped system following the execution of the Read Filtering Workflow.

Required Files

If you have not already, you will need to activate your metscale environment and perform the Offline Setup for the assembly workflow:

[user@localhost ~]$ conda activate metscale 

(metscale)[user@localhost ~]$ cd metscale/workflows

(metscale)[user@localhost workflows]$ python download_offline_files.py --workflow assembly  

Singularity Images

In the metscale/container_images/ directory, you should see the following Singularity images that were created when running the assembly or all flag during the Offline Setup:

File Name File Size
spades_3.14.0--h2d02072_0.sif 104 MB
megahit_1.1.2--py35_0.sif 48 MB
quast_5.0.2--py27pl526ha92aebf_0.sif 810 MB
multiqc_1.4--py35_0.sif 453 MB

If you are missing any of these files, you should re-run the appropriate setup command, as per instructions in the Offline Setup.

Input Files

The assembly workflow uses the Illumina paired-end filtered reads (outputs from the Read Filtering Workflow) as its inputs. These files should be located in the metscale/workflows/data directory:

File Name File Size
SRR606249_subset10_1_reads_trim2_1.fq.gz 365 MB
SRR606249_subset10_1_reads_trim2_2.fq.gz 359 MB
SRR606249_subset10_1_reads_trim30_1.fq.gz 313 MB
SRR606249_subset10_1_reads_trim30_2.fq.gz 300 MB

These example reference assembly files should also be located in the metscale/workflows/data/ directory for reference-based assembly evaluation with MetaQUAST:

File Name File Size MD5 Checksum
GCF_000008565.1_ASM856v1_genomic.fna.gz 924 KB a556db886f11a8af3783d63319140e74
Shakya_Refs/ 60 MB none, 64 reference files in directory

If these files look good to go, then you may proceed to run the example dataset through the assembly workflow rules.

Workflow Execution

Workflows are executed according to the sample names and workflow parameters, as specified in the config file. For more information about config files, see the Getting Started wiki page.

After the config file is ready, be sure to specify the Singularity bind path from the metscale/workflows directory before running the assembly workflow.

cd metscale/workflows
export SINGULARITY_BINDPATH="data:/tmp"  

You can then execute of the workflows through snakemake using the following command:

snakemake --use-singularity {rules} {other options}

The following rules are available for execution in the assembly workflow (yellow stars indicate terminal rules):

The assembly rules and their parameters are listed under "workflows" in the metscale/workflows/config/default_workflowconfig.settings config file.

Sample Type Rule Description
Metagenome assembly_metaspades_workflow metaSPAdes assemblies filtered reads
Metagenome assembly_megahit_workflow MEGAHIT assemblies filtered reads
Metagenome assembly_all_workflow metaSPAdes and MEGAHIT both independently assemble filtered reads
Metagenome assembly_quast_workflow MetaQUAST evaluates the metagenomic assemblies
Metagenome assembly_multiqc_workflow MultiQC aggregates all QUAST reports from MEGAHIT or metaSPAdes assemblies into a single report
Metagenome assembly_metaquast_workflow MetaQUAST evaluates the metagenomic assemblies against a single reference or multiple references
RNA Transcripts assembly_rnaspades_workflow rnaSPAdes assemblies filtered reads from RNA transcripts*
RNA Transcripts assembly_rnaspades_metaquast_workflow MetaQUAST evaluates the RNA transcript assemblies*
RNA Transcripts assembly_rnaspades_multiqc_workflow MultiQC aggregates all MetaQUAST reports from rnaSPAdes into a single report*
Isolate assembly_spades_workflow SPAdes assemblies filtered reads from isolates**
Isolate assembly_quast_reference_with_spades_workflow QUAST evaluates the spades assembly against a reference**
Bacterial Isolate with Plasmids assembly_plasmidspades_workflow plasmidSPAdes assemblies filtered reads from plasmids of isolates**
Bacterial Isolate with Plasmids assembly_quast_reference_with_plasmidspades_workflow QUAST evaluates the plasmidSPAdes assembly against a reference**

*The assembly_rnaspades_workflow, assembly_rnaspades_metaquast_workflow, and assembly_rnaspades_multiqc_workflow rules are intended to be run on RNA transcript sequences.

**The assembly_spades_workflow, assembly_plasmidspades_workflow, assembly_quast_reference_with_spades_workflow, and assembly_quast_reference_with_plasmidspades_workflow rules are intended to be run with isolate sequences.

This wiki describes how to run those rules with the Shakya subset 10 test dataset for explanatory purposes, but it is more appropriate to run isolate genome sequences through the SPAdes and plasmidSPAdes assemblers, and metatranscriptomes or transcriptomes would be assembled with rnaSPAdes. Complex metagenomes like the Shakya subset 10 test dataset should be assembled with MEGAHIT or metaSPAdes.

The metagenome assembly rules for MEGAHIT and metaSPAdes can be run independently, or run together by listing them back to back in the command as such:

snakemake --use-singularity assembly_all_workflow assembly_metaquast_workflow assembly_multiqc_workflow

The following command will run only the metSPAdes assembler:

snakemake --use-singularity assembly_metaspades_workflow

The following command will run only the MEGAHIT assembler:

snakemake --use-singularity assembly_megahit_workflow

Both metagenome assemblers can be run in tandem, or with the assembly_all_workflow rule:

snakemake --use-singularity assembly_all_workflow 

To assemble RNA transcript sequences with rnaSPAdes, run the following rule:

snakemake --use-singularity assembly_rnaspades_workflow

To evaluate the metagenome assemblies with MEGAHIT or metaSPAdes, QUAST can be run with the assembly_quast_workflow rule:

snakemake --use-singularity assembly_quast_workflow

The assembly_multiqc_workflow rule concatenates all of the metagenome QUAST reports into a single report with MultiQC. This rule can also be used independently to execute the entire metagenomic assembly workflow and reference-independent assembly evaluations:

snakemake --use-singularity assembly_multiqc_workflow

To evaluate the metagenomic assemblies using MetaQUAST with a single reference, the reference file should be specified in default_workflowparams.settings under the assembly section. The default file is downloaded in the offline assembly as an example, but this filename should be updated based on the expected reference for a sample:

"metaquast_ref" :  "GCF_000008565.1_ASM856v1_genomic.fna.gz",

If multiple references are known for a metagenomic sample, those can all be used in a MetaQUAST evaluation by creating a sub-directory within /workflows/data that includes all of the reference genome assembly files. During the offline download, a directory called Shakya_Refs is downloaded that includes all of the expected reference genome assemblies for that sample. This name of the directory of reference genomes can be specified in the assembly section of default_workflowparams.settings, instead of a single reference file:

"metaquast_ref" :  "Shakya_Refs",

Once you have updated the parameter section (if applicable), the following command will execute MetaQUAST with your indicated reference file with MEGAHIT and/or metaSPAdes assemblies:

snakemake --use-singularity assembly_metaquast_workflow

The following command will execute MetaQUAST with your indicated reference file(s) with rnaSPAdes assemblies:

snakemake --use-singularity assembly_rnaspades_metaquast_workflow

The assembly_rnaspades_multiqc_workflowcombines and visualizes the MetaQUAST reports for rnaSPAdes:

snakemake --use-singularity assembly_rnaspades_multiqc_workflow

The following command will run the plasmidSPAdes and SPAdes assemblers on isolates:

snakemake --use-singularity assembly_spades_workflow assembly_plasmidspades_workflow

To evaluate the isolate and/or plasmid assemblies using QUAST, the reference assembly file needs to be specified in default_workflowparams.settings under the assembly section. The default file is downloaded in the offline assembly, but it should be updated to match the expected reference for a sample.

"quast_spades_ref" : "GCF_000008565.1_ASM856v1_genomic.fna.gz",

and

"quast_plasmidspades_ref" : "GCF_000008565.1_ASM856v1_genomic.fna.gz",

Once you have updated the parameter section (if applicable), the following command will execute QUAST with your specified reference assembly file:

snakemake --use-singularity assembly_quast_reference_with_spades_workflow assembly_quast_reference_with_plasmidspades_workflow

Additional options for snakemake can be found in the snakemake documentation.

To specify your own parameters for this or any of the workflows prior to execution, see Workflow Architecture for more information.

Output

After successful execution of the assembly workflow, outputs will be found in the metscale/workflows/data/ directory. You should expect to see the following files for each pair of trimmed reads:

Tool Output File Name Description
metaSPAdes {sample}_1_reads_trim{quality_threshold}.metaspades.contigs.fa The final metaSPAdes assembled contigs from metagenomes, which is the output file used by downstream analysis tools
metaSPAdes {sample}_1_reads_trim{quality_threshold}.metaspades/ Directory with additional outputs from the metaSPAdes assembler
MEGAHIT {sample}_1_reads_trim{quality_threshold}.megahit.contigs.fa The final MEGAHIT assembled contigs from metagenomes, which is the output file used by downstream analysis tools
MEGAHIT {sample}_1_reads_trim{quality_threshold}.megahit/ Directory with additional outputs from the MEGAHIT assembler
QUAST (without references) {sample}_1_reads_trim{quality_threshold}.{assembler}_quast/ Directory with QUAST outputs for MEGAHIT and/or metaSPAdes
QUAST (without references) {sample}_1_reads_trim{quality_threshold}. {assembler}_quast/report.html QUAST HTML report for MEGAHIT and/or metaSPAdes
MultiQC {sample}_1_reads.{assembler}_multiqc_report.html MultiQC HTML report, including multiple QUAST reports
MultiQC {sample}_1_reads.{assembler}_multiqc_report_data/ MultiQC directory with additional QUAST data and statistics
MetaQUAST with metaSPAdes and/or MEGAHIT {sample}_1_reads_trim{quality_threshold}.{assembler}_metaquast/ MetaQUAST directory with MetaQUAST HTML report, additional data, and statistics for metaSPAdes and/or MEGAHIT
MetaQUAST with rnaSPAdes {sample}_1_reads_trim{quality_threshold}.rnaspades_metaquast_report_data/ MetaQUAST directory with MetaQUAST HTML report, additional data, and statistics for rnaSPAdes assembly
SPAdes {sample}_1_reads_trim{quality_threshold}_k{k_values}.spades.contigs.fa The final SPAdes assembled contigs from isolates, which is the output file used by downstream analysis tools
SPAdes {sample}_1_reads_trim{quality_threshold}_k{k_values}.spades/ Directory with additional outputs from the SPAdes assembler
plasmidSPAdes {sample}_1_reads_trim{quality_threshold}.plasmidspades.contigs.fa The final plasmidSPAdes assembled contigs from isolates, which is the output file used by downstream analysis tools
plasmidSPAdes {sample}_1_reads_trim{quality_threshold}.plasmidspades/ Directory with additional outputs from the plasmidSPAdes assembler
QUAST (with reference) {sample}_1_reads_trim{quality_threshold}.{assembler}-quast/ Directory with QUAST outputs for SPAdes and/or plasmidSPAdes
QUAST (with reference) {sample}_1_reads_trim{quality_threshold}.{assembler}-quast/report.html QUAST HTML report for SPAdes and/or plasmidSPAdes
rnaSPAdes {sample}_1_reads_trim{quality_threshold}.rnaspades.transcripts.fasta The final rnaSPAdes assembled contigs from transcriptome, which is the output file used by downstream analysis tools
rnaSPAdes {sample}_1_reads_trim{quality_threshold}.rnaspades/ rnaSPAdesdirectory with additional data, and statistics for rnaSPAdes

The above files are the major outputs of the assembly workflow, and the *contigs.fa files are used as inputs into the Comparison and/or Functional Inference workflow pages.

Additional Information

Command Line Equivalents

To better understand how the workflows are operating, it may be helpful to see commands that could be used to generate equivalent outputs with the individual tools. Note that the file names in the below examples may not be exact replicates of the file naming conventions in the current workflows, but the commands are equivalent.

The metaSPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:

metaspades.py -m 240 -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2.metaspades
metaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2.metaspades

The metaSPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:

metaspades.py -m 240 -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30.metaspades
metaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30.metaspades

The QUAST evaluations of the metaSPAdes assemblies is equivalent to running these commands:

quast.py {sample}_1_reads_trim2.metaspades.contigs.fa -o {sample}_1_reads_trim2.metaspades_quast
quast.py {sample}_1_reads_trim30.metaspades.contigs.fa -o {sample}_1_reads_trim30.metaspades_quast
quast.py SRR606249_subset10_1_reads_trim2.metaspades.contigs.fa -o SRR606249_subset10_1_reads_trim2.metaspades_quast
quast.py SRR606249_subset10_1_reads_trim30.metaspades.contigs.fa -o SRR606249_subset10_1_reads_trim30.metaspades_quast

The MultiQC aggregation of the metaSPAdes QUAST reports is equivalent to running this command:

multiqc {sample}_1_reads_trim2.metaspades_quast/report.tsv {sample}_1_reads_trim30.metaspades_quast/report.tsv -n {sample}_1_reads_metaspades_multiqc_report -o {sample}_1_reads_metaspades_multiqc_report
multiqc SRR606249_subset10_1_reads_trim2.metaspades_quast/report.tsv SRR606249_subset10_1_reads_trim30.metaspades_quast/report.tsv -n SRR606249_subset10_1_reads_metaspades_multiqc_report -o SRR606249_subset10_1_reads_metaspades_multiqc_report

The MetaQUAST evaluations of the metaSPAdes assemblies is equivalent to running these commands:

metaquast.py {sample}_1_reads_trim2.metaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim2.metaspades_quast  
metaquast.py {sample}_1_reads_trim30.metaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim30.metaspades_quast
metaquast.py SRR606249_subset10_1_reads_trim2.metaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim2.metaspades_quast  
metaquast.py SRR606249_subset10_1_reads_trim30.metaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim30.metaspades_quast

The MEGAHIT assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:

megahit -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz --out-prefix={sample}_1_reads_trim2.megahit -o {sample}_1_reads_trim2.megahit
megahit -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz --out-prefix=SRR606249_subset10_1_reads_trim2.megahit -o SRR606249_subset10_1_reads_trim2.megahit

The MEGAHIT assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:

megahit -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz --out-prefix={sample}_1_reads_trim30.megahit -o {sample}_1_reads_trim30.megahit
megahit -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz --out-prefix=SRR606249_subset10_1_reads_trim30.megahit -o SRR606249_subset10_1_reads_trim30.megahit

The QUAST evaluations of the MEGAHIT assemblies is equivalent to running these commands:

quast.py {sample}_1_reads_trim2.megahit.contigs.fa -o {sample}_1_reads_trim2.megahit_quast  
quast.py {sample}_1_reads_trim30.megahit.contigs.fa -o {sample}_1_reads_trim30.megahit_quast
quast.py SRR606249_subset10_1_reads_trim2.megahit.contigs.fa -o SRR606249_subset10_1_reads_trim2.megahit_quast  
quast.py SRR606249_subset10_1_reads_trim30.megahit.contigs.fa -o SRR606249_subset10_1_reads_trim30.megahit_quast

The MultiQC aggregation of the MEGAHIT QUAST reports is equivalent to running this command:

multiqc {sample}_1_reads_trim2.megahit_quast/report.tsv {sample}_1_reads_trim30.megahit_quast/report.tsv -n {sample}_1_reads_megahit_multiqc_report -o {sample}_1_reads_megahit_multiqc_report
multiqc SRR606249_subset10_1_reads_trim2.megahit_quast/report.tsv SRR606249_subset10_1_reads_trim30.megahit_quast/report.tsv -n SRR606249_subset10_1_reads_megahit_multiqc_report -o SRR606249_subset10_1_reads_megahit_multiqc_report

The MetaQUAST evaluations of the MEGAHIT assemblies is equivalent to running these commands:

metaquast.py {sample}_1_reads_trim2.megahit.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim2.megahit_quast  
metaquast.py {sample}_1_reads_trim30.megahit.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim30.megahit_quast
metaquast.py SRR606249_subset10_1_reads_trim2.megahit.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim2.megahit_quast  
metaquast.py SRR606249_subset10_1_reads_trim30.megahit.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim30.megahit_quast

The SPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:

spades.py -k {k_values} -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2_k{k_values}.spades
spades.py -k 21,33,55 -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2_k21_33_55.spades

The SPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:

spades.py -k {k_values} -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30_k{k_values}.spades
spades.py -k 21,33,55 -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30_k21_33_55.spades

The QUAST evaluations of the SPAdes assemblies with a reference assembly is equivalent to running these commands:

quast.py {sample}_1_reads_trim2.spades.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim2.spades_quast  
quast.py {sample}_1_reads_trim30.spades.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim30.spades_quast
quast.py SRR606249_subset10_1_reads_trim2.spades.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim2.spades_quast  
quast.py SRR606249_subset10_1_reads_trim30.spades.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim30.spades_quast

The plasmidSPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:

plasmidspades.py -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2.plasmidspades
plasmidspades.py -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2.plasmidspades

The plasmidSPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:

plasmidspades.py -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30.plasmidspades
plasmidspades.py -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30.plasmidspades

The QUAST evaluations of the plasmidSPAdes assemblies with a reference assembly is equivalent to running these commands:

quast.py {sample}_1_reads_trim2.plasmidspades.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim2.plasmidSPAdes-quast 
quast.py {sample}_1_reads_trim30.plasmidSPAdes.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim30.plasmidSPAdes-quast
quast.py SRR606249_subset10_1_reads_trim2.plasmidSPAdes.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim2.plasmidSPAdes-quast  
quast.py SRR606249_subset10_1_reads_trim30.plasmidSPAdes.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim30.plasmidSPAdes-quast

The rnaSPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:

rnaspades.py -m 240 -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2.rnaspades
rnaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2.rnaspades

The rnaSPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:

rnaspades.py -m 240 -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30.rnaspades
rnaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30.rnaspades

The MetaQUAST evaluations of the rnaSPAdes assemblies is equivalent to running these commands:

metaquast.py {sample}_1_reads_trim2.rnaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim2.rnaspades_quast  
metaquast.py {sample}_1_reads_trim30.rnaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim30.rnaspades_quast
metaquast.py SRR606249_subset10_1_reads_trim2.rnaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim2.rnaspades_quast  
metaquast.py SRR606249_subset10_1_reads_trim30.rnaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim30.rnaspades_quast

The MultiQC aggregation of the rnaSPAdes MetaQUAST reports is equivalent to running this command:

multiqc {sample}_1_reads_trim2.rnaspades_quast/report.tsv {sample}_1_reads_trim30.rnaspades_quast/report.tsv -n {sample}_1_reads_rnaspades_multiqc_report -o {sample}_1_reads_rnaspades_multiqc_report
multiqc SRR606249_subset10_1_reads_trim2.rnaspades_quast/report.tsv SRR606249_subset10_1_reads_trim30.rnaspades_quast/report.tsv -n SRR606249_subset10_1_reads_rnaspades_multiqc_report -o SRR606249_subset10_1_reads_rnaspades_multiqc_report

Expected Output Files for the Example Dataset

Below is a more detailed description of the output files expected in the metscale/workflows/data/ directory after the assembly workflow has been successfully run.

Using the filtered reads generated by the Read Filtering Workflow:

File Name File Size
SRR606249_subset10_1_reads_trim2_1.fq.gz 365 MB
SRR606249_subset10_1_reads_trim2_2.fq.gz 359 MB
SRR606249_subset10_1_reads_trim30_1.fq.gz 313 MB
SRR606249_subset10_1_reads_trim30_2.fq.gz 300 MB

The following files are produced by SPAdes after assembling the filtered reads from isolates with the assembly_spades_workflow rule:

File Name File Size
SRR606249_subset10_1_reads_trim2_k21_33_55.spades.contigs.fa 150 MB
SRR606249_subset10_1_reads_trim2_k21_33_55.spades/ 779 MB
SRR606249_subset10_1_reads_trim30_k21_33_55.spades.contigs.fa 139 MB
SRR606249_subset10_1_reads_trim30_k21_33_55.spades/ 726 MB

The following files are produced by plasmidSPAdes after assembling plasmids from the filtered reads of isolates with the assembly_plasmidspades_workflow rule:

File Name File Size
SRR606249_subset10_1_reads_trim2.plasmidspades.contigs.fa 22 MB
SRR606249_subset10_1_reads_trim2.plasmidspades/ 109 MB
SRR606249_subset10_1_reads_trim30.plasmidspades.contigs.fa 20 MB
SRR606249_subset10_1_reads_trim30.plasmidspades/ 101 MB

The following files are produced by metaSPAdes after assembling the filtered reads with the assembly_metaspades_workflow or assembly_all_workflow rule*:

File Name File Size
SRR606249_subset10_1_reads_trim2.metaspades.contigs.fa 153 MB
SRR606249_subset10_1_reads_trim2.metaspades/ 978 MB
SRR606249_subset10_1_reads_trim30.metaspades.contigs.fa 142 MB
SRR606249_subset10_1_reads_trim30.metaspades/ 909 MB

The following files are produced by MEGAHIT after assembling the filtered reads with the assembly_megahit_workflow or assembly_all_workflow rule*:

File Name File Size
SRR606249_subset10_1_reads_trim2.megahit.contigs.fa 127 MB
SRR606249_subset10_1_reads_trim2.megahit/ 108 KB
SRR606249_subset10_1_reads_trim30.megahit.contigs.fa 115 MB
SRR606249_subset10_1_reads_trim30.megahit/ 104 KB

*Additional files generated by the metaSPAdes and MEGAHIT assemblers are saved in the sub-directories listed above.

The following files are produced by QUAST after evaluating the assemblies with the assembly_quast_workflow rule:

File Name File Size
SRR606249_subset10_1_reads_trim2.metaspades_quast/ 744 KB
SRR606249_subset10_1_reads_trim2.metaspades_quast/report.html 568 KB
SRR606249_subset10_1_reads_trim30.metaspades_quast/ 736 KB
SRR606249_subset10_1_reads_trim30.metaspades_quast/report.html 556 KB
SRR606249_subset10_1_reads_trim2.megahit_quast/ 748 KB
SRR606249_subset10_1_reads_trim2.megahit_quast/report.html 574 KB
SRR606249_subset10_1_reads_trim30.megahit_quast/ 732 KB
SRR606249_subset10_1_reads_trim30.megahit_quast/report.html 557 KB

The following files are produced by MultiQC after aggregating the QUAST reports with the assembly_multiqc_workflow rule:

File Name File Size
SRR606249_subset10_1_reads.megahit_multiqc_report_data/ 60 KB
SRR606249_subset10_1_reads.megahit_multiqc_report.html 1.1 MB
SRR606249_subset10_1_reads.metaspades_multiqc_report_data/ 60 KB
SRR606249_subset10_1_reads.metaspades_multiqc_report.html 1.1 MB

The tables below summarize statistics from the QUAST evaluations of SRR606249_subset10_1_reads assemblies in the final MultiQC report.

Sample Name N50 (Kbp) N75 (Kbp) L50 (K) L75 (K) Largest contig (Kbp) Length (Mbp)
SRR606249_subset10_1_reads_trim2.metaspades.contigs 2.5 1.0 7.2 26,791.0 264.3 115.8
SRR606249_subset10_1_reads_trim30.metaspades.contigs 2.5 1.0 8.1 27,066.0 172.8 104.1
SRR606249_subset10_1_reads_trim2.megahit.contigs 2.9 1.1 6.1 23,014.0 264.4 109.7
SRR606249_subset10_1_reads_trim30.megahit.contigs 2.4 1.0 7.0 23,574.0 212.1 97.2

The statistics from the MEGAHIT and metaSPAdes assemblies for this sample are similar, although this does not assess potential differences in the taxonomic or functional content of the assembled contigs.

The following files are produced by QUAST after evaluating the assemblies with a reference for the assembly_quast_reference_with_spades_workflow and assembly_quast_reference_with_plasmidspades_workflow rules:

File Name File Size
SRR606249_subset10_1_reads_trim2.spades_quast/ 1.1 MB
SRR606249_subset10_1_reads_trim2.spades_quast/report.html 856 KB
SRR606249_subset10_1_reads_trim30.spades_quast/ 1.1 MB
SRR606249_subset10_1_reads_trim30.spades_quast/report.html 813 KB
SRR606249_subset10_1_reads_trim2.plasmidspades-quast/ 664 KB
SRR606249_subset10_1_reads_trim2.plasmidspades-quast/report.html 443 KB
SRR606249_subset10_1_reads_trim30.plasmidspades-quast/ 676 KB
SRR606249_subset10_1_reads_trim30.plasmidspades-quast/report.html 455 KB

The following files are produced by MetaQUAST after evaluating the assemblies with reference(s) for metaSPAdes and/or MEGAHIT with the assembly_metaquast_workflow rule:

File Name File Size
SRR606249_subset10_1_reads_trim2.metaspades_metaquast/ 1.1 MB
SRR606249_subset10_1_reads_trim2.metaspades_metaquast/report.html 927 KB
SRR606249_subset10_1_reads_trim30.metaspades_metaquast/ 1.1 MB
SRR606249_subset10_1_reads_trim30.metaspades_metaquast/report.html 925 KB
SRR606249_subset10_1_reads_trim2.megahit_metaquast/ 1.1 MB
SRR606249_subset10_1_reads_trim2.megahit_metaquast/report.html 926 KB
SRR606249_subset10_1_reads_trim30.megahit_metaquast/ 1.1 MB
SRR606249_subset10_1_reads_trim30.megahit_metaquast/report.html 924 KB

The following files are produced by rnaSPAdes after assembling the transcript sequences reads with the assembly_rnaspades_workflow:

File Name File Size
SRR606249_subset10_1_reads_trim2.rnaspades.transcripts.fasta 171 MB
SRR606249_subset10_1_reads_trim2.rnaspades/ 673 MB
SRR606249_subset10_1_reads_trim30.rnaspades.transcripts.fasta 156 MB
SRR606249_subset10_1_reads_trim30.rnaspades/ 623 MB

The following files are produced by MetaQUAST after evaluating the rnaSPAdes assemblies with reference(s) for the assembly_rnaspades_metaquast_workflow rule:

File Name File Size
SRR606249_subset10_1_reads_trim2.rnaspades_metaquast_report_data/ 1.1 MB
SRR606249_subset10_1_reads_trim30.rnaspades_metaquast_report_data/ 1.1 MB

The following files are produced by MultiQC after aggregating the rnaSPAdes MetaQUAST reports with the assembly_rnaspades_multiqc_workflow rule:

File Name File Size
SRR606249_subset10_1_reads_trim2.rnaspades_multiqc_report_data/ 64 KB
SRR606249_subset10_1_reads_trim2.rnaspades_multiqc_report.html 1.1 MB
SRR606249_subset10_1_reads_trim30.rnaspades_multiqc_report_data/ 64 KB
SRR606249_subset10_1_reads_trim30.rnaspades_multiqc_report.html 1.1 MB