Skip to content

10. Post Processing

Krista Ternus edited this page Jan 24, 2021 · 6 revisions

Post Processing

Table of Contents

Overview

This page describes the post-processing and interpretation of results after running the MetScale workflows.

Organizing Final Output Files

Following the completion of all analyses, a final post processing command has been incorporated to organize all datasets according to their sample name(s). The post_processing_move_samples_dir_workflow snakemake rule will create sub-directories in the data/ directory and move all files associated with that sample into their respective sub-directories.

This can be executed with the following command:

snakemake --use-singularity --configfile=config/my_custom_config.json post_processing_move_samples_dir_workflow

Before executing this command, we recommend that users ensure they are completely finished with all analyses. Re-executing a workflow can't be done unless data files are moved back into the metagenomics/workflows/data/ directory.

Aside from the organizational benefits of moving the files for each sample into their own directory, we are also exploring options for generating final reports that could be executed on all files located within a single directory.

The following rules are available for execution in the post processing workflow (yellow stars indicate the terminal rules):

Generating Final Report

Once all your data has been moved with the post_processing_move_samples_dir_workflow command, you can now generate a report off all your results. This will create an summary-report.html file which includes detailed summaries of all your workflow analyses. If you have run any taxonomic analyses from the taxonomic_classifier workflows, then the report will also generate a graphical summary comparing the results across each tool.

First, you need to run the setup from the /workflow/post_processing/ directory. This only needs to be done once. If you accidentally run the set up again you will see on onscreen error message that Execution Halted. This is to prevent overwriting any set up files. You will still be able to process and generate the final report.

The setup can be executed with the following command:

cd post_processing/
python setup_post_processing.py --input <path_to_data_directory> --post <path_to_post_processing_directory>

Depending on the setup of your directories, those paths may look like this:

python setup_post_processing.py --input ~/metscale/workflows/data/ --post ~/metscale/workflows/post_processing/

You may then execute the command to generate the final report from the metscale/workflows directory. This can be executed with the following commands:

cd ..
snakemake --use-singularity --configfile=config/my_custom_config.json post_processing_create_final_report_workflow

Once this has successfully run, you should see a summary-report.html file in your samples sub directory (e.g. /data/<samples_name>_finished/). Some of the features in the final report require the tool output files to be present in the same folder as the summary report html file when viewing the results. If you have run any taxonomic analyses, you will also see an abundance_graph.png file, summarizing the relative abundances off the taxa identified across all the taxonomic classifier tools.

For each tool, you will see a color indicating the signal of each species identified on the "Y" axis color-coded by the following:

Color Species Signal KrakenUniq kmers Kraken2 reads Bracken reads Kaiju reads Sourmash f match Mash identity
Red Very Strong >10,000 >100,000 >100,000 >100,000 >0.60 >0.95
Orange Strong 5,000-10,000 30,000-100,000 30,000-100,000 30,000-100,000 0.20-0.60 0.90-0.95
Yellow Moderately Strong 2,000-5,000 10,000-30,000 10,000-30,000 10,000-30,000 0.15-0.20 0.85-0.90
Green Moderate 1,000-2,000 1,000-10,000 1,000-10,000 1,000-10,000 0.10-0.15 0.80-0.85
Blue Weak 500-1,000 100-1,000 100-1,000 100-1,000 0.05-0.10 0.75-0.80
Grey Very Weak 0-500 0-100 0-100 0-100 0-0.05 0-0.75
White No Species Signal 0 0 0 0 0 0