Skip to content

Swapping in SEPP for read placement with q2 picrust2

Gavin Douglas edited this page Jun 11, 2021 · 21 revisions

Note that SEPP can now be run directly as part of the full PICRUSt2 pipeline, which largely makes this tutorial obsolete. Although it is outdated it still demonstrates how you can run PICRUSt2 using a custom-made input tree of query and reference sequences.

If you want to use the PICRUSt2 QIIME 2 plugin, but you do not have at least 16-20 GB of RAM, the best alternative is to use a different read placement program for the read placement step. This tutorial will demonstrate how ASVs placed with q2-fragment-insertion can be used as input to the q2-picrust2 plugin.

Before running this tutorial we recommend that you take a look through the standalone tutorial for a better description of the tool (click Tutorial on the right side-bar). Note that many of the features available in the standalone version are not implemented in the QIIME2 plugin yet. You should also read through the key limitations of metagenome inference.

q2-picrust2 Running Example

You can see a description of the PICRUSt2 command by running:

qiime picrust2 custom-tree-pipeline --help

The required inputs are --i-table and --i-tree, which need to correspond to QIIME2 artifacts of types FeatureTable[Frequency] and Phylogeny[Rooted], respectively. The Feature Table needs to contain the abundances of ASVs (i.e. a BIOM table) and the tree file needs to contain the ASVs placed into the PICRUSt2 reference tree.

We will be using test files from the PICRUSt2 tutorial for this example. You can download these files using the below commands:

mkdir q2-picrust2_test

cd q2-picrust2_test

wget http://kronos.pharmacology.dal.ca/public_files/picrust/picrust2_tutorial_files/mammal_biom.qza

wget http://kronos.pharmacology.dal.ca/public_files/picrust/picrust2_tutorial_files/mammal_seqs.qza

wget http://kronos.pharmacology.dal.ca/public_files/picrust/picrust2_tutorial_files/mammal_metadata.tsv

These files correspond to the ASV count table, ASV sequences, and metadata for the samples. There are 11 samples and 371 ASVs in total. These samples were collected from the mammalian stool of Arctic wolves, coyotes, beavers, and porcupines, as previously described.

To place the ASVs we will first run q2-fragment-insertion against the PICRUSt2 reference multiple-sequence alignment and phylogeny. Note that using this sequence placement approach differs from the default PICRUSt2 pipeline, but is easier to integrate with the QIIME2 framework. Note that you need to place your ASVs into the PICRUSt2 reference files - if you place your ASVs into the default SEPP reference files you will get downstream errors. The reference file used here is the same ones you would use with your own data. Download the reference file and place the study sequences into the tree:

wget http://kronos.pharmacology.dal.ca/public_files/picrust/picrust2_tutorial_files/picrust2_default_sepp_ref.qza \
     -O picrust2_default_sepp_ref.qza 
 
qiime fragment-insertion sepp --i-representative-sequences mammal_seqs.qza \
                              --p-threads 1 \
                              --i-reference-database picrust2_default_sepp_ref.qza \
                              --output-dir tutorial_placed_out

Note that in QIIME 2 versions prior to 2019.10 that the q2-fragment-insertion options are different. For these versions you will need to specify the reference tree and multiple-sequence alignment separately.

Now that we have our placed ASVs we can run qiime picrust2 custom-tree-pipeline. The main options currently available in this plugin allow you to change the number of threads (--p-threads), the hidden-state prediction (HSP) method (--p-hsp-method), and the maximum NSTI value (--p-max-nsti). Note that there are many more options available for the standalone PICRUSt2 version, which you could run outside of QIIME 2.

The --p-max-nsti option specifies how distantly placed a sequence needs to be in the reference phylogeny before it is excluded. The default cut-off is 2. In human datasets used for testing PICRUSt2 the only ASVs above this default cut-off were 18S sequences erroneously in 16S datasets, which suggests this cut-off is highly lenient. For environmental datasets a higher proportion of ASVs may be thrown out based on this default cut-off.

You can run the full PICRUSt2 pipeline with this command (should take ~17 min and 5GB of RAM - this will be much faster if you can set more threads). Note in this case pic (phylogenetic independent contrast) is the indicated hidden-state prediction method, because it is fastest. However, we recommend that in practice users use the mp method (~40 min on 1 thread for this example).

qiime picrust2 custom-tree-pipeline --i-table mammal_biom.qza \
                                    --i-tree tutorial_placed_out/tree.qza \
                                    --output-dir q2-picrust2_output \
                                    --p-threads 1 \
                                    --p-hsp-method pic \
                                    --p-max-nsti 2 \
                                    --verbose

The output artifacts of this command are the red boxes in the flowchart here. In addition, MetaCyc pathway coverages will also be output, which could help advanced users interpret the pathway completeness, but will not be used by most users.

These output files in (q2-picrust2_output) are:

  • ec_metagenome.qza - EC metagenome predictions (rows are EC numbers and columns are samples).
  • ko_metagenome.qza - KO metagenome predictions (rows are KOs and columns are samples).
  • pathway_abundance.qza - MetaCyc pathway abundance predictions (rows are pathways and columns are samples).

The artifacts are all of type FeatureTable[Frequency], which means they can be used with QIIME2 plugins that process and analyze these datatypes. Please see the main q2-picrust2 tutorial for examples of how to get quick summaries of the data and how to export the output to a table.

At this point many users are interested in identifying significantly different functions between sample groups. You should be aware that results based on differential abundance testing can vary substantially between shotgun metagenomics sequencing data and amplicon-based metagenome predictions based on the same samples. This is especially true for community-wide pathway predictions. Please check out this post and our pre-print for more details.