출처: https://docs.qiime2.org/2019.7/tutorials/gneiss/

In this tutorial you will learn how to perform differential abundance analysis using balances in gneiss.


Compositionality refers to the issue of dealing with proportions.

While we cannot exactly solve the problem of identifying differentially abundant species, we can relax this problem and ask which partitions of microbes are changing.

# Creating balances

In the Chronic Fatigue Syndrome dataset published in Giloteaux et al (2016), there are 87 individuals with 48 diseased patients and 39 healthy controls. The data used in this tutorial were sequenced on an Illumina MiSeq using the Earth Microbiome Project hypervariable region 4 (V4) 16S rRNA sequencing protocol.

In [1]:
!wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2019.7/tutorials/gneiss/sample-metadata.tsv"

--2019-09-05 10:57:37--  https://data.qiime2.org/2019.7/tutorials/gneiss/sample-metadata.tsv
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/gneiss/sample-metadata.tsv [following]
--2019-09-05 10:57:37--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/gneiss/sample-metadata.tsv
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 54.231.176.192
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|54.231.176.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10540 (10K) [text/plain]
Saving to: `sample-metadata.tsv'


2019-09-05 10:57:38 (25.1 MB/s) - `sample-metadata.tsv' saved [10540/10540]



In [2]:
!wget \
  -O "table.qza" \
  "https://data.qiime2.org/2019.7/tutorials/gneiss/table.qza"

--2019-09-05 10:57:44--  https://data.qiime2.org/2019.7/tutorials/gneiss/table.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/gneiss/table.qza [following]
--2019-09-05 10:57:44--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/gneiss/table.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.144.8
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.144.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 35979 (35K) [binary/octet-stream]
Saving to: `table.qza'


2019-09-05 10:57:45 (238 KB/s) - `table.qza' saved [35979/35979]



In [3]:
!wget \
  -O "taxa.qza" \
  "https://data.qiime2.org/2019.7/tutorials/gneiss/taxa.qza"

--2019-09-05 10:57:52--  https://data.qiime2.org/2019.7/tutorials/gneiss/taxa.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/gneiss/taxa.qza [following]
--2019-09-05 10:57:53--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/gneiss/taxa.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.232.200
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.232.200|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 96166 (94K) [binary/octet-stream]
Saving to: `taxa.qza'


2019-09-05 10:57:54 (303 KB/s) - `taxa.qza' saved [96166/96166]



First, we will define partitions of microbes for which we want to construct balances. 

Note that the differential abundance techniques that we will be running will utilize log ratio transforms.

# Option 1: Correlation-clustering

This option should be your default option.

In [4]:
!qiime gneiss correlation-clustering \
  --i-table table.qza \
  --o-clustering hierarchy.qza

[32mSaved Hierarchy to: hierarchy.qza[0m


# Option 2: Gradient-clustering

An alternative to correlation-clustering is to create a tree based on a numeric metadata category.


In [5]:
!qiime gneiss gradient-clustering \
  --i-table table.qza \
  --m-gradient-file sample-metadata.tsv \
  --m-gradient-column Age \
  --o-clustering gradient-hierarchy.qza

[32mSaved Hierarchy to: gradient-hierarchy.qza[0m


An important consideration for downstream analyses is the problem of overfitting.

# Building linear models using balances

Now that we have a tree that defines our partitions, we can perform the isometric log ratio (ILR) transform.

In [6]:
!qiime gneiss ilr-hierarchical \
  --i-table table.qza \
  --i-tree hierarchy.qza \
  --o-balances balances.qza

[32mSaved FeatureTable[Balance] to: balances.qza[0m


Now that we have the log ratios of each node of our tree, we can run linear regression on the balances.

Remember that ANOVA is a special case of linear regression - every problem that can be solved by ANOVA can be reformulated as a linear regression. 

In [8]:
!qiime gneiss ols-regression \
  --p-formula "Subject+Sex+Age+BMI+sCD14ugml+LBPugml+LPSpgml" \
  --i-table balances.qza \
  --i-tree hierarchy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization regression_summary.qzv


ols-regression is deprecated and will be removed in a future version of this plugin.[0m
[32mSaved Visualization to: regression_summary.qzv[0m


In [9]:
from qiime2 import Visualization
Visualization.load('regression_summary.qzv')

As noted in the legend, the numerators for each balance are highlighted in light red, while the denominators are highlighted in dark red.

Specifically we’ll plot a boxplot and identify taxa that could be explaining the differences between the control and patient groups.

In [10]:
!qiime gneiss balance-taxonomy \
  --i-table table.qza \
  --i-tree hierarchy.qza \
  --i-taxonomy taxa.qza \
  --p-taxa-level 2 \
  --p-balance-name 'y0' \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-column Subject \
  --o-visualization y0_taxa_summary.qzv


balance-taxonomy is deprecated and will be removed in a future version of this plugin.[0m
[32mSaved Visualization to: y0_taxa_summary.qzv[0m


In [11]:
Visualization.load('y0_taxa_summary.qzv')

In this particular case, the log ratio is lower in the patient group compared to the control group

Remember, based on the toy examples given in the beginning of this tutorial, it is not possible to infer absolute changes of microbes in a given sample.