## [Generate a tree for phylogenetic diversity analyses](https://docs.qiime2.org/2023.5/tutorials/moving-pictures-usage/#:~:text=Generate%20a%20tree%20for%20phylogenetic%20diversity%20analyses)

From the moving pictures tutorial:
> QIIME supports several phylogenetic diversity metrics, including Faith’s Phylogenetic Diversity and weighted and unweighted UniFrac. In addition to counts of features per sample (i.e., the data in the FeatureTable[Frequency] QIIME 2 artifact), these metrics require a rooted phylogenetic tree relating the features to one another. This information will be stored in a Phylogeny[Rooted] QIIME 2 artifact. To generate a phylogenetic tree we will use align-to-tree-mafft-fasttree pipeline from the q2-phylogeny plugin. 
First, the pipeline uses the mafft program to perform a multiple sequence alignment of the sequences in our FeatureData[Sequence] to create a FeatureData[AlignedSequence] QIIME 2 artifact. Next, the pipeline masks (or filters) the alignment to remove positions that are highly variable. These positions are generally considered to add noise to a resulting phylogenetic tree. Following that, the pipeline applies FastTree to generate a phylogenetic tree from the masked alignment. The FastTree program creates an unrooted tree, so in the final step in this section midpoint rooting is applied to place the root of the tree at the midpoint of the longest tip-to-tip distance in the unrooted tree.

In [25]:
!qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences ../output/blank-mock/rep-seqs-bm.qza \
  --output-dir ../output/blank-mock/phylogeny-tree-bm

[32mSaved FeatureData[AlignedSequence] to: ../output/blank-mock/phylogeny-tree-bm/alignment.qza[0m
[32mSaved FeatureData[AlignedSequence] to: ../output/blank-mock/phylogeny-tree-bm/masked_alignment.qza[0m
[32mSaved Phylogeny[Unrooted] to: ../output/blank-mock/phylogeny-tree-bm/tree.qza[0m
[32mSaved Phylogeny[Rooted] to: ../output/blank-mock/phylogeny-tree-bm/rooted_tree.qza[0m
[0m

## [Alpha & Beta diversity](https://docs.qiime2.org/2023.5/tutorials/moving-pictures-usage/#:~:text=Alpha%20and%20beta%20diversity%20analysis)

![alpha-vs-beta](https://commercio.nyc3.digitaloceanspaces.com/goldbio-2018/pages/2.%20Alpha%20vs%20Beta%20diversity.png)

QIIME 2’s diversity analyses are available through the `q2-diversity` plugin, which supports computing alpha and beta diversity metrics, applying related statistical tests, and generating interactive visualizations. We’ll first apply the `core-metrics-phylogenetic` method, which rarefies a `FeatureTable Frequency` to a user-specified depth, computes several alpha and beta diversity metrics, and generates principle coordinates analysis (PCoA) plots using Emperor for each of the beta diversity metrics. 

The metrics computed by default are:
##### Alpha diversity- 
Shannon’s diversity index (a quantitative measure of community richness- 

Observed Features (a qualitative measure of community richne- s)

Faith’s Phylogenetic Dive**rsity (a qualitative measure of community richness that incorporates phylogenetic relationships between the fea**t- res)

Evenness (or Pielou’s Evenness; a measure of community eve##### nness)

Beta - iversity

Jaccard distance (a qualitative measure of community dis- imilarity)

Bray-Curtis distance (a quantitative measure of community d- ssimilarity)

unweighted U**niFrac distance (a qualitative measure of community dissimilarity that incorporates phylogenetic relationships betwe**e-  the features)

weighted** UniFrac distance (a quantitative measure of community dissimilarity that incorporates phylogenetic relationships bet**ween the features) 


An important parameter that needs to be prii`ded to this scri ` i`s --p-sampling-dep`th, which is the even sampling (i.e. rarefaction) depth. Because most diversity metrics are sensitive to different sampling depths across different samples, this script will randomly subsample the counts from each sample to the value provided for this parameter. For am--ple, if you provide `--p-sampling-depth 5`00, this step will subsample the counts in each sample without replacement so that each sample in the resulting table has a total count of 500. If the total count for any sample(s) are smaller than this value, those samples will be dropped from the diversity analysis. Choosing this value is tricky. We recommend making your choice by reviewing the infora`tion prs`ented in th`e table.q`zv file that was created above. Choose a value that is as high as possible (so you retain more sequences per sampl)while excluding as few w samples as possiblefe

### Sampling Depth

In [26]:
## open interactive table visualization
Visualization.load('../output/blank-mock/table-bm.qzv')

## navigate to the interactive sample detail tab
# move the sampling depth slider as high as you can before excluding any samples 
# we want the sampling depth to be high, while retaining all 22 samples
# this looks like a sampling depth of 10,6727 (09AUG2023, SST) 
# but maybe we exclude the mock community and bring it up to 163971? (15AUG2023, SST)

What value would you choose to pass for --p-sampling-depth? 
- **104,183**
How many samples will be excluded from your analysis based on this choice? 
- **none, all 22 samples are retained**
How many total reads will you be analyzing in the core-metrics-phylogenetic command?
- **2,292,026**

This represents **42.33%** of the features present across the 22 samples
The mock community has the fewest reads at **104,183** and is our 'limiting factor' to increase sample depth.
Why does the blank have so many features! That is not good... 

### Rooted Tree

In [27]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny ../output/blank-mock/phylogeny-tree-bm/rooted_tree.qza \
  --i-table ../output/blank-mock/table-bm.qza \
  --p-sampling-depth 104183 \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --output-dir ../output/blank-mock/diversity-core-bm

[32mSaved FeatureTable[Frequency] to: ../output/blank-mock/diversity-core-bm/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../output/blank-mock/diversity-core-bm/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../output/blank-mock/diversity-core-bm/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../output/blank-mock/diversity-core-bm/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../output/blank-mock/diversity-core-bm/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: ../output/blank-mock/diversity-core-bm/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ../output/blank-mock/diversity-core-bm/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ../output/blank-mock/diversity-core-bm/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ../output/blank-mock/diversity-core-bm/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: ../output/blank-

### Unweighted Unifrac Emperor Plot

In [28]:
Visualization.load('../output/blank-mock/diversity-core-bm/unweighted_unifrac_emperor.qzv')

### Weighted Unifrac Emperor Plot

In [29]:
Visualization.load('../output/blank-mock/diversity-core-bm/weighted_unifrac_emperor.qzv')

### Jaccard Emperor PLot

In [31]:
Visualization.load('../output/blank-mock/diversity-core-bm/jaccard_emperor.qzv')

### Bray-Curtis Emperor Plot

In [30]:
Visualization.load('../output/blank-mock/diversity-core-bm/bray_curtis_emperor.qzv')

## Alpha Diversity
After computing diversity metrics, we can begin to explore the microbial composition of the samples in the context of the sample metadata. This information is present in the sample metadata file `../rawdata/sample-metadata-verbose.tsv`.

We’ll first test for associations between categorical metadata columns and alpha diversity data.
We’ll do that here for the Faith Phylogenetic Diversity (a measure of community richness):

In [32]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity ../output/blank-mock/diversity-core-bm/faith_pd_vector.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --o-visualization ../output/blank-mock/faith-pd-group-significance-bm.qzv

[32mSaved Visualization to: ../output/blank-mock/faith-pd-group-significance-bm.qzv[0m
[0m

### Microbial Community Richness: Faith Phylogenetic Diversity

##### 
Which categorical sample metadata columns are most strongly associated with the differences in microbial community richness
Temperature Treatment

##### Are these differences statistically significat?n

Pae
- no significantly different groups

Temp
- ambient (n=12) vs. hot (n=8), pvalue (0.02)
  
PaeTemp
- peak-ambient (n=4) vs. peak-hot (n=4), pvalue (0.04)
- env-ambient (n=4) vs. peak-hot (n=4), pvalue (0.04)

Colony
- no significantly different groups

Tank
- H1 (n=5) vs KB (n=4), pvalue (0.05)
- A2 (n=4) vs H1 (n=5), pvalue (0.05)t?

In [33]:
Visualization.load('../output/blank-mock/faith-pd-group-significance-bm.qzv')

In [34]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity ../output/blank-mock/diversity-core-bm/evenness_vector.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --o-visualization ../output/blank-mock/evenness-group-significance-bm.qzv

[32mSaved Visualization to: ../output/blank-mock/evenness-group-significance-bm.qzv[0m
[0m

### Microbial Community Evenness, Alpha diversity , Pielou's Evenness

##### Which categorical sample metadata columns are most strongly associated with the differences in microbial community evenness? 
Nearly everything but Colony.... 
Surprised by one Tank result: H1 vs H2, I wouldn't have expected those to be different.

##### Are these differences statistically significant?
Pae
- control (n=8) vs peak (n=8), pvalue (0.05)
- env (n=4) vs. peak (n=8), pvalue (0.006)

Temp
- ambient (n=12) vs hot (n=8), pvalue (0.02)

PaeTemp
- control-ambient (n=4) vs peak-ambient (n=4), pvalue (0.02)
- control-ambient (n=4) vs peak-hot (n=4), pvalue (0.02)
- env-ambient (n=4) vs peak-ambient (n=4), pvalue (0.02)
- env-ambient (n=4) vs peak-hot (n=4), pvalue (0.02)
  
Colony
- no significantly different groups

Tank
- A1 (n=4) vs H1 (n=5), pvalue (0.02)
- A1 (n=4) vs KB (n=4), pvalue (0.04)
- A2 (n=4) vs H1 (n=5), pvalue (0.02)
- H1 (n=5) vs H2 (n=3), pvalue (0.05)
- H1 (n=5) vs KB (n=4), pvalue (0.01)

In [35]:
Visualization.load('../output/blank-mock/evenness-group-significance-bm.qzv')

## Beta Diversity (There is a problem with the metadata I can't figure out here)

Next we’ll analyze sample composition in the context of categorical metadata using PERMANOVA (first described in Anderson (2001)) using the beta-group-significance command. The following commands will test whether distances between samples within a group, such as samples from the same body site (e.g., gut), are more similar to each other then they are to samples from the other groups (e.g., tongue, left palm, and right palm). If you call this command with the --p-pairwise parameter, as we’ll do here, it will also perform pairwise tests that will allow you to determine which specific pairs of groups (e.g., tongue and gut) differ from one another, if any. This command can be slow to run, especially when passing --p-pairwise, since it is based on permutation tests. So, unlike the previous commands, we’ll run beta-group-significance on specific columns of metadata that we’re interested in exploring, rather than all metadata columns to which it is applicable. Here we’ll apply this to our unweighted UniFrac distances, using two sample metadata columns, as follows.

In [36]:
!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/blank-mock/diversity-core-bm/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../rawdata/sample-metadata-slim.tsv \
  --o-visualization ../output/blank-mock/diversity-core-bm/unweighted-unifrac-body-site-significance.qzv \
  --p-pairwise

!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/blank-mock/iversity-core-bm/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../rawdata/sample-metadata-slim.tsv \
  --o-visualization ../output/blank-mock/diversity-core-bm/unweighted-unifrac-body-site-significance.qzv \
  --p-pairwise

Usage: [94mqiime diversity beta-group-significance[0m [OPTIONS]

  Determine whether groups of samples are significantly different from one
  another using a permutation-based statistical test.

[1mInputs[0m:
  [94m[4m--i-distance-matrix[0m ARTIFACT
    [32mDistanceMatrix[0m     Matrix of distances between pairs of samples.
                                                                    [35m[required][0m
[1mParameters[0m:
  [94m[4m--m-metadata-file[0m METADATA
  [94m[4m--m-metadata-column[0m COLUMN  [32mMetadataColumn[Categorical][0m
                       Categorical sample metadata column.          [35m[required][0m
  [94m--p-method[0m TEXT [32mChoices('permanova', 'anosim', 'permdisp')[0m
                       The group significance test to be applied.
                                                        [35m[default: 'permanova'][0m
  [94m--p-pairwise[0m / [94m--p-no-pairwise[0m
                       Perform pairwise tests between all pairs

## [Atacama Soil Microbiome: Questions to Guide Data Analysis](https://docs.qiime2.org/2023.5/tutorials/atacama-soils/#paired-end-read-analysis-commands:~:text=Questions%20to%20guide%20data%20analysis)
What sample metadata or combinations of sample metadata are most strongly associated with the differences in microbial composition of the samples? Are these associations stronger with unweighted UniFrac or with Bray-Curtis? Based on what you know about these metrics, what does that difference suggest? For exploring associations between continuous metadata and sample composition, the commands qiime metadata distance-matrix in combination with qiime diversity mantel and qiime diversity bioenv will be useful. These were not covered in the Moving Pictures tutorial, but you can learn about them by running them with the `--help` parameter.

## Alpha rarefaction plotting
In this section we’ll explore alpha diversity as a function of sampling depth using the qiime diversity alpha-rarefaction visualizer. This visualizer computes one or more alpha diversity metrics at multiple sampling depths, in steps between 1 (optionally controlled with --p-min-depth) and the value provided as --p-max-depth. At each sampling depth step, 10 rarefied tables will be generated, and the diversity metrics will be computed for all samples in the tables. The number of iterations (rarefied tables computed at each sampling depth) can be controlled with --p-iterations. Average diversity values will be plotted for each sample at each even sampling depth, and samples can be grouped based on metadata in the resulting visualization if sample metadata is provided with the --m-metadata-file parameter.

The value that you provide for --p-max-depth should be determined by reviewing the “Frequency per sample” information presented in the table.qzv file that was created above. In general, choosing a value that is somewhere around the median frequency seems to work well, but you may want to increase that value if the lines in the resulting rarefaction plot don’t appear to be leveling out, or decrease that value if you seem to be losing many of your samples due to low total frequencies closer to the minimum sampling depth than the maximum sampling depth.

In [38]:
!qiime diversity alpha-rarefaction \
  --i-table ../output/blank-mock/table-bm.qza \
  --i-phylogeny ../output/blank-mock/phylogeny-tree-bm/rooted_tree.qza \
  --p-max-depth 104183 \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --o-visualization ../output/blank-mock/alpha-rarefaction-bm.qzv

[32mSaved Visualization to: ../output/blank-mock/alpha-rarefaction-bm.qzv[0m
[0m

In [39]:
Visualization.load('../output/blank-mock/alpha-rarefaction-bm.qzv')

## [Taxonomic Analysis](https://docs.qiime2.org/2023.5/tutorials/moving-pictures-usage/#taxonomic-analysis:~:text=and%20bottom%20plots.)

In the next sections we’ll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our FeatureData[Sequence] QIIME 2 artifact. We’ll do that using a pre-trained Naive Bayes classifier and the q2-feature-classifier plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We’ll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.

Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in Training feature classifiers with q2-feature-classifier to train your own taxonomic classifiers. We provide some common classifiers on our data resources page, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.

In [47]:
!cd ../output/blank-mock/taxonomy ; wget \
  -O 'gg-13-8-99-515-806-nb-classifier.qza' \
  'https://docs.qiime2.org/2021.11/data/tutorials/moving-pictures-usage/gg-13-8-99-515-806-nb-classifier.qza'

--2023-08-16 12:05:51--  https://docs.qiime2.org/2021.11/data/tutorials/moving-pictures-usage/gg-13-8-99-515-806-nb-classifier.qza
Resolving docs.qiime2.org (docs.qiime2.org)... 104.21.84.49, 172.67.186.144, 2606:4700:3035::ac43:ba90, ...
Connecting to docs.qiime2.org (docs.qiime2.org)|104.21.84.49|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28289645 (27M) [application/octet-stream]
Saving to: ‘gg-13-8-99-515-806-nb-classifier.qza’


2023-08-16 12:05:51 (96.0 MB/s) - ‘gg-13-8-99-515-806-nb-classifier.qza’ saved [28289645/28289645]



In [48]:
!qiime feature-classifier classify-sklearn \
  --i-classifier ../output/blank-mock/taxonomy/gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads ../output/blank-mock/rep-seqs-bm.qza \
  --o-classification ../output/blank-mock/taxonomy-bm.qza
!qiime metadata tabulate \
  --m-input-file ../output/blank-mock/taxonomy-bm.qza \
  --o-visualization ../output/blank-mock/taxonomy-bm.qzv

[32mSaved FeatureData[Taxonomy] to: ../output/blank-mock/taxonomy-bm.qza[0m
[0m[32mSaved Visualization to: ../output/blank-mock/taxonomy-bm.qzv[0m
[0m

In [51]:
Visualization.load('../output/blank-mock/taxonomy-bm.qzv')

In [53]:
!qiime taxa barplot \
  --i-table ../output/blank-mock/table-bm.qza \
  --i-taxonomy ../output/blank-mock/taxonomy-bm.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --o-visualization ../output/blank-mock/taxa-bar-plots-bm.qzv

[32mSaved Visualization to: ../output/blank-mock/taxa-bar-plots-bm.qzv[0m
[0m

In [55]:
Visualization.load('../output/blank-mock/taxa-bar-plots-bm.qzv')

## [Differential Abundance Testing](https://docs.qiime2.org/2023.5/tutorials/moving-pictures-usage/#taxonomic-analysis:~:text=the%20later%20timepoints%3F-,Differential%20abundance%20testing,-with%20ANCOM%C2%B6)

ANCOM can be applied to identify features that are differentially abundant (i.e. present in different abundances) across sample groups. As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the ANCOM paper before using this method.

Differential abundance testing in microbiome analysis is an active area of research. There are two QIIME 2 plugins that can be used for this: q2-gneiss and q2-composition. This section uses q2-composition, but there is another tutorial which uses gneiss on a different dataset if you are interested in learning more.


### ANCOM
**Analysis of Composition of Microbiomes**
ANCOM is implemented in the q2-composition plugin. ANCOM assumes that few (less than about 25%) of the features are changing between groups. If you expect that more features are changing between your groups, you should not use ANCOM as it will be more error-prone (an increase in both Type I and Type II errors is possible). 
Here I do not filter any samples and use everything so I can see what is in the mock and blank samples. So we'll use the full feature table located at `../output/blank-mock/table-bm.qza`

ANCOM operates on a FeatureTable[Composition] QIIME 2 artifact, which is based on frequencies of features on a per-sample basis, but cannot tolerate frequencies of zero. To build the composition artifact, a FeatureTable[Frequency] artifact must be provided to add-pseudocount (an imputation method), which will produce the FeatureTable[Composition] artifact.)


In [56]:
!qiime composition add-pseudocount \
  --i-table ../output/blank-mock/table-bm.qza \
  --o-composition-table ../output/blank-mock/comp-table-bm.qza

[32mSaved FeatureTable[Composition] to: ../output/blank-mock/comp-table-bm.qza[0m
[0m

We can then run ANCOM on the `PaeTemp` column to determine what features differ in abundance across the coral samples of the treatments.

In [58]:
!qiime composition ancom \
  --i-table ../output/blank-mock/comp-table-bm.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --m-metadata-column PaeTemp \
  --o-visualization ../output/blank-mock/ancom-paetemp-bm.qzv

[32mSaved Visualization to: ../output/blank-mock/ancom-paetemp-bm.qzv[0m
[0m

In [3]:
Visualization.load('../output/blank-mock/ancom-paetemp-bm.qzv')

In [4]:
!qiime composition ancom \
  --i-table ../output/blank-mock/comp-table-bm.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --m-metadata-column Temp \
  --o-visualization ../output/blank-mock/ancom-temp-bm.qzv

[32mSaved Visualization to: ../output/blank-mock/ancom-temp-bm.qzv[0m
[0m

In [4]:
Visualization.load('../output/blank-mock/ancom-temp-bm.qzv')

#### Taxonomy to Species (Level 7)

We’re also often interested in performing a differential abundance test at a specific taxonomic level. To do this, we can collapse the features in our FeatureTable[Frequency] at the taxonomic level of interest, and then re-run the above steps. In this tutorial, we collapse our feature table at the genus level (i.e. level 6 of the Greengenes taxonomy).

Level:
   1. Kingdom
   2. Phylum
   3. Class
   4. Order
   5. Family
   6. Genus
   7. Species



In [7]:
!qiime taxa collapse \
  --i-table ../output/blank-mock/table-bm.qza \
  --i-taxonomy ../output/blank-mock/taxonomy-bm.qza \
  --p-level 7 \
  --o-collapsed-table ../output/blank-mock/table-bm-l7.qza
!qiime composition add-pseudocount \
  --i-table ../output/blank-mock/table-bm-l7.qza \
  --o-composition-table ../output/blank-mock/comp-table-bm-l7.qza
!qiime composition ancom \
  --i-table ../output/blank-mock/comp-table-bm-l7.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --m-metadata-column Temp \
  --o-visualization ../output/blank-mock/ancom-temp-bm-l7.qzv

[32mSaved FeatureTable[Frequency] to: ../output/blank-mock/table-bm-l7.qza[0m
[0m[32mSaved FeatureTable[Composition] to: ../output/blank-mock/comp-table-bm-l7.qza[0m
[0m[32mSaved Visualization to: ../output/blank-mock/ancom-temp-bm-l7.qzv[0m
[0m

In [5]:
Visualization.load('../output/blank-mock/ancom-temp-bm-l7.qzv')

### [Gneiss](https://docs.qiime2.org/2023.5/tutorials/gneiss/#:~:text=Differential%20abundance%20analysis%20with%20gneiss,-%C2%B6)

In this tutorial you will learn how to perform differential abundance analysis using balances in gneiss. The main problem that we will focus on is how to identify differentially abundant taxa in a compositionally coherent way.

Compositionality refers to the issue of dealing with proportions. To account for differences in sequencing depth, microbial abundances are typically interpreted as proportions (e.g. relative abundance). Because of this, it becomes challenging to infer exactly which microbes are changing – since proportions add to one, the change of a single microbe will also change the proportions of the remaining microbes.
Rather than focusing on individual taxa, we can focus on the ratio between taxa (or groups of taxa), since these ratios are consist between the true abundances and the observed proportions of the species observed. We typically log transform these ratios for improved visualization (‘log ratios’). The concept of calculating balances (or ratios) for multiple species can be extended to trees as shown in the following example.
![balances taxon trees](https://docs.qiime2.org/2023.5/_images/gneiss-balances.jpg)
On the left, we define a tree, where each of the tips corresponds to a taxon, and underneath are the proportions of each taxon in the first sample. The internal nodes (i.e. balances) define the log ratio between the taxa underneath. On the right is the same tree, and underneath are the proportions of each taxa in a different sample. Only one of the taxa abundances changes. As we have observed before, the proportions of all of the taxa will change, but looking at the balances, only the balances containing the purple taxa will change. In this case, balance b3
 won’t change, since it only considers the ratio between the red and taxa. By looking at balances instead proportions, we can eliminate some of the variance by restricting observations to only focus on the taxa within a given balance.

The outstanding question here is, how do we construct a balance tree to control for the variation, and identify interesting differentially abundant partitions of taxa? In gneiss, there are three main ways that this can be done:
1. 
Correlation clustering. If we don’t have relevant prior information about how to cluster together organisms, we can group together organisms based on how often they co-occur with each other. This is available in the correlation-clustering command and creates tree input for ilr-hierarchical.2. 

Gradient clustering. Use a metadata category to cluster taxa found in similar sample types. For example, if we want to evaluate if pH is a driving factor, we can cluster according to the pH that the taxa are observed in, and observe whether the ratios of low-pH organisms to high-pH organisms change as the pH changes. This is available in the gradient-clustering command and creates tree input for ilr-hierarchica3. l.

Phylogenetic analysis. A phylogenetic tree (e.g. rooted-tree.qza) created outside of gneiss can also be used. In this case you can use your phylogenetic tree as input for ilr-phyl

Once we have a tree, we can calculate balances...espectively.

After the balances are calculated, standard statistical procedures such as ANOVA and linear regression ca

First, we will define partitions of microbes for which we want to construct balances. Again, there are multiple possible ways to construct a tree (i.e. hierarchy) which defines the partition of microbes (balances) for which we want to construct balances. We will show examples of both correlation-clustering and gradient-clustering on this dataset.

Note that the differential abundance techniques that we will be running will utilize log ratio transforms. Since it is not possible to take the logarithm of zero, both clustering methods below include a default pseudocount parameter. This replaces all zeroes in the table with a 1, so that we can apply logarithms on this transformed table.

The input table is the raw count table (FeatureTable[Frequency]).n be performed.ogenetic.

#### Correlation Clustering
This option should be your default option. We will employ unsupervised clustering via Ward’s hierarchical clustering to obtain Principal Balances. In essence, this will define the partitions of microbes that commonly co-occur with each other using Ward hierarchical clustering, which is defined by the following metric:

$$
d(x,y) = V[ln \frac{x}{y}]
$$

Where **x**  and** ** 
 represent the proportions of two microbes across all of the samples. If two microbes are highly correlated, then this quantity will shrink close to zero. Ward hierarchical cluster will then use this distance metric to iteratively cluster together groups of microbes that are correlated with each other. In the end, the tree that we obtain will highlight the high level structure and identify any blocks within in the data.

In [9]:
!qiime gneiss correlation-clustering \
  --i-table ../output/blank-mock/table-bm.qza \
  --o-clustering ../output/blank-mock/hierarchy-bm.qza

[32mSaved Hierarchy to: ../output/blank-mock/hierarchy-bm.qza[0m
[0m

In [10]:
!qiime gneiss dendrogram-heatmap \
  --i-table ../output/blank-mock/table-bm.qza \
  --i-tree ../output/blank-mock/hierarchy-bm.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --m-metadata-column PaeTemp \
  --p-color-map seismic \
  --o-visualization ../output/blank-mock/heatmap-cc-paetemp.qzv


dendrogram-heatmap is deprecated and will be removed in a future version of this plugin.[0m
[32mSaved Visualization to: ../output/blank-mock/heatmap-cc-paetemp.qzv[0m
[0m

##### PaeTemp Heatmap

In [6]:
Visualization.load('../output/blank-mock/heatmap-cc-paetemp.qzv')

##### Temp Heatmap

In [12]:
!qiime gneiss dendrogram-heatmap \
  --i-table ../output/blank-mock/table-bm.qza \
  --i-tree ../output/blank-mock/hierarchy-bm.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --m-metadata-column Temp \
  --p-color-map seismic \
  --o-visualization ../output/blank-mock/heatmap-cc-temp.qzv


dendrogram-heatmap is deprecated and will be removed in a future version of this plugin.[0m
[32mSaved Visualization to: ../output/blank-mock/heatmap-cc-temp.qzv[0m
[0m

In [7]:
Visualization.load('../output/blank-mock/heatmap-cc-temp.qzv')

#### Gradient Clustering

In [14]:
!qiime gneiss gradient-clustering \
  --i-table ../output/blank-mock/table-bm.qza \
  --m-gradient-file ../rawdata/sample-metadata-verbose.tsv \
  --m-gradient-column PaeTemp \
  --o-clustering ../output/blank-mock/gradient-hierarchy-bm.qza

Usage: [94mqiime gneiss gradient-clustering[0m [OPTIONS]

  Build a bifurcating tree that represents a hierarchical clustering of
  features.  The hiearchical clustering uses Ward hierarchical clustering
  based on the mean difference of gradients that each feature is observed in.
  This method is primarily used to sort the table to reveal the underlying
  block-like structures.

[1mInputs[0m:
  [94m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency | RelativeFrequency |[0m
    [32mComposition][0m       The feature table containing the samples in which the
                       columns will be clustered.                   [35m[required][0m
[1mParameters[0m:
  [94m[4m--m-gradient-file[0m METADATA
  [94m[4m--m-gradient-column[0m COLUMN  [32mMetadataColumn[Numeric][0m
                       Contains gradient values to sort the features and
                       samples.                                     [35m[required][0m
  [94m--p-ignore-missing-samples[0m 

#### Phylogenetic Analysis

In [17]:
!qiime gneiss --help

Usage: [94mqiime gneiss[0m [OPTIONS] COMMAND [ARGS]...

  Description: This is a QIIME 2 plugin supporting statistical models on
  feature tables and metadata using balances.

  Plugin website: https://biocore.github.io/gneiss/

  Getting user support: Please post to the QIIME 2 forum for help with this
  plugin: https://forum.qiime2.org

[1mOptions[0m:
  [94m--version[0m            Show the version and exit.
  [94m--example-data[0m PATH  Write example data and exit.
  [94m--citations[0m          Show citations and exit.
  [94m--help[0m               Show this message and exit.

[1mCommands[0m:
  [94massign-ids[0m                     Assigns ids on internal nodes in the tree,
                                 and makes sure that they are consistent with
                                 the table columns.
  [94mcorrelation-clustering[0m         Hierarchical clustering using feature
                                 correlation.
  [94mdendrogram-heatmap[0m             D

In [20]:
!qiime gneiss ilr-phylogenetic \
  --i-table ../output/blank-mock/table-bm.qza \
  --i-tree ../output/blank-mock/phylogeny-tree-bm/rooted_tree.qza \
  --o-balances ../output/blank-mock/balances-rootedtree-bm.qza \
  --o-hierarchy ../output/blank-mock/hierarchy-rootedtree-bm.qza
 

[32mSaved FeatureTable[Balance] to: ../output/blank-mock/balances-rootedtree-bm.qza[0m
[32mSaved Hierarchy to: ../output/blank-mock/hierarchy-rootedtree-bm.qza[0m
[0m

In [21]:
!qiime gneiss dendrogram-heatmap \
  --i-table ../output/blank-mock/table-bm.qza \
  --i-tree ../output/blank-mock/hierarchy-rootedtree-bm.qza \
  --m-metadata-file ../rawdata/sample-metadata-verbose.tsv \
  --m-metadata-column Temp \
  --p-color-map seismic \
  --o-visualization ../output/blank-mock/heatmap-rootedtree-temp.qzv


dendrogram-heatmap is deprecated and will be removed in a future version of this plugin.[0m
[32mSaved Visualization to: ../output/blank-mock/heatmap-rootedtree-temp.qzv[0m
[0m

In [22]:
Visualization.load('../output/blank-mock/heatmap-rootedtree-temp.qzv')