In [1]:
%matplotlib inline

from biom import load_table
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from qiime2 import Artifact, Metadata
from qiime2.plugins import diversity, empress
import scipy.stats as ss
import seaborn as sns

First, we need to import the relevant data types

```bash
qiime tools import \
    --type FeatureTable[Frequency] \
    --input-path figs1/input/57013_reference-hit.biom \
    --output-path figs1/output/table.qza
    
qiime tools import \
    --type Phylogeny[Rooted] \
    --input-path figs1/input/57013_insertion_tree.relabelled.tre \
    --output-path figs1/output/tree.qza
    
qiime tools import \
    --type FeatureData[Sequence] \
    --input-path figs1/input/57013_reference-hit.seqs.fa \
    --output-path figs1/output/sequences.qza
```

Then we'll want to perform taxonomy classification to later import as additional metadata.

```bash
wget https://data.qiime2.org/2020.8/common/gg-13-8-99-nb-classifier.qza
    
qiime feature-classifier classify-sklearn \
    --i-reads figs1/output/sequences.qza \
    --i-classifier gg-13-8-99-nb-classifier.qza \
    --p-n-jobs -1 \
    --o-classification figs1/output/taxonomy.qza 
```

Now we can perform differential abundance analysis and the ILR transform.

```bash
qiime songbird multinomial \
    --i-table figs1/output/table.qza \
    --p-formula 'C(country) + C(diet)' \
    --m-metadata-file figs1/input/11212_prep_3370_qiime_20170918-134804.txt \
    --output-dir figs1/output/multinomial
    
qiime gneiss ilr-phylogenetic-differential \
    --i-differential figs1/output/multinomial/differentials.qza \
    --i-tree figs1/output/tree.qza \
    --output-dir figs1/output/ilr
```

Now that we have the ILR transformed differentials, we can view them and sort them by magnitude.
```
qiime metadata tabulate \
    --m-input-file figs1/output/ilr/ilr_differential.qza \
    --o-visualization figs1/output/ilr/ilr_differential.qzv
    
qiime tools view figs1/output/ilr/ilr_differential.qzv
```

Picking meaningful clades can be tricky -- if the clade is too close to the tips of the tree, it can be noisy.
The ids of the clades correspond to the level ordering - ids with smaller numbers are closer to the root of the tree, whereas larger ids are closer to the tips of the tree. The rule of thumb is to pull out clades that are closer to the root of the tree, since it is likely to contain a large aggregate of microbes.

Looking at the top 3 largest balances and bottom 3 most negative balances that differentiate diet, there are a few candidates, namely `y231,y169,y56` and `y10,y4,y15`.  We can do two things, namely visualize these clades directly on the tree, and visualize them as an ordination.

```
qiime gneiss ilr-phylogenetic-ordination \
    --i-table figs1/output/table.qza \
    --i-tree figs1/output/tree.qza \
    --p-clades y231,y169,y56,y10,y4,y15 \
    --output-dir figs1/output/ordination 
```

```bash
qiime empress community-plot \
       --i-tree figs1/output/tree.qza \
       --i-feature-table figs1/output/table.qza \
       --i-pcoa figs1/output/ordination/ordination.qza \
       --m-feature-metadata-file figs1/output/taxonomy.qza \
       --m-feature-metadata-file figs1/output/ordination/clade_metadata.qza \
       --m-sample-metadata-file figs1/input/11212_prep_3370_qiime_20170918-134804.txt \
       --p-filter-missing-features \
       --o-visualization figs1/output/ilr-ordination.qzv
```