Unpause the last notebook

# 6 Composition with ANCOM

## 6.1 ANCOM by ASV

Beta-diversities are OK to plot our samples in space, but if we want to really see if two groups are really different we need to do more statistical tests.

To look for these statistical differences we are going to use ANCOM

For example, we want to know:
- which ASVs are the most different in *gut* between the two *subjects*

This procedure works in multiple steps:
1. Filter the OTU table so you only have the **gut** samples
2. Apply a small correction to the values of the OTU table
3. Run ANCOM over the two **subjects**

Let's filter the OTU table so there are only gut samples

In [1]:
qiime feature-table filter-samples \
    --i-table          table.qza \
    --m-metadata-file  sample-metadata.tsv \
    --p-where          "[body-site]='gut'" \
    --o-filtered-table gut-table.qza

[32mSaved FeatureTable[Frequency] to: gut-table.qza[0m
[0m


Then, fix the table with pseudo-counts, becasue ANCOM cannot work with zeros in the table.

In [2]:
qiime composition add-pseudocount \
    --i-table             gut-table.qza \
    --o-composition-table comp-gut-table.qza

[32mSaved FeatureTable[Composition] to: comp-gut-table.qza[0m
[0m


Finally, run ANCOM, specifying that we want the analyses between each `subject`

In [3]:
qiime composition ancom \
    --i-table           comp-gut-table.qza \
    --m-metadata-file   sample-metadata.tsv \
    --m-metadata-column subject \
    --o-visualization   ancom-subject.qzv

[32mSaved Visualization to: ancom-subject.qzv[0m
[0m


- [ancom-subject.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fancom-subject.qzv)

### Exercise

- Which ASVs are the most different between the two subjects in the gut?

- `868528ca947bc57b69ffdf83e6b73bae`, it has a clr of -7.1
- `4b5eeb300368260019c1fbc7a3c718fc`, it has a clr of 8.16 

- What are the taxonomies of these ASVs? (Pssst: the taxonomy table is [here](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Ftaxonomy.qzv)

Both ASVs have the exact same taxonomy: `k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s_`

Both are the same genus. DADA2 has proof that both ASVs are different. Also, the taxonomic classifier failed to provide a species name, probably because they do not have a name (yet).

It is very likely that both subjects have two different species of the Bacteroides genus.

## 6.2 ANCOM by taxa

Instead of finding the most characteristic ASV of each group, we are really interesed in which taxonomic group (kingdom, order, phylum) is most different between samples.

In this case, the procedure is simmilar, with an extra step:
1. Filter the body site (already done: gut-table.qza)
2. Collapse by taxon level
3. Apply correction
4. Run ANCOM

In [4]:
# This is already done
# qiime feature-table filter-samples \
#    --i-table          table.qza \
#    --m-metadata-file  sample-metadata.tsv \
#    --p-where          "[body-site]='gut'" \
#    --o-filtered-table gut-table.qza

In this case, we will need the taxonomy.qza we generated in the previous notebook.

We are collapsing ASVs to the 6th level (genus).

In [5]:
qiime taxa collapse \
  --i-table           gut-table.qza \
  --i-taxonomy        taxonomy.qza \
  --p-level           6 \
  --o-collapsed-table gut-table-l6.qza

[32mSaved FeatureTable[Frequency] to: gut-table-l6.qza[0m
[0m


Add pseudocounts

In [6]:
qiime composition add-pseudocount \
  --i-table             gut-table-l6.qza \
  --o-composition-table comp-gut-table-l6.qza

[32mSaved FeatureTable[Composition] to: comp-gut-table-l6.qza[0m
[0m


And finally, run ANCOM:

In [7]:
qiime composition ancom \
  --i-table           comp-gut-table-l6.qza \
  --m-metadata-file   sample-metadata.tsv \
  --m-metadata-column subject \
  --o-visualization   l6-ancom-subject.qzv

[32mSaved Visualization to: l6-ancom-subject.qzv[0m
[0m


- [l6-ancom-subject.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fl6-ancom-subject.qzv)

### Exercise

- What genera are differentially abundant between the two samples?

This time only one: `k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Parabacteroides`. It has a CLR of 5.11.

- What happens if instead of comparing by subject at level 6, we repeat the analyses by `reported-antibiotic-usage` at level 3?

In [8]:
qiime taxa collapse \
  --i-table           gut-table.qza \
  --i-taxonomy        taxonomy.qza \
  --p-level           3 \
  --o-collapsed-table gut-table-l3.qza

[32mSaved FeatureTable[Frequency] to: gut-table-l3.qza[0m
[0m


In [9]:
qiime composition add-pseudocount \
  --i-table             gut-table-l3.qza \
  --o-composition-table comp-gut-table-l3.qza

[32mSaved FeatureTable[Composition] to: comp-gut-table-l3.qza[0m
[0m


In [10]:
qiime composition ancom \
  --i-table           comp-gut-table-l3.qza \
  --m-metadata-file   sample-metadata.tsv \
  --m-metadata-column reported-antibiotic-usage \
  --o-visualization   l3-ancom-antibiotic.qzv

[32mSaved Visualization to: l3-ancom-antibiotic.qzv[0m
[0m


This time, there is only one single overrepresented class (level 3): `k__Bacteria;p__Firmicutes;c__Erysipelotrichi`

It has a CLR of 3.34. According to the bottom table. It went from having 13 to 73 reads per sample with no antibiotics to just 1 in all samples.

Since the `add-pseudocount` what only does is fill with ones all those ASVs that should be a 0, we can totally say that this genus was present in samples without antibiotics, and completely wiped out when applying them.