# 3. Diversity

(Remember to unpause the previous notebook)

So far, we have:
- separated and cleaned our samples, and 
- determined which representative sequences are in our sample and how many

This notebook tries to figure out how diverse our samples are

<img src="../assets/img/qiime_map.svg"  width="1200" height="600">

We start with three files from the previous notebook:
- `sample-metadata.tsv`: experimental design data from our samples
- `rep-seqs.qza`: the actual sequences of our OTU/ASVs
- `table.qza`: the frequencies of each OTU and sample.

## 4. Diversity analysis

We want to know:
- the diversity of each sample (alpha diversity)
- and how diverse are each pair of samples (beta diversity)

But before doing these analyses, we need to make all the sequences comparable.

### 4.1 Alignment and tree construction

To do so we are going to:
- align all the sequences
- compute their phylogenetic tree

Thankfully, we can do it in one go with `qiime phylogeny`

In [None]:
qiime phylogeny --help

There are three methods to align and build the tree:
- MAFFT + FastTree (fastest)  <- this
- MAFFT + IQTREE
- MAFFT + RAxML (most precise)

Let's see the help to know what we need to give it in order to work:

In [None]:
qiime phylogeny align-to-tree-mafft-fasttree --help

In [None]:
qiime phylogeny align-to-tree-mafft-fasttree \
    --i-sequences        rep-seqs.qza \
    --o-alignment        phylo-aligned-seqs.qza \
    --o-masked-alignment phylo-masked-aligned-seqs.qza \
    --o-tree             phylo-unrooted-tree.qza \
    --o-rooted-tree      phylo-rooted-tree.qza

All the files are artifacts, so there is nothing to see :(

## 4.2 Core metrics

Also, the alpha and beta diversities are computed in one single command. It is done with `qiime diversity core-metrics-phylogenetic`:

Briefly:
- alpha: individual diversity of each sample
- beta: comparison of diversity between two samples

In [None]:
qiime diversity core-metrics-phylogenetic --help

Remember [table.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Ftable.qzv)

Here we set the `--p-sampling-depth` parameter to 1103. This value was chosen based on the number of sequences in the L3S313 sample because it’s close to the number of sequences in the next few samples that have higher sequence counts, and because it is considerably higher (relatively) than the number of sequences in the samples that have fewer sequences. 

In [None]:
qiime diversity core-metrics-phylogenetic \
  --i-phylogeny      phylo-rooted-tree.qza \
  --i-table          table.qza \
  --p-sampling-depth 1103 \
  --m-metadata-file  sample-metadata.tsv \
  --output-dir       metrics

The results appear in the diversity-core-metrics-results folder

The visualizable ones:

- [metrics/bray_curtis_emperor.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Fbray_curtis_emperor.qzv)
- [metrics/jaccard_emperor.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Fjaccard_emperor.qzv)
- [metrics/unweighted_unifrac_emperor.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Funweighted_unifrac_emperor.qzv)
- [metrics/weighted_unifrac_emperor.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Fweighted_unifrac_emperor.qzv)

For every distance, we have a clustering result:
- Bray-Curtis
- Jaccard
- Unweighted UNIFRAC
- Weighted UNIFRAC

## 4.3 Alpha diversity

Alpha diversities come in two flavors: 
- Faith Phylogenetic Diversity (richness)
- Evennes.
We compute and visualize both:

In [None]:
# Richness ~ faith phylogenetic diversity
qiime diversity alpha-group-significance \
  --i-alpha-diversity metrics/faith_pd_vector.qza \
  --m-metadata-file   sample-metadata.tsv \
  --o-visualization   metrics/faith-pd-group-significance.qzv

In [None]:
# evenness
qiime diversity alpha-group-significance \
  --i-alpha-diversity metrics/evenness_vector.qza \
  --m-metadata-file   sample-metadata.tsv \
  --o-visualization   metrics/evenness-group-significance.qzv

- [metrics/faith-pd-group-significance.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Ffaith-pd-group-significance.qzv)
- [metrics/evenness-group-significance.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Fevenness-group-significance.qzv)

### Exercise

**Note**: the Kruskal Wallis statitstical test checks if two or more samples come from the same distribution:
- H high -> p low -> they are different
- H low -> p high -> they are **not** different (!= equal)

- According to the Kruskal-Wallis test of all the samples, do they have the same or different **richness**?
  - By body site?
  - By subject?
  - By antibiotic usage?

- According to the Kruskal-Wallis test of all the samples, do they have the same or different **evenness**?
  - By body site?
  - By subject?
  - By antibiotic usage?

## 4.4 Beta diversity

We are going to analyze the sample composition using PERMANOVA tests. The purpose is to compare distances between groups of samples (body parts) and tell if they are different or not.

We should expect to see that the left and right hands are similar, that they are far away from gut, and that the tongue sits in beween

We are doing two tests: by body site and subject:

In [None]:
qiime diversity beta-group-significance \
  --i-distance-matrix metrics/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file   sample-metadata.tsv \
  --m-metadata-column body-site \
  --o-visualization   metrics/unweighted-unifrac-body-site-significance.qzv \
  --p-pairwise

In [None]:
qiime diversity beta-group-significance \
  --i-distance-matrix metrics/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file   sample-metadata.tsv \
  --m-metadata-column subject \
  --o-visualization   metrics/unweighted-unifrac-subject-group-significance.qzv \
  --p-pairwise

- [metrics/unweighted-unifrac-body-site-significance.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Funweighted-unifrac-body-site-significance.qzv)
- [metrics/unweighted-unifrac-subject-group-significance.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Funweighted-unifrac-subject-group-significance.qzv)

### Exercise

- Are there differences between body sites?

- Are there differences between body sites?

Instead of using statistical tests and p-values to see the differences, we can use the `emperor` plugin to plot the samples in 3D space.

In [None]:
qiime emperor plot \
  --i-pcoa          metrics/unweighted_unifrac_pcoa_results.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-custom-axes   days-since-experiment-start \
  --o-visualization metrics/unweighted-unifrac-emperor-days-since-experiment-start.qzv

In [None]:
qiime emperor plot \
  --i-pcoa          metrics/bray_curtis_pcoa_results.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-custom-axes   days-since-experiment-start \
  --o-visualization metrics/bray-curtis-emperor-days-since-experiment-start.qzv

- [metrics/unweighted-unifrac-emperor-days-since-experiment-start.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Fbray-curtis-emperor-days-since-experiment-start.qzv)
- [metrics/bray-curtis-emperor-days-since-experiment-start.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Fcore-metrics-results%2Funweighted-unifrac-emperor-days-since-experiment-start.qzv)

### Exercise

- Do each body site follow their own progression?

- Are the right and left hands simmilar?

- Do you notice something weird?

## 4.5 Alpha rarefaction

Sometimes you need to know if you have sequenced enough for every sample.

Rarefaction consists on doing the same analysis multiple times with different coverages

In our case, we want to know if we have captured all the richness for every sample.

To do so, we run the alpha diversity function with 500 reads per sample, then 1000, them 1500, and so on, and plot the results:

We will know that we have sequenced enough if we are getting the same diversity, i.e., the plot has plateaued.

In [None]:
qiime diversity alpha-rarefaction \
    --i-table         table.qza \
    --i-phylogeny     phylo-rooted-tree.qza \
    --p-max-depth     12000 \
    --m-metadata-file sample-metadata.tsv \
    --o-visualization metrics/alpha-rarefaction.qzv

- [alpha_rarefaction.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2023.2%2Fdata%2Ftutorials%2Fmoving-pictures%2Falpha-rarefaction.qzv)

Do we reach a pleateau in all samples?

What if we set the maximum depth to 12,000? 

## End of notebook

In [None]:
pause

Click the stop button before continuing to the next notebook