# Amplicon Sequence Data Analysis with QIIME 2

Adding ```!``` before the command tells the notebook this is a bash command, rather than python.

To use sequencing data in QIIME2, we first need to turn the FASTQ files containing our data into QIIME artifacts.

What the QIIME2 pipeline will do:
![our workflow](https://github.com/Gibbons-Lab/isb_course_2023/raw/main/docs/16S/assets/steps.png)

### About the Data
I downloaded FASTQ data files generated by [Mr. DNA Lab](https://www.mrdnalab.com/) Molecular Research from DropBox. 
I unzipped the folders and uploaded the `sample-metadata.tsv` file and `demux` folder into the `coral-pae-temp/analysis/microbiome/data` directory.  

Here I am working with the FASTQ data files located in `coral-pae-temp/analysis/microbiome/data/demux`. In the `demux` folder are two `fastq.gz` files for each of the 22 samples, one for the forward read and one for the reverse read. 

The `fastq.gz` file name includes the sample identifier and should look like `4.Ea_S1_L001_R1_001.fastq.gz`. 
The underscore-separated fields in this file name are:

1.  the sample identifier,

2.  the barcode sequence or a barcode identifier,

3.  the lane number,

4.  the direction of the read (i.e. R1 or R2, because these are paired-end reads), and

5.  the set number.
   

The `fastq.gz` files are **Demultiplexed** (aka **Demuxed**) sequences that still have the forward and reverse primers in the sequences.

-   The Raw Data is **demultiplexed**

-   A R1 and R2 fastq.gz file has been generated for each individual sample

-   All forward reads are binned into the R1 fastq.gz files

-   All reverse reads are binned into the R2 fastq.gz files

-   Other than demultiplexing; you can consider the Raw Data on BaseSpace as untouched (**The Forward and Reverse Primer Sequences have not been removed**)

## Python 3 API import qiime plugins

In [1]:
from qiime2 import Visualization
from qiime2 import Artifact

In [2]:
#pip install empress
#!qiime dev refresh-cache
#!qiime empress --help

### Treatment
Only treatment samples (remove Environmental control samples)

In [3]:
import pandas as pd

In [4]:
# read in tsv sample metadata as a csv
df = pd.read_csv('../data/sample-metadata-verbose.tsv', delimiter='\t')
# make a list of SampleID values to remove
remove = ['1.Ea', '2.Ea', '3.Eb', '4.Ea', 'blank.', 'mock.']
# remove those rows 
df = df[~df['#SampleID'].isin(remove)]
df

Unnamed: 0,#SampleID,BarcodeSequence,LinkerPrimerSequence,BarcodeName,ReversePrimer,ProjectName,Description,Pae,Temp,PaeTemp,Colony,Tank
0,1.CA2a,ATCATAGGCT,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0073,GGACTACNVGGGTWTCTAAT,060823STillcus515F,1.CA2a,control,ambient,control-ambient,1,A2
1,1.CH2a,TGTTAGAAGG,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0074,GGACTACNVGGGTWTCTAAT,060823STillcus515F,1.CH2a,control,hot,control-hot,1,H2
3,1.PA2a,ACGGCCGTCA,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0076,GGACTACNVGGGTWTCTAAT,060823STillcus515F,1.PA2a,peak,ambient,peak-ambient,1,A2
4,1.PH1a,CGTTGCTTAC,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0077,GGACTACNVGGGTWTCTAAT,060823STillcus515F,1.PH1a,peak,hot,peak-hot,1,H1
5,2.CA2a,TGACTACATA,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0078,GGACTACNVGGGTWTCTAAT,060823STillcus515F,2.CA2a,control,ambient,control-ambient,2,A2
6,2.CH1b,CGGCCTCGTT,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0079,GGACTACNVGGGTWTCTAAT,060823STillcus515F,2.CH1b,control,hot,control-hot,2,H1
8,2.PA1b,TCGTCTGACT,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0081,GGACTACNVGGGTWTCTAAT,060823STillcus515F,2.PA1b,peak,ambient,peak-ambient,2,A1
9,2.PH2a,CTCATAGCGA,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0082,GGACTACNVGGGTWTCTAAT,060823STillcus515F,2.PH2a,peak,hot,peak-hot,2,H2
10,3.CA1b,AGACACATTA,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0083,GGACTACNVGGGTWTCTAAT,060823STillcus515F,3.CA1b,control,ambient,control-ambient,3,A1
11,3.CH2a,GCGCGATGTT,GTGYCAGCMGCCGCGGTAA,60bp_UDPi5_0084,GGACTACNVGGGTWTCTAAT,060823STillcus515F,3.CH2a,control,hot,control-hot,3,H2


In [5]:
# save the new df as a tsv
df.to_csv('../data/sample-metadata-treatment.tsv', sep='\t', index=False)

In [6]:
!qiime feature-table filter-samples \
  --i-table ../output/filtered/table-no-mock-no-hits-taxon-filtered.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --o-filtered-table ../output/filtered/table-treatment.qza

[32mSaved FeatureTable[Frequency] to: ../output/filtered/table-treatment.qza[0m
[0m

In [7]:
!qiime feature-table summarize \
  --i-table ../output/filtered/table-treatment.qza \
  --o-visualization ../output/filtered/table-treatment.qzv \
  --m-sample-metadata-file ../data/sample-metadata-treatment.tsv

[32mSaved Visualization to: ../output/filtered/table-treatment.qzv[0m
[0m

In [8]:
Visualization.load('../output/filtered/table-treatment.qzv')

## Diversity & Phylogenetics

[Generate a tree for phylogenetic diversity analyses](https://docs.qiime2.org/2023.5/tutorials/moving-pictures-usage/#:~:text=Generate%20a%20tree%20for%20phylogenetic%20diversity%20analyses)

From the moving pictures tutorial:
> QIIME supports several phylogenetic diversity metrics, including Faith’s Phylogenetic Diversity and weighted and unweighted UniFrac. In addition to counts of features per sample (i.e., the data in the FeatureTable[Frequency] QIIME 2 artifact), these metrics require a rooted phylogenetic tree relating the features to one another. This information will be stored in a Phylogeny[Rooted] QIIME 2 artifact. To generate a phylogenetic tree we will use align-to-tree-mafft-fasttree pipeline from the q2-phylogeny plugin. 
First, the pipeline uses the mafft program to perform a multiple sequence alignment of the sequences in our FeatureData[Sequence] to create a FeatureData[AlignedSequence] QIIME 2 artifact. Next, the pipeline masks (or filters) the alignment to remove positions that are highly variable. These positions are generally considered to add noise to a resulting phylogenetic tree. Following that, the pipeline applies FastTree to generate a phylogenetic tree from the masked alignment. The FastTree program creates an unrooted tree, so in the final step in this section midpoint rooting is applied to place the root of the tree at the midpoint of the longest tip-to-tip distance in the unrooted tree.

In [9]:
!qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences ../output/dada2/representative_sequences.qza \
  --output-dir ../output/tree

Usage: [94mqiime phylogeny align-to-tree-mafft-fasttree[0m [OPTIONS]

  This pipeline will start by creating a sequence alignment using MAFFT, after
  which any alignment columns that are phylogenetically uninformative or
  ambiguously aligned will be removed (masked). The resulting masked alignment
  will be used to infer a phylogenetic tree and then subsequently rooted at
  its midpoint. Output files from each step of the pipeline will be saved.
  This includes both the unmasked and masked MAFFT alignment from q2-alignment
  methods, and both the rooted and unrooted phylogenies from q2-phylogeny
  methods.

[1mInputs[0m:
  [94m[4m--i-sequences[0m ARTIFACT [32mFeatureData[Sequence][0m
                          The sequences to be used for creating a fasttree
                          based rooted phylogenetic tree.           [35m[required][0m
[1mParameters[0m:
  [94m--p-n-threads[0m VALUE [32mInt % Range(1, None) | Str % Choices('auto')[0m
                          Th

In [10]:
!qiime empress tree-plot \
    --i-tree ../output/tree/rooted_tree.qza \
    --o-visualization ../output/tree/empress.qzv

[32mSaved Visualization to: ../output/tree/empress.qzv[0m
[0m

In [11]:
Visualization.load("../output/tree/empress.qzv")

[Diversity](https://docs.qiime2.org/2023.5/tutorials/moving-pictures-usage/#:~:text=Alpha%20and%20beta%20diversity%20analysis)

QIIME 2’s diversity analyses are available through the `q2-diversity` plugin, which supports computing alpha and beta diversity metrics, applying related statistical tests, and generating interactive visualizations. We’ll first apply the `core-metrics-phylogenetic` method, which rarefies a `FeatureTable Frequency` to a user-specified depth, computes several alpha and beta diversity metrics, and generates principle coordinates analysis (PCoA) plots using Emperor for each of the beta diversity metrics.

An important metric to consider when studying microbial ecology is __diversity__. Diversity comes in two flavors: ⍺ (alpha) and β (beta).

Alpha diversity is pretty simple - how diverse is a single sample? You might consider measures like richness and evenness.

![alpha diversity](https://gibbons-lab.github.io/isb_course_2023/16S/assets/alpha_diversity.png)

Beta diversity instead looks at how different two samples are from each other - what taxa are shared, and how their abundances differ.

![beta diversity](https://gibbons-lab.github.io/isb_course_2023/16S/assets/beta_diversity.png)


The metrics computed by default are:
##### Alpha diversity- 
Shannon’s diversity index (a quantitative measure of community richness- 

Observed Features (a qualitative measure of community richne- s)

Faith’s Phylogenetic Dive**rsity (a qualitative measure of community richness that incorporates phylogenetic relationships between the fea**t- res)

Evenness (or Pielou’s Evenness; a measure of community eve##### nness)

Beta - iversity

Jaccard distance (a qualitative measure of community dis- imilarity)

Bray-Curtis distance (a quantitative measure of community d- ssimilarity)

unweighted U**niFrac distance (a qualitative measure of community dissimilarity that incorporates phylogenetic relationships betwe**e-  the features)

weighted** UniFrac distance (a quantitative measure of community dissimilarity that incorporates phylogenetic relationships bet**ween the features) 

### Sampling Depth

An important parameter that needs to be provided to this code is ` --p-sampling-depth`, which is the even sampling (i.e. rarefaction) depth. Because most diversity metrics are sensitive to different sampling depths across different samples, this script will randomly subsample the counts from each sample to the value provided for this parameter. For am--ple, if you provide `--p-sampling-depth 500`, this step will subsample the counts in each sample without replacement so that each sample in the resulting table has a total count of 500. If the total count for any sample(s) are smaller than this value, those samples will be dropped from the diversity analysis. Choosing this value is tricky. We recommend making your choice by reviewing the information presented in the `table.qzv` file that was created above. Choose a value that is as high as possible (so you retain more sequences per sample) while excluding as few samples as possible.

Navigate to the interactive sample detail tab
<br>
Move the sampling depth slider as high as you can before excluding any samples 
<br>
We want the sampling depth to be high, while retaining all 22 samples
<br>
This looks like a sampling depth of 10,6727 (09AUG2023, SST) 
<br>
But maybe we exclude the mock community and bring it up to 163971? (15AUG2023, SST)
<br>
... Redoing this with only the 16 samples that were exposed to the treatment ( so we're excluding here the mock and blank and environmental baseline samples)
<br>
In this.. the lowest frequency is 166920 

What value would you choose to pass for --p-sampling-depth? 
- **160000**
How many samples will be excluded from your analysis based on this choice? 
- **none, all 16 treatment samples are retained**
How many total sequences will you be analyzing in the core-metrics-phylogenetic command?
Retained 2,560,000 (66.13%) features in 16 (100.00%) samples at the specifed sampling depth.

To account for variations in sampling depth, we'll provide QIIME2 with a cutoff at which rarefy all our samples. Since this randomly selects sequences, your results might look a little different. We'll also pass in our metadata file, so we can keep track how which samples come from each group.

## Alpha rarefaction plotting
In this section we’ll explore alpha diversity as a function of sampling depth using the `qiime diversity alpha-rarefaction` visualizer. This visualizer computes one or more alpha diversity metrics at multiple sampling depths, in steps between 1 (optionally controlled with `--p-min-depth`) and the value provided as `--p-max-depth`. At each sampling depth step, 10 rarefied tables will be generated, and the diversity metrics will be computed for all samples in the tables. The number of iterations (rarefied tables computed at each sampling depth) can be controlled with `--p-iterations`. Average diversity values will be plotted for each sample at each even sampling depth, and samples can be grouped based on metadata in the resulting visualization if sample metadata is provided with the `--m-metadata-file` parameter.

In [12]:
!qiime diversity alpha-rarefaction \
  --i-table ../output/filtered/table-treatment.qza \
  --i-phylogeny ../output/tree/rooted_tree.qza \
  --p-min-depth 10 \
  --p-max-depth  160000 \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --o-visualization ../output/alpha-treatment/alpha-rarefaction.qzv

[32mSaved Visualization to: ../output/alpha-treatment/alpha-rarefaction.qzv[0m
[0m

In [13]:
Visualization.load('../output/alpha-treatment/alpha-rarefaction.qzv')

In [14]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny ../output/tree/rooted_tree.qza \
  --i-table ../output/filtered/table-treatment.qza \
  --p-sampling-depth 160000 \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --no-recycle \
  --verbose \
  --o-rarefied-table ../output/diversity-treatment/rarefied-table.qza \
  --o-shannon-vector ../output/diversity-treatment/shannon-vector.qza \
  --o-evenness-vector ../output/diversity-treatment/evenness-vector.qza \
  --o-jaccard-distance-matrix ../output/diversity-treatment/jaccard-distance-matrix.qza \
  --o-bray-curtis-distance-matrix ../output/diversity-treatment/bray-curtis-distance-matrix.qza \
  --o-jaccard-pcoa-results ../output/diversity-treatment/jaccard-pcoa-results.qza \
  --o-bray-curtis-pcoa-results ../output/diversity-treatment/bray-curtis-pcoa-results.qza \
  --o-jaccard-emperor ../output/diversity-treatment/jaccard-emperor.qzv \
  --o-bray-curtis-emperor ../output/diversity-treatment/bray-curtis-emperor.qzv \
  --o-weighted-unifrac-distance-matrix ../output/diversity-treatment/weighted-unifrac-distance-matrix.qza \
  --o-unweighted-unifrac-emperor ../output/diversity-treatment/unweighted-unifrac-emperor.qzv \
  --o-faith-pd-vector ../output/diversity-treatment/faith-pad-vector.qza \
  --o-observed-features-vector ../output/diversity-treatment/observed-features-vector.qza \
  --o-unweighted-unifrac-pcoa-results ../output/diversity-treatment/unweighted-unifrac-pcoa-results.qza \
  --o-weighted-unifrac-pcoa-results ../output/diversity-treatment/weighted-unifrac-pcoa-results.qza \
  --o-unweighted-unifrac-distance-matrix ../output/diversity-treatment/unweighted-unifrac-distance-matrix.qza \
  --o-weighted-unifrac-emperor ../output/diversity-treatment/weighted-unifrac-emperor.qzv

  warn(
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command:

faithpd -i /tmp/qiime2/stanja/data/c10eca03-55e7-4350-9e4c-5b42a38eca0e/data/feature-table.biom -t /tmp/qiime2/stanja/data/085b6921-9b93-43d7-83e2-cf3cab988dca/data/tree.nwk -o /tmp/q2-AlphaDiversityFormat-jjv109kp

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command:

ssu -i /tmp/qiime2/stanja/data/c10eca03-55e7-4350-9e4c-5b42a38eca0e/data/feature-table.biom -t /tmp/qiime2/stanja/data/085b6921-9b93-43d7-83e2-cf3cab988dca/data/tree.nwk -m unweighted -o /tmp/q2-LSMatFormat-7t61lv50

Running external command line application. This may print messages to stdout and/or stderr.
T

### Unweighted Unifrac Emperor Plot

In [15]:
Visualization.load('../output/diversity-treatment/unweighted_unifrac_emperor.qzv')

### Weighted Unifrac Emperor Plot

In [16]:
Visualization.load('../output/diversity-treatment/weighted_unifrac_emperor.qzv')

### Jaccard Emperor PLot

In [17]:
Visualization.load('../output/diversity-treatment/jaccard_emperor.qzv')

### Bray-Curtis Emperor Plot

In [18]:
Visualization.load('../output/diversity-treatment/bray_curtis_emperor.qzv')

## Alpha Diversity
After computing diversity metrics, we can begin to explore the microbial composition of the samples in the context of the sample metadata. This information is present in the sample metadata file `../rawdata/sample-metadata-treatment.tsv`.

We’ll first test for associations between categorical metadata columns and alpha diversity data.
We’ll do that here for the Faith Phylogenetic Diversity (a measure of community richness):

In [19]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity ../output/diversity-treatment/faith_pd_vector.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --o-visualization ../output/alpha-treatment/faith-pd-group-significance.qzv

[32mSaved Visualization to: ../output/alpha-treatment/faith-pd-group-significance.qzv[0m
[0m

### Microbial Community Richness: Faith Phylogenetic Diversity

##### Which categorical sample metadata columns are most strongly associated with the differences in microbial community richness?
##### Are these differences statistically significant?

Pae
- no significantly different groups

Temp
- no significantly different groups
  
PaeTemp
- no significantly different groups

Colony
- no significantly different groups

Tank
- no significantly different groups

In [20]:
Visualization.load('../output/alpha-treatment/faith-pd-group-significance.qzv')

In [21]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity ../output/diversity-treatment/evenness_vector.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --o-visualization ../output/alpha-treatment/evenness-group-significance.qzv

[32mSaved Visualization to: ../output/alpha-treatment/evenness-group-significance.qzv[0m
[0m

### Microbial Community Evenness, Alpha diversity , Pielou's Evenness

##### Which categorical sample metadata columns are most strongly associated with the differences in microbial community evenness? 
Surprised by Tank result: H1 vs H2, I wouldn't have expected those to be different.

##### Are these differences statistically significant?
Pae
- no significantly different groups
- control (n=8) vs peak (n=8), pvalue (0.07)

Temp
- no significantly different groups
- control (n=8) vs peak (n=8), pvalue (0.07)

PaeTemp
- control-ambient (n=4) vs peak-ambient (n=4), pvalue (0.04)
- control-ambient (n=4) vs peak-hot (n=4), pvalue (0.02)
  
Colony
- no categorical data?

Tank
Kruskal-Wallis pvalue (0.02)
- A1 (n=4) vs H1 (n=5), pvalue (0.05)
- A2 (n=4) vs H1 (n=5), pvalue (0.01)
- H1 (n=5) vs H2 (n=3), pvalue (0.05)

In [22]:
Visualization.load('../output/alpha-treatment/evenness-group-significance.qzv')

let's use the Shannon vector in the output directory to create a visualization of alpha diversity across samples.

In [23]:
!qiime diversity alpha-group-significance \
    --i-alpha-diversity ../output/diversity-treatment/shannon_vector.qza \
    --m-metadata-file ../data/sample-metadata-treatment.tsv \
    --o-visualization ../output/alpha-treatment/alpha_groups.qzv

[32mSaved Visualization to: ../output/alpha-treatment/alpha_groups.qzv[0m
[0m

In [24]:
Visualization.load("../output/alpha-treatment/alpha_groups.qzv")

## Beta Diversity

Next we’ll analyze sample composition in the context of categorical metadata using PERMANOVA (first described in Anderson (2001)) using the beta-group-significance command. The following commands will test whether distances between samples within a group, such as samples from the same body site (e.g., gut), are more similar to each other then they are to samples from the other groups (e.g., tongue, left palm, and right palm). If you call this command with the --p-pairwise parameter, as we’ll do here, it will also perform pairwise tests that will allow you to determine which specific pairs of groups (e.g., tongue and gut) differ from one another, if any. This command can be slow to run, especially when passing --p-pairwise, since it is based on permutation tests. So, unlike the previous commands, we’ll run beta-group-significance on specific columns of metadata that we’re interested in exploring, rather than all metadata columns to which it is applicable. Here we’ll apply this to our unweighted UniFrac distances, using two sample metadata columns, as follows.

Let's visualize the beta diversity and see how they separate. For this we'll look at weighted UniFrac. 
<br>

We can check for 'significant' separation between samples using PERMANOVA. We can do this with the diversity plugin in QIIME2.

### Adonis Test

In [43]:
!qiime diversity adonis \
    --i-distance-matrix ../output/diversity-treatment/weighted_unifrac_distance_matrix.qza \
    --m-metadata-file ../data/sample-metadata-treatment.tsv \
    --p-formula Pae \
    --p-n-jobs 2 \
    --o-visualization ../output/beta-treatment/pae_permanova.qzv

[32mSaved Visualization to: ../output/beta-treatment/pae_permanova.qzv[0m
[0m

In [44]:
Visualization.load("../output/beta-treatment/pae_permanova.qzv")

In [45]:
!qiime diversity adonis \
    --i-distance-matrix ../output/diversity-treatment/weighted_unifrac_distance_matrix.qza \
    --m-metadata-file ../data/sample-metadata-treatment.tsv \
    --p-formula Temp \
    --p-n-jobs 2 \
    --o-visualization ../output/beta-treatment/temp_permanova.qzv

[32mSaved Visualization to: ../output/beta-treatment/temp_permanova.qzv[0m
[0m

In [46]:
Visualization.load("../output/beta-treatment/temp_permanova.qzv")

In [29]:
!qiime diversity adonis \
    --i-distance-matrix ../output/diversity-treatment/weighted_unifrac_distance_matrix.qza \
    --m-metadata-file ../data/sample-metadata-treatment.tsv \
    --p-formula Pae+Temp \
    --p-n-jobs 2 \
    --o-visualization ../output/beta-treatment/paetemp_permanova.qzv

[32mSaved Visualization to: ../output/beta-treatment/paetemp_permanova.qzv[0m
[0m

In [30]:
Visualization.load("../output/beta-treatment/paetemp_permanova.qzv")

### Beta-Group Significance

In [31]:
!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/diversity-treatment/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --m-metadata-column Pae \
  --o-visualization ../output/beta-treatment/unweighted-unifrac-pae-significance.qzv \
  --p-pairwise

!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/diversity-treatment/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --m-metadata-column Temp \
  --o-visualization ../output/beta-treatment/unweighted-unifrac-temp-significance.qzv \
  --p-pairwise

!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/diversity-treatment/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --m-metadata-column PaeTemp \
  --o-visualization ../output/beta-treatment/unweighted-unifrac-paetemp-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: ../output/beta-treatment/unweighted-unifrac-pae-significance.qzv[0m
[0m[32mSaved Visualization to: ../output/beta-treatment/unweighted-unifrac-temp-significance.qzv[0m
[0m[32mSaved Visualization to: ../output/beta-treatment/unweighted-unifrac-paetemp-significance.qzv[0m
[0m

In [32]:
Visualization.load("../output/beta-treatment/unweighted-unifrac-pae-significance.qzv")

In [33]:
Visualization.load("../output/beta-treatment/unweighted-unifrac-temp-significance.qzv")

In [34]:
Visualization.load("../output/beta-treatment/unweighted-unifrac-paetemp-significance.qzv")

## PERMDISP

In [54]:
!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/diversity-treatment/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --m-metadata-column Pae \
  --o-visualization ../output/diversity-treatment/weighted-unifrac-pae-permdisp.qzv \
  --p-method permdisp \
  --p-pairwise

[32mSaved Visualization to: ../output/diversity-treatment/weighted-unifrac-pae-permdisp.qzv[0m
[0m

In [55]:
Visualization.load("../output/diversity-treatment/weighted-unifrac-pae-permdisp.qzv")

In [52]:
!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/diversity-treatment/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --m-metadata-column Temp \
  --o-visualization ../output/diversity-treatment/weighted-unifrac-temp-permdisp.qzv \
  --p-method permdisp \
  --p-pairwise

[32mSaved Visualization to: ../output/diversity-treatment/weighted-unifrac-temp-permdisp.qzv[0m
[0m

In [53]:
Visualization.load("../output/diversity-treatment/weighted-unifrac-temp-permdisp.qzv")

In [49]:
!qiime diversity beta-group-significance \
  --i-distance-matrix ../output/diversity-treatment/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file ../data/sample-metadata-treatment.tsv \
  --m-metadata-column PaeTemp \
  --o-visualization ../output/diversity-treatment/weighted-unifrac-paetemp-permdisp.qzv \
  --p-method permdisp \
  --p-pairwise

[32mSaved Visualization to: ../output/diversity-treatment/weighted-unifrac-paetemp-permdisp.qzv[0m
[0m

In [50]:
Visualization.load("../output/diversity-treatment/weighted-unifrac-paetemp-permdisp.qzv")

## [Questions to Guide Data Analysis](https://docs.qiime2.org/2023.5/tutorials/atacama-soils/#paired-end-read-analysis-commands:~:text=Questions%20to%20guide%20data%20analysis)
What sample metadata or combinations of sample metadata are most strongly associated with the differences in microbial composition of the samples? Are these associations stronger with unweighted UniFrac or with Bray-Curtis? Based on what you know about these metrics, what does that difference suggest? For exploring associations between continuous metadata and sample composition, the commands `qiime metadata distance-matrix` in combination with `qiime diversity mantel` and `qiime diversity bioenv` will be useful. These were not covered in the Moving Pictures tutorial, but you can learn about them by running them with the `--help` parameter.

Next Steps:
<br>
- I want to know which taxa are uniquely present in each treatment, and which ones stay the same
 - ANCOM, composition plugin (apply ANalysis of Composition of Microbiomes (ANCOM) to identify features that are differentially abundant across groups
<br>
- Use [PICRUST](https://library.qiime2.org/plugins/q2-picrust2/13/) for functional analysis
