# Alpha Diversity Analysis

We will first test for associations between our categorical metadata columns and alpha diversity. Alpha diversity asks about the distribution of features within each sample, and once calculated for all samples can be used to test whether the per‐sample diversity differs across different conditions (e.g., samples obtained at different ages). The comparison makes no assumptions about the features that are shared between samples; two samples can have the same alpha diversity and not share any features.

## Import Libraries

In [50]:
import os
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization
import numpy as np

%matplotlib inline

In [2]:
data_dir = 'data'
data_dir_phyl = 'data/phylogeny'
data_dir_div = 'data/alpha_diversity'

## Alpha Rarefaction

To perform rarefaction, we first need to decide which sampling depth is best suited for our dataset. For this, we will analyse how sampling depth impacts within-sample diversity estimates (= alpha diversity) with the alpha-rarefaction action. This action generates interactive alpha rarefaction curves for sequencing depths between min_depth and max_depth and computes 10 (default) rarefied tables with corresponding alpha diversity metrics at each sampling depth step.

In [57]:
! qiime diversity alpha-rarefaction \
    --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
    --p-max-depth 10000 \
    --m-metadata-file $data_dir/pundemic_metadata.tsv \
    --o-visualization $data_dir_div/alpha_rarefaction.qzv

#   --i-phylogeny $data_dir_phyl/fasttree_tree_rooted.qza \

^C

Aborted!


In [54]:
Visualization.load(f"{data_dir_div}/alpha_rarefaction.qzv")

## Diversity Analysis
Applies a collection of diversity metrics (non-phylogenetic) to a feature
table. For alpha diversity three metrics are important: 
- Shannon Entropy: a quantitative measure of community richness
- Pielou Evenness: a measure of community evenness
- observed features: a quantitative measure of community richness, called “observed OTUs” here for historical reasons

In [79]:
! qiime diversity core-metrics \
  --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
  --m-metadata-file $data_dir/pundemic_metadata.tsv \
  --p-sampling-depth 3000 \
  --p-n-jobs 8 \
  --output-dir $data_dir_div/core_metrics_results

[32mSaved FeatureTable[Frequency] to: data/alpha_diversity/core_metrics_results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: data/alpha_diversity/core_metrics_results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/alpha_diversity/core_metrics_results/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: data/alpha_diversity/core_metrics_results/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: data/alpha_diversity/core_metrics_results/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/jaccard_emperor.qzv[0m
[32mSaved Visualization to: data/alpha_diversity/core_me

In [13]:
metadata = pd.read_csv(f"{data_dir}/pundemic_metadata.tsv", sep='\t')

## Pairwise difference comparisons between time points
When microbial data are collected at different timepoints, it is useful to examine dynamic changes in the microbial communities (longitudinal analysis). This Pairwise difference test determines whether the value of a specific metric changed significantly between pairs of paired samples (e.g., pre- and post-treatment).

In [55]:
metadata['subgroup_response'] = metadata['disease_subgroup'] + "_" + metadata['blinded_clinical_response']
metadata.age = metadata.age.replace('Unknown', np.nan)
metadata.sex = metadata.sex.replace('Unknown', np.nan)
metadata.to_csv(f'{data_dir}/pundemic_metadata_subgroup_response_all.tsv', sep='\t', index=False)

In [15]:
! qiime tools export \
  --input-path $data_dir_div/core_metrics_results/shannon_vector.qza \
  --output-path $data_dir_div/shannon

[32mExported data/alpha_diversity/core_metrics_results/shannon_vector.qza as AlphaDiversityDirectoryFormat to directory data/alpha_diversity/shannon[0m
[0m

In [16]:
! qiime tools export \
  --input-path $data_dir_div/core_metrics_results/evenness_vector.qza \
  --output-path $data_dir_div/evenness

[32mExported data/alpha_diversity/core_metrics_results/evenness_vector.qza as AlphaDiversityDirectoryFormat to directory data/alpha_diversity/evenness[0m
[0m

In [17]:
! qiime tools export \
  --input-path $data_dir_div/core_metrics_results/observed_features_vector.qza \
  --output-path $data_dir_div/observed_features

[32mExported data/alpha_diversity/core_metrics_results/observed_features_vector.qza as AlphaDiversityDirectoryFormat to directory data/alpha_diversity/observed_features[0m
[0m

In [21]:
shannon = pd.read_csv(f"{data_dir_div}/shannon/alpha-diversity.tsv", sep='\t')
evenness = pd.read_csv(f"{data_dir_div}/evenness/alpha-diversity.tsv", sep='\t')
observed_features = pd.read_csv(f"{data_dir_div}/observed_features/alpha-diversity.tsv", sep='\t')

In [24]:
shannon.rename(columns = {shannon.columns[0]: "id"}, inplace = True)
evenness.rename(columns = {evenness.columns[0]: "id"}, inplace = True)
observed_features.rename(columns = {observed_features.columns[0]: "id"}, inplace = True)

In [25]:
metrics = pd.merge(shannon, evenness, on = "id")
metrics = pd.merge(metrics, observed_features, on = "id")
metrics.head()

Unnamed: 0,id,shannon_entropy,pielou_evenness,observed_features
0,SRR10505051,1.267043,0.366257,11
1,SRR10505052,0.971329,0.217814,22
2,SRR10505053,0.312109,0.073473,19
3,SRR10505056,2.111094,0.460439,24
4,SRR10505057,1.607049,0.434286,13


In [26]:
metadata = pd.merge(metadata, metrics, on = "id")
metadata.head()

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,disease_subgroup,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,subgroup_response,shannon_entropy,pielou_evenness,observed_features
0,SRR10505051,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,post-treatment,Placebo_NR,1.267043,0.366257,11
1,SRR10505052,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,pre-treatment,Placebo_NR,0.971329,0.217814,22
2,SRR10505053,1045,29,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,Res,6.0,0.0,pre-treatment,Placebo_Res,0.312109,0.073473,19
3,SRR10505056,1044,34,male,Indian Subcontinental,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,,4.0,,post-treatment,,2.111094,0.460439,24
4,SRR10505057,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,9.0,6.0,post-treatment,FMT_NR,1.607049,0.434286,13


There was only one patient with placebo response in our metadata, where sadly the post-treatment sample got filtered out at the 3000 cut in the rarefaction step:

State post-treatment is not represented by any members of Placebo_Res group in metadata. Consider using a different group_column or state value.
So we need to filter out the "subgroup_response != "Placebo_Res""

In [27]:
metadata_filtered = metadata[metadata.subgroup_response.notna()]
metadata_filtered = metadata_filtered[metadata_filtered.subgroup_response != "Placebo_Res"]
metadata_filtered.to_csv(f'{data_dir}/pundemic_metadata_subgroup_response.tsv', sep='\t', index=False)
metadata_filtered

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,disease_subgroup,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,subgroup_response,shannon_entropy,pielou_evenness,observed_features
0,SRR10505051,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,post-treatment,Placebo_NR,1.267043,0.366257,11
1,SRR10505052,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,pre-treatment,Placebo_NR,0.971329,0.217814,22
4,SRR10505057,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,9.0,6.0,post-treatment,FMT_NR,1.607049,0.434286,13
5,SRR10505058,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,9.0,6.0,pre-treatment,FMT_NR,1.976768,0.425674,25
7,SRR10505060,1042,40,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,pre-treatment,Placebo_NR,1.008220,0.241784,18
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,SRR10505141,1001,57,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,Res,7.0,2.0,pre-treatment,FMT_Res,3.011957,0.685733,21
78,SRR10505142,1001,57,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,Res,7.0,2.0,post-treatment,FMT_Res,2.499077,0.578232,20
86,SRR10505153,2225,34,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,6.0,5.0,pre-treatment,FMT_NR,0.198583,0.070737,7
87,SRR10505154,1024,Unknown,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,Res,8.0,0.0,post-treatment,FMT_Res,2.079506,0.466316,22


### Shannon Entropy

In [31]:
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric shannon_entropy \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/shannon-pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/shannon-pairwise-differences.qzv[0m
[0m

In [32]:
Visualization.load(f"{data_dir_div}/shannon-pairwise-differences.qzv")

### Pielou Evenness

In [33]:
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric pielou_evenness \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/evenness-pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/evenness-pairwise-differences.qzv[0m
[0m

In [35]:
Visualization.load(f"{data_dir_div}/evenness-pairwise-differences.qzv")

### Observed Features

In [36]:
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric observed_features \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/observed_features-pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/observed_features-pairwise-differences.qzv[0m
[0m

In [37]:
Visualization.load(f"{data_dir_div}/observed_features-pairwise-differences.qzv")

There are no significant differences between the different groups ("Pairwise group comparison tests"). The same thing can be concluded from the Pairwise difference tests, which states that there are no significant differences between the pre- and post-treatment samples.

## Diversity differences between categorical metadata 
The rarefied SampleData[AlphaDiversity] artifact produced in the above step contains univariate, continuous values and can be tested using common non‐parametric statistical test (e.g., Kruskal‐Wallis test) with the following command:

In [56]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/shannon_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/core_metrics_results/shannon_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/shannon_vector.qzv[0m
[0m

In [57]:
Visualization.load(f"{data_dir_div}/core_metrics_results/shannon_vector.qzv")

In [58]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/evenness_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/core_metrics_results/evenness_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/evenness_vector.qzv[0m
[0m

In [59]:
Visualization.load(f"{data_dir_div}/core_metrics_results/evenness_vector.qzv")

In [60]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/observed_features_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/core_metrics_results/observed_features_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/observed_features_vector.qzv[0m
[0m

In [62]:
Visualization.load(f"{data_dir_div}/core_metrics_results/observed_features_vector.qzv")

One important confounding factor here is that we are simultaneously analyzing our samples across all time points and in doing so potentially losing meaningful signals at a particular time point. Importantly, having more than one time point per subject also violates the assumption of the Kurskal‐Wallis test that all samples are independent. More appropriate methods that take into account repeated measurements from the same samples are demonstrated in the longitudinal paiwise data analysis section above.

## Alpha Correlation of numeric data (puns_per_hour)

In [42]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/core_metrics_results/shannon_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/shannon_alpha_correlation.qzv

[32mSaved Visualization to: data/alpha_diversity/shannon_alpha_correlation.qzv[0m
[0m

In [43]:
Visualization.load(f"{data_dir_div}/shannon_alpha_correlation.qzv")

In [44]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/core_metrics_results/evenness_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/evenness_alpha_correlation.qzv

[32mSaved Visualization to: data/alpha_diversity/evenness_alpha_correlation.qzv[0m
[0m

In [45]:
Visualization.load(f"{data_dir_div}/evenness_alpha_correlation.qzv")

In [46]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/core_metrics_results/observed_features_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/observed_features_alpha_correlation.qzv

[32mSaved Visualization to: data/alpha_diversity/observed_features_alpha_correlation.qzv[0m
[0m

In [47]:
Visualization.load(f"{data_dir_div}/observed_features_alpha_correlation.qzv")