# Alpha Diversity Analysis

We will first test for associations between our categorical metadata columns and alpha diversity. Alpha diversity asks about the distribution of features within each sample, and once calculated for all samples can be used to test whether the per‐sample diversity differs across different conditions (e.g., samples obtained at different ages). The comparison makes no assumptions about the features that are shared between samples; two samples can have the same alpha diversity and not share any features.

In [1]:
# Import libraries
import os
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization
import numpy as np

%matplotlib inline

In [2]:
data_dir = 'data'
data_dir_phyl = 'data/phylogeny' # used when attemping phylogentic based analysis
data_dir_div = 'data/alpha_diversity'

## Alpha Rarefaction

To perform rarefaction, we first need to decide which sampling depth is best suited for our dataset. For this, we will analyse how sampling depth impacts within-sample diversity estimates (= alpha diversity) with the alpha-rarefaction action. This action generates interactive alpha rarefaction curves for sequencing depths between min_depth and max_depth and computes 10 (default) rarefied tables with corresponding alpha diversity metrics at each sampling depth step.

In [3]:
# Generates alpha rarefaction curves
! qiime diversity alpha-rarefaction \
    --i-table $data_dir/closed_reference_cluster/table-filtered.qza \
    --p-max-depth 10000 \
    --m-metadata-file $data_dir/pundemic_metadata.tsv \
    --o-visualization $data_dir_div/alpha_rarefaction.qzv

#   --i-phylogeny $data_dir_phyl/fasttree_tree_rooted.qza \

[32mSaved Visualization to: data/alpha_diversity/alpha_rarefaction.qzv[0m
[0m

In [4]:
Visualization.load(f"{data_dir_div}/alpha_rarefaction.qzv")

## Diversity Analysis
Applies a collection of diversity metrics (non-phylogenetic) to a feature
table. For alpha diversity three metrics are important: 
- Shannon Entropy: a quantitative measure of community richness
- Pielou Evenness: a measure of community evenness
- observed features: a quantitative measure of community richness, called “observed OTUs” here for historical reasons

In [5]:
! qiime diversity core-metrics \
  --i-table $data_dir/closed_reference_cluster/table-filtered.qza \
  --m-metadata-file $data_dir/pundemic_metadata.tsv \
  --p-sampling-depth 3000 \
  --p-n-jobs 8 \
  --output-dir $data_dir_div/core_metrics_results

[32mSaved FeatureTable[Frequency] to: data/alpha_diversity/core_metrics_results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: data/alpha_diversity/core_metrics_results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/alpha_diversity/core_metrics_results/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: data/alpha_diversity/core_metrics_results/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: data/alpha_diversity/core_metrics_results/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/jaccard_emperor.qzv[0m
[32mSaved Visualization to: data/alpha_diversity/core_me

In [6]:
metadata = pd.read_csv(f"{data_dir}/pundemic_metadata.tsv", sep='\t')

## Pairwise difference comparisons between time points
When microbial data are collected at different timepoints, it is useful to examine dynamic changes in the microbial communities (longitudinal analysis). This Pairwise difference test determines whether the value of a specific metric changed significantly between pairs of paired samples (e.g., pre- and post-treatment).

In [7]:
metadata['subgroup_response'] = metadata['disease_subgroup'] + "_" + metadata['blinded_clinical_response']

# Change all Unknown with NAs so it does not occur in the plots as separate box
metadata.age = metadata.age.replace('Unknown', np.nan)
metadata.sex = metadata.sex.replace('Unknown', np.nan)

# Differentiate healed and healthy and check for significance between healthy, healed, disease
metadata['disease_status'] = metadata['group']
metadata.loc[(metadata.time_point == 'post-treatment') & (metadata.blinded_clinical_response == 'Res'), 'disease_status'] = 'Healed'

metadata.to_csv(f'{data_dir}/pundemic_metadata_subgroup_response_all.tsv', sep='\t', index=False)

In [8]:
! qiime tools export \
  --input-path $data_dir_div/core_metrics_results/shannon_vector.qza \
  --output-path $data_dir_div/shannon

[32mExported data/alpha_diversity/core_metrics_results/shannon_vector.qza as AlphaDiversityDirectoryFormat to directory data/alpha_diversity/shannon[0m
[0m

In [9]:
! qiime tools export \
  --input-path $data_dir_div/core_metrics_results/evenness_vector.qza \
  --output-path $data_dir_div/evenness

[32mExported data/alpha_diversity/core_metrics_results/evenness_vector.qza as AlphaDiversityDirectoryFormat to directory data/alpha_diversity/evenness[0m
[0m

In [10]:
! qiime tools export \
  --input-path $data_dir_div/core_metrics_results/observed_features_vector.qza \
  --output-path $data_dir_div/observed_features

[32mExported data/alpha_diversity/core_metrics_results/observed_features_vector.qza as AlphaDiversityDirectoryFormat to directory data/alpha_diversity/observed_features[0m
[0m

In [11]:
shannon = pd.read_csv(f"{data_dir_div}/shannon/alpha-diversity.tsv", sep='\t')
evenness = pd.read_csv(f"{data_dir_div}/evenness/alpha-diversity.tsv", sep='\t')
observed_features = pd.read_csv(f"{data_dir_div}/observed_features/alpha-diversity.tsv", sep='\t')

In [12]:
shannon.rename(columns = {shannon.columns[0]: "id"}, inplace = True)
evenness.rename(columns = {evenness.columns[0]: "id"}, inplace = True)
observed_features.rename(columns = {observed_features.columns[0]: "id"}, inplace = True)

In [13]:
metrics = pd.merge(shannon, evenness, on = "id")
metrics = pd.merge(metrics, observed_features, on = "id")
metrics.head()

Unnamed: 0,id,shannon_entropy,pielou_evenness,observed_features
0,SRR10505051,1.317129,0.396495,10
1,SRR10505052,0.919479,0.224951,17
2,SRR10505053,0.231316,0.06251,13
3,SRR10505056,1.89284,0.453927,18
4,SRR10505057,1.562509,0.492917,9


In [14]:
metadata = pd.merge(metadata, metrics, on = "id")
metadata.head()

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,disease_subgroup,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,subgroup_response,disease_status,shannon_entropy,pielou_evenness,observed_features
0,SRR10505051,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,post-treatment,Placebo_NR,Puns,1.317129,0.396495,10
1,SRR10505052,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,pre-treatment,Placebo_NR,Puns,0.919479,0.224951,17
2,SRR10505053,1045,29,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,Res,6.0,0.0,pre-treatment,Placebo_Res,Puns,0.231316,0.06251,13
3,SRR10505056,1044,34,male,Indian Subcontinental,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,,4.0,,post-treatment,,Puns,1.89284,0.453927,18
4,SRR10505057,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,9.0,6.0,post-treatment,FMT_NR,Puns,1.562509,0.492917,9


In [16]:
# Account for sex bias
metadata['shannon_sex_bias'] = metadata['shannon_entropy']

mean_female = np.mean(metadata.loc[metadata.sex == 'female', 'shannon_sex_bias'])
std_female = np.std(metadata.loc[metadata.sex == 'female', 'shannon_sex_bias'])

mean_male = np.mean(metadata.loc[metadata.sex == 'male', 'shannon_sex_bias'])
std_male = np.std(metadata.loc[metadata.sex == 'male', 'shannon_sex_bias'])

#normalization: z-scores for males and females separately
metadata.loc[metadata.sex == 'female', 'shannon_sex_bias'] = (metadata.loc[metadata.sex == 'female', 'shannon_sex_bias'] - mean_female)/std_female
metadata.loc[metadata.sex == 'male', 'shannon_sex_bias'] = (metadata.loc[metadata.sex == 'male', 'shannon_sex_bias'] - mean_male)/std_male

metadata.head()

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,...,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,subgroup_response,disease_status,shannon_entropy,pielou_evenness,observed_features,shannon_sex_bias
0,SRR10505051,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,9.0,8.0,post-treatment,Placebo_NR,Puns,1.317129,0.396495,10,0.012471
1,SRR10505052,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,9.0,8.0,pre-treatment,Placebo_NR,Puns,0.919479,0.224951,17,-0.55128
2,SRR10505053,1045,29,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,Res,6.0,0.0,pre-treatment,Placebo_Res,Puns,0.231316,0.06251,13,-1.31191
3,SRR10505056,1044,34,male,Indian Subcontinental,Europe,Switzerland,Zurich,Zurich,Puns,...,,4.0,,post-treatment,,Puns,1.89284,0.453927,18,1.183356
4,SRR10505057,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,9.0,6.0,post-treatment,FMT_NR,Puns,1.562509,0.492917,9,0.360348


#### Note**

There was only one patient with a placebo response in our metadata, however the post-treatment sample got filtered out at the 3000 cut in the rarefaction step:

The state "post-treatment" is not represented by any members of Placebo_Res group in metadata. Filter out the "subgroup_response != "Placebo_Res""

In [17]:
metadata_filtered = metadata[metadata.subgroup_response.notna()]
metadata_filtered = metadata_filtered[metadata_filtered.subgroup_response != "Placebo_Res"]
metadata_filtered.to_csv(f'{data_dir}/pundemic_metadata_subgroup_response.tsv', sep='\t', index=False)
metadata_filtered

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,...,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,subgroup_response,disease_status,shannon_entropy,pielou_evenness,observed_features,shannon_sex_bias
0,SRR10505051,1048,36.0,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,9.0,8.0,post-treatment,Placebo_NR,Puns,1.317129,0.396495,10,0.012471
1,SRR10505052,1048,36.0,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,9.0,8.0,pre-treatment,Placebo_NR,Puns,0.919479,0.224951,17,-0.55128
4,SRR10505057,1043,35.0,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,9.0,6.0,post-treatment,FMT_NR,Puns,1.562509,0.492917,9,0.360348
5,SRR10505058,1043,35.0,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,9.0,6.0,pre-treatment,FMT_NR,Puns,1.983531,0.444795,22,0.957233
7,SRR10505061,2212,27.0,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,8.0,4.0,pre-treatment,FMT_NR,Puns,1.050672,0.374257,7,-0.081407
8,SRR10505062,1041,39.0,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,Res,6.0,0.0,post-treatment,FMT_Res,Healed,1.285828,0.387073,10,-0.031904
9,SRR10505063,1041,39.0,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,Res,6.0,0.0,pre-treatment,FMT_Res,Puns,1.778664,0.3932,23,0.666792
10,SRR10505064,1040,52.0,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,7.0,7.0,pre-treatment,Placebo_NR,Puns,0.496077,0.213649,5,-0.914293
11,SRR10505065,1040,52.0,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,7.0,7.0,post-treatment,Placebo_NR,Puns,2.124552,0.441938,28,1.531339
12,SRR10505066,1038,35.0,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,...,NR,6.0,2.0,pre-treatment,FMT_NR,Puns,1.38149,0.314524,21,0.415413


### Shannon Entropy

In [18]:
# Compares Shannon entropy differences for pre and post
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric shannon_entropy \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/shannon-pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/shannon-pairwise-differences.qzv[0m
[0m

In [19]:
Visualization.load(f"{data_dir_div}/shannon-pairwise-differences.qzv")

In [20]:
# Account for sex bias
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric shannon_sex_bias \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/shannon-sex_bias-pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/shannon-sex_bias-pairwise-differences.qzv[0m
[0m

In [21]:
Visualization.load(f"{data_dir_div}/shannon-sex_bias-pairwise-differences.qzv")

### Pielou Evenness

In [22]:
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric pielou_evenness \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/evenness-pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/evenness-pairwise-differences.qzv[0m
[0m

In [23]:
Visualization.load(f"{data_dir_div}/evenness-pairwise-differences.qzv")

### Observed Features

In [24]:
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric observed_features \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/observed_features-pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/observed_features-pairwise-differences.qzv[0m
[0m

In [25]:
Visualization.load(f"{data_dir_div}/observed_features-pairwise-differences.qzv")

#### Preliminary result:

There are no significant differences between the different groups ("Pairwise group comparison tests"). The same thing can be concluded from the Pairwise difference tests, which states that there are no significant differences between the pre- and post-treatment samples.

## Diversity differences between categorical metadata 
The rarefied SampleData[AlphaDiversity] artifact produced in the above step contains univariate, continuous values and can be tested using common non‐parametric statistical test (e.g., Kruskal‐Wallis test)

In [26]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/shannon_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata.tsv \
  --o-visualization $data_dir_div/core_metrics_results/no_healed_shannon_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/no_healed_shannon_vector.qzv[0m
[0m

In [27]:
Visualization.load(f"{data_dir_div}/core_metrics_results/no_healed_shannon_vector.qzv")

### Differentiating between Healthy, Healed and Puns

In [28]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/shannon_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/core_metrics_results/shannon_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/shannon_vector.qzv[0m
[0m

In [29]:
Visualization.load(f"{data_dir_div}/core_metrics_results/shannon_vector.qzv")

In [30]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/evenness_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/core_metrics_results/evenness_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/evenness_vector.qzv[0m
[0m

In [34]:
Visualization.load(f"{data_dir_div}/core_metrics_results/evenness_vector.qzv")

In [32]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/observed_features_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/core_metrics_results/observed_features_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/observed_features_vector.qzv[0m
[0m

In [35]:
Visualization.load(f"{data_dir_div}/core_metrics_results/observed_features_vector.qzv")

One important confounding factor here is that we are simultaneously analyzing our samples across all time points and in doing so potentially losing meaningful signals at a particular time point. Importantly, having more than one time point per subject also violates the assumption of the Kurskal‐Wallis test that all samples are independent. More appropriate methods that take into account repeated measurements from the same samples are demonstrated in the longitudinal paiwise data analysis section above.

## Alpha Correlation of numeric data (puns_per_hour)

In [36]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/core_metrics_results/shannon_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/shannon_alpha_correlation.qzv

[32mSaved Visualization to: data/alpha_diversity/shannon_alpha_correlation.qzv[0m
[0m

In [39]:
Visualization.load(f"{data_dir_div}/shannon_alpha_correlation.qzv")

In [37]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/core_metrics_results/evenness_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/evenness_alpha_correlation.qzv

[32mSaved Visualization to: data/alpha_diversity/evenness_alpha_correlation.qzv[0m
[0m

In [40]:
Visualization.load(f"{data_dir_div}/evenness_alpha_correlation.qzv")

In [38]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/core_metrics_results/observed_features_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response_all.tsv \
  --o-visualization $data_dir_div/observed_features_alpha_correlation.qzv

[32mSaved Visualization to: data/alpha_diversity/observed_features_alpha_correlation.qzv[0m
[0m

In [41]:
Visualization.load(f"{data_dir_div}/observed_features_alpha_correlation.qzv")

# Conclusion

- significant difference between healthy and disease people BUT healed and healthy are seen as one group
    - add metadata columns: healed, healthy, diseased and do significance test
    - we saw a significant difference between healthy and puns but not healed and healthy
    - then pairwise testing: no significant difference
    
- We noticed a significant difference between sexes. This could lead to a bias in the analysis.
    - new metadata column for shannon with z-normalized by sex data
    - then pairwise testing: no significant difference 
    