# Alpha Diversity Analysis

## Import Libraries

In [77]:
import os
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization

%matplotlib inline

In [78]:
data_dir = 'data'
data_dir_phyl = 'data/phylogeny'
data_dir_div = 'data/alpha_diversity'

## Alpha Rarefaction

To perform rarefaction, we first need to decide which sampling depth is best suited for our dataset. For this, we will analyse how sampling depth impacts within-sample diversity estimates (= alpha diversity) with the alpha-rarefaction action. This action generates interactive alpha rarefaction curves for sequencing depths between min_depth and max_depth and computes 10 (default) rarefied tables with corresponding alpha diversity metrics at each sampling depth step.

In [57]:
! qiime diversity alpha-rarefaction \
    --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
    --p-max-depth 10000 \
    --m-metadata-file $data_dir/pundemic_metadata.tsv \
    --o-visualization $data_dir_div/alpha_rarefaction.qzv

#   --i-phylogeny $data_dir_phyl/fasttree_tree_rooted.qza \

^C

Aborted!


In [54]:
Visualization.load(f"{data_dir_div}/alpha_rarefaction.qzv")

## Diversity Analysis

In [79]:
! qiime diversity core-metrics \
  --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
  --m-metadata-file $data_dir/pundemic_metadata.tsv \
  --p-sampling-depth 3000 \
  --p-n-jobs 8 \
  --output-dir $data_dir_div/core_metrics_results

[32mSaved FeatureTable[Frequency] to: data/alpha_diversity/core_metrics_results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/core_metrics_results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: data/alpha_diversity/core_metrics_results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/alpha_diversity/core_metrics_results/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: data/alpha_diversity/core_metrics_results/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: data/alpha_diversity/core_metrics_results/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/jaccard_emperor.qzv[0m
[32mSaved Visualization to: data/alpha_diversity/core_me

In [80]:
metadata = pd.read_csv(f"{data_dir}/pundemic_metadata.tsv", sep='\t')

## Pairwise difference comparisons
Pairwise difference tests determine whether the value of a specific metric changed significantly between pairs of paired samples (e.g., pre- and post-treatment).

In [81]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/core_metrics_results/shannon_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata.tsv \
  --o-visualization $data_dir_div/core_metrics_results/shannon_vector.qzv

[32mSaved Visualization to: data/alpha_diversity/core_metrics_results/shannon_vector.qzv[0m
[0m

In [82]:
Visualization.load(f"{data_dir_div}/core_metrics_results/shannon_vector.qzv")

In [83]:
! qiime tools export \
  --input-path $data_dir_div/core_metrics_results/shannon_vector.qza \
  --output-path $data_dir_div/shannon_vector

[32mExported data/alpha_diversity/core_metrics_results/shannon_vector.qza as AlphaDiversityDirectoryFormat to directory data/alpha_diversity/shannon_vector[0m
[0m

In [84]:
shannon = pd.read_csv(f"{data_dir_div}/shannon_vector/alpha-diversity.tsv", sep='\t')

In [85]:
shannon.rename(columns = {shannon.columns[0]: "id"}, inplace = True)

In [86]:
metadata = pd.merge(metadata, shannon, on = "id")
metadata.head()

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,disease_subgroup,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,shannon_entropy
0,SRR10505051,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,post-treatment,1.267043
1,SRR10505052,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,pre-treatment,0.971329
2,SRR10505053,1045,29,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,Res,6.0,0.0,pre-treatment,0.312109
3,SRR10505056,1044,34,male,Indian Subcontinental,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,,4.0,,post-treatment,2.111094
4,SRR10505057,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,9.0,6.0,post-treatment,1.607049


In [87]:
metadata['subgroup_response'] = metadata['disease_subgroup'] + "_" + metadata['blinded_clinical_response']
metadata[metadata.patient_id == "1045"]

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,disease_subgroup,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,shannon_entropy,subgroup_response
2,SRR10505053,1045,29,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,Res,6.0,0.0,pre-treatment,0.312109,Placebo_Res


There was only one patient with placebo response in our metadata, where sadly the post-treatment sample got filtered out at the 3000 cut in the rarefaction step.

State post-treatment is not represented by any members of Placebo_Res group in metadata. Consider using a different group_column or state value.
So we need to filter out the "subgroup_response != "Placebo_Res""

In [89]:
metadata_filtered = metadata[metadata.subgroup_response.notna()]
metadata_filtered = metadata_filtered[metadata_filtered.subgroup_response != "Placebo_Res"]
metadata_filtered.to_csv(f'{data_dir}/pundemic_metadata_subgroup_response.tsv', sep='\t', index=False)
metadata_filtered

Unnamed: 0,id,patient_id,age,sex,ethnicity,continent,country,region,city,group,disease_subgroup,blinded_clinical_response,puns_per_hour_pre_treatment,puns_per_hour_post_treatment,time_point,shannon_entropy,subgroup_response
0,SRR10505051,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,post-treatment,1.267043,Placebo_NR
1,SRR10505052,1048,36,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,pre-treatment,0.971329,Placebo_NR
4,SRR10505057,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,9.0,6.0,post-treatment,1.607049,FMT_NR
5,SRR10505058,1043,35,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,9.0,6.0,pre-treatment,1.976768,FMT_NR
7,SRR10505060,1042,40,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,Placebo,NR,9.0,8.0,pre-treatment,1.008220,Placebo_NR
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,SRR10505141,1001,57,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,Res,7.0,2.0,pre-treatment,3.011957,FMT_Res
78,SRR10505142,1001,57,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,Res,7.0,2.0,post-treatment,2.499077,FMT_Res
86,SRR10505153,2225,34,male,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,NR,6.0,5.0,pre-treatment,0.198583,FMT_NR
87,SRR10505154,1024,Unknown,female,Caucasian,Europe,Switzerland,Zurich,Zurich,Puns,FMT,Res,8.0,0.0,post-treatment,2.079506,FMT_Res


In [None]:
! qiime longitudinal pairwise-differences --help

In [90]:
! qiime longitudinal pairwise-differences \
  --m-metadata-file $data_dir/pundemic_metadata_subgroup_response.tsv \
  --p-metric shannon_entropy \
  --p-group-column subgroup_response \
  --p-state-column time_point \
  --p-state-1 pre-treatment \
  --p-state-2 post-treatment \
  --p-individual-id-column patient_id \
  --p-replicate-handling random \
  --o-visualization $data_dir_div/pairwise-differences.qzv

[32mSaved Visualization to: data/alpha_diversity/pairwise-differences.qzv[0m
[0m

In [91]:
Visualization.load(f"{data_dir_div}/pairwise-differences.qzv")

In [15]:
#! qiime diversity alpha \
#  --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
#  --p-metric observed_features \
#  --o-alpha-diversity $data_dir_div/alpha_diversity.qza

[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/alpha_diversity.qza[0m
[0m

Test the associations between categorical metadata columns and the Faith Phylogenetic Diversity (a measure of community richness) metric using the qiime diversity alpha-group-significance method (implementation of a one-way ANOVA method, namely Kruskal-Wallis test):

In [16]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/alpha_diversity.qza \
  --m-metadata-file $data_dir/pundemic_metadata.tsv \
  --o-visualization $data_dir_div/alpha_diversity.qzv

[32mSaved Visualization to: data/alpha_diversity/alpha_diversity.qzv[0m
[0m

In [21]:
Visualization.load(f"{data_dir_div}/alpha_diversity.qzv")

Next, we will test whether numeric sample metadata columns are correlated with microbial community richness by using the qiime diversity alpha-correlation method (implementation of Spearman correlation).

In [9]:
#make metadata columns numeric
metadata.dtypes

id                               object
patient_id                       object
age                              object
sex                              object
ethnicity                        object
continent                        object
country                          object
region                           object
city                             object
group                            object
disease_subgroup                 object
blinded_clinical_response        object
puns_per_hour_pre_treatment     float64
puns_per_hour_post_treatment    float64
time_point                       object
dtype: object

In [6]:
#metadata['age'] = pd.to_numeric(metadata['age'], errors = 'coerce')
metadata['patient_id'] = pd.to_numeric(metadata['patient_id'], errors = 'coerce')
metadata.dtypes

id                               object
patient_id                      float64
age                             float64
sex                              object
ethnicity                        object
continent                        object
country                          object
region                           object
city                             object
group                            object
disease_subgroup                 object
blinded_clinical_response        object
puns_per_hour_pre_treatment     float64
puns_per_hour_post_treatment    float64
time_point                       object
dtype: object

In [7]:
metadata.to_csv(f'{data_dir}/pundemic_metadata_numeric.tsv', sep='\t', index=False)

In [10]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/alpha_diversity.qza \
  --m-metadata-file $data_dir/pundemic_metadata_numeric.tsv \
  --o-visualization $data_dir_div/alpha_correlation.qzv

[32mSaved Visualization to: data/alpha_diversity/alpha_correlation.qzv[0m
[0m

In [3]:
Visualization.load(f"{data_dir_div}/alpha_correlation.qzv")