# Alpha Diversity Analysis

## Import Libraries

In [1]:
import os
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization

%matplotlib inline

In [2]:
data_dir = 'data'
data_dir_phyl = 'data/phylogeny'
data_dir_div = 'data/alpha_diversity'

## Alpha Rarefaction

To perform rarefaction, we first need to decide which sampling depth is best suited for our dataset. For this, we will analyse how sampling depth impacts within-sample diversity estimates (= alpha diversity) with the alpha-rarefaction action. This action generates interactive alpha rarefaction curves for sequencing depths between min_depth and max_depth and computes 10 (default) rarefied tables with corresponding alpha diversity metrics at each sampling depth step.

In [8]:
! qiime diversity alpha-rarefaction \
    --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
    --p-max-depth 10000 \
    --m-metadata-file $data_dir/pundemic_metadata.tsv \
    --o-visualization $data_dir_div/alpha_rarefaction.qzv

#   --i-phylogeny $data_dir_phyl/fasttree_tree_rooted.qza \

[32mSaved Visualization to: data/alpha_diversity/alpha_rarefaction.qzv[0m
[0m

In [9]:
Visualization.load(f"{data_dir_div}/alpha_rarefaction.qzv")

## Diversity Analysis

In [13]:
#if error occurs we need to delete old core_metrics_results directory so it can create a new directory 
#this code snipped was needed if we had a phylogenetic tree
#! qiime diversity core-metrics \
#  --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
#  --i-phylogeny $data_dir_phyl/fasttree_tree_rooted.qza \
#  --m-metadata-file $data_dir/pundemic_metadata.tsv \
#  --p-sampling-depth 3000 \
#  --output-dir $data_dir_div/core_metrics_results

In [15]:
! qiime diversity alpha \
  --i-table $data_dir/closed_reference_cluster/cr90_feature_table.qza \
  --p-metric observed_features \
  --o-alpha-diversity $data_dir_div/alpha_diversity.qza

[32mSaved SampleData[AlphaDiversity] to: data/alpha_diversity/alpha_diversity.qza[0m
[0m

Test the associations between categorical metadata columns and the Faith Phylogenetic Diversity (a measure of community richness) metric using the qiime diversity alpha-group-significance method (implementation of a one-way ANOVA method, namely Kruskal-Wallis test):

In [16]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir_div/alpha_diversity.qza \
  --m-metadata-file $data_dir/pundemic_metadata.tsv \
  --o-visualization $data_dir_div/alpha_diversity.qzv

[32mSaved Visualization to: data/alpha_diversity/alpha_diversity.qzv[0m
[0m

In [17]:
Visualization.load(f"{data_dir_div}/alpha_diversity.qzv")

Next, we will test whether numeric sample metadata columns are correlated with microbial community richness by using the qiime diversity alpha-correlation method (implementation of Spearman correlation):

In [8]:
#make metadata columns numeric
metadata = pd.read_csv(f"{data_dir}/pundemic_metadata.tsv", sep='\t')
metadata.dtypes

id                               object
patient_id                       object
age                              object
sex                              object
ethnicity                        object
continent                        object
country                          object
region                           object
city                             object
group                            object
disease_subgroup                 object
blinded_clinical_response        object
puns_per_hour_pre_treatment     float64
puns_per_hour_post_treatment    float64
time_point                       object
dtype: object

In [9]:
metadata['age'] = pd.to_numeric(metadata['age'], errors = 'coerce')
metadata['patient_id'] = pd.to_numeric(metadata['patient_id'], errors = 'coerce')
metadata.dtypes

id                               object
patient_id                      float64
age                             float64
sex                              object
ethnicity                        object
continent                        object
country                          object
region                           object
city                             object
group                            object
disease_subgroup                 object
blinded_clinical_response        object
puns_per_hour_pre_treatment     float64
puns_per_hour_post_treatment    float64
time_point                       object
dtype: object

In [10]:
metadata.to_csv(f'{data_dir}/pundemic_metadata_numeric.tsv', sep='\t', index=False)

In [11]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir_div/core_metrics_results/faith_pd_vector.qza \
  --m-metadata-file $data_dir/pundemic_metadata_numeric.tsv \
  --o-visualization  $data_dir_div/core_metrics_results/faith_pd_group_significance_numeric.qzv

[33mQIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.[0m
[32mSaved Visualization to: data/diversity/core_metrics_results/faith_pd_group_significance_numeric.qzv[0m
[0m

In [12]:
Visualization.load(f"{data_dir_div}/core_metrics_results/faith_pd_group_significance_numeric.qzv")