Code to analyze the Beta diversity

In [9]:
# importing all required packages & notebook extensions at the start of the notebook
import os
import pandas as pd
import qiime2 as q2
from skbio import OrdinationResults
from qiime2 import Visualization
from seaborn import scatterplot

%matplotlib inline

In [8]:
#all variables
Data_raw='Data/raw'
Data_classified='Data/classified'
Data_diversity='Data/diversity'

The following two codes were run on Euler due to little memory capacity on Jupyterhub

In [None]:
#not sure
! qiime diversity core-metrics \
  --i-table $Data_classified/table-filtered.qza \
  --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv\
  --p-sampling-depth 3000 \
  --output-dir $Data_diversity/core-metrics-results

this was submitted as a job

In [None]:
#not sure
! qiime diversity core-metrics \
  --i-table $Data_classified/table-filtered.qza \
  --m-metadata-file $Data_raw/20250914_metadata_personal_environmental_sensory_details.tsv\
  --p-sampling-depth 3000 \
  --output-dir $Data_diversity/core-metrics-results-p-e-s

## Creating the necessary files  
Before being able to create the kmerizer results, the metadata personel, environmental & sensory needed to be adjusted, so that the first column would be the sample ID

In [21]:
!awk -F'\t' 'NR==FNR { if (FNR>1) { pid2sid[$8]=$1 } next } NR!=FNR { if (FNR==1) { print "sample ID\t"$0; next } if ($1 in pid2sid) { print pid2sid[$1] "\t" $0 } }' Data/raw/20250913_metadata_ITS.tsv Data/raw/20250914_metadata_personal_environmental_sensory_details.tsv > merged_output.tsv


The following code was then submitted as a job on Euler, due to too little memory capacity on Jupyterhub

In [None]:
#!/bin/bash
#SBATCH --job-name=beta
#SBATCH --time=04:00:00
#SBATCH --mem-per-cpu=32GB
#SBATCH --cpus-per-task=4
#SBATCH --output=beta.log
source /cluster/home/nschwager/miniconda3/etc/profile.d/conda.sh
source ~/.bashrc
conda activate qiime2-amplicon-2025.10
qiime kmerizer core-metrics \
  --i-table /cluster/scratch/nschwager/Input/table-filtered.qza \
  --i-sequences /cluster/scratch/nschwager/In/rep-seqs-filtered.qza \
  --m-metadata-file /cluster/scratch/nschwager/Ein/merged.tsv \
  --p-sampling-depth 3000 \
  --output-dir /cluster/scratch/nschwager/Input/kmerizer-results-p-e-s

## Analysis of Metadata ITS

In [4]:
Visualization.load(f"{Data_diversity}/kmerizer-results/scatterplot.qzv")

- Hand swabs and sourdough communities show different sets of fungis and different relative abundance
- there appears to be no difference between right & left hand
- there appears to be some clustering of plate P1-P4 and P5-P7 as well as for the DNA Extraction plate DNA55-DNA58 and DNA59-DNA61 and mostly between the projects highschool & highschool_hs

In [6]:
Visualization.load(f"{Data_diversity}/core-metrics-results/bray_curtis_emperor.qzv")

**Comparison of the project**  
- Comparison of the projects highschool & highschool_hs shows a significant difference in the compositional similarity (bray curtis: p & q value of 0.001 and pseudo F-value of 302.679368)  
- similar results for the jaccard metric: p & q value of 0.001 and pseudo F-value of 136.490833 which indicates a high proportion of features that are not shared between the highschools

Bray-curtis

In [7]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column project \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results/bray_curtis-project-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results/bray_curtis-project-significance.qzv[0m
[0m[?25h

In [8]:
Visualization.load(f"{Data_diversity}/kmerizer-results/bray_curtis-project-significance.qzv")

Jaccard

In [9]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column project \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results/jaccard-project-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results/jaccard-project-significance.qzv[0m
[0m[?25h

In [10]:
Visualization.load(f"{Data_diversity}/kmerizer-results/jaccard-project-significance.qzv")

**Comparison of sample_type**  
- The pairwise permanova results for the Bray curtis metric show a significant difference between the hand swabs and the sourdough with a p value of 0.001, a q value of 0.002 and a pseudo F-value of 309.080532  
- The pariwise permanova results for the Jaccard metric point in the same direction as the sourdough and hand swabs comparison have a p-value of 0.001, q value of 0.001429 and a pseudo-F value of 145.174588

Bray curtis

In [11]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column sample_type \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results/bray_curtis-sample_type-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results/bray_curtis-sample_type-significance.qzv[0m
[0m[?25h

In [12]:
Visualization.load(f"{Data_diversity}/kmerizer-results/bray_curtis-sample_type-significance.qzv")

Jaccard

In [13]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column sample_type \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results/jaccard-sample_type-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results/jaccard-sample_type-significance.qzv[0m
[0m[?25h

In [14]:
Visualization.load(f"{Data_diversity}/kmerizer-results/jaccard-sample_type-significance.qzv")

**Comparison of hand**  
- as already assumed from the emperor visualization there is no statistical significant difference between the fungal composition of the right and left hand  
- Bray-Curtis: p-value: 0.766, q-value: 0.766, pseudo F-value: 0.746564  
- Jaccard: p & q value: 0.297 and pseude F-value: 1.024582

Bray-Curtis

In [15]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column hand \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results/bray_curtis-hand-significance.qzv

Visualization.load(f"{Data_diversity}/kmerizer-results/bray_curtis-hand-significance.qzv")

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results/bray_curtis-hand-significance.qzv[0m
[0m[?25h

Jaccard

In [16]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column hand \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results/jaccard-hand-significance.qzv

Visualization.load(f"{Data_diversity}/kmerizer-results/jaccard-hand-significance.qzv")

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results/jaccard-hand-significance.qzv[0m
[0m[?25h

## Analysis of Metadata personal, environmental & sensory details

In [5]:
Visualization.load(f"{Data_diversity}/kmerizer-results-p-e-s/scatterplot.qzv")

In [17]:
Visualization.load(f"{Data_diversity}/core-metrics-results-p-e-s/bray_curtis_emperor.qzv")

**Comparison of background**  
- Non significant difference of background sterile/non sterile  
- Bray curtis: p & q value: 0.072, pseudo F-value: 1.525025  
- Jaccard: p- & q value: 0.047, pseudo F-value: 1.218399

Bray curtis

In [25]:
# Filter distance matrix to only include samples in metadata p-e-s
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_diversity/merged_output.tsv \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-p-e-s/bray_curtis_filtered.qza

!qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/bray_curtis_filtered.qza \
    --m-metadata-file $Data_diversity/merged_output.tsv \
    --m-metadata-column background \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-p-e-s/bray_curtis-background-significance.qzv

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-p-e-s/bray_curtis_filtered.qza[0m
  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-p-e-s/bray_curtis-background-significance.qzv[0m
[0m[?25h

In [26]:
Visualization.load(f"{Data_diversity}/kmerizer-results-p-e-s/bray_curtis-background-significance.qzv")

Jaccard

In [23]:
# Filter distance matrix to only include samples in metadata p-e-s
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_diversity/merged_output.tsv \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_filtered.qza

!qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_filtered.qza \
    --m-metadata-file $Data_diversity/merged_output.tsv \
    --m-metadata-column background \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-p-e-s/jaccard-background-significance.qzv

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-p-e-s/jaccard_filtered.qza[0m
  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-p-e-s/jaccard-background-significance.qzv[0m
[0m[?25h

In [24]:
Visualization.load(f"{Data_diversity}/kmerizer-results-p-e-s/jaccard-background-significance.qzv")

## Multivariate PERMANOVA test

To conduct it, metadata with no values needed to be filtered out in the metadata and in the distance matrix before further testing was possible

In [49]:
meta = pd.read_csv("Data/diversity/merged_output.tsv", sep="\t")

#removing rows without ethical agreement
rows_to_drop = [6, 15, 18, 29, 31, 35, 36, 37, 42, 44, 45, 47, 50, 51, 52]

meta_clean = meta.drop(index=[i-1 for i in rows_to_drop])

meta_clean.to_csv("Data/diversity/merged_output_ethical_agreement.tsv", sep="\t", index=False)

**latitude & longitude**  
- there is a statistical significane along longitude (p=0.001) of the microbial diversity for the Jaccard  
- not confirmed through Bray-Curtis (p=0.091)

Jaccard

In [50]:
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_diversity/merged_output_ethical_agreement.tsv \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_ethical_agreement.qza

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-p-e-s/jaccard_ethical_agreement.qza[0m
[0m[?25h

In [51]:
! qiime diversity adonis \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_ethical_agreement.qza \
    --m-metadata-file $Data_diversity/merged_output_ethical_agreement.tsv \
    --p-formula "latitude*longitude" \
    --o-visualization $Data_diversity/kmerizer-results-p-e-s/jaccard_multi_place.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-p-e-s/jaccard_multi_place.qzv[0m
[0m[?25h

In [52]:
Visualization.load(f"{Data_diversity}/kmerizer-results-p-e-s/jaccard_multi_place.qzv")

Bray Curtis

In [53]:
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_diversity/merged_output_ethical_agreement.tsv \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-p-e-s/bray_curtis_ethical_agreement.qza

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-p-e-s/bray_curtis_ethical_agreement.qza[0m
[0m[?25h

In [54]:
! qiime diversity adonis \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/bray_curtis_ethical_agreement.qza \
    --m-metadata-file $Data_diversity/merged_output_ethical_agreement.tsv \
    --p-formula "latitude*longitude" \
    --o-visualization $Data_diversity/kmerizer-results-p-e-s/bray_curtis_multi_place.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-p-e-s/bray_curtis_multi_place.qzv[0m
[0m[?25h

In [55]:
Visualization.load(f"{Data_diversity}/kmerizer-results-p-e-s/bray_curtis_multi_place.qzv")

**plants, pH & TTA**  
- none of these have an influence on diversity alone and as interactions (can be seen as p-Value is 1)

In [45]:
meta = pd.read_csv("Data/diversity/merged_output_ethical_agreement.tsv", sep="\t")
#removing rows without values for pH
rows_to_drop = [4, 9, 13, 24, 28, 29, 33, 36]

meta_clean = meta.drop(index=[i-1 for i in rows_to_drop])

meta_clean.to_csv("Data/diversity/merged_output_plants_pH.tsv", sep="\t", index=False)

In [46]:
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_diversity/merged_output_plants_pH.tsv \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_filtered_plants_pH.qza

  import pkg_resources
  series = series.replace('', np.nan).infer_objects(copy=False)
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-p-e-s/jaccard_filtered_plants_pH.qza[0m
[0m[?25h

In [47]:
! qiime diversity adonis \
    --i-distance-matrix $Data_diversity/kmerizer-results-p-e-s/jaccard_filtered_plants_pH.qza \
    --m-metadata-file $Data_diversity/merged_output_plants_pH.tsv \
    --p-formula "plants*day7_pH*day14_pH*day21_pH*day7_TTA*day14_TTA*day21_TTA" \
    --o-visualization $Data_diversity/kmerizer-results-p-e-s/jaccard_multi_plants.qzv

  import pkg_resources
  series = series.replace('', np.nan).infer_objects(copy=False)
[32mSaved Visualization to: Data/diversity/kmerizer-results-p-e-s/jaccard_multi_plants.qzv[0m
[0m[?25h

In [48]:
Visualization.load(f"{Data_diversity}/kmerizer-results-p-e-s/jaccard_multi_plants.qzv")