Code to analyze the Beta diversity

In [1]:
# importing all required packages & notebook extensions at the start of the notebook
import os
import pandas as pd
import qiime2 as q2
from skbio import OrdinationResults
from qiime2 import Visualization
from seaborn import scatterplot
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
#all variables
Data_raw='Data/raw'
Data_classified='Data/classified'
Data_diversity='Data/diversity'

<div style="background-color: skyblue; padding: 10px;">
    Titles
    </div>
<div style="background-color: aliceblue; padding: 10px;">
    Results

## Creating the necessary files  
<div style="background-color: skyblue; padding: 10px;">


Adjusting merged_output.tsv to adapt for analysis without problems (no spaces in column titles)

In [3]:
df = pd.read_csv(f'{Data_raw}/merged_output.tsv', sep='\t')
# Keep first column name, modify the rest
new_columns = [df.columns[0]] + [col.replace(' ', '_').replace('/', '_') for col in df.columns[1:]]
df.columns = new_columns
df.to_csv(f'{Data_raw}/merged_output_usable.tsv', sep='\t', index=False)

## Creating the files on all metadata

In [5]:
! qiime diversity core-metrics \
  --i-table $Data_classified/table-filtered.qza \
  --m-metadata-file $Data_raw/merged_output.tsv\
  --p-sampling-depth 3000 \
  --output-dir $Data_diversity/core-metrics-results-merged

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Data/diversity/core-metrics-results-merged/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/core-metrics-results-merged/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/core-metrics-results-merged/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: Data/diversity/core-metrics-results-merged/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: Data/diversity/core-metrics-results-merged/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: Data/diversity/core-metrics-results-merged/jaccard_emperor.qzv[0m
[32mSaved Visualization 

The following code was then submitted as a job on Euler, due to too little memory capacity on Jupyterhub

## Creating files on just sourdough data
First filtering the coresponding tables

In [6]:
!qiime feature-table filter-samples \
  --i-table $Data_classified/table-filtered.qza \
  --m-metadata-file $Data_raw/merged_output.tsv  \
  --p-where "sample_type='sourdough'" \
  --o-filtered-table $Data_classified/table-filtered-sourdough_only.qza

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Data/classified/table-filtered-sourdough_only.qza[0m
[0m[?25h

In [7]:
!qiime feature-table filter-seqs \
  --i-data $Data_classified/rep-seqs-filtered.qza \
  --i-table $Data_classified/table-filtered-sourdough_only.qza \
  --o-filtered-data $Data_classified/rep-seqs-filtered-sourdough_only.qza

  import pkg_resources
[32mSaved FeatureData[Sequence] to: Data/classified/rep-seqs-filtered-sourdough_only.qza[0m
[0m[?25h

Creating the metrics files

In [8]:
! qiime diversity core-metrics \
  --i-table $Data_classified/table-filtered-sourdough_only.qza \
  --m-metadata-file $Data_raw/merged_output.tsv\
  --p-sampling-depth 3000 \
  --output-dir $Data_diversity/core-metrics-results-merged-sourdough-only

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Data/diversity/core-metrics-results-merged-sourdough-only/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged-sourdough-only/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged-sourdough-only/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged-sourdough-only/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/core-metrics-results-merged-sourdough-only/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/core-metrics-results-merged-sourdough-only/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: Data/diversity/core-metrics-results-merged-sourdough-only/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: Data/diversity/core-metrics-results-merged-sourdough-only/bray_curtis_pcoa_results.qza[0m


In [9]:
!qiime kmerizer core-metrics \
  --i-table $Data_classified/table-filtered-sourdough_only.qza \
  --i-sequences $Data_classified/rep-seqs-filtered-sourdough_only.qza \
  --m-metadata-file $Data_raw/merged_output.tsv\
  --p-sampling-depth 3000 \
  --output-dir $Data_diversity/kmerizer-results-merged-sourdough-only

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Data/diversity/kmerizer-results-merged-sourdough-only/rarefied_table.qza[0m
[32mSaved FeatureTable[Frequency] to: Data/diversity/kmerizer-results-merged-sourdough-only/kmer_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/kmerizer-results-merged-sourdough-only/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/kmerizer-results-merged-sourdough-only/shannon_vector.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: Data/diversity/kmerizer-results-merged-sourdough-only/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: Data/diversity/kmerizer-results-merged-sourdough-only/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: Data/diversi

## Creating files on just hand data
Filtering first

In [10]:
!qiime feature-table filter-samples \
  --i-table $Data_classified/table-filtered.qza \
  --m-metadata-file $Data_raw/merged_output.tsv  \
  --p-where "sample_type='hand_swabs'" \
  --o-filtered-table $Data_classified/table-filtered-hand_only.qza

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Data/classified/table-filtered-hand_only.qza[0m
[0m[?25h

In [11]:
!qiime feature-table filter-seqs \
  --i-data $Data_classified/rep-seqs-filtered.qza \
  --i-table $Data_classified/table-filtered-hand_only.qza \
  --o-filtered-data $Data_classified/rep-seqs-filtered-hand_only.qza

  import pkg_resources
[32mSaved FeatureData[Sequence] to: Data/classified/rep-seqs-filtered-hand_only.qza[0m
[0m[?25h

Creating the metric files

In [12]:
! qiime diversity core-metrics \
  --i-table $Data_classified/table-filtered-hand_only.qza \
  --m-metadata-file $Data_raw/merged_output.tsv\
  --p-sampling-depth 3000 \
  --output-dir $Data_diversity/core-metrics-results-merged-hand-only

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Data/diversity/core-metrics-results-merged-hand-only/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged-hand-only/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged-hand-only/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: Data/diversity/core-metrics-results-merged-hand-only/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/core-metrics-results-merged-hand-only/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: Data/diversity/core-metrics-results-merged-hand-only/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: Data/diversity/core-metrics-results-merged-hand-only/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: Data/diversity/core-metrics-results-merged-hand-only/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: Data/diversi

The following job was submitted on Euler, due to too little memory capacity on Jupyterhub

## Filter distance matrix for people who fileld out survey
<div style="background-color: red; padding: 10px;">
is this needed?

In [None]:
df = pd.read_csv("Data/raw/merged_output_usable.tsv", sep="\t")

columns_to_check = ["start_time", "completion_time"]

df_filtered = df.dropna(subset=columns_to_check)

df_filtered.to_csv("Data/diversity/filtered-metadata/meta_survey.tsv", sep="\t", index=False)

In [13]:
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_survey.tsv  \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix_survey.qza

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix_survey.qza[0m
[0m[?25h

In [14]:
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_survey.tsv  \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix_survey.qza

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix_survey.qza[0m
[0m[?25h

## Analysis of whole Metadata ITS
<div style="background-color: skyblue; padding: 10px;">


### Initial plots

In [15]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/scatterplot.qzv")

<div style="background-color: aliceblue; padding: 10px;">
    
- Hand swabs and sourdough communities show different sets of fungis and different relative abundance  
- when doing day as y-axis and bray-curtis as x-axis there seems to be more and more similarities for the sourdoughs to the hand over time  
- there appears to be no difference between right & left hand

In [16]:
Visualization.load(f"{Data_diversity}/core-metrics-results-merged/bray_curtis_emperor.qzv")

## Comparison of the hand & sourdough environment

### Filtering just on hand & sourdough

In [44]:
df = pd.read_csv("Data/raw/merged_output_usable.tsv", sep="\t")

major_types = ["hand_swabs", "sourdough"]

df_filtered = df[df["sample_type"].isin(major_types)]

df_filtered.to_csv("Data/diversity/filtered-metadata/meta_handdough.tsv",sep="\t", index=False)

In [45]:
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough.tsv  \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-merged/bray_curtis_distance_matrix_handdough.qza

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-merged/bray_curtis_distance_matrix_handdough.qza[0m
[0m[?25h

In [46]:
!qiime diversity filter-distance-matrix \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough.tsv  \
    --o-filtered-distance-matrix $Data_diversity/kmerizer-results-merged/jaccard_distance_matrix_handdough.qza

  import pkg_resources
[32mSaved DistanceMatrix to: Data/diversity/kmerizer-results-merged/jaccard_distance_matrix_handdough.qza[0m
[0m[?25h

### Checking difference between hand & sourdough

Bray-curtis

In [51]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged/bray_curtis_distance_matrix_handdough.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough.tsv \
    --m-metadata-column sample_type \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-merged/bray_curtis-sample_type-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/bray_curtis-sample_type-significance.qzv[0m
[0m[?25h

In [52]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/bray_curtis-sample_type-significance.qzv")

Jaccard

In [53]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged/jaccard_distance_matrix_handdough.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough.tsv \
    --m-metadata-column sample_type \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-merged/jaccard-sample_type-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/jaccard-sample_type-significance.qzv[0m
[0m[?25h

In [54]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/jaccard-sample_type-significance.qzv")

<div style="background-color: aliceblue; padding: 10px;">

- Comparison of the sample types (handswabs vs sourdough) shows a significant difference in the composition and abundance (bray curtis: p value of 0.001, q-value of 0.001 & pseudo F value of 253.777534; jaccard metric: p of 0.001, q-value of 0.001 and pseudo F-value of 100.950748) 
- which indicates a high proportion of features that are not shared between hand and sourdough overall

### Comparison of hand & sourdough over time

In [62]:
df = pd.read_csv("Data/raw/merged_output_usable.tsv", sep="\t")

major_types = ["hand_swabs", "sourdough"]
df = df[df["sample_type"].isin(major_types)]

df = df[df["day"].isin([7.0, 21.0])]

df["person_day"] = df["person-id"].astype(str) + "_" + df["day"].astype(str)

df.to_csv("Data/diversity/filtered-metadata/meta_handdough_pairwise.tsv", sep="\t", index=False)

Replicates were handeled as random because hand swabs for left & right were generated & down below shown no statistical significant difference between hand composition & abundance of left & right

In [73]:
!qiime longitudinal pairwise-distances \
  --i-distance-matrix $Data_diversity/kmerizer-results-merged/bray_curtis_distance_matrix_handdough.qza \
  --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough_pairwise.tsv \
  --p-state-column sample_type \
  --p-state-1 hand_swabs \
  --p-state-2 sourdough \
  --p-group-column day \
  --p-individual-id-column person_day \
  --p-replicate-handling random \
  --o-visualization $Data_diversity/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time.qzv[0m
[0m[?25h

In [75]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time.qzv")

In [76]:
!qiime longitudinal pairwise-distances \
  --i-distance-matrix $Data_diversity/kmerizer-results-merged/jaccard_distance_matrix_handdough.qza \
  --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough_pairwise.tsv \
  --p-state-column sample_type \
  --p-state-1 hand_swabs \
  --p-state-2 sourdough \
  --p-group-column day \
  --p-individual-id-column person_day \
  --p-replicate-handling random \
  --o-visualization $Data_diversity/kmerizer-results-merged/jaccard-hand-assimilation-over-time.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/jaccard-hand-assimilation-over-time.qzv[0m
[0m[?25h

In [77]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/jaccard-hand-assimilation-over-time.qzv")

<div style="background-color: aliceblue; padding: 10px;">

- there doesn't seem to be a convergence of hand and sourdough microbiome over time

**Only sterile sourdough**

In [91]:
df = pd.read_csv("Data/raw/merged_output_usable.tsv", sep="\t")

major_types = ["hand_swabs", "sourdough"]
df = df[df["sample_type"].isin(major_types)]

df = df[(df["sample_type"] == "hand_swabs") |
        ((df["sample_type"] == "sourdough") & (df["background"] == "sterile"))]


df = df[df["day"].isin([7.0, 21.0])]

df["person_day"] = df["person-id"].astype(str) + "_" + df["day"].astype(str)

df.to_csv(
    "Data/diversity/filtered-metadata/meta_handdough_pairwise_sterile.tsv",
    sep="\t",
    index=False
)

In [84]:
!qiime longitudinal pairwise-distances \
  --i-distance-matrix $Data_diversity/kmerizer-results-merged/bray_curtis_distance_matrix_handdough.qza \
  --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough_pairwise_sterile.tsv \
  --p-state-column sample_type \
  --p-state-1 hand_swabs \
  --p-state-2 sourdough \
  --p-group-column day \
  --p-individual-id-column person_day \
  --p-replicate-handling random \
  --o-visualization $Data_diversity/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time-sterile.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time-sterile.qzv[0m
[0m[?25h

In [85]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time-sterile.qzv")

In [86]:
!qiime longitudinal pairwise-distances \
  --i-distance-matrix $Data_diversity/kmerizer-results-merged/jaccard_distance_matrix_handdough.qza \
  --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough_pairwise_sterile.tsv \
  --p-state-column sample_type \
  --p-state-1 hand_swabs \
  --p-state-2 sourdough \
  --p-group-column day \
  --p-individual-id-column person_day \
  --p-replicate-handling random \
  --o-visualization $Data_diversity/kmerizer-results-merged/jaccard-hand-assimilation-over-time-sterile.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/jaccard-hand-assimilation-over-time-sterile.qzv[0m
[0m[?25h

In [87]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/jaccard-hand-assimilation-over-time-sterile.qzv")

**Non-sterile background**

In [92]:
df = pd.read_csv("Data/raw/merged_output_usable.tsv", sep="\t")

major_types = ["hand_swabs", "sourdough"]
df = df[df["sample_type"].isin(major_types)]

df = df[(df["sample_type"] == "hand_swabs") |
        ((df["sample_type"] == "sourdough") & (df["background"] == "non-sterile"))]


df = df[df["day"].isin([7.0, 21.0])]

df["person_day"] = df["person-id"].astype(str) + "_" + df["day"].astype(str)

df.to_csv(
    "Data/diversity/filtered-metadata/meta_handdough_pairwise_non-sterile.tsv",
    sep="\t",
    index=False
)

In [94]:
!qiime longitudinal pairwise-distances \
  --i-distance-matrix $Data_diversity/kmerizer-results-merged/bray_curtis_distance_matrix_handdough.qza \
  --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough_pairwise_non-sterile.tsv \
  --p-state-column sample_type \
  --p-state-1 hand_swabs \
  --p-state-2 sourdough \
  --p-group-column day \
  --p-individual-id-column person_day \
  --p-replicate-handling random \
  --o-visualization $Data_diversity/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time-non-sterile.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time-non-sterile.qzv[0m
[0m[?25h

In [95]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/bray-curtis-hand-assimilation-over-time-non-sterile.qzv")

In [96]:
!qiime longitudinal pairwise-distances \
  --i-distance-matrix $Data_diversity/kmerizer-results-merged/jaccard_distance_matrix_handdough.qza \
  --m-metadata-file $Data_diversity/filtered-metadata/meta_handdough_pairwise_non-sterile.tsv \
  --p-state-column sample_type \
  --p-state-1 hand_swabs \
  --p-state-2 sourdough \
  --p-group-column day \
  --p-individual-id-column person_day \
  --p-replicate-handling random \
  --o-visualization $Data_diversity/kmerizer-results-merged/jaccard-hand-assimilation-over-time-non-sterile.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged/jaccard-hand-assimilation-over-time-non-sterile.qzv[0m
[0m[?25h

In [97]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged/jaccard-hand-assimilation-over-time-non-sterile.qzv")

<div style="background-color: aliceblue; padding: 10px;">

- also not just in one background

## Analysis of filtered only Sourdough metadata
<div style="background-color: skyblue; padding: 10px;">


### Initial plots

In [33]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/scatterplot.qzv")

<div style="background-color: aliceblue; padding: 10px;">

- it seems like the background plays less and less a role (xField Bray-Curtis 1 yField day) and it seems to explain most of the difference in composition in the beginning)  
- if day21 aromas = null means, that they were no aromas, then lower pH is associated with more aromas

In [34]:
Visualization.load(f"{Data_diversity}/core-metrics-results-merged-sourdough-only/bray_curtis_emperor.qzv")

<div style="background-color: aliceblue; padding: 10px;">

- strong difference between background sterile & non-sterile
- some aromas on day 28 seem to appear either only on the sterile / non-sterile side (e.g. banana)

## Control that no effect trough different plates

Bray-curtis

In [99]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_raw/merged_output.tsv\
    --m-metadata-column plate \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/bray-curtis-plate.qzv

  import pkg_resources
^C
R[write to console]: 

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/q2cli/util.py", line 275, in get_plugin_manager
    return qiime2.sdk.PluginManager.reuse_existing()
  File "/opt/conda/lib/python3.10/site-packages/qiime2/sdk/plugin_manager.py", line 58, in reuse_existing
    raise UninitializedPluginManagerError
qiime2.sdk.plugin_manager.UninitializedPluginManagerError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/q2cli/click/type.py", line 117, in _convert_input
    result, error = q2cli.util._load_input(value)
  File "/opt/conda/lib/python3.10/site-packages/q2cli/util.py", line 355, in _load_input
    _ = get_plugin_manager()
  File "/opt/conda/lib/python3.10/site-packages/q2cli/util.py", line 287, in get_plugin_manager
    return qiime2.sdk.PluginManager()
  File "/opt/conda/lib/python3.10/site-packages/qiime2/s

In [100]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/bray-curtis-plate.qzv")

Jaccard

In [37]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_raw/merged_output.tsv\
    --m-metadata-column plate \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard-plate.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged-sourdough-only/jaccard-plate.qzv[0m
[0m[?25h

In [38]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/jaccard-plate.qzv")

<div style="background-color: aliceblue; padding: 10px;">
    
- no influence of plate on sourdough composition -> good

### Effect of Background on Sourdough

In [102]:
!qiime longitudinal pairwise-distances \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_raw/merged_output.tsv \
    --p-state-column day \
    --p-group-column background \
    --p-state-1 7.0 \
    --p-state-2 21.0 \
    --p-individual-id-column person-id \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/bray-curtis-background-difference-over-time.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged-sourdough-only/bray-curtis-background-difference-over-time.qzv[0m
[0m[?25h

In [101]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/bray-curtis-background-difference-over-time.qzv")

Trying to plot the distance as a scatterplot to see the distribution of change

In [41]:
effect_background = pd.read_csv(f'{Data_diversity}/kmerizer-results-merged-sourdough-only/pairs-effect-of-background-sourdough.tsv', index_col=0, sep='\t')
effect_background.shape

FileNotFoundError: [Errno 2] No such file or directory: 'Data/diversity/kmerizer-results-merged-sourdough-only/pairs-effect-of-background-sourdough.tsv'

In [None]:
sns.set(rc={'figure.figsize':(3, 4)}, style='white')

with sns.plotting_context("notebook", font_scale=1):
    # seaborn's scatter plot
    ax = sns.scatterplot(
        effect_background, 
        x='Group', 
        y='Distance',
        alpha=0.8,
        color='skyblue')
    
    
    # matplotlib's customization
    ax.set_xlabel('Background')
    ax.set_ylabel('Distance from day 7 to day 21')
    ax.set_ylim((0,1))
    ax.set_xlim(-0.5, 1.5) 

ax.tick_params(axis='y', which='major', left=True)

ax.set_title('Change of Sourdoughs over time, depending on Background (Bray-Curtis)', fontsize=12);

In [None]:
!qiime longitudinal pairwise-distances \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_raw/merged_output.tsv \
    --p-state-column day \
    --p-group-column background \
    --p-state-1 7.0 \
    --p-state-2 21.0 \
    --p-individual-id-column person-id \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard-background-difference-over-time.qzv

In [None]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/jaccard-background-difference-over-time.qzv")

<div style="background-color: aliceblue; padding: 10px;">
    
There is a highly significant difference in change of the sourdough fungal abundance (significant bray-curtis), but not in composition (non-significant jaccard) over time, depending on the background. 
There are samples also for the sterile background that also almost didn't change and some that did change a lot.

**Effect of latitude & longitude**

Bray-curtis

In [None]:
! qiime diversity adonis \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix_survey.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_survey.tsv  \
    --p-formula "latitude*longitude" \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_location.qzv

In [None]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/bray_curtis_location.qzv")

Jaccard

In [None]:
! qiime diversity adonis \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix_survey.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_survey.tsv  \
    --p-formula "latitude*longitude" \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_location.qzv

In [None]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/jaccard_location.qzv")

<div style="background-color: aliceblue; padding: 10px;">

- geographical location doesn't explain difference in abundance & composition of the sourdough's

**Effect of pets**

Fill in NaN's for 0 because adonis can't work with NaN's and we already have just the metadata for people who filled out the survey

In [None]:
df = pd.read_csv("Data/raw/merged_output_usable.tsv", sep="\t")

columns_to_check = ["guinea_pig", "cat", "dog", "turtle", "fish"]

df[columns_to_check] = df[columns_to_check].fillna(0.0)
df.to_csv("Data/diversity/filtered-metadata/meta_pets.tsv", sep="\t", index=False)

Bray curtis (no need to filter distance matrix, as same as before (survey))

In [None]:
! qiime diversity adonis \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_distance_matrix_survey.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_pets.tsv  \
    --p-formula "guinea_pig*cat*dog*turtle*fish" \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/bray_curtis_pets.qzv

In [None]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/bray_curtis_pets.qzv")

Jaccard (no need to filter distance matrix, as same as before (survey))

In [None]:
! qiime diversity adonis \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_distance_matrix_survey.qza \
    --m-metadata-file $Data_diversity/filtered-metadata/meta_pets.tsv  \
    --p-formula "guinea_pig*cat*dog*turtle*fish" \
    --o-visualization $Data_diversity/kmerizer-results-merged-sourdough-only/jaccard_pets.qzv

In [None]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-sourdough-only/jaccard_pets.qzv")

<div style="background-color: aliceblue; padding: 10px;">

Pets do not have an influence on fungal composition or abundance

## Analysis of filtered only Hand metadata
<div style="background-color: skyblue; padding: 10px;">


In [None]:
Visualization.load(f"{Data_diversity}/core-metrics-results-merged-hand-only/bray_curtis_emperor.qzv")

In [None]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-hand-only/scatterplot.qzv")

<div style="background-color: aliceblue; padding: 10px;">  
    
- no difference of sourdoughbackground on hand fungal composition visible  
- there is some kind of clustering based on aromas of day 21  
- there doesn't seem to be a correlation between yeast / sourdough bake experience and hand fungal composition

**Comparison of hand**  

Bray-Curtis

In [24]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-hand-only/bray_curtis_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column hand \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-merged-hand-only/bray_curtis-hand-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged-hand-only/bray_curtis-hand-significance.qzv[0m
[0m[?25h

ValueError: Data/diversity/kmerizer-results/bray_curtis-hand-significance.qzv does not exist.

In [25]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-hand-only/bray_curtis-hand-significance.qzv")

Jaccard

In [27]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $Data_diversity/kmerizer-results-merged-hand-only/jaccard_distance_matrix.qza \
    --m-metadata-file $Data_raw/20250913_metadata_ITS.tsv \
    --m-metadata-column hand \
    --p-pairwise \
    --o-visualization $Data_diversity/kmerizer-results-merged-hand-only/jaccard-hand-significance.qzv

  import pkg_resources
[32mSaved Visualization to: Data/diversity/kmerizer-results-merged-hand-only/jaccard-hand-significance.qzv[0m
[0m[?25h

In [28]:
Visualization.load(f"{Data_diversity}/kmerizer-results-merged-hand-only/jaccard-hand-significance.qzv")

<div style="background-color: aliceblue; padding: 10px;">

- there is no difference in abundance or composition between right & left hand
- Bray-Curtis: p & q value: 0.889, pseudo F-value: 0.715672
- Jaccard: p & q value: 0.581 and pseude F-value: 0.964955