### Input Files for Analysis

- **Raw Data Processing**  
  Initial processing of all raw sequencing data was performed, and only the samples selected for downstream analysis were retained.  
  The filtered datasets are stored in the `output` directory.

- **Feature Table (`./output/filtered_yoo_0h_table.qza`)**  
  This file contains the feature table generated through demultiplexing and denoising using the DADA2 plugin in QIIME 2.  
  It includes only the 0-hour incubation samples that were selected for downstream analyses.

All data processing and analyses were conducted using the QIIME 2 platform ([https://qiime2.org](https://qiime2.org)).

## Database Information

Due to file size limitations, the database could not be uploaded to GitHub.  
Instead, the **SILVA database** was downloaded and used locally for analysis.

The specific database file used in this study is:

**`./DB/silva-138-99-515-806-nb-classifier.qza`**

In [None]:
# To determine the appropriate sequencing depth for rarefaction, the feature table (QZA) was converted to a QZV file and visualized in QIIME 2.
# Based on the sequencing depth distribution, a minimum depth of 2,425 reads was selected for rarefaction in subsequent diversity analyses.

!qiime feature-table summarize \
    --i-table output/filtered_yoo_0h_table.qza \
    --o-visualization output/filtered_yoo_0h_table.qzv

# rarefied 2425
!qiime tools view output/filtered_yoo_0h_table.qzv

[32mSaved Visualization to: output/filtered_yoo_0h_table.qzv[0m
[0mPress the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.^C


In [None]:
# taxanomy

!qiime feature-classifier classify-sklearn \
  --i-classifier './DB/silva-138-99-515-806-nb-classifier.qza' \
  --i-reads output/filtered_yoo_0h_rep_seqs.qza \
  --o-classification output/taxonomy_yoo_0h.qza


[32mSaved FeatureData[Taxonomy] to: output/taxonomy_yoo_0h.qza[0m
[0m

In [4]:
!qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences output/filtered_yoo_0h_rep_seqs.qza \
  --o-alignment output/aligned_rep_seqs_yoo_0h.qza \
  --o-masked-alignment output/masked_aligned_rep_seqs_yoo_0h.qza \
  --o-tree output/unrooted-tree_0h.qza \
  --o-rooted-tree output/rooted-tree_0h.qza

[32mSaved FeatureData[AlignedSequence] to: output/aligned_rep_seqs_yoo_0h.qza[0m
[32mSaved FeatureData[AlignedSequence] to: output/masked_aligned_rep_seqs_yoo_0h.qza[0m
[32mSaved Phylogeny[Unrooted] to: output/unrooted-tree_0h.qza[0m
[32mSaved Phylogeny[Rooted] to: output/rooted-tree_0h.qza[0m
[0m

In [6]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny output/rooted-tree_0h.qza \
  --i-table output/filtered_yoo_0h_table.qza \
  --p-sampling-depth 2425 \
  --m-metadata-file data/Merge_metadata_250526_F_s18.txt \
  --output-dir core-metrics-results-enterotype_0h

[32mSaved FeatureTable[Frequency] to: core-metrics-results-enterotype_0h/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results-enterotype_0h/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results-enterotype_0h/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results-enterotype_0h/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results-enterotype_0h/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results-enterotype_0h/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results-enterotype_0h/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results-enterotype_0h/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results-enterotype_0h/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: core-metrics-results-enterotype_0h/unweighted_unifra

In [8]:
!qiime taxa collapse \
  --i-table core-metrics-results-enterotype_0h/rarefied_table.qza \
  --i-taxonomy output/taxonomy_yoo_0h.qza \
  --p-level 6 \
  --o-collapsed-table level6_output/yoo_0h_l6.qza

[32mSaved FeatureTable[Frequency] to: level6_output/yoo_0h_l6.qza[0m
[0m

### File Generation for Analysis

- **QZA and BIOM File Generation**  
  From the  feature table, we generated the necessary `.qza` and `.biom` files for downstream analyses.  
  These files serve as the basis for taxonomic classification and diversity analysis. 

- **Final Input for Enterotype Analysis (`rarefied_yoo_0h_l6.tsv`)**  
  The file `rarefied_yoo_0h_l6.tsv`, located in the `level6_output/` directory, was used as the final input for enterotype analysis.  
  This file was generated after rarefaction and taxonomic summarization at level 6.


In [9]:
import os
import zipfile

In [None]:
def extract_qza_files(folder_path):
    qza_files = []
    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.endswith(".qza"):
                qza_files.append(os.path.join(root, file))

    for qza_file in qza_files:
        output_dir = f"{os.path.splitext(qza_file)[0]}_extracted"
        os.makedirs(output_dir, exist_ok=True)
        
        try:
            with zipfile.ZipFile(qza_file, 'r') as zip_ref:
                zip_ref.extractall(output_dir)
            # print(f"Successfully extracted: {qza_file} -> {output_dir}")
        
        except zipfile.BadZipFile:
            print(f"Error extracting {qza_file}: Not a valid .zip file")


In [11]:
extract_qza_files('level6_output')

In [None]:
# biom file convert
!biom convert -i level6_output/feature-table.biom -o level6_output/rarefied_yoo_0h_l6.tsv --to-tsv