# Type 2 Diabetes Gut Microbiome Analysis ðŸ§¬

## 1. Project Overview
This project aims to investigate the gut microbiome composition differences between **Type 2 Diabetes (T2D) patients** and **Healthy controls** using Shotgun Metagenomics.

- **Dataset:** PRJNA422434 (BGI Study).
- **Total Samples:** 120 Samples (Balanced Design: 60 Cases vs 60 Controls).
- **Goal:** Identify taxonomic and functional markers associated with T2D.

## 2. Computational Infrastructure
The analysis is performed on a high-performance cloud instance provided by **Oracle Cloud Infrastructure (OCI)**.

- **Instance Type:** VM.Standard.E5.Flex (AMD EPYC Milan Processor).
- **Resources:** 14 vCPUs / 84 GB RAM.
- **Storage:** 700 GB Block Storage.
- **OS:** Ubuntu 20.04 LTS.

In [None]:
import pandas as pd

# Load the SRA Run Table (fetched from ENA)
# We filter for WGS assays and specific groups (T2D vs Control)
url = "https://www.ebi.ac.uk/ena/portal/api/filereport?accession=PRJNA422434&result=read_run&fields=run_accession,sample_alias&format=tsv&download=true&limit=0"
df = pd.read_csv(url, sep='\t')

# Filter Groups
cases = df[df['sample_alias'].str.contains('T2D', case=False, na=False)].head(60)
controls = df[~df['sample_alias'].str.contains('T2D', case=False, na=False)].head(60)

print(f"Selected {len(cases)} Case samples and {len(controls)} Control samples.")

# Save to file for Snakemake
full_list = pd.concat([cases, controls])
# full_list['run_accession'].to_csv('samples.txt', index=False, header=False) # Uncomment to save
full_list.head()

In [None]:
# Display the Snakemake workflow file
!cat Snakefile