# Quality Control & Trimming Analysis ðŸ§¹

## 1. Overview
In this step, we processed raw sequencing data to ensure high-quality input for downstream analysis.
- **Tool:** `fastp` (All-in-one FASTQ preprocessor).
- **Strategy:**
  - Adapter trimming.
  - Quality filtering (Phred score > Q20).
  - Length filtering (discard reads < 50bp).
  - **Storage Optimization:** Raw FASTQ files were deleted immediately after processing using Snakemake's `temp()` feature to save disk space.

## 2. MultiQC Report
The following report aggregates QC metrics for all **120 samples** (60 Case / 60 Control).

In [None]:
from IPython.display import IFrame

# Display the MultiQC report as an embedded HTML page
# This allows for interactive scrolling and zooming directly within the notebook
IFrame(src='qc/multiqc_report.html', width='100%', height=800)

In [None]:
import json
import pandas as pd

# Load the MultiQC data from the generated JSON file
with open('qc/multiqc_data/multiqc_data.json') as f:
    data = json.load(f)

# Extract relevant statistics (Filtering rates) for each sample
stats = {}
try:
    # Accessing general statistics from Fastp report data
    # Note: The JSON structure depends on MultiQC version, generic extraction below
    report_data = data['report_general_stats_data']
    
    # Iterate through the first dataset (usually Fastp if it's the only tool run)
    for sample_name, metrics in report_data[0].items():
        clean_name = sample_name.split('_')[0] # Clean up sample ID
        stats[clean_name] = metrics

except Exception as e:
    print(f"Error extracting data: {e}")

# Create a DataFrame and display the first 5 samples
df_stats = pd.DataFrame.from_dict(stats, orient='index')
print("=== Summary Statistics (First 5 Samples) ===")
display(df_stats.head())