# MAJOR STEP 2: Data Trimming & Post-QC Verification

**Goal:** Clean the raw reads from adapter contamination and verify the success of the cleaning process.

**Why:** The analysis in `01_PreQC_Analysis.ipynb` revealed significant adapter contamination in our 96 samples. We cannot proceed to Mapping (Phase 5) with contaminated data. This notebook documents the "Treatment" (Trimming) and "Verification" (Post-QC) steps.

##  1: Handoff to "The Factory" (Trimming)

**The Handoff (Done):**
To solve the contamination, we added the `rule fastp_trimming` to our main `Snakefile` to automate the cleaning process. We then executed this high-throughput job from the terminal.

**Result:** 192 clean `.fastq.gz` files were created in `results/trimmed_reads/`.

In [None]:
# --- Verification Code (Rule 1) ---
# We are in 'notebooks/', so we check '../results/'

import os

print("--- Verifying Trimming (Phase 3) Outputs ---")

# 1. Check the target directory
dir_path = "../results/trimmed_reads/"
exists = os.path.exists(dir_path)
print(f"Directory {dir_path} exists: {exists}")

# 2. Count the files
# (This code is safer than !ls in a notebook)
if exists:
    files = os.listdir(dir_path)
    # We expect 192 files (.fastq.gz)
    file_count = len(files)
    print(f"File count in directory: {file_count}")
    
    if file_count == 192:
        print("VERIFICATION SUCCESS: 192 trimmed files found.")
    else:
        print(f"VERIFICATION FAILED: Expected 192, but found {file_count}.")

    # 3. Show a sample of the files
    print("\nSample of files found:")
    for f in files[:5]: # Show first 5
        print(f"  - {f}")

##  2: Verification (Post-QC)

**The Handoff (Done):**
To verify the cleaning, we also added the `rule multiqc_fastp` to the `Snakefile` to aggregate all the new `fastp` reports. We executed this aggregation rule from the terminal.

**Result:** A new master report was created: `results/qc/multiqc_report_fastp.html`.

In [None]:
# --- Verification Code (Rule 1) ---
from IPython.display import IFrame

print("--- Verifying Post-QC (Phase 4) Output ---")

file_path = "../results/qc/multiqc_report_fastp.html"
exists = os.path.exists(file_path)
print(f"Report file {file_path} exists: {exists}")

if exists:
    print("\nVERIFICATION SUCCESS. Displaying the report (Showroom):")
    # This displays the HTML report directly inside our notebook!
    display(IFrame(src=file_path, width=900, height=600))
else:
    print("VERIFICATION FAILED: Report file not found.")

## Conclusion 

**Status:** Success.
The code cells above confirm that all 192 files from Phase 3 and the master report from Phase 4 were successfully created by the "Factory".

**The Scientific Conclusion:**
The `multiqc_report_fastp.html` report (displayed above) confirms that adapter contamination **has been successfully removed**.

**Next Step (The Handoff):**
Data Cleaning (Phases 1-4) is complete and documented. We are ready for **Phase 5: Mapping**.