# Project 2: M. tuberculosis Genome Assembly
## 02 - Genome Assembly with SPAdes

* **Author:** Youssef (Your Name)
* **Date:** 25-Oct-2025
* **Sample IDs:** `DRR749571` (Control), `DRR749572` (Resistant)

### Objective
This notebook performs the *de novo* genome assembly using the trimmed reads from the previous step (01_QC). We will use `SPAdes`, a powerful assembler that uses De Bruijn graphs with multiple k-mer sizes.

We will assemble each sample *separately* to compare them later.

### Tool
* `SPAdes`: (v4.0.0) A genome assembler for small genomes.
* **Key Parameters:**
    * `--careful`: Mismatch correction mode for high-quality data.
    * `-k 21,33,55,77,99,127`: A large range of k-mers for high-quality contigs.
    * `-t 4 -m 20`: Using 4 threads and 20GB of RAM.

In [None]:
print("--- 1. Creating directories for SPAdes assembly ---")
# We create one output directory for each sample's assembly
!mkdir -p ../analysis/03_spades_assembly/DRR749571_assembly
!mkdir -p ../analysis/03_spades_assembly/DRR749572_assembly

print("Directories created:")
!ls -lR ../analysis/

In [None]:
print("--- 2. Starting SPAdes Assembly for DRR749571 (Control) ---")
# This will take a LONG time (potentially 30-60 minutes or more).
# We are running this inside 'screen' so it's safe if we disconnect.

!spades.py \
  -1 ../analysis/02_fastp_trimmed/DRR749571.trimmed_1.fastq.gz \
  -2 ../analysis/02_fastp_trimmed/DRR749571.trimmed_2.fastq.gz \
  -o ../analysis/03_spades_assembly/DRR749571_assembly \
  -k 21,33,55,77,99,127 \
  --careful \
  -t 4 \
  -m 20

print("--- SPAdes Assembly for DRR749571 COMPLETE ---")