# Project 2: M. tuberculosis Genome Assembly

## 00 - Environment Setup and Data Download

* **Author:** Youssef Mimoune
* **Date:** 19/10
* **Project:** `PRJDB37589` (PAS resistance in *M. tuberculosis*)
* **Sample ID:** `DRR749571`

### Objective
This notebook sets up the project structure, verifies all required bioinformatics tools, and downloads the raw sequence data from the DDBJ/SRA database.

In [None]:
# We use '!' to run shell commands from inside Jupyter
print("--- 1. Verifying Bioinformatics Tools ---")

!fastqc --version
!multiqc --version
!fastp --version
!spades.py --version
!quast.py --version
!prokka --version
!prefetch -V  # -V (capital V) for sra-tools version

In [None]:
print("\n--- 2. Verifying Project Structure ---")
!ls -lR ../

In [None]:
print("\n--- 3. Downloading SRA Data: DRR749571 ---")
# -O data/00_raw_sra  -->  Output directory
# We created the 'data' directory, but .gitignore is ignoring it. 
# Let's create the sub-directory '00_raw_sra' inside 'data'.
!mkdir -p data/00_raw_sra

!prefetch DRR749571 -O data/00_raw_sra
print("--- Download Complete ---")

In [None]:
print("\n--- 4. Downloading SRA Data: DRR749572 (Resistant Strain) ---")
!prefetch DRR749572 -O data/00_raw_sra
print("--- Download Complete ---")

In [None]:
print("\n--- 5. Checking all downloaded files ---")
!ls -lhR data/00_raw_sra

## 6. Convert SRA to FASTQ

Now we will use `fasterq-dump` (part of sra-tools) to convert the compressed `.sra` files into paired-end `.fastq` files.
- We use `--split-files` to get two files (R1 and R2) for our paired-end data.
- We use `-O data/01_raw_fastq` to place the results in our raw FASTQ directory.
- We use `-p` to show progress.

In [None]:
print("--- Starting SRA to FASTQ conversion ---")
# Note: sra-tools can be slow. This might take a few minutes.

# Sample 1: DRR749571 (Control)
!fasterq-dump --split-files -p -O data/01_raw_fastq data/00_raw_sra/DRR749571/DRR749571.sra

# Sample 2: DRR749572 (Resistant)
!fasterq-dump --split-files -p -O data/01_raw_fastq data/00_raw_sra/DRR749572/DRR749572.sra

print("--- Conversion Complete ---")

In [None]:
print("\n--- 7. Final Verification of FASTQ Files ---")
!ls -lh data/01_raw_fastq