# Project 2: M. tuberculosis Genome Assembly
## 05 - Comparative Analysis: Unbiased Mutation Discovery

* **Author:** Youssef mimoune
* **Date:** 26-Oct-2025

### Objective
This notebook performs an **unbiased (discovery-driven)** comparison to find all genetic differences between our two strains. We are not assuming we know the answer.

We want to answer the question: "What are *all* the protein differences between the control strain (DRR749571) and the PAS-resistant strain (DRR749572)?"

### Methodology
We will use the Linux `diff` tool to compare the two complete proteome files (`.faa`) generated by Prokka. This will show us every single amino acid mutation (SNP) or insertion/deletion (indel) across all ~4000 genes.

In [None]:
print("--- 1. Creating directory for differential analysis ---")
!mkdir -p ../analysis/06_diff_results

print("Directory created.")
!ls -l ../analysis/

In [None]:
print("--- 1. Testing Hypothesis 1: Checking gene 'folC' ---")
print("--- 'folC' name not found in .faa. Searching in GFF for 'folate' instead... ---")

print("\n--- Control Sample (DRR749571) ---")
# Search the .gff file for the function "folate"
!grep "folate" ../analysis/05_prokka_annotation/DRR749571_annotation/DRR749571_control.gff

print("\n--- Resistant Sample (DRR749572) ---")
!grep "folate" ../analysis/05_prokka_annotation/DRR749572_annotation/DRR749572_resistant.gff

In [None]:
print("--- 1. Testing Hypothesis 1: Extracting 'folC' using its Locus Tag ---")
print("\n--- Control Sample (DRR749571) 'folC' Protein Sequence: ---")

# We are now searching for the specific Locus Tag we found in the GFF
!grep -A 10 "IBGOOOGP_02999" ../analysis/05_prokka_annotation/DRR749571_annotation/DRR749571_control.faa

In [None]:
print("\n--- Resistant Sample (DRR749572) 'folC' Protein Sequence: ---")

# We use the specific Locus Tag for *this* sample's annotation
!grep -A 10 "JCLEOGND_02856" ../analysis/05_prokka_annotation/DRR749572_annotation/DRR749572_resistant.faa

In [None]:
print("--- 2. Hypothesis 1 FAILED. Testing Hypothesis 2: Checking gene 'thyA' ---")
print("--- Searching GFF for 'thyA' Locus Tags... ---")

print("\n--- Control Sample (DRR749571) ---")
!grep "thyA" ../analysis/05_prokka_annotation/DRR749571_annotation/DRR749571_control.gff

print("\n--- Resistant Sample (DRR749572) ---")
!grep "thyA" ../analysis/05_prokka_annotation/DRR749572_annotation/DRR749572_resistant.gff

In [None]:
print("--- 3. Extracting 'thyA' (Hypothesis 2) using its Locus Tag ---")
print("\n--- Control Sample (DRR749571) 'thyA' Protein Sequence: ---")

# We are searching for the specific Locus Tag for thyA
!grep -A 10 "IBGOOOGP_02304" ../analysis/05_prokka_annotation/DRR749571_annotation/DRR749571_control.faa

In [None]:
print("\n--- Resistant Sample (DRR749572) 'thyA' Protein Sequence: ---")

!grep -A 10 "JCLEOGND_02130" ../analysis/05_prokka_annotation/DRR749572_annotation/DRR749572_resistant.faa

In [None]:
print("--- 3. H1 & H2 FAILED. Testing Hypothesis 3: Checking gene 'ribD' ---")
print("--- Searching GFF for 'ribD' Locus Tags... ---")

print("\n--- Control Sample (DRR749571) ---")
!grep "ribD" ../analysis/05_prokka_annotation/DRR749571_annotation/DRR749571_control.gff

print("\n--- Resistant Sample (DRR749572) ---")
!grep "ribD" ../analysis/05_prokka_annotation/DRR749572_annotation/DRR749572_resistant.gff

In [None]:
print("--- 4. Extracting 'ribD' (Hypothesis 3) using its Locus Tag ---")
print("\n--- Control Sample (DRR749571) 'ribD' Protein Sequence: ---")

# We are searching for the specific Locus Tag for ribD
!grep -A 10 "IBGOOOGP_00101" ../analysis/05_prokka_annotation/DRR749571_annotation/DRR749571_control.faa

In [None]:
print("\n--- Resistant Sample (DRR749572) 'ribD' Protein Sequence: ---")

!grep -A 10 "JCLEOGND_00159" ../analysis/05_prokka_annotation/DRR749572_annotation/DRR749572_resistant.faa