## Title & Introduction

Brief project description

BioProject ID and patient selection

Gene of interest (e.g., KRAS)

References (link to the paper)

## Get Conda environment ready

* Make sure your Conda environment is active
```bash
conda activate ctdna-analysis
```
file for conda environment is [here]()

1. **Data acquisition**
   - FASTQ download from NCBI SRA
   - Verification of sample metadata and timepoints

   ```bash
   sudo apt update
   sudo apt install -y sra-toolkit entrez-direct
   ````

In [None]:
import pandas as pd

df = pd.read_csv("PRJNA714799_runinfo.csv")

unique_counts = df.nunique().sort_values(ascending=False)

print(unique_counts)

2. **Quality control**
   - Read quality 
   - Adapter and quality trimming (if necessary)

   Show FastQC summary plots

Markdown explaining why QC is important

3. **Alignment**
   - Alignment to GRCh38 reference genome
   - Sorting, indexing, and duplicate handling

   Example using BWA MEM:

   Repeat for post-treatment sample

Markdown: describe why alignment and sorting are needed

4. **Targeted variant calling**
   - Variant calling optimized for low-allele-fraction ctDNA
   - Focus on KRAS coding regions
   - Filtering for clinically relevant variants

   Goal: detect mutations in KRAS

Example bcftools:

Or GATK HaplotypeCaller if preferred

Markdown: explain how variants are called and filtered

5. **Variant annotation**
   - Functional and clinical annotation
   - Interpretation in the context of anti-EGFR resistance

   Goal: annotate variants with SnpEff
   Markdown: explain clinical significance of detected mutations


6. **Comparative analysis**
   - Pre-treatment vs post-treatment variant comparison
   - Allele frequency changes over time

   Results Visualization

Use matplotlib / seaborn:

Variant allele frequency bar plots

QC read quality summaries

Pre vs post-treatment comparison

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("variant_summary.csv")
sns.barplot(data=df, x="Mutation", y="VAF", hue="Timepoint")
plt.savefig("../results/figures/variants.png")

7. **Visualization & interpretation**
   - Clear plots summarizing variant dynamics
   - Biological interpretation grounded in the literature

Highlight clinically relevant mutations

Relate findings to the paper

Optional: add vision board for presentation/marketing

---

## Key Findings (Summary)
- Identification of KRAS variants consistent with known resistance mechanisms to anti-EGFR therapy
- Evidence of variant emergence or allele frequency shifts during treatment
- Results align with observations reported in the original study

> Detailed results and figures are available in the Jupyter notebook.