# W-ASAP Workshop: Exploring SARS-CoV-2 Wastewater Data

**Welcome!** This notebook will help you explore real wastewater surveillance data from Switzerland.

In the next 30 minutes, you'll learn how to:
- Explore which variants are circulating
- Download real sequence alignments
- Query data programmatically to answer your own questions

Let's get started!

## Setup: Import Required Libraries

We'll use standard Python libraries that are pre-installed in Google Colab:
- `requests`: for making API calls
- `pandas`: for data manipulation
- `matplotlib`: for visualization

In [1]:
# Import libraries
import requests
import pandas as pd
import matplotlib.pyplot as plt
import json
from datetime import datetime

# Configure matplotlib for better-looking plots
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("‚úì Setup complete! Ready to explore wastewater data.")

‚úì Setup complete! Ready to explore wastewater data.


---
## Section 1: GenSpectrum/Swiss-Wastewater Interface

### üîç Explore which variants are circulating

Before diving into code, let's explore the data visually using the GenSpectrum interface.

**üëâ Open this link:** [https://genspectrum.org/swiss-wastewater/covid](https://genspectrum.org/swiss-wastewater/covid)

**What you can do:**
- Browse the variant landscape over time
- Check mutation frequencies at specific genomic positions
- Validate primers against current sequences
- Diagnose why dPCR assays might be underperforming

*Take a few minutes to explore. We'll work through two exercises below.*

---

### üìù Exercise 1: Manual Mode - Finding Nucleotide Mutations

**Your Task:**  
What are the nucleotide mutations observed for SARS-CoV-2 in **Z√ºrich** during **November 2025** with a mean proportion between **15% and 30%**?  
What is the **lowest genomic reference position**?

**Instructions:**
1. Open the [GenSpectrum Swiss Wastewater interface](https://genspectrum.org/swiss-wastewater/covid)
2. Set the following filters:
   - **Location:** Z√ºrich (ZH)
   - **Date Range:** November 1-30, 2025
   - **Analysis Mode:** Manual
   - **Sequence Type:** Nucleotide
3. Look for mutations with mean proportions between 15-30%
4. Identify the mutation at the lowest genomic position

<details>
<summary><b>üëÅÔ∏è Click to reveal answer</b></summary>

### Answer: **A405G**

The mutation **A405G** (position 405) has a mean proportion within the target range.

**üîó Direct link to the query:**  
[View in GenSpectrum](https://genspectrum.org/swiss-wastewater/covid?locationName=Z%C3%BCrich+%28ZH%29&samplingDate=2025-11-01--2025-11-30&granularity=day&analysisMode=manual&sequenceType=nucleotide&)

**üìä Visualization:**  
![Answer 1 - Nucleotide mutations in Z√ºrich](https://raw.githubusercontent.com/gordonkoehn/wasap-workshop/main/answers/answer01.png)

</details>

---

### üìù Exercise 2: Variant Explorer - Tracking BA.3.2*

**Your Task:**  
Is **BA.3.2*** present in **Basel** during **November-December 2025**?  
What are the signature mutations and their prevalence?

**Instructions:**
1. Switch to **Variant Explorer** mode
2. Set the following filters:
   - **Location:** Basel (BS)
   - **Date Range:** November 1 - December 31, 2025
   - **Analysis Mode:** Variant
   - **Sequence Type:** Nucleotide
   - **Variant:** BA.3.2*
   - **Min Jaccard Index:** 0.5 (to be more lenient with signature mutations)
3. Look for signature mutations of this variant

<details>
<summary><b>üëÅÔ∏è Click to reveal answer - Nucleotide Mutations</b></summary>

### Answer: Yes, BA.3.2* is present!

**Key signature mutations:**
- **T22054A**
- **C23625A**

**Prevalence:** ~14% on November 23rd

**üîó Direct link to the query:**  
[View in GenSpectrum](https://genspectrum.org/swiss-wastewater/covid?locationName=Basel+%28BS%29&samplingDate=2025-11-01--2025-12-31&granularity=day&analysisMode=variant&sequenceType=nucleotide&variant=BA.3.2*&minProportion=0.8&minCount=15&minJaccard=0.5&)

**üìä Visualization:**  
![Answer 2 - BA.3.2* nucleotide mutations in Basel](https://raw.githubusercontent.com/gordonkoehn/wasap-workshop/main/answers/answer02.png)

</details>

---

#### üéØ Bonus: Link to Clinical Sequences

**Try this:**  
Click on the mutation **T22054A** (it should be underlined/clickable) to link out to CovSpectrum.

**Question:** In which clinical sequences is this mutation found? What lineages of BA.3.2* contain it?

<details>
<summary><b>üëÅÔ∏è Click to reveal bonus answer</b></summary>

### Bonus Answer:

**üîó Link:** [CovSpectrum T22054A](https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nucMutations=T22054A)

**Found in lineage:** BA.3.2.2 (2.86%)

This shows you how wastewater data connects to clinical sequencing databases!

</details>

---

#### üß¨ Now Try With Amino Acids

**Your Task:**  
Repeat the same query as above, but switch **Sequence Type** to **Amino Acid**.  
Which amino acid changes have been seen?

**Hint:** You can filter by gene in the plot.

<details>
<summary><b>üëÅÔ∏è Click to reveal answer - Amino Acid Mutations</b></summary>

### Answer:

**Key amino acid changes:**
- **ORF1a:I3944L**
- **S:K795T**
- **S:A852K**

**üîó Direct link to the query:**  
[View in GenSpectrum](https://genspectrum.org/swiss-wastewater/covid?locationName=Basel+%28BS%29&samplingDate=2025-11-01--2025-12-31&granularity=day&analysisMode=variant&sequenceType=amino+acid&variant=BA.3.2*&minProportion=0.8&minCount=15&minJaccard=0.5&)

**üìä Visualization:**  
![Answer 4 - BA.3.2* amino acid mutations in Basel](https://raw.githubusercontent.com/gordonkoehn/wasap-workshop/main/answers/answer04.png)

**Note:** These amino acid changes in the Spike protein (S:) could affect antibody binding or viral fitness!

</details>

---

### ‚úÖ Section 1 Complete!

**What you've learned:**
- How to query nucleotide and amino acid mutations
- How to use Manual Mode vs Variant Explorer
- How to track specific lineages over time and location
- How wastewater data connects to clinical sequences (CovSpectrum)

**Key takeaway:** The GenSpectrum interface lets you explore real wastewater data without writing any code‚Äîperfect for quick checks and hypothesis generation!

---

---
## Section 2: W-ASAP Loculus for Downloading Alignments

### Get your hands on real sequences

**What is Loculus?**  
Loculus is the database backend that stores all the wastewater sequencing data. You can download alignments, inspect metadata, and access raw data‚Äîusually available the day after processing!

**üëâ Open this link:** [https://db.wasap.genspectrum.org/](https://db.wasap.genspectrum.org/)

**How to download alignment data:**
1. Browse available samples by location and date
2. Filter by your region or time period of interest
3. Download the alignment files (FASTA format)
4. Use these alignments in your own bioinformatics pipelines!

**Where to find sequences:**
- Latest uploads appear within 24 hours of processing
- Metadata includes: collection date, location, sequencing coverage
- You can trace back to raw reads if needed

*Explore the interface and download a sample if you'd like!*

---
## Section 3: LAPIS + Loculus APIs for Programmatic Queries

Now let's write some code! The following cells show you how to:
- Query variant frequencies over time
- Check primer coverage
- Fetch sample metadata

All queries use **live APIs**‚Äîyou're getting real data in real-time.

### Cell 3a: LAPIS Query ‚Äì Count a Set of Mutations Over Time

**Goal:** Visualize how specific mutations or lineages change over time in Swiss wastewater.

**What this does:**
- Queries the LAPIS API for read counts by lineage
- Groups data by time period
- Visualizes as a heatmap

**How to modify:**
- Change the lineage filter (e.g., `BA.2.86`, `XBB.1.5`)
- Adjust the time range
- Query specific mutations instead of lineages

In [None]:
# Cell 3a: Count mutations over time
# TODO: Add LAPIS query code here

### Cell 3b: LAPIS Query ‚Äì Primer Coverage

**Goal:** Check if your primers are still binding to current sequences.

**Use case:**  
Your dPCR assay stopped working? Let's see if mutations appeared in the primer binding sites!

**What this does:**
- Queries reads with mutations at specific genomic positions
- Shows which primers are affected
- Calculates % of reads with problematic mutations

**How to modify:**
- Update primer coordinates for your assay (N1, N2, E gene, etc.)
- Add multiple primer sets to compare

In [None]:
# Cell 3b: Check primer coverage
# TODO: Add primer coverage query code here

### Cell 3c: Loculus API Query ‚Äì Metadata Inspection

**Goal:** Fetch sample metadata to understand data quality and availability.

**What this does:**
- Queries Loculus for sample metadata
- Shows: collection dates, locations, sequencing coverage stats
- Helps you find samples relevant to your research

**How to modify:**
- Filter by specific locations (cantons, treatment plants)
- Adjust date ranges
- Add quality filters (e.g., minimum coverage)

In [None]:
# Cell 3c: Fetch metadata from Loculus
# TODO: Add Loculus metadata query code here

---
## Next Steps

**You now have the tools to:**
- Explore variant dynamics in your region
- Validate your primers against current sequences
- Access fresh data the day after processing

**Questions?** Ask during the Q&A!

**Want to go deeper?**
- Modify the queries above to answer your specific questions
- Combine LAPIS + Loculus data for richer analysis
- Integrate this data into your own workflows

Happy exploring! ü¶†üíß