# For Loops in Python – Tekin Analyzes Wastewater Data

A `for` loop allows us to iterate over a sequence (e.g., a string, list, or tuple) and execute a block of code multiple times. This is incredibly useful for analyzing large datasets, which is often the case in wastewater surveillance.

### Why Use `for` Loops in Wastewater Surveillance?

A `for` loop is used to iterate over sequences, such as strings, lists, or tuples, allowing us to process data efficiently without writing repetitive code. In wastewater surveillance, `for` loops are crucial for tasks like:

- Iterating through multiple samples or time points
- Checking if viral loads exceed thresholds in a large number of samples
- Analyzing sequences of nucleotides or pathogens
- Processing large datasets of environmental data, such as reading multiple sequences from a file

By automating repetitive tasks, `for` loops help Tekin efficiently analyze wastewater surveillance data. For example, he could loop through a list of samples and apply specific analysis or check each sample's viral load against an established threshold. This greatly reduces the need for manual intervention and helps in processing large batches of data.

---

## **Basic `for` Loop**

A `for` loop iterates over a sequence **element by element**, which is useful for analyzing multiple samples or sequences at once.



### But first let's learn what is list!

What is a List in Python?

A list is a built-in data structure in Python that allows you to store multiple values in a single variable. Lists are ordered, meaning the elements inside them maintain their sequence. They are also mutable, which means you can modify, add, or remove elements after creating them. Lists can store different data types, including numbers, strings, or even other lists. In bioinformatics, lists are useful for storing DNA sequences, protein residues, nucleotide frequencies, or gene names. Since lists support indexing and iteration, they allow efficient manipulation of biological data, such as extracting codons, filtering specific nucleotides, or analyzing sequence motifs.

* Now we can move onto the simple example

In [1]:
# Iterating through a list of wastewater samples
samples = ["WWTP_01", "WWTP_02", "WWTP_03"]

for sample in samples:
    print(f"Processing sample: {sample}")


Processing sample: WWTP_01
Processing sample: WWTP_02
Processing sample: WWTP_03


Iterating through a DNA Sequence in Wastewater Samples
A common task in wastewater surveillance is to analyze viral sequences. Tekin might use a for loop to iterate through each nucleotide in a viral genome sequence to detect certain patterns or mutations.

In [2]:
# Iterating through a viral genome sequence in wastewater
viral_sequence = "ATGCGT"

for nucleotide in viral_sequence:
    print(nucleotide)

A
T
G
C
G
T


### Analyzing Multiple Samples in Wastewater Surveillance
Tekin could also use for loops to analyze multiple samples, checking if their viral load exceeds a threshold, as in the following example:

In [None]:
# List of viral loads in wastewater samples
viral_loads = [18000, 32000, 25000, 27000, 15000]

# Define the threshold for high viral load
threshold = 25000

# Loop through each sample and check if it exceeds the threshold
for load in viral_loads:
    if load > threshold:
        print(f"High viral load detected: {load} copies/mL")
    else:
        print(f"Normal viral load: {load} copies/mL")


In [4]:
# Here is the simple example of a list.

dna_sequences = ["ATGCGT", "TTAAGC", "CCGATT"]
print(dna_sequences)

# This list has tree elements in it.

# Each element inside the list can be accessed using its index, and indexing starts from ZERO.

# Let me show you how can you print the first element in the list below

dna_sequences[0] # This means print the first index which is the first element in the list. Try it out!

# We can continue like 0, 1, 2 since there are three indexes/elements

# If we want, we can make it prettier as always!

print("First sequence:", dna_sequences[0])  # Output: ATGCGT
print("Second sequence:", dna_sequences[1])  # Output: TTAAGC
print("Third sequence:", dna_sequences[2])  # Output: CCGATT

# When you see the lovely square brackets, you will understand that 
# you're dealing with a list

# Check it out by running the code!

['ATGCGT', 'TTAAGC', 'CCGATT']
First sequence: ATGCGT
Second sequence: TTAAGC
Third sequence: CCGATT


In [None]:
# TRY IT OUT BY YOURSELF AREA

## **Iterating Through a List in Wastewater Surveillance**
A `for` loop can iterate over a list of items, which is useful when processing multiple samples, sequence bases, or other biological data.

```python
# List of nucleotides detected in a wastewater sample
nucleotides = ["A", "T", "C", "G"]

for base in nucleotides:
    print(f"Nucleotide: {base}")


In [None]:
# CODE PLAYGROUND

In this example, Tekin uses a for loop to iterate over a list of nucleotides in a sample and print each nucleotide. This approach can be extended to analyze sequence data, detect specific patterns, or validate the composition of wastewater samples in genomic studies.

For instance, Tekin could use this loop to check if all the nucleotides in a sequence are valid (A, T, C, G) or to detect patterns like the start codon ("ATG") in a viral genome.

### We changed the structure of the lists as below. Now we have multiple elements in the list.

In [None]:
# List of gene names
genes = ["BRCA1", "TP53", "EGFR", "MYC"]

# Iterating through the list of genes
for gene in genes:
    print(f"Gene: {gene}")


### If I want to print the first letter of the zero index which code I should use?

In [None]:
# CODE PLAYGROUND (Trying is free)

In [None]:
# SOLUTION

# List of gene names
genes = ["BRCA1", "TP53", "EGFR", "MYC"]

# Printing the first letter of the first gene (index 0)
print(genes[0][0])  


## **Using `range()` in a Loop for Wastewater Surveillance Tasks**
The `range()` function is useful for iterating a set number of times, which is beneficial when you need to loop through a fixed number of samples or time points.

```python
# Printing sample numbers from 1 to 5 (for example, representing sample IDs)
for i in range(1, 6):
    print(f"Processing Sample #{i}")


In this example, Tekin uses range() to loop through a set number of samples, printing a message for each sample ID. This could be applied to tasks like iterating through a batch of wastewater samples and performing quality control or viral load analysis.

For instance, Tekin could use range() to repeat an analysis multiple times, such as checking the viral load of several samples at once, or to automate statistical processing over several time points in a surveillance dataset.

In [None]:
# Simulating gene expression levels for 5 genes
for i in range(1, 6):
    print(f"Gene {i}: Expression Level Recorded")


In [None]:
# CODE PLAYGROUND

In [None]:
# Generating gene IDs from Gene_1 to Gene_10
for i in range(1, 11):
    print(f"Gene_{i}")


In [None]:
# CODE PLAYGROUND

## **Looping Through a DNA Sequence in Triplets (Codons) for Wastewater Surveillance**

In wastewater surveillance, especially when analyzing viral genomes, it is often useful to **step through a DNA sequence in triplets**, extracting codons to detect gene start points or analyze mutations.

```python
# DNA sequence representing a viral genome
dna_sequence = "ATGCGTAAGTCC"

# Loop through the sequence in triplets
for i in range(0, len(dna_sequence), 3):
    codon = dna_sequence[i:i+3]
    print(codon)


In [1]:
# DNA sequence
dna_sequence = "ATGCGTAAGTCC"

# Loop through the sequence in triplets (codons)
for i in range(0, len(dna_sequence), 3):
    codon = dna_sequence[i:i+3]
    print(f"Codon {i//3 + 1}: {codon}")


Codon 1: ATG
Codon 2: CGT
Codon 3: AAG
Codon 4: TCC


In [None]:
# CODE PLAYGROUND

### Explanation:

**`range(0, len(dna_sequence), 3)`**: 
This loop starts at index `i = 0` and steps forward by 3 positions each time (i.e., processing three bases at a time).

- `range(0, len(dna_sequence), 3)` generates numbers like `0`, `3`, `6`, `9`, etc., corresponding to the positions where each codon starts.
- `len(dna_sequence)` ensures the loop stops when the sequence's end is reached.

**`codon = dna_sequence[i:i+3]`**: 
This extracts a triplet of nucleotides starting at index `i`.

- `dna_sequence[i:i+3]` slices the sequence into three consecutive bases, each representing a codon.

### Codon Numbering:
- `i//3 + 1` calculates the current codon number.
- `i//3` gives a 0-based index, and adding `+1` converts it to a 1-based codon number (e.g., Codon 1, Codon 2, etc.).

---

### Example Iterations:

**First iteration (i = 0):**

```python
codon = dna_sequence[0:3]  # "ATG"
codon_number = 0 // 3 + 1  # 1


## **Using `enumerate()` for Indexed Iteration in Wastewater Surveillance**

The `enumerate()` function allows you to track both the **index** and the value of elements while iterating through a sequence, which is useful for tasks like locating specific elements within a DNA sequence or tracking sample positions.

```python
dna_sequence = "ATGCGT"

for index, nucleotide in enumerate(dna_sequence):
    print(f"Position {index}: {nucleotide}")


In [None]:
# List of AMR genes
amr_genes = ["blaCTX-M", "mecA", "tetM", "aadA1", "ermB"]

# Using enumerate() to track the index and gene name
for index, gene in enumerate(amr_genes):
    print(f"AMR Gene {index + 1}: {gene}")


In [None]:
# CODE PLAYGROUND

### Explanation:
enumerate(dna_sequence): This function returns both the index and the element (nucleotide in this case) as you loop through the sequence.

for index, nucleotide in enumerate(dna_sequence): index keeps track of the current position in the sequence, while nucleotide gives you the base at that position.

This is useful in wastewater surveillance when analyzing a sequence and needing both the nucleotide and its position, for example, to detect certain genetic patterns or flag specific mutations that occur at certain locations in the sequence.

In this example, Tekin is processing a DNA sequence and printing each nucleotide along with its position, which can be extended to larger genomic datasets in wastewater surveillance.



## **Filtering with `for` Loops in Wastewater Surveillance**

Filtering specific bases in a sequence is crucial when analyzing genetic data. For example, Tekin might want to find specific nucleotides, like "G" (Guanine), in a viral genome sequence from a wastewater sample.

```python
dna_sequence = "ATGCGTAAGTCC"

# Finding only 'G' bases
for base in dna_sequence:
    if base == "G":
        print(f"Guanine found at position {dna_sequence.index(base)}")


### Explanation:
* for base in dna_sequence:: This loop iterates over each base in the DNA sequence.

* if base == "G":: This condition filters the sequence to only process the bases that are "G" (Guanine).

* dna_sequence.index(base): This method retrieves the position of the base in the sequence, showing where "G" is located.

* In wastewater surveillance, this approach is useful for identifying and tracking specific nucleotide occurrences, such as detecting mutations, analyzing gene expression, or flagging certain pathogen markers. Tekin could adapt this method to filter out specific bases in a large dataset of genetic sequences from wastewater samples.



In [None]:
# List of AMR genes
amr_genes = ["blaCTX-M", "mecA", "tetM", "aadA1", "ermB", "blaTEM", "vanA", "blaOXA"]

# Filtering only β-lactam resistance genes (genes that start with 'bla')
for gene in amr_genes:
    if gene.startswith("bla"):
        print(f"β-lactam resistance gene found: {gene}")


In [None]:
# Playground is all yours!

In [None]:
# Playground is all yours!


##  **Summary**
- **`for` loops iterate** over strings, lists, and other sequences.
- **`range()` allows looping a specific number of times.**
- **We can loop through a DNA sequence in triplets (codons).**
- **`enumerate()` helps track positions in a sequence.**
- **`if` conditions inside loops allow filtering specific elements.**

Next, we will explore **list manipulations and operations in Python**.

# **Loops in Python: `while`, `continue`, and `break` in Wastewater Surveillance**

Loops are a fundamental part of Python programming, allowing us to execute a block of code **multiple times** until a condition is met. In this section, we will explore the **`while` loop** and how to control it using **`break` and `continue`**.

---

## **What is a `while` Loop?**

A `while` loop **repeats** a block of code **as long as a condition remains `True`**. It is especially useful when the number of iterations is unknown in advance, such as when Tekin needs to process a series of data points until a certain threshold is met.

### **Syntax:**

```python
while condition:
    # Code to execute


In [None]:
# Sample viral loads (in copies/mL) for each wastewater sample
viral_loads = [18000, 22000, 30000, 25000, 27000]

# Threshold for high viral load
threshold = 25000

# Check viral load until one exceeds the threshold
index = 0
while viral_loads[index] <= threshold:
    print(f"Sample {index+1} has a normal viral load: {viral_loads[index]} copies/mL")
    index += 1

print(f"Sample {index+1} has a high viral load: {viral_loads[index]} copies/mL")




In this example, the **while** loop continues to monitor the viral loads until one sample exceeds the predefined threshold. Once the condition is met, it breaks the loop and prints a message.

### Why while, continue, and break Matter for Wastewater Surveillance

* These control flow tools allow Tekin to:

* Monitor data dynamically (e.g., checking viral load until a certain threshold is met)

* Skip invalid data with continue, ensuring that only relevant data is processed

* Stop the loop early with break when a critical condition is met, optimizing the analysis process

* By mastering these tools, Tekin can write efficient, adaptable code that reacts dynamically to the data, which is essential for handling the complexities and variability of wastewater surveillance data.



In [None]:
# Sample viral loads
viral_loads = [18000, 22000, 30000, 25000, 27000]

# Threshold for high viral load
threshold = 25000

# Stop processing when a sample exceeds the threshold
for load in viral_loads:
    if load > threshold:
        print(f"High viral load detected: {load} copies/mL")
        break
    print(f"Viral load: {load} copies/mL")


In [None]:
# Example of using while. 
# Stop scanning a DNA sequence as soon as a mutation (X) is found.


dna_sequence = "ATGCGTXAGGCTAAGTGA"

index = 0
while index < len(dna_sequence):
    if dna_sequence[index] == "X":
        print(f"Mutation found at position {index + 1}! Stopping scan.")
        break  # Stops the loop when a mutation is found
    index += 1


In [None]:
# CODE PLAYGROUND

In [3]:
# Another example. Skip low-quality variants (Quality < 20) and only process high-quality ones.
 
quality_scores = [35, 18, 50, 12, 40, 5, 30]

for score in quality_scores:
    if score < 20:
        continue  # Skip low-quality scores
    
    print(f"Processing high-quality variant with score: {score}")


Processing high-quality variant with score: 35
Processing high-quality variant with score: 50
Processing high-quality variant with score: 40
Processing high-quality variant with score: 30


In [None]:
# CODE PLAYGROUND

In [4]:
# Finding the First Stop Codon (TAA, TAG, TGA)

dna_sequence = "ATGCCTGATTAGCGTAAAGTGA"
index = 0

while index < len(dna_sequence) - 2:
    codon = dna_sequence[index : index + 3]  # Extract triplet codon
    
    if codon not in ["TAA", "TAG", "TGA"]:
        index += 3  # Move to the next codon
        continue  # Skip processing if not a stop codon

    print(f"Stop codon '{codon}' found at position {index + 1}. Stopping scan.")
    break  # Exit loop when a stop codon is found

Stop codon 'TAG' found at position 10. Stopping scan.


In [None]:
# CODE PLAYGROUND

# Problem 1: Extracting Codons from a DNA Sequence

### **Scenario**
You are working with DNA sequences and need to extract codons (triplets of nucleotides) from a given sequence.

### **Task**
Write a Python program that:

1. Takes a **DNA sequence** as input from the user.
2. Extracts **codons (triplets of nucleotides)** from the sequence.
3. Prints each **codon with its position**.

---

### **Example Output**
For example, if the input DNA sequence is:

ATGCGTAAGTCC

### **Hint**
You can use a `for` loop to iterate over the DNA sequence in steps of 3 to extract each codon.

---




In [None]:
# CODE PLAYGROUND

In [None]:
# CODE PLAYGROUND


---

# Problem 2: Counting Specific Bases in a DNA Sequence
### Scenario
You need to analyze a DNA sequence and count how many times a **specific nucleotide** (A, T, C, or G) appears.

### Task
Write a Python program that:
- Takes a **DNA sequence** and a **nucleotide** from the user.
- Counts how many times the given nucleotide appears in the sequence.
- Prints the **count**.

In [None]:
# CODE PLAYGROUND

In [None]:
# CODE PLAYGROUND

In [None]:
# CODE PLAYGROUND

In [None]:
# CODE PLAYGROUND

In [7]:

# Step 1: Get input from the user
dna_sequence = input("Enter a DNA sequence: ").upper()

# Step 2: Extract codons (triplets of nucleotides)
for i in range(0, len(dna_sequence), 3):
    codon = dna_sequence[i:i+3]  # Extract 3 nucleotides at a time
    print(f"Codon {i//3 + 1}: {codon}")

Enter a DNA sequence: ACACT
Codon 1: ACA
Codon 2: CT


In [None]:
# CODE PLAYGROUND

In [None]:
# CODE PLAYGROUND

In [None]:
# Solution Number 2:

# Get user input
dna_sequence = input("Enter a DNA sequence: ").upper()
nucleotide = input("Enter the nucleotide to count: ").upper()

# Validate input
if nucleotide not in ["A", "T", "C", "G"]:
    print("Invalid nucleotide! Please enter A, T, C, or G.")
else:
    # Count occurrences
    count = dna_sequence.count(nucleotide)
    print(f"The nucleotide '{nucleotide}' appears {count} times in the sequence.")


In [None]:
# CODE PLAYGROUND

# Questions?