In [None]:
# In colab run this cell first to setup the file structure!
%cd /content
!rm -rf MOL518-Intro-to-Data-Analysis

!git clone https://github.com/shaevitz/MOL518-Intro-to-Data-Analysis.git
%cd MOL518-Intro-to-Data-Analysis/Lecture_4

# Lecture 4: Loops and Control Flow

In this lecture we introduce **loops** and **conditional logic**. These tools let us repeat operations and make decisions when the structure of a problem does not naturally fit into a single operation.

## Loops and Indexing sequential elements of an array

In many problems, we want to repeat an operation a specific number of times and keep track of where we are. To do this, Python provides the function `range`.

`range(n)` generates a sequence of integers starting at 0 and ending at n âˆ’ 1. By indexing the array with each element of range, we can tell Python to do an operation once for each position.

When we combine range with len, we can loop over the valid indices of an array. This lets us access elements by position using square bracket indexing.

In [4]:
import numpy as np

values = np.array([1, 3, 5, 7, 9])

indices = range(len(values))

A `for` loop repeats a block of code once for each item in a sequence. Sometimes that sequence is a range of numbers, and sometimes it is the values in a list or array.

First, let's look at the indices we can use to loop over the `values` array. This command tells Python to print the index `i` for each of entry in `indices`

In [5]:
for i in indices:
    print(i)

0
1
2
3
4


As expected there are five entries, one for each entry in `values`. Notice, that the index starts at zero (!) so the highest index is 4.

Now we can loop over the values array and build up to the total sum one step at a time.

In [10]:
# Note that we initialize the total to zero before the loop!
total = 0

# Loop over the values in the array
for i in range(len(values)):
    v = values[i]
    total = total + v
    print("Running total:", total)

# Final result after the loop finishes
print("-----\nFinal total:", total)

Running total: 1
Running total: 4
Running total: 9
Running total: 16
Running total: 25
-----
Final total: 25


In the example below, we loop over the values in a list and build the total sum one step at a time, printing the running total as we go.

In [None]:
values = [1, 3, 5, 7, 9]

total = 0   # note that we initialize the total to zero before the loop!

for v in values:
    total = total + v   # the current value is added to the running total
    print("running total:", total)

total

running total: 1
running total: 4
running total: 9
running total: 16
running total: 25


25

## Why indentation matters

In Python, indentation is not cosmetic. It defines which lines of code belong together. Loops and conditionals only apply to the indented block that follows them.

We will first see what happens when indentation is wrong.

In [None]:
# This cell intentionally contains an indentation error
for i in range(3):
print(i)

Python will report an `IndentationError`. This is helpful, because it stops your program before it does the wrong thing.

More dangerous are cases where the indentation is *legal* but incorrect, leading to logic bugs rather than explicit errors.

In [None]:
# Legal Python, but likely not what we intend
for i in range(3):
    x = i * 2
print(x)

### In-class exercise

Compute the mean of the list `values` *without* using NumPy. Use a loop and check intermediate results.

In [None]:
# Your code here


## Conditionals as decision making

`if`, `elif`, and `else` let us run code only when certain conditions are met. The condition itself is a **boolean expression** that evaluates to `True` or `False`.

We often use conditionals together with loops.

In [None]:
numbers = [2, -1, 4, -3, 0]

for n in numbers:
    if n > 0:
        print(n, "is positive")
    elif n < 0:
        print(n, "is negative")
    else:
        print(n, "is zero")

## Example 1: Peak finding in a mass spectrum

We now apply loops and conditionals to a simple mass spectrum. The data consist of intensity as a function of m/z.

Our goal is to identify local maxima above a chosen threshold.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt("Lecture_4/data/mass_spectrum.csv", delimiter=",", skiprows=1)
mz = data[:, 0]
intensity = data[:, 1]

In [None]:
# Plot the raw spectrum
plt.figure()
plt.plot(mz, intensity)
plt.xlabel("m/z")
plt.ylabel("intensity")
plt.show()

We will loop over indices so we can compare each point to its neighbors. We avoid the first and last points because they do not have two neighbors.

In [None]:
threshold = 0.4

peak_indices = []

for i in range(1, len(intensity) - 1):
    if intensity[i] > threshold:
        if intensity[i] > intensity[i-1] and intensity[i] > intensity[i+1]:
            peak_indices.append(i)

print("number of peaks found:", len(peak_indices))
print("first few peak indices:", peak_indices[:5])

In [None]:
# Plot spectrum with peaks marked
plt.figure()
plt.plot(mz, intensity)
plt.plot(mz[peak_indices], intensity[peak_indices], "o")
plt.xlabel("m/z")
plt.ylabel("intensity")
plt.show()

### In-class exercise

Change the threshold and observe how the detected peaks change. What threshold gives the most reasonable result?

## Example 2: Categorical measurements from qPCR-style data

We now load a small categorical dataset from file and use a loop to prepare it for plotting.

In [None]:
qpcr = np.loadtxt("Lecture_4/data/fake_qpcr_data_long.csv", delimiter=",", skiprows=1, dtype=str)

genes = qpcr[:, 0]
measurements = qpcr[:, 1].astype(float)

In [None]:
# Build lists explicitly using a loop
gene_labels = []
values = []

for g, v in zip(genes, measurements):
    gene_labels.append(g)
    values.append(v)

In [None]:
# Bar plot
plt.figure()
plt.bar(gene_labels, values)
plt.xticks(rotation=60)
plt.ylabel("measurement")
plt.show()

## Loops versus arrays

Many operations can be written either as explicit loops or as NumPy array operations. Array-based code is often shorter and faster, but loops can be clearer when the logic is complex.

Below is the same thresholding operation written without a loop.

In [None]:
mask = measurements > 0.5
genes[mask]

Both approaches are useful. The important habit is choosing the one that is easiest to read, debug, and explain.

## Common failure modes

- Off-by-one indexing errors
- Forgetting to initialize accumulator variables
- Confusing `=` with `==`
- Mixing data types when building lists