# Lab 0.1: Sequential vs Parallel Thinking

**Chapter 0: The Parallel Mindset**

This lab explores the fundamental differences between sequential and parallel programming through hands-on examples.

## Learning Objectives
- Understand why sequential intuition fails in parallel contexts
- Identify race conditions in naive parallel code
- Practice thinking about "what can happen simultaneously"

In [None]:
import numpy as np
import time
from concurrent.futures import ThreadPoolExecutor
import threading

## Part 1: The Sequential Sum

Let's start with a simple sequential sum - something we've all written many times.

In [None]:
def sequential_sum(arr):
    """Sequential sum - each step depends on the previous."""
    total = 0
    for x in arr:
        total += x
    return total

# Test it
arr = np.random.rand(1_000_000)
result = sequential_sum(arr)
print(f"Sequential sum: {result:.4f}")
print(f"NumPy sum (reference): {arr.sum():.4f}")

## Part 2: The Naive Parallel Sum (BROKEN!)

What happens if we try to parallelize this naively?

In [None]:
# WARNING: This code has a race condition!
total = 0

def add_to_total(x):
    global total
    total += x  # Race condition: read-modify-write is not atomic

# Reset and try parallel execution
total = 0
with ThreadPoolExecutor(max_workers=4) as executor:
    executor.map(add_to_total, arr[:10000])  # Use smaller array for demo

print(f"Naive parallel sum: {total:.4f}")
print(f"Expected: {arr[:10000].sum():.4f}")
print(f"Difference: {abs(total - arr[:10000].sum()):.4f} (should be ~0, but isn't!)")

## Part 3: Understanding the Race Condition

The race condition occurs because `total += x` is actually three operations:
1. Read `total`
2. Add `x`
3. Write result back to `total`

When threads interleave, updates get lost.

In [None]:
# Run the broken version multiple times to see variability
results = []
expected = arr[:10000].sum()

for _ in range(10):
    total = 0
    with ThreadPoolExecutor(max_workers=4) as executor:
        executor.map(add_to_total, arr[:10000])
    results.append(total)

print("Results from 10 runs (should all be the same, but aren't):")
for i, r in enumerate(results):
    print(f"  Run {i+1}: {r:.4f} (error: {abs(r - expected):.4f})")

## Part 4: The Correct Parallel Approach

The GPU-friendly approach: each worker computes a partial sum, then combine at the end.

In [None]:
def parallel_sum_correct(arr, num_workers=4):
    """Correct parallel sum using partial results."""
    chunk_size = len(arr) // num_workers
    
    def sum_chunk(start, end):
        return arr[start:end].sum()
    
    # Each worker computes independent partial sum
    with ThreadPoolExecutor(max_workers=num_workers) as executor:
        futures = []
        for i in range(num_workers):
            start = i * chunk_size
            end = start + chunk_size if i < num_workers - 1 else len(arr)
            futures.append(executor.submit(sum_chunk, start, end))
        
        # Combine partial results (reduction)
        partial_sums = [f.result() for f in futures]
    
    return sum(partial_sums)

# Test the correct version
result = parallel_sum_correct(arr)
print(f"Correct parallel sum: {result:.4f}")
print(f"Expected: {arr.sum():.4f}")
print(f"Match: {np.isclose(result, arr.sum())}")

## Exercise: Identify the Pattern

For each operation below, think about:
1. Can it be parallelized?
2. What pattern does it follow? (embarrassingly parallel, reduction, stencil, irregular)

In [None]:
# Exercise 1: Element-wise sigmoid
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# What pattern is this? (Answer in next cell)

In [None]:
# Answer: Embarrassingly parallel
# Each output element depends only on its corresponding input element.
# No communication needed between parallel units.

In [None]:
# Exercise 2: Find maximum value
def find_max(arr):
    return np.max(arr)

# What pattern is this?

In [None]:
# Answer: Reduction
# Many values combined into one.
# Parallel approach: find max in chunks, then combine chunk maxes.

## Key Takeaways

1. **Sequential intuition fails**: `total += x` seems simple but has hidden race conditions
2. **Think about data flow**: Who reads what? Who writes what? Can they overlap?
3. **Divide and combine**: The parallel pattern is often "compute partial results, then reduce"
4. **Independence is key**: The more independent the work units, the easier to parallelize