# Lesson 2.4: Broadcasting

## What is Broadcasting?

Broadcasting is NumPy's way of doing math between arrays of **different sizes**.

### PHP Parallel
Imagine if in PHP you could do this:
```php
$prices = [10, 20, 30];
$taxed = $prices * 1.1;  // ERROR in PHP!
```
In NumPy, this just works! The scalar `1.1` gets "broadcast" to match the array size.

Broadcasting handles cases beyond just scalars - it can stretch smaller arrays to match bigger ones, following specific rules.

In [None]:
import numpy as np

## Level 1: Scalar Broadcasting (simplest)

A single number operates with every element.

In [None]:
prices = np.array([100, 200, 300, 400])

# The scalar 1.1 is "broadcast" to [1.1, 1.1, 1.1, 1.1]
taxed = prices * 1.1
print("With 10% tax:", taxed)  # [110. 220. 330. 440.]

# Convert TDS from ppm to mg/L (multiply by 0.64)
tds_ppm = np.array([150, 300, 450, 600])
tds_mgl = tds_ppm * 0.64
print("TDS in mg/L:", tds_mgl)

## Level 2: Array + Row/Column Broadcasting

A 1D array can broadcast across a 2D array's rows or columns.

In [None]:
# Imagine: 3 water filters, measured on 4 different days
# Rows = filters, Columns = days
tds_readings = np.array([
    [40, 42, 45, 48],   # Filter A
    [60, 65, 68, 70],   # Filter B
    [90, 95, 100, 105],  # Filter C
])
print("TDS Readings (3 filters x 4 days):")
print(tds_readings)
print("Shape:", tds_readings.shape)  # (3, 4)

In [None]:
# Subtract Day 1 reading from ALL days (to see change from baseline)
# baseline is the first column: shape (3,)
baseline = tds_readings[:, 0]  # [40, 60, 90]
print("Baseline (Day 1):", baseline)

# We need to reshape baseline to (3, 1) so it broadcasts across columns
baseline_col = baseline.reshape(-1, 1)  # Shape: (3, 1)
print("Reshaped baseline:", baseline_col.shape)

change_from_baseline = tds_readings - baseline_col
print("\nChange from Day 1:")
print(change_from_baseline)

## Practical Example: Temperature Conversion

Convert an entire dataset of temperatures from Celsius to Fahrenheit.

In [None]:
# Weekly temperatures in Celsius for 3 cities
celsius = np.array([
    [22, 24, 26, 23, 21, 25, 27],  # City A
    [30, 32, 35, 33, 31, 34, 36],  # City B
    [15, 17, 14, 16, 13, 18, 20],  # City C
])

# F = C * 9/5 + 32 → broadcasting handles this beautifully
fahrenheit = celsius * 9/5 + 32

print("Celsius:")
print(celsius)
print("\nFahrenheit:")
print(fahrenheit)

## Practical Example: Normalizing Data (VERY common in ML!)

**Normalization** scales all features to the same range (usually 0 to 1).

This is crucial because ML algorithms work better when features are on similar scales.

Formula: `normalized = (value - min) / (max - min)`

In [None]:
# Water filter sensor data
# Columns: [TDS (35-200), Flow_rate (0.5-2.5), Pressure (30-70)]
sensor_data = np.array([
    [35,  2.5, 60],
    [80,  1.5, 50],
    [120, 1.0, 42],
    [200, 0.5, 30],
])
print("Raw data:")
print(sensor_data)

# Normalize each column to 0-1 range
col_min = sensor_data.min(axis=0)  # Min of each column
col_max = sensor_data.max(axis=0)  # Max of each column
print(f"\nColumn mins: {col_min}")
print(f"Column maxs: {col_max}")

# Broadcasting magic! col_min and col_max are (3,), data is (4, 3)
# NumPy broadcasts (3,) across all 4 rows automatically
normalized = (sensor_data - col_min) / (col_max - col_min)
print("\nNormalized (0 to 1):")
print(np.round(normalized, 2))

## Broadcasting Rules (When Does It Work?)

NumPy compares shapes **from right to left**. Two dimensions are compatible when:
1. They are equal, OR
2. One of them is 1

```
(4, 3) + (3,)   → Works! (3 matches 3)
(4, 3) + (4, 1) → Works! (1 broadcasts to 3)
(4, 3) + (4,)   → ERROR! (3 != 4)
```

In [None]:
a = np.ones((4, 3))  # 4 rows, 3 cols

# This works: (4, 3) + (3,) → broadcast row across all rows
b = np.array([10, 20, 30])
print("(4,3) + (3,):")
print(a + b)

# This works: (4, 3) + (4, 1) → broadcast column across all cols
c = np.array([[1], [2], [3], [4]])  # Shape: (4, 1)
print("\n(4,3) + (4,1):")
print(a + c)

In [None]:
# This FAILS: (4, 3) + (4,) → shapes don't align from the right
d = np.array([1, 2, 3, 4])  # Shape: (4,)
try:
    result = a + d
except ValueError as e:
    print(f"Error: {e}")
    print("Shapes (4,3) and (4,) don't broadcast - 3 != 4 from the right")

## Exercise: Water Filter Data Normalization

1. Create a dataset of 5 water filters with columns: [TDS, Flow_rate, Temperature]
2. Normalize each column to 0-1 range using broadcasting
3. Also try **standardization**: `(value - mean) / std` for each column
4. Convert all temperature readings from Celsius to Fahrenheit

In [None]:
# YOUR CODE HERE

# 1. Create dataset
# filters = np.array([
#     [45, 2.0, 25],
#     [...],
# ])

# 2. Normalize (0-1)
# mins = filters.min(axis=0)
# maxs = filters.max(axis=0)
# normalized = ???

# 3. Standardize (mean=0, std=1)
# means = filters.mean(axis=0)
# stds = filters.std(axis=0)
# standardized = ???

# 4. Convert temperature column to Fahrenheit
# temps_f = filters[:, 2] * ??? + ???