# Lesson 2.2: Array Operations (Vectorization)

## The Big Idea: No More Loops!

In PHP, if you want to double every price:
```php
$doubled = array_map(fn($p) => $p * 2, $prices);
```

In NumPy, you just do:
```python
doubled = prices * 2
```

This is called **vectorization** - operations apply to ALL elements at once. No loops, no `array_map`. This is faster AND cleaner.

In [None]:
import numpy as np

prices = np.array([10.0, 25.5, 8.0, 42.0, 15.5])
print("Original prices:", prices)

## Element-wise Arithmetic

Every math operation works on EVERY element automatically.

In [None]:
# Add 10% tax to every price
# PHP: array_map(fn($p) => $p * 1.1, $prices)
with_tax = prices * 1.1
print("With 10% tax:", with_tax)

# Apply a flat $5 discount
discounted = prices - 5
print("After $5 off:", discounted)

# Square each price (why not!)
squared = prices ** 2
print("Squared:", squared)

In [None]:
# Operations between TWO arrays (element by element)
quantities = np.array([2, 1, 5, 1, 3])

# Total cost per item = price * quantity
totals = prices * quantities
print("Prices:    ", prices)
print("Quantities:", quantities)
print("Totals:    ", totals)

## Comparison Operations

Compare every element at once - returns an array of True/False.

In [None]:
tds_readings = np.array([45, 120, 38, 200, 65, 95, 310, 50])

# Which readings exceed safe limit (100 ppm)?
# PHP: array_filter($readings, fn($r) => $r > 100)
above_limit = tds_readings > 100
print("Above 100 ppm:", above_limit)  # [False, True, False, True, ...]

# Use this boolean array to FILTER (very powerful!)
dangerous = tds_readings[above_limit]
print("Dangerous readings:", dangerous)

# Count how many
print(f"{above_limit.sum()} out of {len(tds_readings)} readings are above limit")

## Aggregation Functions

Summarize your data in one call. Like Laravel's `->sum()`, `->avg()`, etc.

In [None]:
water_usage = np.array([12.5, 15.0, 8.3, 20.1, 14.7, 11.2, 16.8])
print("Daily water usage (liters):", water_usage)
print()

# PHP: array_sum($usage)  →  Python:
print(f"Total:   {water_usage.sum():.1f} liters")

# PHP: array_sum($usage) / count($usage)  →  Python:
print(f"Average: {water_usage.mean():.1f} liters")

print(f"Min:     {water_usage.min():.1f} liters")
print(f"Max:     {water_usage.max():.1f} liters")

# Standard deviation - measures how spread out values are
# (no PHP equivalent - very useful in ML!)
print(f"Std Dev: {water_usage.std():.1f} liters")

# Which DAY had the max/min?
print(f"Highest usage on day: {water_usage.argmax()}")
print(f"Lowest usage on day:  {water_usage.argmin()}")

## Axis Parameter (for 2D arrays)

When you have a table (2D array), you can aggregate along rows OR columns.

- `axis=0` → operate DOWN each column (collapse rows)
- `axis=1` → operate ACROSS each row (collapse columns)

In [None]:
# 3 filters, 4 daily TDS readings each
tds_data = np.array([
    [42, 45, 48, 50],   # Filter A
    [65, 70, 68, 72],   # Filter B  
    [120, 125, 130, 128] # Filter C (needs maintenance!)
])
print("TDS Data (3 filters x 4 days):")
print(tds_data)
print()

# Average TDS per FILTER (across columns → axis=1)
print("Avg per filter:", tds_data.mean(axis=1))  # [46.25, 68.75, 125.75]

# Average TDS per DAY (down rows → axis=0)
print("Avg per day:   ", tds_data.mean(axis=0))

# Overall average
print("Overall avg:   ", tds_data.mean())

## Universal Functions (ufuncs)

NumPy provides fast math functions that work on whole arrays.

In [None]:
values = np.array([1, 4, 9, 16, 25])

print("Square root:", np.sqrt(values))   # [1. 2. 3. 4. 5.]
print("Absolute:",    np.abs(np.array([-3, -1, 0, 2, 5])))
print("Round:",       np.round(np.array([1.2, 2.7, 3.5]), 0))

# These are used constantly in ML for things like:
# - np.exp() for exponential (used in sigmoid function)
# - np.log() for logarithm (used in loss functions)
# - np.sqrt() for standard deviation calculations

## Exercise: Water Filter Analysis

You have TDS readings from a water filter over 10 days. Analyze the data using NumPy operations.

1. Calculate the average, min, max TDS
2. Find how many days TDS was above 80 ppm
3. Calculate the daily change in TDS (hint: use `np.diff()`)
4. Apply a 10% measurement error correction (multiply by 0.9)

In [None]:
# YOUR CODE HERE
tds_10days = np.array([35, 42, 55, 60, 78, 85, 92, 88, 95, 110])

# 1. Basic stats
# print(f"Average TDS: {tds_10days.???()}")
# print(f"Min TDS: {tds_10days.???()}")
# print(f"Max TDS: {tds_10days.???()}")

# 2. Days above 80 ppm
# above_80 = tds_10days > ???
# print(f"Days above 80: {above_80.???()}")

# 3. Daily change (hint: np.diff gives difference between consecutive elements)
# daily_change = np.diff(???)
# print(f"Daily changes: {daily_change}")

# 4. Corrected readings
# corrected = tds_10days * ???
# print(f"Corrected: {corrected}")