# ‚ö° Section 6 ‚Äî Vectorized Computation & Aggregation in NumPy

In this section, we explore the *heart* of NumPy‚Äôs performance and expressiveness ‚Äî **vectorized operations** and **aggregation**.

### Why It Matters
- Vectorization lets you perform elementwise computations **without explicit loops**.
- Aggregation operations summarize or reduce large datasets efficiently (e.g., `sum`, `mean`, `std`).
- Together, these allow high-speed mathematical and statistical processing that scales to millions of elements.

We'll learn both *how* to use them and *why* they‚Äôre so fast.

In [None]:
import numpy as np

# Sample dataset ‚Äî sales for 4 stores over 7 days
np.random.seed(42)
sales = np.random.randint(80, 200, size=(4, 7))
print("Sales data (rows=stores, cols=days):\n", sales)
print("Shape:", sales.shape)

## üöÄ 1. Vectorized Operations ‚Äî Elementwise Arithmetic

Vectorization means that mathematical operations apply **elementwise** across entire arrays.  
This avoids Python loops and is implemented in **fast C code** under the hood.

Each element is processed simultaneously ‚Äî giving both **clean syntax** and **massive performance boosts**.

In [None]:
# Basic arithmetic
revenue = sales * 10   # Each sale worth $10
costs = sales * 6.5    # Each item costs $6.50 to produce

profit = revenue - costs
print("Profit per cell:\n", profit)

# Elementwise percentage change
markup = (profit / costs) * 100
print("\nMarkup (%):\n", markup.round(2))

### üß† Under the Hood: Why It‚Äôs So Fast

- NumPy arrays are **contiguous memory blocks**, unlike Python lists.
- Operations are implemented in **compiled C** with SIMD (vectorized CPU instructions).
- The loop still exists ‚Äî just **hidden inside** C code that runs at machine speed!

Try timing the difference yourself:

In [None]:
import time

arr = np.random.rand(10_000_000)

# Vectorized
start = time.time()
result_vec = arr * 2
print("Vectorized time:", round(time.time() - start, 4), "s")

# Pure Python loop
arr_list = arr.tolist()
start = time.time()
result_loop = [x * 2 for x in arr_list]
print("Loop time:", round(time.time() - start, 4), "s")

## üßÆ 2. Universal Functions (UFuncs)

NumPy provides a rich collection of **UFuncs** ‚Äî functions that operate elementwise on arrays and support:
- Broadcasting (automatic shape alignment)
- Optional output arrays (`out` parameter)
- Type casting and error handling

Examples include `np.add`, `np.subtract`, `np.sqrt`, `np.exp`, `np.log`, and many trigonometric functions.

In [None]:
x = np.array([1, 4, 9, 16, 25])
print("‚àöx:", np.sqrt(x))
print("log(x):", np.log(x))
print("exp(x):", np.exp(x))
print("sin(x):", np.sin(x))

### üîÑ Binary UFuncs

These take two arrays (or scalars) and perform elementwise operations ‚Äî examples: `add`, `multiply`, `power`.

In [None]:
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])

print("a + b:", np.add(a, b))
print("a * b:", np.multiply(a, b))
print("a ** b:", np.power(a, 2))

## üß© 3. Broadcasting ‚Äî The Secret Weapon

Broadcasting allows operations between arrays of **different shapes**, automatically expanding dimensions where necessary.

Rules:
1. Align shapes from right to left.
2. Dimensions must either match or one must be 1.

It avoids unnecessary data duplication and enables elegant expressions.

In [None]:
# Example: add daily bonus (per day) to all stores
bonus = np.array([5, 10, 15, 20, 25, 30, 35])  # shape (7,)
adjusted = sales + bonus  # (4,7) + (7,)

print("Bonus added:\n", adjusted)
print("Shapes:", sales.shape, bonus.shape, "‚Üí", adjusted.shape)

### üìà 4. Aggregations (Reductions)

Aggregation functions **collapse** an array into a single value or along an axis.

Common examples:
- `sum`, `mean`, `std`, `var`, `min`, `max`
- `np.percentile`, `np.median`, `np.cumsum`, `np.prod`

Axis rules:
- `axis=None` (default): entire array.
- `axis=0`: column-wise (down each column).
- `axis=1`: row-wise (across each row).

In [None]:
# Total sales
print("Total:", np.sum(sales))

# Mean per store (row-wise)
print("Mean per store:", np.mean(sales, axis=1))

# Mean per day (col-wise)
print("Mean per day:", np.mean(sales, axis=0))

# Variance and standard deviation
print("Variance:", np.var(sales))
print("Standard Deviation:", np.std(sales))

## ‚öôÔ∏è 5. Axis Logic ‚Äî A Visual Rule

Think of **axis** as the direction **being collapsed**:

| Array Shape | `axis=0` (‚Üì columns) | `axis=1` (‚Üí rows) |
|--------------|---------------------|--------------------|
| (4, 7) | Collapses rows ‚Üí 7 results | Collapses cols ‚Üí 4 results |

A quick mnemonic:  
‚û°Ô∏è *Axis 0 ‚Üí move down*, *Axis 1 ‚Üí move across*.

In [None]:
print("Sum over axis=0 (columns):", np.sum(sales, axis=0))
print("Sum over axis=1 (rows):", np.sum(sales, axis=1))

## üîç 6. Cumulative and Logical Aggregations

You can also accumulate or combine conditions across elements.

- `np.cumsum`, `np.cumprod` for cumulative totals
- `np.all`, `np.any` for logical checks

In [None]:
print("Cumulative sum per row:\n", np.cumsum(sales, axis=1))
print("Any day > 180 per store:", np.any(sales > 180, axis=1))
print("All days > 100 per store:", np.all(sales > 100, axis=1))

## üß† 7. Chaining Operations Efficiently

NumPy lets you combine multiple operations *without leaving C speed*.
This technique is called **vectorized chaining** ‚Äî it minimizes intermediate allocations and keeps things fast.

In [None]:
# Compute average profit margin for each store
price = 10
cost = 6.5

margins = ((sales * price - sales * cost) / (sales * cost)).mean(axis=1)
print("Avg profit margin per store (%):", (margins * 100).round(2))

## ‚ö†Ô∏è 8. Best Practices & Common Pitfalls

‚úÖ **Best Practices:**
- Always prefer **vectorized** over loop-based code.
- Use `axis` explicitly ‚Äî makes your intent clear.
- Use `.sum()` and `.mean()` methods directly on arrays (they‚Äôre faster than Python equivalents).
- Chain operations to keep computations on the C side.

üö´ **Pitfalls:**
- Mixing Python scalars and arrays in large loops ‚Äî slow!
- Forgetting parentheses in chained math, causing shape mismatch.
- Using `np.append` repeatedly ‚Äî it creates new copies each time. Prefer preallocation or `np.concatenate` once.

## üß© Challenge Exercise ‚Äî ‚ÄúSales Insights Dashboard‚Äù

**Dataset:**  
Use the `sales` array (4 stores √ó 7 days).

**Tasks:**
1. Compute each store‚Äôs **average, max, and min** daily sales.
2. Compute **total weekly revenue** if each sale = $12.
3. Find the **store with the highest mean sale** using vectorized logic (no loops).
4. Calculate **percent deviation from mean** for each entry.
5. Use broadcasting to apply a **daily growth factor**: `[1.02, 0.98, 1.05, 1.00, 1.03, 0.97, 1.01]`.

üí° *Bonus:* Compute the correlation (`np.corrcoef`) between stores‚Äô sales patterns.

Try to solve each part **without writing loops** ‚Äî use vectorization and aggregation instead!

‚úÖ **Next Up:**  
In **Section 7**, we‚Äôll tackle **broadcasting, reshaping, and combining arrays** ‚Äî learning how to manipulate structure and dimensions to prepare data for analysis and machine learning.

# --- End of Section 6 ‚Äî Continue to Section 7 ---