# ‚ö° Section 3: Vectorization and Performance Optimization in NumPy

In this section, we‚Äôll explore how NumPy achieves its **speed advantage** through *vectorization* and learn how to write code that fully utilizes it.

Understanding vectorization helps you write efficient numerical code without explicit Python loops ‚Äî bringing your operations closer to C-level performance.

---
## üéØ Learning Objectives

By the end of this section, you will:
- Understand what **vectorization** means in NumPy.
- Learn the difference between **Python loops** and **vectorized array operations**.
- Use **universal functions (ufuncs)** for efficient elementwise computation.
- Learn about **performance measurement** with `%timeit`.
- Apply optimization techniques like **in-place operations** and **avoiding copies**.

---
## üß† 1. What is Vectorization?

Vectorization means expressing computations as **array operations**, letting NumPy perform them internally in **compiled C code** instead of interpreted Python loops.

This makes NumPy both **faster** and **more concise**.

Let‚Äôs compare an operation implemented with a Python loop vs NumPy‚Äôs vectorized approach.

In [ ]:
import numpy as np
import time

# Create large arrays
size = 1_000_000
a = np.arange(size)
b = np.arange(size)

# Python loop version
start = time.time()
result_loop = np.zeros(size)
for i in range(size):
    result_loop[i] = a[i] + b[i]
end = time.time()
print(f"Loop version time: {end - start:.4f} seconds")

# Vectorized version
start = time.time()
result_vec = a + b
end = time.time()
print(f"Vectorized version time: {end - start:.4f} seconds")

You‚Äôll typically see **100x or more speedup** with vectorized operations.

---
## ‚öôÔ∏è 2. Universal Functions (ufuncs)

NumPy provides **universal functions** ‚Äî optimized vectorized wrappers for elementwise operations (e.g., addition, exponentiation, trigonometric functions).

They automatically broadcast and handle elementwise computation efficiently.

In [ ]:
x = np.linspace(0, np.pi, 5)
print("x:", x)

# Using ufuncs
sin_x = np.sin(x)
cos_x = np.cos(x)
sum_trig = np.add(sin_x, cos_x)

print("\nSin(x):", sin_x)
print("Cos(x):", cos_x)
print("Sin(x) + Cos(x):", sum_trig)

Notice how each of these operates **elementwise** without explicit loops. This is the essence of vectorization.

---
## üß© 3. Performance Benchmarking with `%timeit`

The Jupyter `%timeit` magic command measures execution speed precisely.

Let's compare a Python loop, list comprehension, and NumPy vectorization side by side.

In [ ]:
# Use smaller arrays to make the test quick
size = 100_000
a = np.arange(size)
b = np.arange(size)

# Compare execution times
%timeit [a[i] + b[i] for i in range(size)]  # Python list comprehension
%timeit a + b                               # NumPy vectorized addition

You'll see that the **NumPy version** is dramatically faster and more memory-efficient.

---
## üßÆ 4. In-place Operations

NumPy allows **in-place updates**, which reuse memory instead of creating new arrays. This saves both time and memory.

Use the `out` parameter in ufuncs or operators like `+=`, `*=`, etc.

In [ ]:
arr = np.arange(5, dtype=float)
print("Original:", arr)

# In-place addition
np.add(arr, 10, out=arr)
print("After in-place add:", arr)

# Another in-place example
arr *= 2
print("After in-place multiply:", arr)

Using in-place operations helps avoid unnecessary memory allocations ‚Äî a critical optimization for large datasets.

---
## üîç 5. Avoiding Unnecessary Copies

NumPy tries to **avoid copying data**, but some operations (like reshaping or slicing with non-contiguous memory) can silently create copies.

You can check whether two arrays share the same data buffer using `np.shares_memory()`.

In [ ]:
a = np.arange(10)
b = a[::2]   # Slicing every second element
c = a.copy() # Explicit copy

print("a:", a)
print("b (slice):", b)
print("c (copy):", c)

print("b shares memory with a:", np.shares_memory(a, b))
print("c shares memory with a:", np.shares_memory(a, c))

‚úÖ **Tip:** Prefer slicing over copying when possible to save memory ‚Äî but make copies explicitly if you need to protect the original data.

---
## ‚ö° 6. Vectorizing Custom Functions

You can use `np.vectorize()` to make custom Python functions behave like ufuncs.

It‚Äôs convenient for readability, though not always faster ‚Äî because it still runs Python code under the hood.

In [ ]:
# Define a simple Python function
def categorize(x):
    if x < 0:
        return 'negative'
    elif x == 0:
        return 'zero'
    else:
        return 'positive'

# Vectorize it
vec_categorize = np.vectorize(categorize)

arr = np.array([-3, -1, 0, 2, 5])
print(vec_categorize(arr))

Even though this allows array-like syntax, performance gains are minimal. For heavy computations, try to **rewrite logic in NumPy terms** instead of using `np.vectorize()`.

---
## üß≠ 7. Best Practices for Vectorization

‚úÖ **Think in arrays, not elements** ‚Äî avoid Python `for` loops.

‚úÖ **Use ufuncs** (`np.add`, `np.sin`, `np.exp`, etc.) instead of writing loops manually.

‚úÖ **Profile code** with `%timeit` or `time` to identify bottlenecks.

‚úÖ **Use in-place operations** to save memory.

‚úÖ **Avoid unnecessary copies**, and check with `np.shares_memory()`.

---
## üí° Challenge Exercises

1. Use vectorization to compute the **Euclidean distance** between each pair of points in two 2D arrays without explicit loops.
2. Generate a 1D array of 1 million elements and compare the performance of squaring them using a Python loop vs. `np.square()`.
3. Write a vectorized version of a function that classifies numbers as even or odd.

---
## üèÅ Summary

- **Vectorization** transforms elementwise Python loops into efficient, C-level array operations.
- **Ufuncs** are the backbone of vectorized computations.
- **In-place operations** and **avoiding copies** make code both faster and memory-efficient.
- **Performance profiling** ensures your optimizations are real.

In the next section, we‚Äôll dive into **Aggregation, Reduction, and Statistical Operations** to summarize and analyze numerical data efficiently using NumPy.
