# Module 01: NumPy Fundamentals

NumPy is the foundation of numerical computing in Python. Nearly every data science library builds on NumPy arrays.

## Learning Objectives

1. Create and manipulate NumPy arrays
2. Use vectorized operations (no loops!)
3. Apply broadcasting for efficient computation
4. Perform basic linear algebra
5. Generate random numbers for simulations

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Why NumPy?

Python lists are flexible but slow for numerical work. NumPy arrays are:

- **Fast**: Written in C, optimized for numerical operations
- **Memory efficient**: Contiguous memory storage
- **Convenient**: Rich set of mathematical functions
- **Foundation**: Used by Pandas, scikit-learn, and virtually all scientific libraries

In [None]:
# Speed comparison
python_list = list(range(1000000))
numpy_array = np.arange(1000000)

# Python list
%timeit [x**2 for x in python_list]

# NumPy array
%timeit numpy_array**2

## Creating Arrays

In [None]:
# From Python list
concentrations = np.array([0.1, 0.2, 0.5, 1.0, 2.0])  # mol/L
print("Concentrations:", concentrations)
print("Type:", type(concentrations))
print("Shape:", concentrations.shape)
print("Dtype:", concentrations.dtype)

In [None]:
# Common array creation functions
zeros = np.zeros(5)
ones = np.ones(5)
temps = np.linspace(300, 500, 5)  # 5 evenly spaced points from 300 to 500
pressures = np.arange(1, 11, 2)   # From 1 to 11, step 2

print("Zeros:", zeros)
print("Ones:", ones)
print("Temperatures:", temps)
print("Pressures:", pressures)

In [None]:
# 2D arrays (matrices)
# Experimental data: rows = experiments, columns = [T, P, yield]
experiments = np.array([
    [300, 1.0, 45.2],
    [350, 1.5, 52.8],
    [400, 2.0, 68.1],
    [450, 2.5, 75.4],
    [500, 3.0, 82.0]
])

print("Shape:", experiments.shape)  # (5 rows, 3 columns)
print("\nExperiment data:")
print(experiments)

## Indexing and Slicing

In [None]:
# 1D indexing
temps = np.array([300, 350, 400, 450, 500])
print("First element:", temps[0])
print("Last element:", temps[-1])
print("First three:", temps[:3])
print("Every other:", temps[::2])

In [None]:
# 2D indexing
print("First row (experiment 1):", experiments[0])
print("First column (all temperatures):", experiments[:, 0])
print("Yields (third column):", experiments[:, 2])
print("Single element [2,1]:", experiments[2, 1])

In [None]:
# Boolean indexing - very powerful!
yields = experiments[:, 2]
temps = experiments[:, 0]

# Find experiments with yield > 60%
high_yield = yields > 60
print("High yield mask:", high_yield)
print("High yield values:", yields[high_yield])
print("Temps for high yield:", temps[high_yield])

## Vectorized Operations

NumPy operations apply element-wise automatically—no loops needed!

In [None]:
# Temperature conversion: K to °C
temps_K = np.array([300, 350, 400, 450, 500])
temps_C = temps_K - 273.15

print("Kelvin:", temps_K)
print("Celsius:", temps_C)

In [None]:
# Ideal gas law: PV = nRT
# Calculate molar volume V/n = RT/P

R = 8.314  # J/(mol·K)
T = np.linspace(300, 500, 5)  # K
P = 101325  # Pa (1 atm)

V_molar = R * T / P  # m³/mol
V_molar_L = V_molar * 1000  # L/mol

print("Temperature (K):", T)
print("Molar volume (L/mol):", V_molar_L)

In [None]:
# Arrhenius equation: k = A * exp(-Ea/RT)
A = 1e13  # 1/s
Ea = 80000  # J/mol
R = 8.314  # J/(mol·K)
T = np.linspace(300, 600, 100)

k = A * np.exp(-Ea / (R * T))

plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.plot(T, k)
plt.xlabel('Temperature (K)')
plt.ylabel('Rate constant k (1/s)')
plt.title('Arrhenius Plot (linear)')

plt.subplot(1, 2, 2)
plt.semilogy(1000/T, k)
plt.xlabel('1000/T (1/K)')
plt.ylabel('Rate constant k (1/s)')
plt.title('Arrhenius Plot (log scale)')

plt.tight_layout()
plt.show()

## Broadcasting

Broadcasting allows operations between arrays of different shapes.

In [None]:
# Calculate reaction rates at multiple T and C combinations
T = np.array([300, 350, 400, 450, 500])  # 5 temperatures
C = np.array([0.1, 0.5, 1.0, 2.0])  # 4 concentrations

# Rate = k * C, where k depends on T
k = A * np.exp(-Ea / (R * T))  # Shape: (5,)

# We want a 5x4 array of rates
# Reshape k to (5, 1) and C stays (4,) → broadcasts to (5, 4)
rates = k.reshape(-1, 1) * C

print("k shape:", k.shape)
print("C shape:", C.shape)
print("rates shape:", rates.shape)
print("\nRates (rows=T, cols=C):")
print(rates)

## Aggregation Functions

In [None]:
# Simulated experimental yields
yields = np.array([78.2, 82.1, 79.5, 81.3, 80.0, 79.8, 83.2, 77.9, 80.5, 81.1])

print(f"Mean: {np.mean(yields):.2f}%")
print(f"Std: {np.std(yields):.2f}%")
print(f"Min: {np.min(yields):.2f}%")
print(f"Max: {np.max(yields):.2f}%")
print(f"Median: {np.median(yields):.2f}%")

In [None]:
# Aggregation along axes for 2D arrays
# Rows = different catalysts, Cols = replicate experiments
catalyst_yields = np.array([
    [78, 82, 79, 81, 80],  # Catalyst A
    [65, 68, 66, 64, 67],  # Catalyst B
    [88, 91, 89, 87, 90],  # Catalyst C
])

print("Mean per catalyst (across replicates):")
print(np.mean(catalyst_yields, axis=1))  # axis=1 means across columns

print("\nMean per replicate (across catalysts):")
print(np.mean(catalyst_yields, axis=0))  # axis=0 means across rows

## Linear Algebra

In [None]:
# Solving linear systems: Ax = b
# Material balance: 3 reactions, 3 unknowns

# Stoichiometric matrix
A = np.array([
    [1, -1, 0],
    [0, 1, -1],
    [1, 0, 1]
])

# Right-hand side (inlet flows)
b = np.array([10, 5, 20])

# Solve
x = np.linalg.solve(A, b)
print("Solution x:", x)

# Verify: Ax should equal b
print("Verification A @ x:", A @ x)

In [None]:
# Matrix operations
A = np.array([[1, 2], [3, 4]])

print("Determinant:", np.linalg.det(A))
print("\nInverse:")
print(np.linalg.inv(A))

eigenvalues, eigenvectors = np.linalg.eig(A)
print("\nEigenvalues:", eigenvalues)
print("\nEigenvectors:")
print(eigenvectors)

## Random Numbers

Essential for simulations, sampling, and machine learning.

In [None]:
# Set seed for reproducibility
rng = np.random.default_rng(42)

# Uniform random numbers
uniform = rng.uniform(0, 1, 5)
print("Uniform [0,1):", uniform)

# Normal distribution (Gaussian)
normal = rng.normal(loc=50, scale=5, size=5)  # mean=50, std=5
print("Normal (μ=50, σ=5):", normal)

# Random integers
integers = rng.integers(1, 100, 5)
print("Random integers [1, 100):", integers)

In [None]:
# Monte Carlo simulation: Propagating measurement uncertainty
# Measure temperature: T = 400 ± 5 K
# What's the uncertainty in rate constant k?

rng = np.random.default_rng(42)
n_samples = 10000

# Sample temperatures from normal distribution
T_samples = rng.normal(400, 5, n_samples)

# Calculate k for each sample
k_samples = A * np.exp(-Ea / (R * T_samples))

print(f"k = {np.mean(k_samples):.4e} ± {np.std(k_samples):.4e}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(10, 4))

axes[0].hist(T_samples, bins=50, edgecolor='black')
axes[0].set_xlabel('Temperature (K)')
axes[0].set_ylabel('Count')
axes[0].set_title('Input: Temperature Distribution')

axes[1].hist(k_samples, bins=50, edgecolor='black')
axes[1].set_xlabel('Rate constant k (1/s)')
axes[1].set_ylabel('Count')
axes[1].set_title('Output: Rate Constant Distribution')

plt.tight_layout()
plt.show()

## Common Pitfalls

In [None]:
# Pitfall 1: Views vs Copies
a = np.array([1, 2, 3, 4, 5])
b = a[1:4]  # This is a VIEW, not a copy!

b[0] = 99  # This modifies 'a' too!
print("a:", a)  # [1, 99, 3, 4, 5]

# Use .copy() if you need an independent copy
a = np.array([1, 2, 3, 4, 5])
b = a[1:4].copy()
b[0] = 99
print("a (with copy):", a)  # [1, 2, 3, 4, 5]

In [None]:
# Pitfall 2: Integer division
a = np.array([1, 2, 3])  # Integer array
print("Integer array / 2:", a / 2)  # Fine, returns floats

# But be careful with floor division
print("Integer array // 2:", a // 2)  # Integer division

In [None]:
# Pitfall 3: Shape mismatches
a = np.array([1, 2, 3])
b = np.array([1, 2])  # Different length!

try:
    c = a + b
except ValueError as e:
    print("Error:", e)

## Summary

Key NumPy concepts:

| Concept | Description |
|---------|-------------|
| Arrays | Homogeneous, n-dimensional containers |
| Vectorization | Apply operations to entire arrays, not elements |
| Broadcasting | Automatic expansion of shapes for operations |
| Indexing | Powerful selection with slices and boolean masks |
| Linear algebra | `np.linalg` for matrices and equations |
| Random | `np.random` for simulations and sampling |

## Next Steps

In the next module, we'll learn Pandas, which builds on NumPy to provide labeled data structures that are easier to work with for tabular data.