# NumPy Essentials for Machine Learning

**Learning Objectives:**
- Master NumPy arrays, operations, and broadcasting for ML workflows
- Understand how NumPy serves as the foundation for both PyTorch and TensorFlow
- Learn essential linear algebra operations used in deep learning
- Practice data manipulation patterns common in ML preprocessing

**Prerequisites:** Basic Python knowledge

**Estimated Time:** 30 minutes

---

NumPy is the foundational library that both PyTorch and TensorFlow build upon. Understanding NumPy is crucial because:
- Both frameworks can seamlessly convert to/from NumPy arrays
- Many preprocessing operations are done in NumPy
- The concepts translate directly to tensor operations
- Debugging often involves converting tensors to NumPy for inspection

In [None]:
import os
import sys

import numpy as np

# Add src to path for our utilities
sys.path.append(os.path.join('..', '..', 'src'))

# Set random seed for reproducibility
np.random.seed(42)

print(f"NumPy version: {np.__version__}")
print(f"Python version: {sys.version}")

## 1. Array Creation and Basic Properties

Understanding how to create and inspect arrays is fundamental to ML workflows.

In [None]:
# Different ways to create arrays (common in ML)

# From lists (loading data)
data_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr_from_list = np.array(data_list)
print("From list:")
print(arr_from_list)
print(f"Shape: {arr_from_list.shape}, Dtype: {arr_from_list.dtype}\n")

# Zeros (weight initialization)
weights = np.zeros((3, 4))
print("Zeros (weight initialization):")
print(weights)
print(f"Shape: {weights.shape}\n")

# Random arrays (data generation, weight initialization)
random_data = np.random.randn(100, 5)  # 100 samples, 5 features
print("Random data (first 5 rows):")
print(random_data[:5])
print(f"Shape: {random_data.shape}\n")

# Identity matrix (useful for regularization)
identity = np.eye(3)
print("Identity matrix:")
print(identity)

In [None]:
# Array properties essential for ML
sample_array = np.random.randn(32, 10, 8)  # Batch size 32, sequence length 10, features 8

print("Array Properties (typical ML batch):")
print(f"Shape: {sample_array.shape} (batch_size, seq_len, features)")
print(f"Number of dimensions: {sample_array.ndim}")
print(f"Total elements: {sample_array.size}")
print(f"Data type: {sample_array.dtype}")
print(f"Memory usage: {sample_array.nbytes} bytes")
print(f"Memory usage: {sample_array.nbytes / 1024:.2f} KB")

## 2. Array Indexing and Slicing

Critical for data manipulation, batch processing, and feature selection.

In [None]:
# Create sample data representing a batch of images
# Shape: (batch_size, height, width, channels)
batch_images = np.random.randint(0, 256, size=(8, 28, 28, 3))

print("Batch of images shape:", batch_images.shape)
print("\nIndexing and Slicing Examples:")

# Get first image
first_image = batch_images[0]
print(f"First image shape: {first_image.shape}")

# Get first 4 images (mini-batch)
mini_batch = batch_images[:4]
print(f"Mini-batch shape: {mini_batch.shape}")

# Get red channel from all images
red_channel = batch_images[:, :, :, 0]
print(f"Red channel shape: {red_channel.shape}")

# Get center crop (common preprocessing)
center_crop = batch_images[:, 7:21, 7:21, :]
print(f"Center crop shape: {center_crop.shape}")

In [None]:
# Boolean indexing (filtering data)
scores = np.array([85, 92, 78, 96, 88, 73, 91, 82])
names = np.array(['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry'])

print("Original scores:", scores)
print("Names:", names)

# Filter high performers (score > 85)
high_performers = scores > 85
print(f"\nHigh performers mask: {high_performers}")
print(f"High performer scores: {scores[high_performers]}")
print(f"High performer names: {names[high_performers]}")

# Multiple conditions
good_range = (scores >= 80) & (scores <= 90)
print(f"\nScores in 80-90 range: {scores[good_range]}")

## 3. Array Operations and Broadcasting

Broadcasting is crucial for efficient ML computations and is used extensively in both PyTorch and TensorFlow.

In [None]:
# Element-wise operations (fundamental to neural networks)
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[2, 2, 2], [3, 3, 3]])

print("Array a:")
print(a)
print("\nArray b:")
print(b)

print("\nElement-wise operations:")
print("Addition (a + b):")
print(a + b)

print("\nMultiplication (a * b):")
print(a * b)

print("\nSquare (a**2):")
print(a**2)

# Activation functions
print("\nCommon activation functions:")
x = np.array([-2, -1, 0, 1, 2])
print(f"Input: {x}")
print(f"ReLU (max(0, x)): {np.maximum(0, x)}")
print(f"Sigmoid: {1 / (1 + np.exp(-x))}")
print(f"Tanh: {np.tanh(x)}")

In [None]:
# Broadcasting examples (very important for ML)
print("Broadcasting Examples:")

# Example 1: Adding bias to all samples
features = np.random.randn(100, 5)  # 100 samples, 5 features
bias = np.array([0.1, -0.2, 0.3, -0.1, 0.2])  # bias for each feature

print(f"Features shape: {features.shape}")
print(f"Bias shape: {bias.shape}")

# Broadcasting adds bias to each sample
features_with_bias = features + bias
print(f"Result shape: {features_with_bias.shape}")
print(f"First sample before: {features[0]}")
print(f"First sample after: {features_with_bias[0]}")

print("\n" + "="*50)

# Example 2: Normalizing features (mean centering)
data = np.random.randn(1000, 3) * 10 + 5  # Add some offset and scale
print(f"\nOriginal data shape: {data.shape}")
print(f"Original means: {np.mean(data, axis=0)}")
print(f"Original stds: {np.std(data, axis=0)}")

# Normalize (broadcasting)
mean = np.mean(data, axis=0)  # Shape: (3,)
std = np.std(data, axis=0)    # Shape: (3,)
normalized_data = (data - mean) / std  # Broadcasting!

print(f"\nNormalized means: {np.mean(normalized_data, axis=0)}")
print(f"Normalized stds: {np.std(normalized_data, axis=0)}")

In [None]:
# Broadcasting rules visualization
print("Broadcasting Rules Examples:")

# Rule: Arrays are aligned from the rightmost dimension
examples = [
    ((3, 4), (4,)),      # (3,4) + (4,) -> (3,4)
    ((2, 3, 4), (4,)),   # (2,3,4) + (4,) -> (2,3,4)
    ((2, 3, 4), (3, 4)), # (2,3,4) + (3,4) -> (2,3,4)
    ((2, 1, 4), (3, 4)), # (2,1,4) + (3,4) -> (2,3,4)
]

for shape1, shape2 in examples:
    a = np.ones(shape1)
    b = np.ones(shape2)
    result = a + b
    print(f"\n{shape1} + {shape2} -> {result.shape}")
    print("\nArray A:")
    print(a)
    print("\nArray B:")
    print(b)
    print("\nResult:")
    print(result)
    print("-"*50)

## 4. Linear Algebra Operations

Essential for understanding neural network computations, matrix multiplications, and transformations.

In [None]:
# Matrix multiplication (core of neural networks)
print("Matrix Multiplication Examples:")

# Simulate a simple neural network layer
# Input: batch_size=32, input_features=10
# Layer: input_features=10, output_features=5
batch_size, input_features, output_features = 32, 10, 5

X = np.random.randn(batch_size, input_features)  # Input data
W = np.random.randn(input_features, output_features)  # Weights
b = np.random.randn(output_features)  # Bias

print(f"Input X shape: {X.shape}")
print(f"Weights W shape: {W.shape}")
print(f"Bias b shape: {b.shape}")

# Forward pass: Y = XW + b
Y = np.dot(X, W) + b  # or X @ W + b
print(f"Output Y shape: {Y.shape}")

print(f"\nFirst sample input: {X[0][:5]}...")  # Show first 5 features
print(f"First sample output: {Y[0]}")

In [None]:
# Different matrix operations
A = np.random.randn(3, 4)
B = np.random.randn(4, 2)

print("Matrix Operations:")
print(f"A shape: {A.shape}")
print(f"B shape: {B.shape}")

# Matrix multiplication
C = A @ B  # Same as np.dot(A, B)
print(f"A @ B shape: {C.shape}")

# Transpose (very common in ML)
A_T = A.T
print(f"A transpose shape: {A_T.shape}")

# Element-wise vs matrix multiplication
square_matrix = np.random.randn(3, 3)
print(f"\nSquare matrix shape: {square_matrix.shape}")
print(f"Element-wise square: {(square_matrix * square_matrix).shape}")
print(f"Matrix multiplication: {(square_matrix @ square_matrix).shape}")

In [None]:
# Advanced linear algebra (useful for understanding ML algorithms)
print("Advanced Linear Algebra:")

# Create a symmetric matrix (common in optimization)
A = np.random.randn(4, 4)
symmetric_A = A + A.T

print(f"Matrix A shape: {symmetric_A.shape}")

# Eigenvalues and eigenvectors (PCA, optimization)
eigenvalues, eigenvectors = np.linalg.eig(symmetric_A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors shape: {eigenvectors.shape}")

# Matrix norms (regularization)
print("\nMatrix norms:")
print(f"Frobenius norm: {np.linalg.norm(A, 'fro'):.4f}")
print(f"L2 norm: {np.linalg.norm(A, 2):.4f}")

# Determinant and inverse
det_A = np.linalg.det(symmetric_A)
print(f"\nDeterminant: {det_A:.4f}")

if abs(det_A) > 1e-10:  # Check if invertible
    inv_A = np.linalg.inv(symmetric_A)
    print(f"Inverse exists, shape: {inv_A.shape}")
    # Verify: A @ A^(-1) should be identity
    identity_check = symmetric_A @ inv_A
    print(f"A @ A^(-1) close to identity: {np.allclose(identity_check, np.eye(4))}")
else:
    print("Matrix is singular (not invertible)")

## 5. Statistical Operations and Aggregations

Critical for data analysis, loss computation, and model evaluation.

In [None]:
# Statistical operations along different axes
# Simulate prediction scores for classification
# Shape: (batch_size, num_classes)
predictions = np.random.randn(100, 5)  # 100 samples, 5 classes

print(f"Predictions shape: {predictions.shape}")
print("\nStatistical Operations:")

# Overall statistics
print(f"Overall mean: {np.mean(predictions):.4f}")
print(f"Overall std: {np.std(predictions):.4f}")
print(f"Min value: {np.min(predictions):.4f}")
print(f"Max value: {np.max(predictions):.4f}")

# Statistics along axes
print(f"\nMean per class (axis=0): {np.mean(predictions, axis=0)}")
print(f"Mean per sample (axis=1) shape: {np.mean(predictions, axis=1).shape}")

# Useful for softmax and classification
max_per_sample = np.max(predictions, axis=1, keepdims=True)
print(f"\nMax per sample shape (keepdims=True): {max_per_sample.shape}")
print(f"Max per sample shape (keepdims=False): {np.max(predictions, axis=1).shape}")

In [None]:
# Practical ML examples
print("Practical ML Statistical Operations:")

# 1. Softmax implementation
def softmax(x):
    """Numerically stable softmax"""
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

logits = np.random.randn(5, 3)  # 5 samples, 3 classes
probabilities = softmax(logits)

print(f"Logits shape: {logits.shape}")
print(f"Probabilities shape: {probabilities.shape}")
print(f"Probabilities sum per sample: {np.sum(probabilities, axis=1)}")
print(f"First sample probabilities: {probabilities[0]}")

# 2. Accuracy calculation
true_labels = np.array([0, 1, 2, 1, 0])
predicted_labels = np.argmax(probabilities, axis=1)

accuracy = np.mean(true_labels == predicted_labels)
print(f"\nTrue labels: {true_labels}")
print(f"Predicted labels: {predicted_labels}")
print(f"Accuracy: {accuracy:.2f}")

In [None]:
# Loss function implementations
print("Common Loss Functions:")

# Mean Squared Error (regression)
y_true = np.array([1.5, 2.3, 3.1, 4.2, 5.0])
y_pred = np.array([1.2, 2.1, 3.4, 4.0, 5.2])

mse = np.mean((y_true - y_pred)**2)
rmse = np.sqrt(mse)
mae = np.mean(np.abs(y_true - y_pred))

print(f"True values: {y_true}")
print(f"Predicted values: {y_pred}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

# Cross-entropy loss (classification)
def cross_entropy_loss(y_true_labels, y_pred_probs):
    """Cross-entropy loss for classification"""
    # Convert labels to one-hot if needed
    n_classes = y_pred_probs.shape[1]
    y_true_onehot = np.eye(n_classes)[y_true_labels]

    # Clip predictions to avoid log(0)
    y_pred_clipped = np.clip(y_pred_probs, 1e-15, 1 - 1e-15)

    # Calculate cross-entropy
    loss = -np.sum(y_true_onehot * np.log(y_pred_clipped)) / len(y_true_labels)
    return loss

ce_loss = cross_entropy_loss(true_labels, probabilities)
print(f"\nCross-entropy loss: {ce_loss:.4f}")

## 6. Array Reshaping and Manipulation

Essential for preparing data for neural networks and handling different tensor shapes.

In [None]:
# Reshaping operations (very common in deep learning)
print("Array Reshaping:")

# Original data: flattened images
flattened_images = np.random.randint(0, 256, size=(100, 784))  # 100 images, 28x28 pixels
print(f"Flattened images shape: {flattened_images.shape}")

# Reshape to image format
images = flattened_images.reshape(100, 28, 28)
print(f"Reshaped to images: {images.shape}")

# Add channel dimension (for CNN)
images_with_channel = images.reshape(100, 28, 28, 1)
print(f"With channel dimension: {images_with_channel.shape}")

# Or using -1 for automatic calculation
auto_reshape = flattened_images.reshape(100, 28, 28, -1)
print(f"Auto reshape (-1): {auto_reshape.shape}")

# Flatten back
flattened_again = images_with_channel.reshape(100, -1)
print(f"Flattened again: {flattened_again.shape}")

In [None]:
# Axis manipulation
print("Axis Manipulation:")

# Sample data: batch of sequences
sequences = np.random.randn(32, 50, 128)  # batch_size, seq_len, features
print(f"Original shape: {sequences.shape}")

# Transpose (swap axes)
transposed = sequences.transpose(1, 0, 2)  # seq_len, batch_size, features
print(f"Transposed: {transposed.shape}")

# Add new axis
with_new_axis = sequences[:, :, :, np.newaxis]
print(f"With new axis: {with_new_axis.shape}")

# Squeeze (remove dimensions of size 1)
squeezed = np.squeeze(with_new_axis)
print(f"Squeezed: {squeezed.shape}")

# Expand dimensions
expanded = np.expand_dims(sequences, axis=0)
print(f"Expanded (axis=0): {expanded.shape}")

expanded_last = np.expand_dims(sequences, axis=-1)
print(f"Expanded (axis=-1): {expanded_last.shape}")

In [None]:
# Concatenation and stacking (combining data)
print("Concatenation and Stacking:")

# Create sample batches
batch1 = np.random.randn(16, 10)  # 16 samples, 10 features
batch2 = np.random.randn(16, 10)  # 16 samples, 10 features
batch3 = np.random.randn(16, 10)  # 16 samples, 10 features

print(f"Batch 1 shape: {batch1.shape}")
print(f"Batch 2 shape: {batch2.shape}")
print(f"Batch 3 shape: {batch3.shape}")

# Concatenate along batch dimension
combined_batches = np.concatenate([batch1, batch2, batch3], axis=0)
print(f"Combined batches: {combined_batches.shape}")

# Stack (creates new dimension)
stacked_batches = np.stack([batch1, batch2, batch3], axis=0)
print(f"Stacked batches: {stacked_batches.shape}")

# Horizontal stack (features)
features1 = np.random.randn(100, 5)
features2 = np.random.randn(100, 3)
combined_features = np.hstack([features1, features2])
print(f"\nFeatures 1: {features1.shape}")
print(f"Features 2: {features2.shape}")
print(f"Combined features: {combined_features.shape}")

## 7. Performance and Memory Considerations

Understanding NumPy performance is crucial for efficient ML workflows.

In [None]:
import time

# Vectorization vs loops
print("Performance Comparison: Vectorization vs Loops")

# Create large arrays
size = 1000000
a = np.random.randn(size)
b = np.random.randn(size)

# Method 1: Pure Python loop (slow)
start_time = time.time()
result_loop = []
for i in range(min(10000, size)):  # Only do 10k for speed
    result_loop.append(a[i] * b[i])
loop_time = time.time() - start_time

# Method 2: NumPy vectorization (fast)
start_time = time.time()
result_vectorized = a * b
vectorized_time = time.time() - start_time

print(f"Loop time (10k elements): {loop_time:.6f} seconds")
print(f"Vectorized time ({size} elements): {vectorized_time:.6f} seconds")
print(f"Speedup factor: {loop_time / vectorized_time * (size/10000):.1f}x")

# Memory usage
print("\nMemory usage:")
print(f"Array 'a' memory: {a.nbytes / 1024 / 1024:.2f} MB")
print(f"Array 'b' memory: {b.nbytes / 1024 / 1024:.2f} MB")
print(f"Result memory: {result_vectorized.nbytes / 1024 / 1024:.2f} MB")

In [None]:
# Memory layout and views vs copies
print("Memory Layout and Views:")

# Original array
original = np.random.randn(1000, 1000)
print(f"Original array memory: {original.nbytes / 1024 / 1024:.2f} MB")

# View (shares memory)
view = original[::2, ::2]  # Every other element
print(f"View shares memory: {view.base is original}")
print(f"View shape: {view.shape}")

# Copy (new memory)
copy = original.copy()
print(f"Copy shares memory: {copy.base is original}")
print(f"Copy memory: {copy.nbytes / 1024 / 1024:.2f} MB")

# Demonstrate view behavior
original[0, 0] = 999
print("\nAfter modifying original[0,0] = 999:")
print(f"View[0,0] = {view[0, 0]} (should be 999 if it's a view)")
print(f"Copy[0,0] = {copy[0, 0]} (should be original value)")

## 8. Connection to PyTorch and TensorFlow

Understanding how NumPy concepts translate to tensor operations in both frameworks.

In [None]:
# Demonstrate NumPy as the bridge between frameworks
print("NumPy as the Bridge Between Frameworks:")

# Create sample data in NumPy
numpy_data = np.random.randn(32, 10).astype(np.float32)
numpy_labels = np.random.randint(0, 3, size=(32,))

print(f"NumPy data shape: {numpy_data.shape}")
print(f"NumPy data type: {numpy_data.dtype}")
print(f"NumPy labels shape: {numpy_labels.shape}")

# Show how this would convert to PyTorch (conceptually)
print("\nConversion to PyTorch (conceptual):")
print("torch_data = torch.from_numpy(numpy_data)")
print("torch_labels = torch.from_numpy(numpy_labels)")

# Show how this would convert to TensorFlow (conceptually)
print("\nConversion to TensorFlow (conceptual):")
print("tf_data = tf.constant(numpy_data)")
print("tf_labels = tf.constant(numpy_labels)")

# Common operations that work similarly
print("\nCommon operations (NumPy syntax):")
print(f"Mean: {np.mean(numpy_data, axis=1).shape}")
print(f"Max: {np.max(numpy_data, axis=1).shape}")
print(f"Reshape: {numpy_data.reshape(32, 2, 5).shape}")
print(f"Transpose: {numpy_data.T.shape}")

In [None]:
# Practical example: Data preprocessing pipeline
print("Practical Data Preprocessing Pipeline:")

# Simulate loading data (this is where NumPy shines)
raw_data = np.random.randn(1000, 20) * 5 + 10  # Some realistic data
raw_labels = np.random.randint(0, 5, size=(1000,))

print(f"Raw data shape: {raw_data.shape}")
print(f"Raw data range: [{np.min(raw_data):.2f}, {np.max(raw_data):.2f}]")

# Step 1: Normalize features
mean = np.mean(raw_data, axis=0)
std = np.std(raw_data, axis=0)
normalized_data = (raw_data - mean) / std

print("\nAfter normalization:")
print(f"Mean: {np.mean(normalized_data, axis=0)[:5]}...")  # Should be ~0
print(f"Std: {np.std(normalized_data, axis=0)[:5]}...")    # Should be ~1

# Step 2: Train/validation split
n_train = int(0.8 * len(normalized_data))
indices = np.random.permutation(len(normalized_data))

train_data = normalized_data[indices[:n_train]]
val_data = normalized_data[indices[n_train:]]
train_labels = raw_labels[indices[:n_train]]
val_labels = raw_labels[indices[n_train:]]

print(f"\nTrain set: {train_data.shape}")
print(f"Validation set: {val_data.shape}")

# Step 3: Convert to appropriate format for frameworks
print("\nData ready for framework conversion:")
print(f"Data type: {train_data.dtype}")
print(f"Labels type: {train_labels.dtype}")
print(f"No NaN values: {not np.any(np.isnan(train_data))}")
print(f"Finite values: {np.all(np.isfinite(train_data))}")

## Summary and Key Takeaways

**What we've learned:**

1. **Array Creation & Properties**: Understanding shapes, dtypes, and memory usage
2. **Indexing & Slicing**: Essential for data manipulation and batch processing
3. **Broadcasting**: Enables efficient operations without explicit loops
4. **Linear Algebra**: Matrix operations that form the core of neural networks
5. **Statistical Operations**: Computing metrics, losses, and aggregations
6. **Reshaping**: Preparing data for different network architectures
7. **Performance**: Vectorization and memory considerations
8. **Framework Bridge**: How NumPy connects to PyTorch and TensorFlow

**Key Patterns for ML:**
- Use vectorized operations instead of loops
- Understand broadcasting for efficient computations
- Master axis-based operations for batch processing
- Know when operations create views vs copies
- Prepare data in NumPy before converting to framework tensors

**Next Steps:**
- Learn Pandas for structured data manipulation
- Understand how these concepts translate to PyTorch tensors
- See how TensorFlow operations mirror NumPy patterns

NumPy is the foundation that makes both PyTorch and TensorFlow possible. Mastering these concepts will make learning either framework much easier!