# Lab 1.2: Mathematical Foundations Review

## Duration: 45 minutes

## Learning Objectives
By the end of this lab, you will be able to:
- Review essential linear algebra concepts for neural networks
- Understand and implement vector and matrix operations
- Apply calculus concepts (derivatives) in the context of neural networks
- Practice mathematical operations that form the backbone of deep learning

## Prerequisites
- Completed Lab 1.1 (Environment Setup)
- Basic understanding of high school algebra and calculus

---

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
%matplotlib inline
plt.style.use('default')

print("Environment ready for mathematical foundations review!")

## Part 1: Vectors and Vector Operations

In neural networks, we frequently work with vectors representing data points, weights, and activations.

In [None]:
print("=" * 40)
print("PART 1: VECTORS AND VECTOR OPERATIONS")
print("=" * 40)

# Create sample vectors
vector_a = np.array([1, 2, 3])
vector_b = np.array([4, 5, 6])

print(f"Vector a: {vector_a}")
print(f"Vector b: {vector_b}")
print(f"Shape of vector a: {vector_a.shape}")
print(f"Length of vector a: {len(vector_a)}")

In [None]:
# Vector Addition and Subtraction
print("\nVector Arithmetic:")
print("-" * 20)

vector_sum = vector_a + vector_b
vector_diff = vector_a - vector_b

print(f"a + b = {vector_sum}")
print(f"a - b = {vector_diff}")

# Scalar multiplication
scalar = 2.5
scaled_vector = scalar * vector_a
print(f"\n{scalar} * a = {scaled_vector}")

In [None]:
# Dot Product (very important for neural networks!)
print("\nDot Product:")
print("-" * 15)

dot_product = np.dot(vector_a, vector_b)
print(f"a · b = {dot_product}")

# Alternative way to compute dot product
dot_product_alt = np.sum(vector_a * vector_b)
print(f"Alternative calculation: {dot_product_alt}")

# Manual calculation for understanding
manual_dot = vector_a[0]*vector_b[0] + vector_a[1]*vector_b[1] + vector_a[2]*vector_b[2]
print(f"Manual calculation: {vector_a[0]}*{vector_b[0]} + {vector_a[1]}*{vector_b[1]} + {vector_a[2]}*{vector_b[2]} = {manual_dot}")

In [None]:
# Vector Magnitude (Length)
print("\nVector Magnitude:")
print("-" * 20)

magnitude_a = np.linalg.norm(vector_a)
magnitude_b = np.sqrt(np.sum(vector_b**2))  # Alternative calculation

print(f"Magnitude of a: {magnitude_a:.4f}")
print(f"Magnitude of b: {magnitude_b:.4f}")

# Unit vector (normalized)
unit_vector_a = vector_a / magnitude_a
print(f"\nUnit vector of a: {unit_vector_a}")
print(f"Magnitude of unit vector: {np.linalg.norm(unit_vector_a):.6f}")

## Part 2: Matrices and Matrix Operations

Matrices are fundamental in neural networks for representing weights, data batches, and transformations.

In [None]:
print("=" * 40)
print("PART 2: MATRICES AND MATRIX OPERATIONS")
print("=" * 40)

# Create sample matrices
matrix_A = np.array([[1, 2, 3],
                     [4, 5, 6]])

matrix_B = np.array([[7, 8],
                     [9, 10],
                     [11, 12]])

print(f"Matrix A (2x3):\n{matrix_A}")
print(f"\nMatrix B (3x2):\n{matrix_B}")
print(f"\nShape of A: {matrix_A.shape}")
print(f"Shape of B: {matrix_B.shape}")

In [None]:
# Matrix Multiplication (most important operation in neural networks!)
print("\nMatrix Multiplication:")
print("-" * 25)

# A (2x3) × B (3x2) = C (2x2)
matrix_C = np.dot(matrix_A, matrix_B)
# Alternative: matrix_C = matrix_A @ matrix_B

print(f"A × B = \n{matrix_C}")
print(f"Shape of result: {matrix_C.shape}")

# Manual calculation for first element to understand
first_element = matrix_A[0,0]*matrix_B[0,0] + matrix_A[0,1]*matrix_B[1,0] + matrix_A[0,2]*matrix_B[2,0]
print(f"\nFirst element calculation: {matrix_A[0,0]}*{matrix_B[0,0]} + {matrix_A[0,1]}*{matrix_B[1,0]} + {matrix_A[0,2]}*{matrix_B[2,0]} = {first_element}")

In [None]:
# Matrix-Vector Multiplication (common in neural network forward pass)
print("\nMatrix-Vector Multiplication:")
print("-" * 35)

# Create a vector
vector_x = np.array([1, 2, 3])
print(f"Vector x: {vector_x}")

# Multiply matrix A with vector x
result_vector = np.dot(matrix_A, vector_x)
print(f"A × x = {result_vector}")
print(f"Shape: ({matrix_A.shape[0]},) - this is a {matrix_A.shape[0]}-dimensional vector")

# This is essentially what happens in a neural network layer!
print("\n💡 This is essentially what happens in a neural network layer:")
print("   Weights (matrix) × Input (vector) = Output (vector)")

In [None]:
# Matrix Transpose (very important for backpropagation)
print("\nMatrix Transpose:")
print("-" * 20)

matrix_A_T = matrix_A.T
print(f"A transpose (3x2):\n{matrix_A_T}")
print(f"Shape: {matrix_A_T.shape}")

# Properties of transpose
print(f"\nOriginal A shape: {matrix_A.shape}")
print(f"Transposed A shape: {matrix_A_T.shape}")
print("Notice how rows become columns and vice versa!")

In [None]:
# Element-wise operations
print("\nElement-wise Operations:")
print("-" * 25)

# Create two matrices of the same shape
matrix_X = np.array([[1, 2], [3, 4]])
matrix_Y = np.array([[5, 6], [7, 8]])

print(f"Matrix X:\n{matrix_X}")
print(f"\nMatrix Y:\n{matrix_Y}")

# Element-wise multiplication (Hadamard product)
element_wise_mult = matrix_X * matrix_Y
print(f"\nElement-wise multiplication X * Y:\n{element_wise_mult}")

# Element-wise addition
element_wise_add = matrix_X + matrix_Y
print(f"\nElement-wise addition X + Y:\n{element_wise_add}")

## Part 3: Broadcasting in NumPy

Broadcasting allows operations between arrays of different shapes, which is very useful in neural networks.

In [None]:
print("=" * 30)
print("PART 3: BROADCASTING")
print("=" * 30)

# Matrix + scalar (broadcasting)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
scalar = 10

result = matrix + scalar
print(f"Original matrix:\n{matrix}")
print(f"\nMatrix + {scalar}:\n{result}")
print("The scalar is 'broadcasted' to match the matrix shape!")

In [None]:
# Matrix + vector (broadcasting)
print("\nMatrix + Vector Broadcasting:")
print("-" * 35)

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Add a vector to each row
vector = np.array([10, 20, 30])
result = matrix + vector

print(f"Matrix (3x3):\n{matrix}")
print(f"\nVector (3,): {vector}")
print(f"\nMatrix + Vector:\n{result}")
print("\nThe vector is added to each row of the matrix!")

In [None]:
# Column vector broadcasting (adding bias in neural networks)
print("\nColumn Vector Broadcasting (like adding bias):")
print("-" * 50)

# Add a column vector to each column
column_vector = np.array([[100], [200], [300]])
result = matrix + column_vector

print(f"Matrix (3x3):\n{matrix}")
print(f"\nColumn vector (3x1):\n{column_vector}")
print(f"\nMatrix + Column Vector:\n{result}")
print("\nThe column vector is added to each column of the matrix!")
print("💡 This is how bias terms are added in neural networks!")

## Part 4: Calculus Review - Derivatives

Derivatives are essential for understanding how neural networks learn through backpropagation.

In [None]:
print("=" * 35)
print("PART 4: DERIVATIVES AND GRADIENTS")
print("=" * 35)

# Visualize a simple function and its derivative
x = np.linspace(-3, 3, 100)
y = x**2  # f(x) = x²
dy_dx = 2*x  # f'(x) = 2x

plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(x, y, 'b-', linewidth=2, label='f(x) = x²')
plt.plot(x, dy_dx, 'r--', linewidth=2, label="f'(x) = 2x")
plt.grid(True, alpha=0.3)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Function and its Derivative')
plt.legend()

# Visualize slope at specific points
plt.subplot(1, 2, 2)
test_points = [-2, -1, 0, 1, 2]
for point in test_points:
    y_point = point**2
    slope = 2*point
    
    # Plot tangent line
    x_tangent = np.linspace(point-0.5, point+0.5, 10)
    y_tangent = slope * (x_tangent - point) + y_point
    
    plt.plot(point, y_point, 'ro', markersize=8)
    plt.plot(x_tangent, y_tangent, 'g-', alpha=0.7)
    plt.text(point, y_point+0.5, f'slope={slope}', ha='center', fontsize=8)

plt.plot(x, y, 'b-', linewidth=2, label='f(x) = x²')
plt.grid(True, alpha=0.3)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Tangent Lines (Slopes)')
plt.legend()

plt.tight_layout()
plt.show()

print("The derivative tells us the slope of the function at each point!")
print("In neural networks, we use derivatives to find the direction to adjust weights.")

In [None]:
# Numerical derivative approximation
print("\nNumerical Derivative Approximation:")
print("-" * 40)

def f(x):
    """Example function: f(x) = x²"""
    return x**2

def numerical_derivative(func, x, h=1e-7):
    """Compute numerical derivative using finite differences"""
    return (func(x + h) - func(x - h)) / (2 * h)

# Test at various points
test_points = [-2, -1, 0, 1, 2]
print("Point\tAnalytical\tNumerical\tError")
print("-" * 45)

for x_test in test_points:
    analytical = 2 * x_test  # True derivative of x²
    numerical = numerical_derivative(f, x_test)
    error = abs(analytical - numerical)
    
    print(f"{x_test}\t{analytical:.6f}\t{numerical:.6f}\t{error:.2e}")

print("\nNumerical derivatives are how computers compute gradients!")

In [None]:
# Chain rule demonstration (crucial for backpropagation)
print("\nChain Rule Demonstration:")
print("-" * 30)

# Composite function: h(x) = (x² + 1)³
# Let u = x² + 1, so h(x) = u³
# dh/dx = dh/du × du/dx = 3u² × 2x = 3(x² + 1)² × 2x

def h(x):
    return (x**2 + 1)**3

def h_derivative(x):
    return 3 * (x**2 + 1)**2 * 2 * x

x_test = 2.0
analytical = h_derivative(x_test)
numerical = numerical_derivative(h, x_test)

print(f"Function: h(x) = (x² + 1)³")
print(f"At x = {x_test}:")
print(f"  Analytical derivative: {analytical:.6f}")
print(f"  Numerical derivative:  {numerical:.6f}")
print(f"  Error: {abs(analytical - numerical):.2e}")

print("\n💡 The chain rule is the foundation of backpropagation in neural networks!")

## Part 5: Gradients and Multivariable Functions

Neural networks work with functions of many variables, so we need to understand gradients.

In [None]:
print("=" * 40)
print("PART 5: GRADIENTS (MULTIVARIABLE)")
print("=" * 40)

# Simple 2D function: f(x, y) = x² + y²
def f_2d(x, y):
    return x**2 + y**2

# Partial derivatives
def df_dx(x, y):
    return 2*x

def df_dy(x, y):
    return 2*y

# Visualize the function
x_range = np.linspace(-3, 3, 50)
y_range = np.linspace(-3, 3, 50)
X, Y = np.meshgrid(x_range, y_range)
Z = f_2d(X, Y)

fig = plt.figure(figsize=(15, 5))

# 3D surface plot
ax1 = fig.add_subplot(131, projection='3d')
surf = ax1.plot_surface(X, Y, Z, cmap='viridis', alpha=0.7)
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_zlabel('f(x,y)')
ax1.set_title('f(x,y) = x² + y²')

# Contour plot
ax2 = fig.add_subplot(132)
contour = ax2.contour(X, Y, Z, levels=20)
ax2.clabel(contour, inline=True, fontsize=8)
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.set_title('Contour Plot')
ax2.grid(True, alpha=0.3)
ax2.axis('equal')

# Gradient vectors
ax3 = fig.add_subplot(133)
x_arrows = np.arange(-2, 3, 1)
y_arrows = np.arange(-2, 3, 1)
X_arrows, Y_arrows = np.meshgrid(x_arrows, y_arrows)
U = df_dx(X_arrows, Y_arrows)  # x-component of gradient
V = df_dy(X_arrows, Y_arrows)  # y-component of gradient

ax3.quiver(X_arrows, Y_arrows, U, V, scale=20, alpha=0.8, color='red')
ax3.contour(X, Y, Z, levels=10, alpha=0.5)
ax3.set_xlabel('x')
ax3.set_ylabel('y')
ax3.set_title('Gradient Vectors')
ax3.grid(True, alpha=0.3)
ax3.axis('equal')

plt.tight_layout()
plt.show()

print("The gradient vector points in the direction of steepest increase!")
print("In neural networks, we go in the opposite direction (gradient descent).")

In [None]:
# Gradient computation for specific points
print("\nGradient Computation at Specific Points:")
print("-" * 45)

test_points = [(1, 1), (2, -1), (-1, 2), (0, 0)]

print("Point\t\tGradient\t\tMagnitude")
print("-" * 50)

for x, y in test_points:
    grad_x = df_dx(x, y)
    grad_y = df_dy(x, y)
    magnitude = np.sqrt(grad_x**2 + grad_y**2)
    
    print(f"({x:2}, {y:2})\t\t({grad_x:2}, {grad_y:2})\t\t{magnitude:.3f}")

print("\nNote: At (0,0), the gradient is (0,0) - this is a minimum point!")

## Part 6: Practical Applications to Neural Networks

Let's connect these mathematical concepts to neural networks with simple examples.

In [None]:
print("=" * 45)
print("PART 6: NEURAL NETWORK CONNECTIONS")
print("=" * 45)

# Simple neural network computation example
print("Simple Neural Network Layer Example:")
print("-" * 40)

# Input data (3 features, 4 data points)
X = np.array([[1, 2, 3],      # Data point 1
              [4, 5, 6],      # Data point 2  
              [7, 8, 9],      # Data point 3
              [2, 1, 4]])     # Data point 4

print(f"Input data X (4 samples, 3 features):\n{X}")

# Weights (3 inputs, 2 neurons)
W = np.array([[0.1, 0.4],    # Weights for feature 1
              [0.2, 0.5],    # Weights for feature 2
              [0.3, 0.6]])   # Weights for feature 3

print(f"\nWeights W (3 features, 2 neurons):\n{W}")

# Bias
b = np.array([0.1, 0.2])
print(f"\nBias b: {b}")

# Forward pass: z = XW + b
z = np.dot(X, W) + b  # Broadcasting adds bias to each row
print(f"\nLinear output z = XW + b (4 samples, 2 neurons):\n{z}")

print("\n💡 This is exactly what happens in a neural network layer!")
print("   - Matrix multiplication: XW")
print("   - Broadcasting: + b")
print("   - Next step would be applying activation function")

In [None]:
# Loss function example (Mean Squared Error)
print("\nLoss Function Example (MSE):")
print("-" * 35)

# Predictions and true labels
y_pred = np.array([0.8, 0.3, 0.9, 0.1])
y_true = np.array([1.0, 0.0, 1.0, 0.0])

print(f"Predictions: {y_pred}")
print(f"True labels: {y_true}")

# Mean Squared Error
mse = np.mean((y_pred - y_true)**2)
print(f"\nMean Squared Error: {mse:.4f}")

# Gradient of MSE with respect to predictions
gradient = 2 * (y_pred - y_true) / len(y_pred)
print(f"\nGradient ∂MSE/∂y_pred: {gradient}")

print("\n💡 This gradient tells us how to adjust our predictions!")
print("   Positive gradient → decrease prediction")
print("   Negative gradient → increase prediction")

In [None]:
# Visualization of gradient descent concept
print("\nGradient Descent Visualization:")
print("-" * 35)

# Simple 1D function to minimize: f(w) = (w-2)² + 1
def loss_function(w):
    return (w - 2)**2 + 1

def loss_gradient(w):
    return 2 * (w - 2)

# Gradient descent simulation
w = -1.0  # Starting point
learning_rate = 0.3
history = [w]
loss_history = [loss_function(w)]

for i in range(10):
    grad = loss_gradient(w)
    w = w - learning_rate * grad  # Gradient descent step
    history.append(w)
    loss_history.append(loss_function(w))

# Plot the optimization process
w_range = np.linspace(-2, 4, 100)
loss_range = loss_function(w_range)

plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(w_range, loss_range, 'b-', linewidth=2, label='Loss function')
plt.plot(history, [loss_function(w) for w in history], 'ro-', 
         markersize=6, label='Gradient descent path')
plt.xlabel('Weight (w)')
plt.ylabel('Loss')
plt.title('Gradient Descent on Loss Function')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(loss_history, 'g.-', linewidth=2, markersize=8)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('Loss Reduction Over Time')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Final weight: {w:.4f} (optimal is 2.0)")
print(f"Final loss: {loss_function(w):.6f} (minimum is 1.0)")
print("\n💡 This is how neural networks learn - by following gradients!")

## Progress Checklist

Mark each concept as understood:

- [ ] Vector operations (addition, dot product, magnitude)
- [ ] Matrix operations (multiplication, transpose)
- [ ] Matrix-vector multiplication
- [ ] Broadcasting in NumPy
- [ ] Derivatives and their geometric meaning
- [ ] Chain rule
- [ ] Gradients for multivariable functions
- [ ] Connection to neural network computations
- [ ] Loss functions and their gradients
- [ ] Gradient descent concept

## Key Concepts Summary

1. **Vectors**: Represent data points, weights, and activations
2. **Matrices**: Store weights and transform data in layers
3. **Matrix Multiplication**: Core operation in neural network forward pass
4. **Broadcasting**: Allows operations between different shaped arrays
5. **Derivatives**: Measure rate of change, essential for learning
6. **Gradients**: Show direction of steepest increase in multivariable functions
7. **Chain Rule**: Enables backpropagation through network layers
8. **Gradient Descent**: Algorithm for minimizing loss functions

## Troubleshooting

### Common Issues:

**1. Matrix dimension mismatch:**
- Always check shapes before multiplication
- Remember: (m,n) × (n,p) = (m,p)

**2. Broadcasting errors:**
- Understand which dimensions can be broadcasted
- Use `.reshape()` to fix dimension issues

**3. Numerical instability:**
- Be careful with very large or small numbers
- Use appropriate data types (float32 vs float64)

**4. Plotting issues:**
- Ensure matplotlib backend is properly configured
- Try `%matplotlib inline` if plots don't appear

## Next Steps

In the next lab, we'll implement activation functions that transform the linear combinations we compute using these mathematical operations.

---

**Congratulations! You've reviewed the essential mathematical foundations for neural networks!**