In [None]:
import marimo as mo

# Week 2: Calculus Foundations for Deep Learning**IME775: Data Driven Modeling and Optimization**ðŸ“– **Reference**: Krishnendu Chaudhury. *Math and Architectures of Deep Learning*, Chapter 3---## Learning Objectives- Master derivatives and their role in optimization- Understand multivariable calculus for neural networks- Learn the chain rule as the foundation of backpropagation- Connect gradients to learning algorithms

In [None]:
import numpy as npimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D

## 2.1 Derivatives: Measuring ChangeThe derivative measures instantaneous rate of change:$$f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$$**Geometric interpretation**: Slope of tangent line = direction of steepest ascent

## 2.2 The Gradient: Multivariate ExtensionFor scalar function $f: \mathbb{R}^n \rightarrow \mathbb{R}$:$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \end{bmatrix}$$**Key property**: Gradient points toward steepest ascent!

## 2.3 Activation Functions and Their Derivatives| Function | Formula | Derivative ||----------|---------|------------|| Sigmoid | $\sigma(x) = \frac{1}{1+e^{-x}}$ | $\sigma(x)(1-\sigma(x))$ || Tanh | $\tanh(x)$ | $1 - \tanh^2(x)$ || ReLU | $\max(0, x)$ | $\mathbb{1}_{x>0}$ || Leaky ReLU | $\max(\alpha x, x)$ | $\alpha$ or $1$ |

## 2.4 The Chain Rule: Foundation of BackpropagationFor $z = f(g(x))$:$$\frac{dz}{dx} = \frac{df}{dg} \cdot \frac{dg}{dx}$$**Neural network**: Each layer is a function composition$$L = \text{loss}(\text{layer}_n(\cdots \text{layer}_2(\text{layer}_1(x))))$$Gradients flow backward through the chain!

### Example: Simple Neural NetworkSingle neuron: $y = \sigma(wx + b)$Loss: $L = (y - t)^2$ where $t$ is target**Forward pass**:1. $z = wx + b$2. $y = \sigma(z)$3. $L = (y - t)^2$**Backward pass** (chain rule):- $\frac{\partial L}{\partial y} = 2(y - t)$- $\frac{\partial L}{\partial z} = \frac{\partial L}{\partial y} \cdot \sigma'(z)$- $\frac{\partial L}{\partial w} = \frac{\partial L}{\partial z} \cdot x$- $\frac{\partial L}{\partial b} = \frac{\partial L}{\partial z}$

In [None]:
# Demonstrate chain rule in action    np.random.seed(42)    # Simple neural network: y = sigmoid(wx + b)    # Loss: L = (y - target)^2    # Parameters    w = 0.5    b = 0.1    x = 2.0    target = 0.8    # Forward pass    z = w * x + b    y = 1 / (1 + np.exp(-z))    L = (y - target) ** 2    # Backward pass (analytical gradients)    dL_dy = 2 * (y - target)    dy_dz = y * (1 - y)  # sigmoid derivative    dL_dz = dL_dy * dy_dz    dL_dw = dL_dz * x    dL_db = dL_dz    # Numerical gradients for verification    eps = 1e-7    # Numerical dL/dw    z_plus = (w + eps) * x + b    y_plus = 1 / (1 + np.exp(-z_plus))    L_plus = (y_plus - target) ** 2    z_minus = (w - eps) * x + b    y_minus = 1 / (1 + np.exp(-z_minus))    L_minus = (y_minus - target) ** 2    numerical_dL_dw = (L_plus - L_minus) / (2 * eps)    return {        'forward': {'z': z, 'y': y, 'L': L},        'analytical': {'dL_dw': dL_dw, 'dL_db': dL_db},        'numerical': {'dL_dw': numerical_dL_dw}    }result = forward_backward_demo()print("Forward Pass:")print(f"  z = {result['forward']['z']:.4f}")print(f"  y = {result['forward']['y']:.4f}")print(f"  L = {result['forward']['L']:.4f}")print("\nGradients (Chain Rule):")print(f"  Analytical dL/dw = {result['analytical']['dL_dw']:.6f}")print(f"  Numerical  dL/dw = {result['numerical']['dL_dw']:.6f}")print(f"  Match: {np.isclose(result['analytical']['dL_dw'], result['numerical']['dL_dw'])}")

## 2.5 Gradient Checking**Numerical gradient** (for verification):$$\frac{\partial f}{\partial x_i} \approx \frac{f(x + \epsilon e_i) - f(x - \epsilon e_i)}{2\epsilon}$$**Gradient check**: Compare analytical vs numerical gradientsThis is crucial for debugging custom neural networks!

## 2.6 The Hessian: Second-Order InformationThe Hessian matrix of second derivatives:$$\mathbf{H} = \begin{bmatrix}\frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} \\\frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2}\end{bmatrix}$$**Eigenvalues** tell us about curvature:- All positive â†’ local minimum- All negative â†’ local maximum- Mixed signs â†’ saddle point

## Summary| Concept | Definition | Deep Learning Role ||---------|------------|-------------------|| **Derivative** | Rate of change | Sensitivity of loss to parameters || **Gradient** | Vector of partials | Direction for parameter updates || **Chain Rule** | Composition derivative | Backpropagation algorithm || **Hessian** | Second derivatives | Curvature, adaptive learning rates |---## References- **Primary**: Krishnendu Chaudhury. *Math and Architectures of Deep Learning*, Chapter 3.- **Supplementary**: Goodfellow, I., et al. (2016). *Deep Learning*, Chapter 4.## Connection to ML Refined CurriculumThis calculus foundation supports:- Week 2-3: Optimization algorithms (gradient descent variants)- Weeks 4-8: Computing gradients for regression and classification