## 1.3: Partial Derivatives — The Engine Behind Gradient Descent in ML

### What Is a Partial Derivative?

A partial derivative measures how a function changes as only one variable changes, while holding all others constant.

It’s used when a function depends on multiple variables — like most loss functions in ML!

![image.png](attachment:afc693cf-6a32-47b7-8c85-ea8b40d7e58d.png)

**Each one answers:**

> "How does `f` change if I only increase `x` or `y` a tiny bit, while keeping the other constant?"

### Why Partial Derivatives Matter in ML

Most loss functions are multi-variable because we optimize many weights:

![image.png](attachment:37fdbe6c-198b-4a68-a1e2-9ab62bba2fb7.png)

✅ This is the gradient — a vector of partial derivatives.
It tells us which direction to update each weight to minimize the loss.

![image.png](attachment:fe7eb835-0f4f-4036-9ee4-05ae94b79573.png)

In [2]:
# Example 2: In Python (with sympy)

from sympy import symbols, diff

x, y = symbols('x y')
f = 3*x**2 + 2*x*y + y**2

df_dx = diff(f, x)
df_dy = diff(f, y)

print("∂f/∂x =", df_dx)
print("∂f/∂y =", df_dy)

∂f/∂x = 6*x + 2*y
∂f/∂y = 2*x + 2*y


### ML Application: Gradient Descent with Multiple Weights

![image.png](attachment:0a57c204-f7e5-4395-a765-3fcf41591d9c.png)

These are partial derivatives in action — the core of every neural network training loop.

### How does it differ from a total derivative?

> The partial derivative measures change in a function with respect to one variable, treating other variables as constants.
> The total derivative considers that all variables may depend on one another — it calculates the combined effect of direct and indirect changes.

![image.png](attachment:e7bce0c9-70f3-4272-8e28-c3f1969d2f3d.png)

This is the total effect on z when x changes, including how y changes with x.

✅ In ML, we usually work with partials, but understanding total derivatives is useful in more complex systems (e.g., recurrent nets, time series, multi-input systems).

### Why is the gradient made up of partials?

> The gradient is a vector of all partial derivatives of a multivariable function. Each component tells us how sensitive the function is to that specific variable, helping us compute the direction of steepest ascent (or descent) in optimization.

### 🧪 1. Compute partial derivatives using `sympy`

![image.png](attachment:872e5e87-d7c7-49ca-af05-64ba757b6dca.png)

In [3]:
from sympy import symbols, diff, sin

x , y = symbols('x y')
f = x ** 2 * y + sin(x * y)

df_dx = diff(f, x)
df_dy = diff(f, y)

print("∂f/∂x =", df_dx)
print("∂f/∂y =", df_dy)

∂f/∂x = 2*x*y + y*cos(x*y)
∂f/∂y = x**2 + x*cos(x*y)


### 🧪 2. Compute gradient vector of a multivariable function

![image.png](attachment:5bd0b36e-34b7-4b78-8cb1-3140d49e75ae.png)

In [4]:
from sympy import symbols, diff

w1, w2, y = symbols('w1 w2 y')
L = (w1 + 2*w2 - y)**2

grad_w1 = diff(L, w1)
grad_w2 = diff(L, w2)

print("Gradient vector: [", grad_w1, ",", grad_w2, "]")

Gradient vector: [ 2*w1 + 4*w2 - 2*y , 4*w1 + 8*w2 - 4*y ]


### 🧪 3. Manually perform one gradient descent step

> "Using your computed gradient, update weights using learning rate η = 0.1"

In [6]:
# Let y = 5, w1 = 1.0, w2 = 1.0
import numpy as np

y_val = 5
w1 = 1.0
w2 = 1.0
eta = 0.1

# Forward pass
loss = (w1 + 2*w2 - y_val)**2

# Gradients (manually or from sympy)
grad_w1 = 2 * (w1 + 2*w2 - y_val)
grad_w2 = 2 * (w1 + 2*w2 - y_val) * 2

# Update
w1 -= eta * grad_w1
w2 -= eta * grad_w2

print("Updated w1:", w1)
print("Updated w2:", w2)

Updated w1: 1.4
Updated w2: 1.8
