# Chain Rule and Gradient
This notebook provides a guide to understanding the **Chain Rule** and **Gradients**, their applications, and their role in AI.

## Contents:
- Chain Rule
- Gradient
- Applications in AI
- Symbolic and Numerical Computations using Python


## 1. Chain Rule
### Definition:
The Chain Rule allows us to differentiate composite functions by breaking them down into their components. It’s represented as:
$$ (f(g(x)))' = f'(g(x)) \cdot g'(x) $$

### Example:
Let’s calculate the derivative of a composite function $h(x) = \sin(x^2)$ using the Chain Rule.

In [None]:
# Example: Applying the Chain Rule
from sympy import symbols, sin, diff
x = symbols('x')
h = sin(x**2)
dh_dx = diff(h, x)
dh_dx

### Application in AI
The Chain Rule is critical in **backpropagation** for training neural networks, where it’s used to compute gradients of the loss function with respect to each layer's parameters by working from the output layer back to the input.

## 2. Gradient
### Definition:
The **gradient** of a function is a vector of partial derivatives, which indicates the direction and rate of the fastest increase of the function. It’s denoted as:
$$ \nabla f(x) = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right) $$

### Example:
Let's calculate the gradient of a multivariable function $f(x, y) = x^2 + y^2$.

In [None]:
# Gradient calculation of a multivariable function
y = symbols('y')
f = x**2 + y**2
df_dx = diff(f, x)
df_dy = diff(f, y)
(df_dx, df_dy)

### Application in AI: Gradient Descent
In AI, the **gradient** helps optimize loss functions. The gradient indicates the direction in which parameters should be adjusted to minimize the error.

### Example of Gradient Descent:
For a function $J(\theta) = \theta^2$, we can find the gradient and update $\theta$ iteratively to reach the minimum value.

### Symbolic Computation Using Python (SymPy)
Using SymPy, we can compute symbolic derivatives and gradients. Below, we’ll apply the Chain Rule and calculate gradients for a multivariable function.

In [None]:
# Chain Rule on another composite function
g = x**3 + 2*x
f_g = sin(g)
df_g_dx = diff(f_g, x)
df_g_dx

In [None]:
# Gradient of a multivariable function
f_multi = x**2 + 3*x*y + y**2
df_dx_multi = diff(f_multi, x)
df_dy_multi = diff(f_multi, y)
(df_dx_multi, df_dy_multi)

### Numerical Computation Using Python (SciPy)
For complex functions, we often use numerical approaches for calculating gradients and applying gradient descent.

In this example, we compute the gradient of a simple function numerically and observe gradient descent steps.

In [None]:
# Numerical gradient approximation using SciPy and Numpy
from scipy.optimize import approx_fprime
import numpy as np

def func(v):
    x, y = v
    return x**2 + y**2

# Point to calculate the gradient at
point = np.array([1.0, 1.0])

# Approximate gradient at the point
epsilon = np.sqrt(np.finfo(float).eps)
gradient = approx_fprime(point, func, epsilon)
gradient

## 3. Conclusion
The Chain Rule and Gradient are fundamental in AI, especially for optimizing models. In this notebook, we explored their definitions, applications, symbolic, and numerical computations. Understanding these tools is key for effective model training in AI.