**Automatic differentiation (autodiff)** is a technique to compute derivatives (gradients) of functions efficiently and accurately. It is a core part of many machine learning frameworks (like PyTorch, TensorFlow, and JAX) and is especially useful for training neural networks using gradient-based optimisation methods such as **stochastic gradient descent**.


### What is it?

Unlike symbolic differentiation (like in SymPy) or numerical differentiation (finite differences), autodiff works by **decomposing functions into a sequence of operations** and **applying the chain rule** automatically. It avoids rounding errors from numerical differentiation and the complexity of symbolic derivatives.

There are two main modes:
- **Forward mode** (good when input dimension is small)
- **Reverse mode** (used in ML, good when output dimension is small—like loss functions)


### How is auto-diff used in machine learning

In machine learning, we often define a **loss function** that tells us how bad our model predictions are. We then:

1. **Compute the gradient of the loss** with respect to model parameters.
2. **Update parameters** in the direction that reduces the loss.

Autodiff handles step 1 automatically, which would otherwise be messy and error-prone.

### Summary

| Feature               | Autodiff                            |
|----------------------|-------------------------------------|
| Accuracy             | Very high (uses exact chain rule)   |
| Speed                | Very fast (especially in reverse mode) |
| Use in ML            | Essential for training models       |
| Example Frameworks   | PyTorch, TensorFlow, JAX            |

### Example

In [1]:
import torch

# Create a tensor with gradient tracking
x = torch.tensor(2.0, requires_grad=True)

# Define a function
y = x**2 + 3 * x + 5

# Compute the derivative of y with respect to x
y.backward()

# Print the gradient dy/dx
print(x.grad)  # Should print 2*x + 3 = 2*2 + 3 = 7

tensor(7.)


### Finite Difference Method

This method approximates the derivative using:

$$f'(x) \approx \frac{f(x + h) - f(x)}{h}$$

where $$h$$ is a small number (step size).

Let’s use the same function:

$$f(x) = x^2 + 3x + 5$$

and evaluate its derivative at $x = 2$.

### Finite difference vs auto-diff

In [2]:
import torch

# AutoDiff method (PyTorch)
x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 3 * x + 5
y.backward()
autodiff_grad = x.grad.item()

# Finite difference method
def f(x):
    return x**2 + 3 * x + 5

x_val = 2.0
h = 1e-5
finite_diff_grad = (f(x_val + h) - f(x_val)) / h

print(f"Autodiff Gradient:       {autodiff_grad:.6f}")
print(f"Finite Difference Grad:  {finite_diff_grad:.6f}")

Autodiff Gradient:       7.000000
Finite Difference Grad:  7.000010


### Conclusion

- **Autodiff** is **exact** up to floating-point precision.
- **Finite difference** is approximate and can suffer from numerical errors if $h$ is too large or too small.
- In machine learning, where we deal with millions of parameters, **autodiff is the gold standard** for gradient computation.