# Curve Fitting via Gradient Descent
This notebook walks through fitting a data to a model using **gradient descent**, breaking down the optimization problem step by step.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## 1. The principles of curve fitting

Suppose we observe a pendulum in motion. We expect its displacement after time $x$ is $a\sin(x + b)$, for parameters $a$ and $b$ to be determined.

Mathematically, the displacement function will be $f(x; a, b) = a\sin(x + b)$. This can be expressed in Python as follows:

In [None]:
def f(x, params):
    a, b = params
    return a * np.sin(x + b)

Let's plot this function for $0 \le x \le 10$ where $a = 2$ and $b = 1$. That is, we're plotting $f(x, 2, 1)$ for $0 \le x \le 10$.

In [None]:
# True parameters
a_true, b_true = 2, 1
params = (a_true, b_true)
x = np.linspace(0, 10, 100)
y_true = f(x, params)

# Plot data
plt.plot(x, y_true, 'r-', label='True function')
plt.xlabel('x')
plt.ylabel('y')
plt.legend();

We can simulate some *noisy* data by adding Gaussian (normally distributed) noise to the true function.

In [None]:
y_noisy = y_true + 0.1 * np.random.normal(size=x.size)

# Plot data
plt.scatter(x, y_noisy, label='Noisy data')
plt.plot(x, y_true, 'r-', label='True function')
plt.xlabel('x')
plt.ylabel('y')
plt.legend();

## 2. Measuring the error

What if we *don't* know the parameters $a$ and $b$? If we only have the data, how would we estimate the parameters?

Here is some real data from an observation.

In [None]:
a_real_true, b_real_true = 3.4, 4.2
params = (a_real_true, b_real_true)
x = np.linspace(0, 10, 100)
y_real_true = f(x, params)
y_real = y_real_true + 0.01 * np.random.normal(size=x.size)

### Activity 1

**Play** with the parameters `a_guess` and `b_guess` belows to try to find a good fit.

In [None]:
# Guess parameters. Change these!!
a_guess, b_guess = 2, 3

y_guess = f(x, (a_guess, b_guess))
plt.scatter(x, y_real, label='Real data')
plt.plot(x, y_guess, 'r-', label='Estimated function')
plt.xlabel('x')
plt.ylabel('y')
plt.legend();

## 2. The Optimization Problem
We want to find parameters $(a, b)$ that minimize the **sum of squared errors**:

$$
J(a, b) = \frac{1}{2n} \sum_{i=1}^n (y_i - f(x_i; a, b))^2.
$$

In [None]:
def J(x, y, params):
    n = x.size
    residuals = y - f(x, params)
    return (residuals**2).sum() / (2*n)

### Activity 2

Play with the values `a_guess` and `b_guess` below to try to minimise $J$.

In [None]:
a_guess, b_guess = 1, 1

J(x,y_real, (a_guess, b_guess))

## 3. Contour plots of the cost function

For a fixed data set, our cost function $J$ depends on two parameters $a$ and $b$. We can think of the $J$-values as representing the height of a surface above the $a$-$b$ plane.

Our goal is to find the location of the lowest point on this surface: the $(a, b)$ pair that minimises $J$.

To start, let's visualise the surface using a contour plot.

In [None]:
%matplotlib widget
def contour_plot(func, amin=-10, amax=10, bmin=-10, bmax=10):
    a_vals = np.linspace(amin, amax, 100)
    b_vals = np.linspace(bmin, bmax, 100)
    A, B = np.meshgrid(a_vals, b_vals)
    
    # Compute the cost on the grid
    Cost = np.empty_like(A)
    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            Cost[i, j] = func(A[i, j], B[i, j])
    
    # Plot the contour
    plt.figure(figsize=(6,5))
    CS = plt.contour(A, B, Cost, levels=15)
    plt.clabel(CS, inline=True, fontsize=8)
    plt.xlabel('a parameter')
    plt.ylabel('b parameter')
    plt.title('Contour Plot')
    plt.show()

### Activity 3

The plot below shows contours of constant $J$. We're trying to find the lowest value. Estimate the optimal $(a, b)$ pair from the plot. Feel free to change the plot area by altering `amin`, `amax`, `bmin` and `bmax`.

In [None]:
contour_plot(lambda a, b : J(x, y_real, (a, b)), amin=-6, amax=6, bmin=-6, bmax=6)

## 4. Gradient descent

Gradient descent is a general method for numerically finding the minimal value of a function of many variables.

**Idea** if you're standing on a hill and want to find the top of the hill, walk in the steepest upward direction. To find the bottom of the hill, walk in the steepest downward direction.

It turns out that the *gradient vector* gives this direction of steepest ascent. In our case, this is the vector
$$
\nabla J = \left(\frac{\partial J}{\partial a},\frac{\partial J}{\partial b}\right),
$$
where $\partial J/ \partial a$ is the *partial derivative* of $J$ with respect to $a$.

Evaluated at a point $(a_0, b_0)$, this partial derivative is the limit, as $h$ tends to $0$, of
$$
\frac{J(a_0 + h, b_0) - J(a_0, b_0)}{h},
$$
much like an ordinary dervative. Likewise, the $\partial J/ \partial b$, evaluated at $(a_0, b_0)$ is the limit as $h$ tends to $0$ of
$$
\frac{J(a_0, b_0 + h) - J(a_0, b_0)}{h}.
$$

Here's a function that numerically estimates the gradient vector:

In [None]:
def numerical_gradient(func, params, h=1e-5):
    """
    Numerically estimates the gradient of func at params using central differences.
    """
    params = np.array(params, dtype=float)
    grad = np.zeros_like(params)

    for i in range(len(params)):
        step = np.zeros_like(params)
        step[i] = h
        grad[i] = (func(params + step) - func(params - step)) / (2 * h)

    return grad

Using this function, we'll plot the gradient vectors as arrows over our contour plot.

In [None]:
def contour_with_gradients(func, amin=-10, amax=10, bmin=-10, bmax=10):
    a_vals, b_vals = np.linspace(amin, amax, 100), np.linspace(bmin, bmax, 100)
    size = min(amax - amin, bmax - bmin)
    A, B = np.meshgrid(a_vals, b_vals)
    
    Cost = np.empty_like(A)
    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            Cost[i, j] = func((A[i, j], B[i, j]))

    # Compute gradients
    step = 8
    A_s, B_s = A[::step, ::step], B[::step, ::step]
    dA, dB = np.zeros_like(A_s), np.zeros_like(B_s)
    for i in range(A_s.shape[0]):
        for j in range(A_s.shape[1]):
            dA[i,j], dB[i,j] = -numerical_gradient(func,(A_s[i,j], B_s[i,j]))
    
    norm = np.linalg.norm(np.array((dA, dB)), axis=0)
    dA, dB = dA / norm, dB / norm
    
    # Plot contour + quiver (gradient arrows)
    plt.figure(figsize=(6,5))
    CS = plt.contour(A, B, Cost, levels=20, cmap = 'cividis')
    plt.clabel(CS, inline=True, fontsize=8)
    plt.quiver(A_s, B_s, dA, dB, 
               angles='xy', scale_units='xy', scale=30/size, width=0.003, color='red')
    plt.xlabel('a parameter')
    plt.ylabel('b parameter')
    plt.title('Contours with Gradient Descent Directions')
    plt.show()


### Activity 4
Run the code below to produce the contour plot with gradient vectors. What is the relationship between the contour lines and the direction of the gradient vectors?

In [None]:
contour_with_gradients(lambda v : J(x, y_real, v), amin=-6, amax=6, bmin=-6, bmax=6)

## 5. Implementing Gradient Descent

To find the minimum value of $J$ using gradient descent, we'll begin with a starting choice $(a_0, b_0)$ of parameters and iteratively compute:
$$
(a_{i+1}, b_{i+1}) = (a_i, b_i) + \alpha \nabla J,
$$
for a small value $\alpha$. This has the effect of taking a small step from point $(a_i, b_i)$ in the direction of the gradient vector $\nabla J$.

Gradient descent is heavily used in training machine learning models where the value $\alpha$ is called the *learning rate*.

### Activity 5

Run the code below. Try a few different values of the learning rate $\alpha$ and number of iterations.

In [None]:
# Initialize parameters
params = np.array([5.0, 1.0])  # [a, b]
alpha = 3.2   # learning rate
iterations = 50 # number of iterations

cost_history = []
for i in range(iterations):
    grads = numerical_gradient(lambda v : J(x, y_real, v), params)
    params = params - alpha * grads
    cost_history.append(J(x, y_real, params))

a_gd, b_gd = params
print(f"After GD: a={a_gd:.3f}, b={b_gd:.3f}, J={J(x, y_real, params):.3e}")

## 6. Visualizing Convergence and Fit
Let's plot the cost function over iterations and compare the fitted curve to the data.

In [None]:
# Plot cost history
plt.figure()
plt.plot(cost_history)
plt.xlabel('Iteration')
plt.ylabel('Cost J')
plt.title('Gradient Descent Convergence')

# Plot fitted curve
plt.figure()
plt.scatter(x, y_real, label='Real data')
y_fit_gd = f(x, params)
plt.plot(x, y_fit_gd, 'g--', label='GD Fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend();

## 7. Discussion
- You can see how the cost decreases over iterations when the learning rate is appropriate.
- Too large a learning rate can cause divergence; too small makes convergence slow.
- SciPy's `curve_fit` uses more advanced methods (Levenberg–Marquardt) for faster, more reliable fitting.

Feel free to experiment with different learning rates, number of iterations, and models!