**Group Details:**

- 2022BCD0038 - K. Sri Chaitan
- 2022BCD0041 - Karthik Raj
- 2022BCD0021 - Siddharth Chitikesi



# Optimization Paths for the Rosenbrock Function Using Adagrad, Adam, and RMSprop

This notebook demonstrates the optimization paths taken by three different optimization methods—**Adagrad**, **Adam**, and **RMSprop**—to minimize the Rosenbrock function. These gradient-based optimization algorithms are commonly used in machine learning for parameter optimization, and each employs unique strategies to adjust the learning rate during optimization.

The Rosenbrock function, given by:

$$ f(x, y) = (1 - x)^2 + 100(y - x^2)^2 $$

is a well-known test problem for optimization methods, as it has a narrow, curved valley making it challenging to optimize. Each of these optimization algorithms leverages different approaches to effectively navigate such complex landscapes.
    


# Import Libraries

This cell imports the necessary libraries for implementing and visualizing the optimization algorithms.

- `numpy` is used for numerical calculations, particularly for handling matrix operations and vectorized computations.
- `matplotlib.pyplot` is used for plotting the optimization paths.
- Other specialized libraries or functions may be imported within each algorithm section if needed.

These libraries are essential for performing mathematical operations, handling array data structures, and visualizing the results of each optimization method on the Rosenbrock function.
    

# Import Libraries

This cell imports the necessary libraries for implementing and visualizing the optimization algorithms.

- `numpy` is used for numerical calculations, particularly for handling matrix operations and vectorized computations.
- `matplotlib.pyplot` is used for plotting the optimization paths.


In [None]:
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt

# Rosenbrock Function and Gradient

This cell defines the **Rosenbrock function** and its gradient. The Rosenbrock function is a popular test problem for optimization, defined as:

$$ f(x, y) = (1 - x)^2 + 100(y - x^2)^2 $$

The gradients `df_dx` and `df_dy` are necessary for gradient-based optimization methods like Adagrad, Adam, and RMSprop.


In [None]:
# Define the Rosenbrock function
def rosenbrock(w):
    x, y = w
    return (1 - x)**2 + 100 * (y - x**2)**2

# Define the gradient of the Rosenbrock function
def rosenbrock_grad(w):
    x, y = w
    grad_x = -2 * (1 - x) - 400 * x * (y - x**2)
    grad_y = 200 * (y - x**2)
    return np.array([grad_x, grad_y])

# Adagrad Implementation

This cell implements the **Adagrad** optimization algorithm. Adagrad adjusts the learning rate for each parameter based on the cumulative sum of squared gradients, reducing the step size for frequently updated parameters.

In this function:
- `grad_square_sum` keeps track of the accumulated squared gradients.
- `adjusted_grad` adjusts the learning rate to adapt to the parameter scale.
- The method iteratively updates parameters `x` and `y` until convergence or the maximum number of iterations is reached.


In [None]:
# Adagrad optimizer with path tracking
def adagrad_path(w_init, grad_fn, alpha=0.01, epsilon=1e-6, max_iters=10000):
    w = np.array(w_init, dtype=np.float64)
    grad_accum = np.zeros_like(w)
    path = [w.copy()]
    for k in range(max_iters):
        grad = grad_fn(w)
        if np.linalg.norm(grad) < epsilon:
            break
        grad_accum += grad**2
        w -= alpha * grad / (np.sqrt(grad_accum) + epsilon)
        path.append(w.copy())
    return np.array(path), k

# RMSprop Implementation

This cell contains the **RMSprop** optimization algorithm, which adjusts the learning rate based on the moving average of squared gradients.

RMSprop parameters include:
- `beta`, controlling the decay rate for the moving average.
- `grad_square_avg`, which maintains the running average of squared gradients.

RMSprop is well-suited for non-stationary objectives and addresses the issues of Adagrad’s rapidly decaying learning rate.


In [None]:
# RMSProp optimizer with path tracking
def rmsprop_path(w_init, grad_fn, alpha=0.001, beta=0.9, epsilon=1e-6, max_iters=10000):
    w = np.array(w_init, dtype=np.float64)
    grad_accum = np.zeros_like(w)
    path = [w.copy()]
    for k in range(max_iters):
        grad = grad_fn(w)
        if np.linalg.norm(grad) < epsilon:
            break
        grad_accum = beta * grad_accum + (1 - beta) * grad**2
        w -= alpha * grad / (np.sqrt(grad_accum) + epsilon)
        path.append(w.copy())
    return np.array(path), k

# Adam Implementation

This cell implements the **Adam** optimization algorithm, which combines elements of both Adagrad and RMSprop. Adam uses running averages of both gradients and squared gradients to adapt the learning rate.

Key parameters in Adam include:
- `beta1` and `beta2`, which control the decay rates for the moving averages of the gradient (`m`) and squared gradient (`v`).
- `m_hat` and `v_hat`, bias-corrected versions of `m` and `v`, prevent initial bias toward zero.

Adam is generally faster to converge and is widely used in machine learning applications.


In [None]:
# Adam optimizer with path tracking
def adam_path(w_init, grad_fn, alpha=0.01, beta1=0.9, beta2=0.999, epsilon=1e-6, max_iters=10000):
    w = np.array(w_init, dtype=np.float64)
    m, v = np.zeros_like(w), np.zeros_like(w)
    path = [w.copy()]
    for k in range(1, max_iters + 1):
        grad = grad_fn(w)
        if np.linalg.norm(grad) < epsilon:
            break
        m = beta1 * m + (1 - beta1) * grad
        v = beta2 * v + (1 - beta2) * grad**2
        m_hat, v_hat = m / (1 - beta1**k), v / (1 - beta2**k)
        w -= alpha * m_hat / (np.sqrt(v_hat) + epsilon)
        path.append(w.copy())
    return np.array(path), k

# Plotting Optimization Paths

This cell is responsible for visualizing the optimization paths taken by the Adagrad, Adam, and RMSprop algorithms on the Rosenbrock function. The contour plot shows the landscape of the Rosenbrock function, while the paths demonstrate each optimizer’s convergence behavior.

The function `plot_paths`:
- Plots the contour map of the Rosenbrock function.
- Overlays the paths of each optimization algorithm to allow for visual comparison.


In [None]:
# Initial point
w_init = [0.0, 1.5]

# Generate paths and iteration counts
adagrad_path_points, adagrad_iters = adagrad_path(w_init, rosenbrock_grad, alpha=0.01, epsilon=1e-6)
rmsprop_path_points, rmsprop_iters = rmsprop_path(w_init, rosenbrock_grad, alpha=0.001, epsilon=1e-6)
adam_path_points, adam_iters = adam_path(w_init, rosenbrock_grad, alpha=0.01, epsilon=1e-6)

# Contour plot of the Rosenbrock function
x = np.linspace(-2, 2, 400)
y = np.linspace(-1, 3, 400)
X, Y = np.meshgrid(x, y)
Z = rosenbrock([X, Y])

plt.figure(figsize=(12, 8))

# Adjust contour levels and use a more visible color map
levels = np.logspace(0.1, 3.5, 30)  # Adjusted levels for better contour resolution
contour = plt.contour(X, Y, Z, levels=levels, norm=LogNorm(), cmap='plasma')  # Using 'plasma' for better contrast
plt.clabel(contour, inline=1, fontsize=10)

# Plot paths with distinct markers
plt.plot(adagrad_path_points[:, 0], adagrad_path_points[:, 1], 'o-', color="cyan", label="Adagrad", markersize=5)
plt.plot(rmsprop_path_points[:, 0], rmsprop_path_points[:, 1], 's-', color="magenta", label="RMSProp", markersize=5)
plt.plot(adam_path_points[:, 0], adam_path_points[:, 1], 'x-', color="red", label="Adam", markersize=5)

# Starting and optimal points
plt.plot(w_init[0], w_init[1], 'ko', markersize=8, label="Start")
plt.plot(1, 1, 'k*', markersize=15, label="Optimal (1, 1)")

# Color bar and labels
plt.colorbar(contour, label='Rosenbrock Function Value')
plt.title(f"Optimizer Paths on Rosenbrock Contour\n"
          f"Adagrad: {adagrad_iters} iters, RMSProp: {rmsprop_iters} iters, Adam: {adam_iters} iters")
plt.xlabel("w1")
plt.ylabel("w2")
plt.legend()
plt.grid(True)
plt.show()