                                                
 #  Gradient Descent notebook

## Introduction

This notebook examines the mechanism of gradient descent and its dependence on key parameter like the number of iterations and the learning rate. The goal is to understand how gradient descent works and to gain hands-on experience with TensorFlow.

I begin with a simple case: a one-dimensional function. An interactive widget lets users adjust parameters and visually observe the gradient descent path. This provides an intuitive understanding of how the algorithm converges.

Next, I extend the exploration to two dimensions using TensorFlow. The gradient descent path is plotted on a three-dimensional surface representing the function. The widget shows how the optimization process works on more than one dimension. 

Finally, I present another interactive widget to visualize the gradient descent path on the two-dimensional level curves of the function, offering a complementary perspective on the optimization process.

Through these exercises, the notebook aims to strengthen understanding of both the gradient descent algorithm and the fundamentals of TensorFlow. 

Part of  code is an adaptation of the code presented in file Utah_Gradient (see references)
                                                                             

In [1]:
# Import libraries 
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
from ipywidgets import interact, IntSlider, FloatSlider

# to be used for two  the dimensional gradient
import tensorflow as tf



## 1)  Gradient descent in one dimension 

Consider the simple cuadratic funtion
$$
f(x) = (x - 2)^2\,.
$$
The gradient descent method  is an iterative process that finds the point $x^*$ that minimize the function or the best point near to the minimum. 
To reach this point starting from a point $x^i$, one  moves in the direction of decreasing function values. This direction is given by the opposite sign of the derivative $f^\prime$.  The movement is done by small steps $\epsilon$ iteratively and is described by the equation

$$
x^{i+1}  = x^i - \epsilon   \, f^\prime (x^i) \,,
$$
where   $\epsilon$ is a positive scalar called the **learning rate**. The process stops until certain criteria are accomplished. 
In these notes, the algorithm stops when the maximum number of iterations is reached. It works becase I am considering a  convex quadratic function 
which has only a global minimum. For more complex functions with several local minima or with stationary points other stopping criteria have to be considered. 

There are several proposals to choose  the step $\epsilon$. Here, $\epsilon$ will be considered constant throughout all iterations. 

The following shows a widget with number of iterations $n= 15$ and different values of learning rates (from 0.01 to 1). 
For each value of learning rate, one can iterate along the 15 iterations. 




In [2]:
# ==========================================
# Example quadratic function
# ==========================================
# define the function
def f(x):
    return (x-2)**2

# define the derivatie 
def gradient(x):
    return  2*(x-2)

# define the algorithm
def gradient_descent(learning_rate, n_iterations):
    x = 5.0  # initial point
    positions = [x]
    for _ in range(n_iterations):
        x = x - learning_rate * gradient(x)
         # stop if x explodes
        if abs(x) > 1e6:
            break
        positions.append(x)
    return positions

# define the plot and variables x and y for the plot
def plot_gd(n_iterations, learning_rate):
    xdata = np.linspace(-1, 6, 200) 
    ydata = f(xdata)

    # Run Gradient Descent
    x_val = gradient_descent(learning_rate, n_iterations)
    y_val = [f(x) for x in x_val]

    # Plot the function
    plt.figure(figsize=(7,5))
    plt.plot(xdata, ydata, label="f(x) = (x-2)²")
    plt.scatter(x_val, y_val, c="red")
    plt.plot(x_val, y_val, c="red", linestyle="--", label="Gradient descent path")
    plt.xlabel("x")
    plt.ylabel("f(x)")
    plt.title(r"Gradient Descent with 15 iterations and several learning rates")
    plt.legend()
    plt.grid(True)
    plt.show()

# Interactive sliders
interact(plot_gd, 
         n_iterations=IntSlider(value=5, min=1, max=15, step=1, description="Iterations"),
         learning_rate=FloatSlider(value=0.5, min=0.01, max=1.0, step=0.01, description="Learning Rate"));


interactive(children=(IntSlider(value=5, description='Iterations', max=15, min=1), FloatSlider(value=0.5, desc…

* From the widget above, it is possible to see that for  $\epsilon < 0.12$, the minimum of the functions is not reached after the 15 interations.
* For a learning rate equal to $0.5$, the minimum is reached in just one step (one iteration)
* When $\epsilon = 0.2$, the minimum is reached smoothly among the 15 iterations. It seems to be an optimal value for the learning rate.
* On the other hand, for  learning values above $0.57$, the algorithm starts to oscillate around the minimum.

##  2) Gradient Descent of a two-dimensional function

In this case, for a function of two variables $f(x,y)$ one has to consider

$$
\vec{v}^{i+1} = \vec{v}^i + \epsilon \, \vec{\nabla} f(\vec{v}^i)
$$
where $\vec{v} = (x,y)$ and $\vec{\nabla}f$ is the gradient of $f$.

In [4]:

# ====================================================
# Gradient Descent with TensorFlow
# ====================================================
   
#  Ordered triples (x, y, z) for plotting
def function(vec):
    """ 
    Argument: 
    vec         -- vector v = (x,y) 
    
    Return:
    space_point --  a 3D point represented as the vector (x,y,z)
    """
    x, y = vec[0], vec[1]
    z =  (0.75 * x - 1.5) ** 2 + (y - 2.0) ** 2 + 0.25 * x * y
    space_point = tf.stack([x, y, z], axis=0)
    return space_point

# Gradient vector
def gradient(vec):
    """ 
    Argument: 
    vec         -- vector v = (x,y) 
    
    Return:
    grad_fun --  2D gradient vector (df/dx, df/dy)
    """
    x, y = vec[0], vec[1]
    df_dx = 1.125 * x - 2.25 + 0.25 * y
    df_dy = 2.0 * y - 4.0 + 0.25 * x
    grad_fun = tf.stack([df_dx, df_dy], axis=0)
    return grad_fun

# Gradient Descent function
def gradient_descent(learning_rate, num_iter):
    """ 
    Arguments: 
    learning_rate -- the learning_rate 
    num_iter      -- number of iterarions

    Return:
    
    """
    v_init = tf.constant([5.0, 4.0], dtype=tf.float32)
    v = v_init

    update_v = [v_init]
    grad_norms = []
    #points = []       # store function evaluations
    
    for i in range(1, num_iter + 1):
        grad = gradient(v)
        v = v - learning_rate *  grad
        update_v.append(v)
        grad_norms.append(tf.norm(grad).numpy())
       

    # store the update values for the plot
    points = []
    for k in range(len(update_v)):
        point= function(update_v[k])
        points.append(point)
    
    # Convert tensors to numpy arrays for plotting them
    points_np = [p.numpy() for p in points]

    # Split into x, y, z lists
    x_vals = [p[0] for p in points_np]
    y_vals = [p[1] for p in points_np]
    z_vals = [p[2] for p in points_np]
    

    # Create a grid for contour plotting and plotting the function
    xdata = tf.linspace(-1.0, 6.0, 50)
    ydata = tf.linspace(-1.0, 6.0, 50)
    X, Y = tf.meshgrid(xdata, ydata)
    # Define the functions of two variables
    Z =  (0.75*X -1.5)**2 + (Y-2.0)**2 + 0.25*X*Y
    # Plot the surface
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.plot_surface(X, Y, Z, edgecolor='none', alpha=0.6 )
    
    # Define the levels for the contour plot
    lev = tf.linspace(0, 20, 21)
    # Plot the 3D contour lines
    ax.contour3D(X, Y, Z, levels=lev, cmap='gray')

    #  Plot gradient descent path
    ax.scatter(x_vals, y_vals, z_vals, color="blue", s=30, label="Points")
    # Optionally connect them with a line
    ax.plot(x_vals, y_vals, z_vals, color="red", linestyle="--")

    # Set plot labels
    ax.set_xlabel('x axis')
    ax.set_ylabel('y axis')
    ax.set_zlabel('z axis')
    ax.set_title('z = f(x,y)')
    plt.title(f'Gradient Descent Path on the surface of the 3D plot ')
    plt.show()


    final_point =  update_v[-1]
    final_grad = grad_norms[-1]
    print("Final point:", final_point.numpy())
    print("Final Gradient norm:",final_grad)

# Create interactive widget
interact(gradient_descent,
         learning_rate=FloatSlider(value=0.9, min=0.01, max=1.0, step=0.01),
         num_iter=IntSlider(value=10, min=1, max=15));


interactive(children=(FloatSlider(value=0.9, description='learning_rate', max=1.0, min=0.01, step=0.01), IntSl…

## 3) Gradient Descent of 2D  plots
The path of the descent gradient can be also visualized on the 2D plot of the level curves. 

In [7]:
# ====================================================
# Visualization of the Gradient Descent Path on the plane
# ====================================================

# Define the function and gradient
def function(vec):
    x, y = vec[0], vec[1]
    z = (0.75 * x - 1.5) ** 2 + (y - 2.0) ** 2 + 0.25 * x * y
    return tf.stack([x, y, z], axis=0)

def gradient(vec):
    x, y = vec[0], vec[1]
    df_dx = 1.125 * x - 2.25 + 0.25 * y
    df_dy = 2.0 * y - 4.0 + 0.25 * x
    return tf.stack([df_dx, df_dy], axis=0)

# Gradient Descent function
def gradient_descent(learning_rate, num_iter):
    v_init = tf.constant([5.0, 4.0], dtype=tf.float32)
    v = v_init

    update_vals = [v_init]
    grad_norms = []

    for i in range(1, num_iter + 1):
        grad = gradient(v)
        v = v - learning_rate * grad
        update_vals.append(v)
        grad_norms.append(tf.norm(grad).numpy())

    # Create grid for contour plot
    xdata = tf.linspace(-1.0, 6.0, 50)
    ydata = tf.linspace(-1.0, 6.0, 50)
    X, Y = tf.meshgrid(xdata, ydata)
    Z = (0.75*X - 1.5)**2 + (Y - 2.0)**2 + 0.25*X*Y

    plt.figure(figsize=(8, 6))
    plt.contour(X, Y, Z, levels=20)

    # Plot gradient descent path
    vals = tf.stack(update_vals)
    plt.plot(vals[:, 0], vals[:, 1], marker='o', color='blue')
    # Optionally connect them with a line
    plt.plot(vals[:, 0], vals[:, 1], color="red", linestyle="--")
    plt.title(f'Gradient Descent Path on the 2D level curves')
    plt.xlabel('x'); plt.ylabel('y')
    plt.show()

    print("Final point:", update_vals[-1].numpy())
    print("Gradient norms:", grad_norms[-1])

# Create interactive widget
interact(gradient_descent,
         learning_rate=FloatSlider(value=0.9, min=0.01, max=1.0, step=0.01),
         num_iter=IntSlider(value=10, min=1, max=50));


interactive(children=(FloatSlider(value=0.9, description='learning_rate', max=1.0, min=0.01, step=0.01), IntSl…

## References:

* [Utah_Gradient](https://users.cs.utah.edu/~jeffp/IDABook/T6-GD.pdf)
* [Medium_Tensor_Flow](https://igormintz.medium.com/basic-tensor-creation-and-manipulation-with-tensorflow-ee4c910a00e2)