# Alternatives to gradient descent


Author: Mathurin Massias, Inria

In [None]:
import numpy as np 
import matplotlib.pyplot as plt 
from ipywidgets import FloatSlider, interact
%matplotlib inline

from matplotlib import rcParams 
rcParams["font.size"] = 16

## Newton's method


Newton's method iterations to minimize a function $f$ read $x_k = x_k - [\nabla^2 f(x_k)]^{-1} \nabla f(x_k)$ (somehow similar to gradient descent, but the gradient direction is modified by application of the inverse Hessian. 

Let's see its behavior on 
$f(x) = \sqrt{1 + x^2}$. One has: 

$$f'(x) = \frac{x}{\sqrt{1 + x^2}}$$ 
$$f''(x) = \frac{\sqrt{1 + x^2} - x \cdot x / \sqrt{1 + x^2}}{\sqrt{1 + x^2}} = \frac{1}{(1 + x^2)^{3/2}}$$

So Newton iteration read:
$$x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)} = x_k - (1 + x_k^2) x_k = - x_k^3$$


Depending on the starting point we will get very fast convergence, oscillations, or very fast divergence!

In [None]:
def plot_newton(x0):
    x = x0
    all_x = np.zeros(5)
    for i in range(5):
        all_x[i] = x
        x = - x ** 3
    
    a = np.linspace(-10, 10, num=100)
    fig, axarr =  plt.subplots(1, 2, figsize=(10, 4), constrained_layout=True)
    axarr[0].plot(a, np.sqrt(1 + a**2))
    axarr[0].scatter(all_x, np.sqrt(1 + all_x ** 2), c=plt.get_cmap("viridis")(np.linspace(0, 1, 5)))
    axarr[0].set_xlabel("x")
    axarr[0].set_ylabel("f(x)")
    axarr[1].semilogy(np.sqrt(1 + all_x ** 2) - 1)
    axarr[1].set_xlabel("iteration")
    axarr[1].set_ylabel("suboptimalty $f(x_k) - f(x^*)$")
    

In [None]:
interact(plot_newton, x0=FloatSlider(min=0.95, max=1.05, step=0.01))