In [1]:
import numpy as np

# Exercise 9.3
## Gradient Descent
The basic idea of the gradient descient method is to find the direction of greatest increase of the function based on the magnitude of its gradient. The process iterates until convergence, with each iteration taking a step size in that direction. However, a drawback is the optimal step size is not often worth the cost and it takes many iterations to converge. This method can converge quickly for certain problems, but can be very slow for others (such as a function with many narrow canyons or troughs). However, it is good to use initially if the initial starting point is not close to the optimal.

## Newton and Quasi-Newton Methods
Newton's method is a descent and local approximation method. It iterates by taking a step that's the size of the inverse of the Hessian in the direction of greatest increase (using its gradient at that point). If the function is a quadratic function, Newton's method can converge very rapidly. However, it is less appropriate to use if the starting point is far from the optimum, its Hessian is not positive definite, of the Hessian and inverse are expensive to compute. In order to circumvent the computational difficulty, quasi-Newton methods like BFGS can be used to approximate the Hessian (though worse convergence rate). 

## Conjugate Gradient
This method is between gradient descent and Newton's method by never computing nor storing Hessian approximations. The method moves along Q-conjugate directions. Conjugate gradient is best used for large quadratic optimization problems where the initial quandratic term matrix is symmetric, positive definite, and sparse. 

# Exercise 9.6

In [40]:
def steep_descent_quad(Q,b,x0,tol=1e-6,maxiters=200):
    for k in range(maxiters):
        Df = Q@x0-b
        if np.linalg.norm(Df)<tol:
            converge=True
            break
        converge=False
        alpha = Df.T@Df/((Df.T@Q)@Df)
        x1 = x0-alpha*Df
        x0=x1
    return x1,converge,k+1

In [41]:
Q = np.array([[2,-1,0],[-1,2,-1],[0,-1,2]])
b = np.array([2,4,9])
x0 = np.array([1,1,1])
result, convergence, iterations= steep_descent_quad(Q,b,x0)
result = np.round(result,decimals=3)
print(result,convergence,iterations)

[ 5.75  9.5   9.25] True 47


# Exercise 9.7

In [48]:
def df_forward(f,x,rerr):
    n = x.shape[0]
    m = f(x).shape
    if len(m)==0:
        m = 1
    h = 2*np.sqrt(rerr)
    Df = np.zeros((m,n))
    for i in range(n):
        ei = np.zeros(n)
        ei[i] = 1
        Df[:,i] = (f(x+h*ei)-f(x))*(1/h)
    return Df

In [51]:
f = lambda x: 0.5*x@(Q@x)-b@x+3 
Df_approx = df_forward(f,x0,1e-6)
print('Approximation:', Df_approx, 'Actual:', Q@x0-b)

Approximation: [[-0.998 -3.998 -7.998]] Actual: [-1 -4 -8]


# Exercise 9.10

For $f(\mathbf{x})=\frac{1}{2}\mathbf{x}^TQ\mathbf{x}-\mathbf{b}^T\mathbf{x}$, then we have that the derivative of $f$ is $Df = Q\mathbf{x}-\mathbf{b}$. Then the minimum is found at $Q\mathbf{x^*}-\mathbf{b} = 0 \Longrightarrow Q\mathbf{x^*} = \mathbf{b}$. If we take the Hessian, we have that $D^2f = Q$. From Newton's method and an initial starting point, we have that $\mathbf{x_1} = \mathbf{x_0}-Q^{-1}Df(\mathbf{x_0}) \Longrightarrow Q\mathbf{x_1} = Df(\mathbf{x_0})-Q\mathbf{x_0}$. Note that $Df = Q\mathbf{x}-\mathbf{b}$ for all $\mathbf{x}$. Thus, $Q\mathbf{x_1} = \mathbf{b} = Q\mathbf{x^*}$. Thus, $\mathbf{x_1}=\mathbf{x^*}$ so Newton's method reaches the optimal for quadratic problems in a single iteration.