# HW2 Problem 2: BFGS

We want to optimize the function $f: \mathbb{R}^3 \rightarrow \mathbb{R}$

$$f(x_1,x_2,x_3) = x_3\log(e^{\frac{x_1}{x_3}}+e^{\frac{x_2}{x_3}})+(x_3 - 2)^2+e^{\frac{1}{x_1+x_2}}$$
$$\textbf{dom } f=\{\mathbf{x}\in\mathbb{R}^3 : x_1+x_2>0,\,x_3>0\}$$

using the BFGS algorithm.

- Given starting point $x_0 = [2,3,5]^T$, convergence tolerance $\epsilon > 0$, starting matrix $H_0 = \mathbf{I}$, $k\leftarrow 0$
- While $||\nabla f_k||>\epsilon$:
    1. Get direction by solving $$p_k = -H_k^{-1} \nabla f_k$$
    2. Update $x_{k+1} = x_k + \alpha_k p_k$ where $\alpha_k$ is obtained from backtracking
    3. Define $s_k = x_{k+1} - x_k$ and $y_k = \nabla f_{k+1} - \nabla f_k$
    4. Compute $$H_{k+1} = (I - \rho_k s_k y_k^T)H_k(I - \rho_k s_k y_k^T) + \rho_k s_k s_k^T$$
- We can take $\epsilon = 10^{-4}$ and a constant $\rho_k = 10^4$

$$\nabla f(x_1,x_2,x_3)=
\begin{pmatrix}
\sigma \left(\dfrac{x_1 - x_2}{x_3}\right)
-\dfrac{e^{\frac{1}{x_1+x_2}}}{(x_1+x_2)^2}
\\[1.2em]
\sigma \left(-\dfrac{x_1 - x_2}{x_3}\right)
-\dfrac{e^{\frac{1}{x_1+x_2}}}{(x_1+x_2)^2}
\\[1.2em]
\log\!\big(e^{x_1/x_3}+e^{x_2/x_3}\big)
-\dfrac{x_2 + (x_1 - x_2)\sigma \left(\dfrac{x_1 - x_2}{x_3}\right)}
{x_3}
+2(x_3-2)
\end{pmatrix},$$

with the Sigmoid function $$\sigma(z) = \frac{1}{1+\exp(-z)}.$$

The finite difference approximation for the gradient at a point $x$ is: $$\nabla f(x) \approx \frac{f(x+h\mathbf{i})-f(x)}{h}$$

And the second derivative matrix is approximated as: $$\nabla^2 f(x) \approx \frac{\nabla f(x+h\mathbf{i}) - \nabla f(x)}{h}$$

In [1]:
# Import packages
import numpy as np
from numpy import log, exp
from scipy.special import expit

In [None]:
# Define the function
def f(x:np.ndarray) -> float:
    x1, x2, x3 = x 
    val = x3 * np.log(np.exp(x1 / x3) + np.exp(x2 / x3)) + (x3 - 2) ** 2 + np.exp(1 / (x1 + x2))
    return val

# Define the gradient of the function
def grad_f(x:np.ndarray) -> np.ndarray:
    
    x1, x2, x3 = x
    delta = (x1 - x2) / x3
    E = exp(1 / (x1 + x2))
    q = (x1 + x2)**2

    grad = np.zeros(3)
    grad[0] = expit(delta) - E / q
    grad[1] = expit(-delta) - E / q
    grad[2] = log(exp(x1 / x3) + exp(x2 / x3)) - (2 - x3) * 2 - (x2 + (x1 - x2)*expit(delta)) / x3

    return grad

# Define the approximate Hessian of the function
def hess_f(x:np.ndarray) -> np.ndarray:
    x = x.flatten()
    h = 1e-2  # Step size
    id = np.eye(len(x))  # Identity matrix

    # Construct the perturbation matrix with h values along the diagonal
    h_matrix = h * id

    # Calculate the forward differences for all components simultaneously
    perturbed_values = np.array([grad_f(x + h_vec) for h_vec in h_matrix])

    # Calculate the second derivative approximation
    approx_hessian = (perturbed_values - grad_f(x)) / h
    
    reshaped_approx_hessian = np.reshape(approx_hessian, (3, 3))
    
    return reshaped_approx_hessian 

In [4]:
# Parameters
alp = 0.4
beta = 0.5
eps = 10**(-5)
x_start = np.array([2,3,5])

# Domain check function
def in_domain(x: np.ndarray) -> bool:
    return (x[2] > 0) and ((x[0] + x[1]) > 0)

# BFGS Direction function
def bfgs_direction(x: np.ndarray, H: np.ndarray) -> np.ndarray:
    grad = grad_f(x)
    p = -np.linalg.solve(H, grad)
    return p

# Backtracking Line Search function
def backtracking_line_search(x: np.ndarray, p: np.ndarray, alpha: float, beta: float) -> float:
    t = 1.0
    while not in_domain(x + t * p) or f(x + t * p) > f(x) + alpha * t * np.dot(grad_f(x), p):
        t *= beta
    return t

# BFGS Algorithm
def bfgs_algorithm(x_start: np.ndarray, alp: float, beta: float, eps: float) -> np.ndarray:
    x = x_start
    H = np.eye(len(x_start))  # Initial Hessian approximation
    iteration = 0

    while np.linalg.norm(grad_f(x)) > eps:
        p = bfgs_direction(x, H)
        t = backtracking_line_search(x, p, alp, beta)
        s = t * p
        x_new = x + s
        y = grad_f(x_new) - grad_f(x)

        if np.dot(y, s) > 1e-10:  # To ensure numerical stability
            rho = 1.0 / np.dot(y, s)
            H = (np.eye(len(x_start)) - rho * np.outer(s, y)) @ H @ (np.eye(len(x_start)) - rho * np.outer(y, s)) + rho * np.outer(s, s)

        x = x_new
        iteration += 1

    return x, iteration

# Run the BFGS algorithm
optimal_x, iteration = bfgs_algorithm(x_start, alp, beta, eps)
print(f"Optimal solution: x = {optimal_x}, f(x) = {f(optimal_x)}")
print(f"Number of iterations: {iteration}")

Optimal solution: x = [0.92618691 0.92623001 1.65342475], f(x) = 3.908113786405081
Number of iterations: 142
