## Optimisation using Quasi-Newton Method:
### The Broyden-Fletcher-Goldfarb-Shanno (BFGS) update

In [117]:
# Imports
import matplotlib.pyplot as plt
import numpy as np
from numpy import log
import shutil
import sys
import os.path

Given function:
\begin{align*}
    f(x_1, x_2, x_3) = x_{3} \log \Big( e^{\frac{x_{1}} {x_{3}}}+ e^{\frac{x_{2}} {x_{3}}} \Big) + (x_{3}-2)^2 + e^{\frac{1}{x_{1} + x_{2}}}
\end{align*}

$ \textbf{dom} \; f: \{ \mathbf{x} \in \mathbb{R}^3 : x_1 +x _2 >0, x_3 > 0 \}  $

In [118]:
# Defining our function
def my_f(x):    
    val = x[2] * log(np.exp(x[0] / x[2]) + np.exp(x[1] / x[2])) + (x[2] - 2)**2 + np.exp(1/(x[0] + x[1]))
    return val

Defining the first derivative: 

$\nabla f = [ \partial f/\partial x_1 \; \partial f/\partial x_2 \; \partial f/\partial x_3]^T   $

$$ \implies \nabla f = \begin{Bmatrix}
\frac{e^{\frac{x_{1}} {x_{3}}}}{e^{\frac{x_{1}} {x_{3}}}+ e^{\frac{x_{2}} {x_{3}}}} - \frac{e^{ \frac{1}{x_1 + x_2}}}{(x_1 +x_2)^2}  \\ \\
\frac{e^{\frac{x_{2}} {x_{3}}}}{e^{\frac{x_{1}} {x_{3}}}+ e^{\frac{x_{2}} {x_{3}}}} - \frac{e^{ \frac{1}{x_1 + x_2}}}{(x_1 +x_2)^2} \\ \\
 log(e^{\frac{x_{1}} {x_{3}}}+ e^{\frac{x_{2}} {x_{3}}}) - \frac{x_1 e^{\frac{x_{1}} {x_{3}}} + x_2 e^{\frac{x_{2}} {x_{3}}}}{x_3 ( e^{\frac{x_{1}} {x_{3}}}+ e^{\frac{x_{2}} {x_{3}}}) } + 2(x_3-2)
\end{Bmatrix}$$

In [119]:
# Defining the first derivative of the function
def nabla_f(x):
    x1, x2, x3 = x[0], x[1], x[2]
    f = np.array([
        [np.exp(x1 / x3) / (np.exp(x1 / x3) + np.exp(x2 / x3)) - (1/((x1+x2)**2))*np.exp(1/(x1 + x2))],
        [np.exp(x2 / x3) / (np.exp(x1 / x3) + np.exp(x2 / x3)) - (1/((x1+x2)**2))*np.exp(1/(x1 + x2))],
        [np.log(np.exp(x1 / x3) + np.exp(x2 / x3)) - (x1 * np.exp(x1 / x3) + x2 * np.exp(x2 / x3)) /
         (x3 * (np.exp(x1 / x3) + np.exp(x2 / x3))) + 2 * (x[2] - 2)]
    ])
    return f

Defining the Second Derivative:

The gradient vector, denoted as $\nabla f$, is the vector of partial derivatives:
$
\nabla f(x) = \left(\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n}\right)
$

The finite difference approximation for the gradient at a point $x$ is given by:
$
\nabla_f(x) \approx \frac{f(x + h\mathbf{i}) - f(x)}{h}
$
where $\mathbf{i}$ is a unit vector along one of the coordinate axes.

The second derivative matrix is then approximated as:
$
\nabla^2 f(x) \approx \frac{1}{h} \left(\nabla_f(x + h\mathbf{i}) - \nabla_f(x)\right)
$

The reshaped second derivative matrix is a $3 \times 3$ matrix obtained from the flattened vector.

In [120]:
# Defing second derivative of the function
def nabla2_f(x):
    x = x.flatten()
    h = 1e-2  # Step size
    identity_matrix = np.eye(len(x))  # Identity matrix

    # Construct the perturbation matrix with h values along the diagonal
    h_matrix = h * identity_matrix

    # Calculate the forward differences for all components simultaneously
    perturbed_values = np.array([nabla_f(x + h_vec) for h_vec in h_matrix])

    # Calculate the second derivative approximation
    second_derivative_matrix = (perturbed_values - nabla_f(x)) / h
    
    reshaped_second_derivative_matrix = np.reshape(second_derivative_matrix, (3, 3))
    
    return reshaped_second_derivative_matrix 

Defining parameters for backtracking search and the start point:

In [121]:
alp = 0.4
beta = 0.5
eps = 10**(-5)
x_start = np.array([2,3,5])

Ensuring domain:

$ \text{While} \; x + t\Delta x \notin \textbf{dom} f, \text{ set } t := \beta t $ 

Here, the direction $\Delta x$ has been taken as a parameter in the defined function and thus we need not explicitly calculate it again.

In [122]:
# Ensuring Domain
def domain_t(x,direction):
    t = 1
    while True:
        v = x + t * direction
        e3 = v[2]
        e2 = v[1]
        e1 = v[0]

        if e3 > 0 and (e2+e1>0):
            return t  # Exit the loop and return 't' if the condition is met

        # If (e3) or (e1+ e2) is negative , adjust 't' and update 'x'
        t *= beta

    return None  # Return None if the condition doesn't satisfy within the maximum iterations (which can be defined)

Backtracking algorithm:

$
\text{Given a descent direction } \Delta x  \text{ for } f \text{ at } x \in \textbf{dom} f, \alpha \in (0, 0.5), \beta \in (0, 1).$

\begin{array}{l}
\text{Set } t := 1. \\ 
\text{Ensure domain:} \; \text{While} \; x + t\Delta x \notin \textbf{dom} f, \text{ set } t := \beta t \\
\text{While } f(x + t\Delta x) > f(x) + \alpha t \nabla f(x)^T \Delta x, \text{ set } t := \beta t.
\end{array}



In [123]:
# Backtracking Algorithm
def Backtrack_t(x,direction):
    t = domain_t(x,direction)
    del_f = nabla_f(x).flatten()
    
    xv = x + t * direction
    le = my_f(xv)                                              # Left expression 
    re1 = my_f(x)
    re2 = np.dot(del_f.T,direction.flatten())           
    re = re1 + alp * t * re2                                   # Right expression

    while le > re:
        t *= beta
        
        xv = x + t * direction
        le = my_f(xv)          
        re = re1 + alp * t * re2
    return t     

### Running the BFGS Algorithm: 

**Given:**
- Starting point $x_0$
- Convergence tolerance $\epsilon > 0$ $(=10^{-5})$
- Starting matrix $H_0$ (taken as Identity matrix)
  
**Initialization:** $k \gets 0$

**While:** $\| \nabla f_k \| > \epsilon$
1. Compute search direction by solving:
   $ p_k = - H_k \nabla f_k $
2. Set $x_{k+1} = x_k + \alpha_k p_k$, where $\alpha_k$ is computed from backtracking line search procedure.
3. Define $s_k = x_{k+1} - x_k$ and $y_k = \nabla f_{k+1} - \nabla f_k$.
4. Compute $H_{k+1}$ using BFGS (as given in the tutorial).
5. $k \gets k + 1$

Finally, as mentioned in the implementation note to set $\rho_k $ as a constant after $y^T_k s_k$ gets smaller some certain $\epsilon$ say $10^{-5}$.\
We have used an $\epsilon = 10^{-4}$ and set the value for  $\rho_k = 10^{4}$.


In [124]:
# Running BFGS Algorithm
iter =0
H_start = np.eye(3)                             # Inverse Hessian aprroxmiation as Identity
norm_nabla_f = np.dot(nabla_f(x_start).flatten(), nabla_f(x_start).flatten())**0.5

while norm_nabla_f > eps:
    
    # Computing Search Direction        (By Avoiding explicit matrix inversion for numerical stability and efficiency)
    del_f = nabla_f(x_start).flatten()
    direction = - np.dot(H_start,del_f)                        # Finding Direction
    
    t = Backtrack_t(x_start,direction)                         # Choosing t using Line Search
    
    x_new = x_start + t * direction       # Update Step
    
    s = x_new - x_start                  
    y = nabla_f(x_new).flatten() - del_f
    
    p = np.dot(y.T,s)   # Finding p
    if p<1e-4:
        p = 1e-4
    p = 1/p            # 1/p
    
    t1 = np.eye(3) - p*np.dot(s,y.T)
    t2 = p*np.dot(s,s.T)
    
    m1 = np.dot(t1,H_start)
    t1 = np.dot(m1,t1) 

    H_new = t1 + t2     # Calculating H_(k+1)
    
    H_start = H_new
    x_start = x_new
    iter=iter+1        # Iteration counter
    norm_nabla_f = np.dot(nabla_f(x_start).flatten(), nabla_f(x_start).flatten())**0.5
    
print("Optimal solution:", x_start)
fopt = my_f(x_start)
print("Optimal function value:", fopt)
print("Number of iterations taken to converge:", iter)

Optimal solution: [0.92619643 0.92622045 1.65342624]
Optimal function value: 3.908113786305548
Number of iterations taken to converge: 183
