# HW2 Problem 1: 

We want to optimize the function $f: \mathbb{R}^3 \rightarrow \mathbb{R}$

$$f(x_1,x_2,x_3) = x_3\log(e^{\frac{x_1}{x_3}}+e^{\frac{x_2}{x_3}})+(x_3 - 2)^2+e^{\frac{1}{x_1+x_2}}$$
$$\textbf{dom } f=\{\mathbf{x}\in\mathbb{R}^3 : x_1+x_2>0,\,x_3>0\}$$

The gradient is:

$$\nabla f(x_1,x_2,x_3)=
\begin{pmatrix}
\dfrac{e^{x_1/x_3}}{e^{x_1/x_3}+e^{x_2/x_3}}
-\dfrac{e^{\frac{1}{x_1+x_2}}}{(x_1+x_2)^2}
\\[1.2em]
\dfrac{e^{x_2/x_3}}{e^{x_1/x_3}+e^{x_2/x_3}}
-\dfrac{e^{\frac{1}{x_1+x_2}}}{(x_1+x_2)^2}
\\[1.2em]
\log\!\big(e^{x_1/x_3}+e^{x_2/x_3}\big)
-\dfrac{x_1 e^{x_1/x_3}+x_2 e^{x_2/x_3}}
{x_3\big(e^{x_1/x_3}+e^{x_2/x_3}\big)}
+2(x_3-2)
\end{pmatrix}$$

To simplify the gradient, we can use the Sigmoid function `scipy.special.expit`
$$\sigma(z) = \frac{1}{1+\exp(-z)}$$

Now the gradient is:

$$\nabla f(x_1,x_2,x_3)=
\begin{pmatrix}
\sigma \left(\dfrac{x_1 - x_2}{x_3}\right)
-\dfrac{e^{\frac{1}{x_1+x_2}}}{(x_1+x_2)^2}
\\[1.2em]
\sigma \left(-\dfrac{x_1 - x_2}{x_3}\right)
-\dfrac{e^{\frac{1}{x_1+x_2}}}{(x_1+x_2)^2}
\\[1.2em]
\log\!\big(e^{x_1/x_3}+e^{x_2/x_3}\big)
-\dfrac{x_2 + (x_1 - x_2)\sigma \left(\dfrac{x_1 - x_2}{x_3}\right)}
{x_3}
+2(x_3-2)
\end{pmatrix}$$

In [1]:
# Import packages
import numpy as np
from numpy import log, exp
from scipy.special import expit

In [2]:
# Define the function
def f(x:np.ndarray) -> float:
    x1, x2, x3 = x 
    val = x3 * np.log(np.exp(x1 / x3) + np.exp(x2 / x3)) + (x3 - 2) ** 2 + np.exp(1 / (x1 + x2))
    return val

# Define the gradient of the function
def grad_f(x:np.ndarray) -> np.ndarray:
    
    x1, x2, x3 = x
    delta = (x1 - x2) / x3
    E = exp(1 / (x1 + x2))
    q = (x1 + x2)**2

    grad = np.zeros(3)
    grad[0] = expit(delta) - E / q
    grad[1] = expit(-delta) - E / q
    grad[2] = log(exp(x1 / x3) + exp(x2 / x3)) - (2 - x3) * 2 - (x2 + (x1 - x2)*expit(delta)) / x3

    return grad

## Gradient Descent with Backtracking Line Search

In [3]:
# Parameters
alpha = 0.4
beta = 0.5
eps = 1e-5
x0 = np.array([3, 4, 5])

# Ensuring we stay within the domain
def domain(x:np.ndarray) -> float:
    t = 1
    while True:
        v = x - t * grad_f(x).flatten()
        e3 = v[2]
        e2 = v[1]
        e1 = v[0]

        if e3 > 0 and (e1 + e2 > 0):
            return t # Exit the loop and return t if in domain
        
        t *= beta # Else, reduce t and try again
    
# Backtracking line search
def backtracking(x:np.ndarray) -> float:
    t = domain(x)
    grad_fx = grad_f(x).flatten()
    xv = x - t * grad_fx
    lhs = f(xv)
    rhs = f(x) - alpha * t * np.dot(grad_fx, grad_fx)

    while lhs > rhs:
        t *= beta
        xv = x - t * grad_fx
        lhs = f(xv)
        rhs = f(x) - alpha * t * np.dot(grad_fx, grad_fx)

    return t

# Gradient descent with backtracking line search
grad_fx = grad_f(x0).flatten()
norm = np.dot(grad_fx, grad_fx)**0.5
iter = 0
while norm > eps:
    direction = -grad_fx
    t = backtracking(x0)
    x0 = x0 + t * direction
    grad_fx = grad_f(x0).flatten()
    norm = np.dot(grad_fx, grad_fx)**0.5
    iter += 1

print("Optimal solution:", x0)
print("Optimal function value:", f(x0))
print("Number of iterations taken to converge:", iter)

Optimal solution: [0.92618727 0.92622965 1.65342641]
Optimal function value: 3.9081137863976365
Number of iterations taken to converge: 30
