# TIES483 ex5, Mikael Myyrä
Note: Switching from Haskell to Python as implementing all the linear algebra by hand became too laborious, and I didn't want to require libraries because you can't submit the whole project directory on Moodle.

## Exercises

For exercises 1-2, we study optimization problem
$$
\begin{align}
\min \qquad & x_1^2+x_2^2 + x_3^3+(1-x_4)^2\\
\text{s.t.}\qquad &x_1^2+x_2^2-1=0\\
    &x_1^2+x_3^2-1=0\\
    &x\in\mathbb R^4
\end{align}
$$




In [3]:
# setup the problem
import numpy as np

def objective_fn(x):
    return x[0]**2 + x[1]**2 + x[2]**3 + (1 - x[3])**2

eq_constraints = [
    lambda x: x[0]**2 + x[1]**2 - 1,
    lambda x: x[0]**2 + x[2]**2 - 1
]

# ad is not available on nixpkgs (my Linux distro's package repository)
# and I don't want to use pip, so I calculate the derivatives by hand

def obj_gradient(x):
    return np.matrix([2*x[0], 2*x[1], 3*(x[2]**2), 2*x[3]])

eq_constr_gradients = [
    lambda x: np.matrix([2*x[0], 2*x[1], 0, 0]),
    lambda x: np.matrix([2*x[0], 0, 2*x[2], 0])
]

def obj_hessian(x):
    return np.matrix([
        [2, 0, 0, 0],
        [0, 2, 0, 0],
        [0, 0, 6*x[2], 0],
        [0, 0, 0, 2]
    ])

eq_constr_hessians = [
    lambda x: np.matrix([
        [2, 0, 0, 0],
        [0, 2, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]
    ]),
    lambda x: np.matrix([
        [2, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 2, 0],
        [0, 0, 0, 0]
    ]),
]

def lagrangian(x, l):
    return objective_fn(x) \
        + sum([l[i] * eq_constraints[i](x) \
              for i in range(len(eq_constraints))])

def lagr_gradient(x, l):
    return obj_gradient(x) \
        + sum([l[i] * eq_constr_gradients[i](x) \
              for i in range(len(eq_constraints))])

def lagr_hessian(x, l):
    return obj_hessian(x) \
        + sum([l[i] * eq_constr_hessians[i](x) \
              for i in range(len(eq_constraints))])


1. **(2 points)** Use the SQP method to solve the above problem. **Analyze carefully the result you got!** How does SQP work for this problem?


Repeatedly solving
$$
\left[
\begin{array}{cc}
\nabla^2_{xx}L(x^k,\lambda^k)&\nabla_x h(x^k)\\
\nabla_x h(x^k)^T & 0
\end{array}
\right]
\left[\begin{array}{c}p^T\\v^T\end{array}\right] =
\left[
\begin{array}{c}
-\nabla_x L(x^k,\lambda^k)\\
-h(x^k)^T
\end{array}
\right].
$$

In [55]:
# note: hardcoded objective function and constraints
# because this isn't needed for general-purpose use
# and piping in all the derivatives through parameters would be annoying

def solve_quadratic(x, l):
    """Solve the linear system defined in the above cell."""    
    # coefficient matrix
    constr_gradients = np.stack([cg(x) for cg in eq_constr_gradients])
    c_len = len(constr_gradients)
    coefs = np.concatenate((
        np.concatenate(
            (lagr_hessian(x, l), constr_gradients.transpose()),
            axis=1,
        ),
        np.concatenate(
            (constr_gradients, np.zeros((c_len, c_len))),
            axis=1,
        )), 
        axis=0,
    )
    
    # right-hand side
    constr_values = np.matrix([c(x) for c in eq_constraints])
    rhs = -1 * np.concatenate(
        (lagr_gradient(x, l), constr_values),
        axis=1,
    ).transpose()
    
    solution = np.linalg.solve(coefs, rhs)
    return np.array(solution[:len(x)].transpose())[0], \
        np.array(solution[len(x):].transpose())[0]
    
def sequential_quadratic(start_x, start_l, precision, error_tolerance):
    """Solve an optimization problem using the SQP method."""
    x = start_x
    l = start_l
    # record steps for playback later
    steps = [[x, l]]
    while True:
        x_step, l_step = solve_quadratic(x, l)
        x_new = x + x_step
        l_new = l + l_step
        
        steps.append([x_new, l_new])
        
        # stop if the difference in objective function values
        # is small enough and there's not too much constraint error
        if abs(objective_fn(x_new) - objective_fn(x)) <= precision \
            and all([abs(c(x_new)) <= error_tolerance for c in eq_constraints]):
            return x_new, steps
        
        # otherwise, loop
        x = x_new
        l = l_new
        
        
# test the above with some starting points
starts = [
    # this gives an error because it results in a non-invertible matrix
    # due to constraint gradients being zero
    # [[0, 0, 0, 0], [0, 0]],
    [[1, 5, -3, 2], [5, 5]],
    [[13, 18, 0, -1000], [10, -15]],
    [[-2, -2, -2, -2], [2, 2]],
    [[1, 1, 1, 1], [1, 1]],
]
precision = 0.001

np.set_printoptions(precision=4)

for [x, l] in starts:
    solution, steps = sequential_quadratic(x, l, precision, precision)
    print(f"Starting at (x: {x}, l: {l}) (took {len(steps)} steps):")
    print(f"\tx: {solution}")
    print(f"\tf(x): {objective_fn(solution):10.4f}")
    print(f"\th(x): {np.array([c(solution) for c in eq_constraints])}")
    # print("steps:")
    # for [x, l] in steps:
    #    print(f"x: {x}, l: {l}")
    #    print(f"\tf(x): {objective_fn(x):10.4f}")
    #    print(f"\tconstraint values: {np.array([c(x) for c in eq_constraints])}")
    print("")

Starting at (x: [1, 5, -3, 2], l: [5, 5]) (took 7 steps):
	x: [ 6.0506e-08  1.0000e+00 -1.0000e+00  0.0000e+00]
	f(x):     1.0000
	h(x): [1.0027e-07 9.7151e-08]

Starting at (x: [13, 18, 0, -1000], l: [10, -15]) (took 11 steps):
	x: [1.     0.0176 0.     0.    ]
	f(x):     2.0003
	h(x): [0.0003 0.    ]

Starting at (x: [-2, -2, -2, -2], l: [2, 2]) (took 7 steps):
	x: [-7.5208e-12 -1.0000e+00 -1.0000e+00  0.0000e+00]
	f(x):     1.0000
	h(x): [1.3554e-11 1.3554e-11]

Starting at (x: [1, 1, 1, 1], l: [1, 1]) (took 8 steps):
	x: [1.     0.0123 0.0123 0.    ]
	f(x):     2.0002
	h(x): [0.0002 0.0002]



Analysis:

From every starting point I've picked at random for testing, it looks to take roughly the same number of steps to find a solution; even the starting point that's 1000 units away on the x4 axis is solved in 11 steps.
This makes sense as the lagrangian is quadratic and thus the second order approximation used is accurate from long distance.

There are two different local solutions that the algorithm seems to find depending on starting point. I suppose this is because there are two different surfaces where the constraint functions intersect and the algorithm follows whichever one it finds first. One of the solutions has a better objective function value, so only one of them is global, which shows that the algorithm is not guaranteed to find the globally best result.

2. **(2 points)** Use the augmented Lagrangian method to solve the above problem. **Analyze carefully the result you got!** How does the method work for this problem?


Augmented Lagrangian is
$$
L_c(x,\lambda) = f(x)+\lambda h(x)+\frac12c\|h(x)\|^2 = L(x,\lambda) + c\|h(x)\|^2
$$

In [26]:
def aug_lagrangian(x, l, penalty):
    return lagrangian(x, l) + 0.5 * penalty * sum([c(x)**2 for c in eq_constraints])

In [38]:
from scipy.optimize import minimize

def aug_lagrangian_solve(start_x, start_l, start_penalty, precision, error_tolerance):
    x = start_x
    l = np.array(start_l, dtype=float)
    pen = start_penalty
    # record steps for playback later
    steps = [[x, l, pen]]
    while True:
        x_new = minimize(lambda x: aug_lagrangian(x, l, pen), x).x
        # stop if the difference in objective function values
        # is small enough and there's not too much constraint error
        if abs(objective_fn(x_new) - objective_fn(x)) <= precision \
            and all([abs(c(x_new)) <= error_tolerance for c in eq_constraints]):
            return x_new, steps
        # otherwise, loop
        l += pen * np.array([c(x_new) for c in eq_constraints])
        pen *= 2
        x = x_new
        steps.append([x, l, pen])
        
        

In [56]:
# using same starting points and precision defined for SQP method
start_penalty = 1.0
for [x, l] in starts:
    solution, steps = aug_lagrangian_solve(x, l, start_penalty, precision, precision)
    print(f"Starting at (x: {x}, l: {l}) (took {len(steps)} steps):")
    print(f"\tx: {solution}")
    print(f"\tf(x): {objective_fn(solution):10.4f}")
    print(f"\th(x): {np.array([c(solution) for c in eq_constraints])}")
    # print("steps:")
    # for [x, l, pen] in steps:
    #     print(f"x: {x}, l: {l}, penalty: {pen}")
    #     print(f"\tf(x): {objective_fn(x):10.4f}")
    #     print(f"\th(x): {np.array([c(x) for c in eq_constraints])}")
    print("")

Starting at (x: [1, 5, -3, 2], l: [5, 5]) (took 10 steps):
	x: [-2.3367e-07 -1.0000e+00 -1.0000e+00  1.0000e+00]
	f(x):    -0.0000
	h(x): [7.4554e-09 6.2676e-08]

Starting at (x: [13, 18, 0, -1000], l: [10, -15]) (took 9 steps):
	x: [9.9996e-01 1.3740e-08 1.2580e-02 1.0000e+00]
	f(x):     0.9999
	h(x): [-7.9135e-05  7.9122e-05]

Starting at (x: [-2, -2, -2, -2], l: [2, 2]) (took 7 steps):
	x: [ 2.2268e-07  1.0000e+00 -1.0000e+00  1.0000e+00]
	f(x):    -0.0000
	h(x): [-3.6717e-08  9.2714e-07]

Starting at (x: [1, 1, 1, 1], l: [1, 1]) (took 8 steps):
	x: [ 9.9991e-01 -2.5709e-08  1.9189e-02  1.0000e+00]
	f(x):     0.9998
	h(x): [-0.0002  0.0002]



Analysis:

This one takes roughly the same number of steps as SQP. An individual step is probably also roughly equal in time cost since this solves an optimization problem per step and SQP inverts a matrix, both of which are somewhat expensive operations.

From some starting points this finds another optimum that SQP missed, however, I'm pretty sure it was only missed before because I'm not using very many starting points. So both algorithms give only locally optimal results, but augmented Lagrangian found some better points by luck.

3. **(2 points)** Solve the problem
$$
\begin{align}
\min   \  & x_1^2 + x_2^2\\
\text{s.t. } & x_1 + x_2 \geq 1.
\end{align}
$$ 
by using just the optimality conditions.



The general form of the problem is
$$
\begin{align}
\min \quad f(x)\\
\text{s.t. } \quad g(x) \gt 0
\end{align}
$$
where
$$
f(x) = x_1^2 + x_2^2\\
g(x) = x_1 + x_2 - 1
$$

The Lagrangian of this is
$$
L(x,\mu) = f(x) - \mu g(x) = x_1^2 + x_2^2 - \mu (x_1 + x_2 - 1)
$$
and its gradient with respect to $x$
$$
\nabla_x L(x,\mu) = (2x_1 - \mu, 2x_2 - \mu)
$$

The stationarity condition states that
$$
\nabla_x L(x^*,\mu^*) = \mathbf{0}
$$
Therefore
$$
(2x_1^* - \mu^*, 2x_2^* - \mu^*) = (0, 0)\\
\begin{cases}
2x_1^* - \mu^* = 0\\
2x_2^* - \mu^* = 0
\end{cases}
$$
Solving this gives the line
$$
x_1^* = x_2^* = \frac{\mu^*}{2}
$$

Additionally, the dual feasibility condition states that
$$
\mu^* \geq 0
$$
Combined with the line we got from stationarity we get
$$
x_1^*, x_2^* \geq 0
$$

Finally, primal feasibility states that
$$
g(x*) \geq 0\\
x_1^* + x_2^* - 1 \geq 0
$$
applying $x_1^* = x_2^*$ from stationarity
$$
2x_1^* - 1 \geq 0\\
x_1^*, x_2^* \geq \frac{1}{2}
$$

Because $f(x)$ increases when $x_1$ and $x_2$ increase, the optimal point is the closest point to the origin that satisfies the above conditions, i.e.
$$
x^* = (\frac{1}{2}, \frac{1}{2})
$$

4. **(2 points)** Consider a problem
$$
\begin{align}
\min   \  & f(x)\\
\text{s.t. } & h_k(x)=0, \text{ for all } k=1,\dots,K,
\end{align}
$$

where all the functions are twice differentiable. Show, that the *gradient of the augmented Lagrangian function* is zero in the minimizer $x^*$ of the above problem. In other words, show that $\nabla_xL_c(x^*,\lambda^*)=0$, where $\lambda^*\in R^n$ is the corresponding optimal Lagrange multiplier vector.