# 18.065 Problem Set 5

Due Friday, April 21 at 1pm.

## Problem 1 (5+6 points)

Consider the following optimization problem:
$$
\min_{x \in \mathbb{R}^2} x_1 \\
\mbox{  subject to } x_2 \le x_1^3 \mbox{ and } x_2 \ge 0 \, .
$$

**(a)** Draw a sketch of the feasible set in the $x_1, x_2$ plane and indicate the optimum $x_*$.

**(b)** Show that the optimum $x_*$ does *not* satisfy the KKT conditions, but explain why this is possible because the LICQ conditions are violated (see the last slide of lecture 22).

(Most problems have local minima that satisfy KKT, but you can see from the picture in (a) that this is a weird case!)

## Problem 2 (6+6+6 points)

Consider the convex problem:
$$
\min_{x\in \mathbb{R}^n} \Vert b - Ax \Vert_2^2 \\
\mbox{  subject to } \Vert x \Vert_2^2 \le r^2
$$
for some $r > 0$, $m \times n$ matrix $A$ (of rank $n$), and $b \in \mathbb{R}^m$ — that is, least-squares optimization with the solution constrained to lie inside a sphere of radius $r$.

**(a)** What is the Lagrange dual function $g(\lambda)$?   (You can give a closed-form expression.  Hint: review Tikhonov-regularized least-squares.)   Define a corresponding Julia function `g(λ; r=1.0)` for the sample parameters given below (this syntax defines an optional keyword argument `r` that defaults to $r=1$).  Make a plot of $g(\lambda)$ for $r=1$ and $r=0.5$ for $\lambda \ge 0$ to verify that it looks concave with a single maximum.

**(b)** If the unconstrained least-square solution $\hat{x} = (A^T A)^{-1} A^T b$ satisfies $\Vert \hat{x} \Vert_2 < r$, then what must be true of the derivative $g'(0)$?  What if $\Vert \hat{x} \Vert_2 > r$?

Check in Julia that $g'(0)$ matches your expectations by computing the derivative using automatic differentiation:
```jl
using ForwardDiff
dgdλ(λ; r=1.0) = ForwardDiff.derivative(λ -> g(λ; r=r), λ)
```
and evaluating it at `dgdλ(0; r=???)` for two values of `r`: one $r > \Vert \hat{x} \Vert_2$ (so that the constraint is inactive) and one $r < \Vert \hat{x} \Vert_2$ (so that the constraint is active).

(In principle, you could take this derivative by hand using matrix calculus, but it's pretty error-prone.)

**(c)** You can take the *second* derivative of $g(\lambda)$ via AD by:
```jl
d²gdλ²(λ; r=0.5) = ForwardDiff.derivative(λ -> dgdλ(λ; r=0.5), λ)
```
Use this to implement a Newton iteration to maximize $g(\lambda)$ (for $\lambda \ge 0$) by finding a root of $g'(\lambda)$, starting with an initial guess of $\lambda=0$, for $r = 0.5$.  (It should converge in only a few iterations.  The solution should have $\lambda > 0$ in this case because ...?)   To at least 8 significant digits, give the resulting dual optimum $\lambda_*$ and the primal optimum $x_*$ (strong duality holds in this convex problem!), and check that $x_*$ is feasible.

In [None]:
using LinearAlgebra

m = 5
n = 4
A = [ -9   2  -2   3
      -5  -3   9   3
      -1  -6   9  -2
      -3  -4   5   4
      -8   9  -6   4 ]
b = [1,2,3,4,5];

In [None]:
g(λ; r=1) = ???

## Problem 3 (5+5+6 points)

In this problem, you will use ADMM to solve the (primal) optimization problem from problem 2 above, for the parameters from problem 2c, using the equivalent formulation:
$$
\min_{x \in \mathbb{R}^n} \left( \Vert b - Ax \Vert_2^2 + \begin{cases} 0 & \Vert x \Vert_2 \le r \\ \infty & \mbox{otherwise} \end{cases} \right)
$$
where the second term is the "indicator" function of the feasible set (the radius-$r$ ball) as in lecture 24 and section III.4 of the text.

A basic iteration of ADMM consists of 3 steps, as described in class and in the textbook:

1. $x^{(k+1)} = \mbox{arg }\min_x \Vert b - Ax \Vert_2^2  + \frac{\rho}{2} \Vert x - z^{(k)} + s^{(k)} \Vert_2^2$ (for some penalty parameter $\rho > 0$)
2. $z^{(k+1)}$ is the projection of $x^{(k+1)} + s^{(k)}$ onto the *closest* point in the feasible set.
3. $s^{(k+1)} = s^{(k)} + x^{(k+1)} - z^{(k+1)}$

**(a)** Give a closed-form solution for step 1.  (Hint: a problem from pset 3 should be helpful.)

**(b)** Give a closed-form solution for step 2.

**(c)** Implement this iteration in Julia to solve this problem with the parameters from 2c above, starting from $x = z = s = \vec{0}$.   Make a (semi-log) plot of the error $\Vert x^{(k)} - x_* \Vert_2$ versus $k$, where $x_*$ is your solution from 2c, for $\rho = 1$ and $\rho = 10$.  (The error should converge to zero!)