# Constrained optimization

In [None]:
import numpy as np
import scipy
import matplotlib.pyplot as plt

## Basic setup

As in our lectures on unconstrained optimization, we consider a smooth objective/cost function $\phi:\mathbb R^n\to \mathbb R$ and the following problem

$$
  \min_{x \in \Omega} \phi(x)
$$

To define $\Omega$, we consider a smooth function $c:\mathbb R^n\to\mathbb R^m$, $m\ge 1$, and either the equality constraint set

$$
    \Omega:=\{x\in \mathbb R^n: c(x) = 0\},
$$

or the inequality constraint set (where the inequality is understood in a coordinate-wise sense):

$$
    \Omega:=\{x\in \mathbb R^n: c(x) \le 0\}.
$$

Since in our minimization, we are only allowed points in $\Omega$, the necessary condition for local optimia from unconstrained optimization, $\nabla \phi = 0$ is not valid anymore. We will discuss the following approaches:

- Quadratic regularization
- Logarithmic barriers
- Forward/adjoint sensitivity for (implicit equation) equality constraints

##  Simple model problem

Let's look at a simple two-dimensional model problem

$$
    \min_{x=(x_1,x_2)} x_1+x_2
$$

subject to the equality constraint

$$
    c(x) = x_1^2 + x_2^2-2 = 0
$$

or subject to the inequality constraint

$$
  c(x) = x_1^2 + x_2^2-2 \le 0
$$

The constraint describes a circle centered at the origin. We will see that the minimum of both, the equality as well as the inequality constrained optimization problem is attained at $x^*=(-\sqrt{2},-\sqrt{2})$.

In [None]:
def phi(x,y):
    "Simple linear objective"
    return x + y

def constraint(x,y):
    "Constraint describing a circle"
    return x**2 + y**2 - 2

In [None]:
def plot_phi_constraint():
    "Draw a filled contour plot"
    xx = np.linspace(-1.8, 1.8)
    X, Y = np.meshgrid(xx, xx)
    Z = phi(X,Y)
    fig,ax = plt.subplots(1,1)
    cp = ax.contourf(X, Y, Z, 20)
    circle = plt.Circle((0, 0), np.sqrt(2), color='r', fill=False)
    fig.colorbar(cp)
    ax.add_artist(circle)
    ax.set_aspect('equal')
    return fig, ax

plot_phi_constraint()


## Quadratic penalization

Let us now attempts to solve this constrained optimization problem numerically.

### Equality constraints
First, we consider the equality-constrained problem. To remove the constraint, we use a quadratic penalization with a parameter $\mu>0$ and consider the (unconstrained!) problem

$$
\min_{x\in \mathbb R^n} Q(x,\mu), \quad \text{where}\quad Q(x,\mu) := \phi(x) + \frac{1}{2\mu} \|c(x)\|^2 
$$

By choosing small $\mu$, we penalize the constraint violation---the smaller $\mu$, the more emphasis we put on satisfying the constraint. However, for most problems, the constraint is only exactly satisfied if the penalization paramter $\mu\to 0$. Let us look at the contour lines of the regularized objective for different values of $\mu$. It can be seen that for smaller $\mu$, the minimum of $Q(\cdot,\mu)$ becomes a better approximation of the true minimum, i.e., the minimum of the constrained problem. 

In [None]:
def phi_quadratic(x,y,mu):
    "Quadratic penalization of objective"
    pen = constraint(x,y)**2
    return phi(x,y) + 1/(2*mu)*pen

def plot_phi_quadratic(mu):
    xx = np.linspace(-1.75, 1.75)
    X, Y = np.meshgrid(xx, xx)
    Z = phi_quadratic(X,Y,mu)
    fig,ax = plt.subplots(1,1)
    cp = ax.contourf(X, Y, Z, 20)
    circle = plt.Circle((0, 0), np.sqrt(2), color='r', fill=False)
    fig.colorbar(cp)
    ax.add_artist(circle)
    ax.set_aspect('equal')
    return fig, ax

plot_phi_quadratic(2)


### Inequality constraint

A similar trick also works for inequality constraints $c(x)\le 0$. To remove this inequality constraint, we again use a quadratic penalization with a parameter $\mu>0$ and consider

$$
\min_{x\in \mathbb R^n} \bar Q(x,\mu), \quad \text{where}\quad \bar Q(x,\mu) := \phi(x) + \frac{1}{2\mu} \|\max(0,c(x))\|^2, 
$$

where the max-function acts on each component (there's only one component in our simple test function). Thus, if the inequality constraint is satisfied, $\phi$ and $\bar Q$ coincide. If not, the violation of the constraint is penalized. Note that for fixed $\mu$, the objective $\bar Q(\cdot )$ is only once continuously differentiable due to the max-function. Let's look at the contours of the penalized function.

In [None]:
def barphi_quadratic(x,y,mu):
    "Quadratic penalization of objective"
    pen = np.maximum(0,constraint(x,y))**2
    return phi(x,y) + 1/(2*mu)*pen

def plot_barphi_quadratic(mu):
    xx = np.linspace(-1.75, 1.75)
    X, Y = np.meshgrid(xx, xx)
    Z = barphi_quadratic(X,Y,mu)
    fig,ax = plt.subplots(1,1)
    cp = ax.contourf(X, Y, Z, 20)
    circle = plt.Circle((0, 0), np.sqrt(2), color='r', fill=False)
    fig.colorbar(cp)
    ax.add_artist(circle)
    ax.set_aspect('equal')
    return fig, ax

plot_barphi_quadratic(1)

It can be seen that the quadratically regularized objective becomes more difficult to minimize due to narrow, steep valleys in the presence of equality constraints, or simply steep slopes close to the minimizer. As we've seen, this is challenging for unconstrained optimization. Hence, one often performs a continuation procedure as follows:

1. Choose $\mu$ relatively large and an initialization $x^0$
2. Solve the quadratically regularized optimization problem starting from $x^0$ to find solution $x^*_\mu$
3. If the constrait violation is small, STOP. Otherwise, reduce $\mu$ (e.g., half it), set initialization $x^0:=x^*_\mu$ and go to 2.


## Log barriers for inequality constraints

The solutions of quadratically regularized problems with inequality constraints are typically *infeasible*, i.e., they are outside the allowed region and only satisfy the inequality constraint in the limit $\mu\to 0$. Log barriers are an alternative that ensures that solutions (and also iterates) are always *feasible*, i.e., they satisfy the inequality constraint. The idea is to replace the inequality constraint $c(x)\le 0$ with a barrier term that is added to the objective as follows (components of $c(x)$ are denoted with index $i$):

$$
\min_{x\in \mathbb R^n} \bar P(x,\mu), \quad \text{where}\quad \bar P(x,\mu) := \phi(x) - \mu \sum_i \log(-c(x)_i).
$$

Note that $-\log(-c(x)_i)$ goes to $+\infty$ as $c(x)_i\to 0-$. Thus, the barrier term blows up as we get close to the boundary of the feasible region.

In [None]:
def barphi_log(x,y,mu):
    "Log penalization of objective"
    pen = - (np.log(-constraint(x,y)))
    return phi(x,y) + mu*pen

def plot_barphi_log(mu):
    xx = np.linspace(-1.75, 1.75)
    X, Y = np.meshgrid(xx, xx)
    Z = barphi_log(X,Y,mu)
    fig,ax = plt.subplots(1,1)
    cp = ax.contourf(X, Y, Z, 20)
    circle = plt.Circle((0, 0), np.sqrt(2), color='r', fill=False)
    fig.colorbar(cp)
    ax.add_artist(circle)
    ax.set_aspect('equal')
    return fig, ax

plot_barphi_log(1)

As can be seen, for small $\mu$ we obtain very large gradients close to the minimizer, which makes this challenging for optimization algorithms (e.g., steepest descent, Newton or quasi Newton method). Thus, one usually uses a similar procedure to carefully decrease $\mu$ as sketched above. More sophisticated variants of the barrier method exist and are known as interior point methods. Their name indicates that iterates are always in the interior of the  feasible set, and these algorithms provide a systematic way to decrease $\mu$ and to also solve the optimization problems for each $\mu$ in a stable manner.

## Forward/adjoint sensitivity analysis

See slides and Andrew's lab examples.




