# Lecture 8, Optimality conditions

We are still studying the full problem
$$
\begin{align} \
\min \quad &f(x)\\
\text{s.t.} \quad & g_j(x) \geq 0\text{ for all }j=1,\ldots,J\\
& h_k(x) = 0\text{ for all }k=1,\ldots,K\\
&x\in \mathbb R^n.
\end{align}
$$

In order to identify which points are optimal, we want to define similar conditions as there are for unconstrained problems through the gradient:

>If $x$ is a  local optimum to function $f$, then $\nabla f(x)=0$.

## KKT conditions



**Theorem (First order Karush-Kuhn-Tucker (KKT) Necessary Conditions)** 

Let $x^*$ be a local minimum for problem
$$
$$
\begin{align} \
\min \quad &f(x)\\
\text{s.t.} \quad & g_j(x) \geq 0\text{ for all }j=1,\ldots,J\\
& h_k(x) = 0\text{ for all }k=1,\ldots,K\\
&x\in \mathbb R^n.
\end{align}
$$
$$

Let us assume that objective and constraint functions are continuosly differentiable at a point $x^*$ and assume that $x^*$ satisfies some regularity conditions (see e.g., https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions#Regularity_conditions_.28or_constraint_qualifications.29 ). Then there exists unique Lagrance multiplier vectors $\mu^*=(\mu_1^*,\ldots,\mu_J^*)$ and $\lambda^* = (\lambda^*_1,\ldots,\lambda_K^*)$ such that
$$
\begin{align}
&\nabla_xL(x^*,\lambda^*,\mu^*) = 0\\
&\mu_j^*\geq0,\text{ for all }j=1,\ldots,J\\
&\mu_j^*g_j(x^*)=0,\text{for all }j=1,\ldots,J,
\end{align}
$$
where $L$ is the *Lagrangian function* $$L(x,\lambda,\mu) = f(x)- \sum_{j=1}^J\mu_jg_j(x) -\sum_{k=1}^K\lambda_kh_k(x)$$.

## An example of constraint qualifications for inequality constraint problems
**Definition (regular point)**

A point $x^*\in S$ is *regular* if the set of gradients of the active inequality constraints 
$$
\{\nabla g_j(x^*) | \text{ constraint } i \text{ is active}\}
$$
is linearly independent.

KKT conditions were developed independently by 
* Karush:"Minima of Functions of Several Variables with Inequalities as Side Constraints". *M.Sc. Dissertation*, Dept. of Mathematics, Univ. of Chicago, 1939
* Kuhn & Tucker: "Nonlinear programming", In: *Proceedings of 2nd Berkeley Symposium*, pp. 481–492, 1951

The coefficients $\lambda$ and $\mu$ are called the KKT multipliers.

The first equality $\nabla_xL(x,\lambda,\mu) = 0$ is called stationary rule and the requirement $\mu_j^*g_j(x)=0,\text{for all }j=1,\ldots,J$ is called the complementary rule.

## Example

Consider the optimization problem
$$
\begin{align}
\min &\qquad (x_1^2+x^2_2+x^2_3)\\
\text{s.t}&\qquad -3+x_1+x_2+x_3\geq 0.
\end{align}
$$
Let us verify the KKT necessary conditions for the local optimum $x^*=(1,1,1)$.

We can see that
$$
L(x,\lambda,\mu) = (x_1^2+x_2^2+x_3^3)+\mu_1(3-x_1-x_2-x_3)
$$
and thus
$$
\nabla_x L(x,\lambda,\mu) = (2x_1-\mu_1,2x_2-\mu_1,2x_3-\mu_1)
$$
and if $\nabla_x L((1,1,1),\lambda,\mu)=0$, then 
$$
2-\mu_1=0 $$
which holds when $$
\mu_1=2.
$$
In addition to this, we can see that $-3+x^*_1+x^*_2+x^*_3= 0$. Thus, the completementary rule holds even though $\mu_1\neq 0$.

## Example 2

Let us check the KKT for a solution that is not a local optimum. Let us have $x^*=(0,1,1)$.

We can easily see that in this case, the conditions are 

$$\left\{
\begin{array}{c}
-\mu_1 = 0\\
2-\mu_1=0
\end{array}
\right.
$$
Clearly, there does not exist a $\mu_1\in \mathbb R$ such that these equalities would hold.

## Example 3

Let us check the KKT for another solution that is not a local optimum. Let us have $x^*=(2,2,2)$.

We can easily see that in this case, the conditions is
$$
4-\mu_1 = 0
$$
Now, $\mu_1=4$ satisfies this equation. However, now
$$
\mu_1(-3+x^*_1+x^*_2+x^*_3)=4(-3+6) = 12 \neq 0.
$$
Thus, the completementary rule fails and the KKT conditions are not true.

## Geometric interpretation of the KKT conditions

## Stationary rule

The stationary rule can be written as, there exist $\lambda',\mu$ so that
$$
-\nabla f(x) = -\sum_{j=1}^K\mu_j\nabla g_j(x) + \sum_{k=1}^K\lambda'_k\nabla h_k(x).
$$
Notice that we have slightly different $\lambda'$.

Now, remember that the $-\nabla v(x)$ gives us the direction of reduction of the function $v$ for all functions.

Thus, the above equation means that the direction of reduction of the function $-\nabla f(x)$ is countered by the direction of the reduction of the inequality constraints $-\nabla g_j(x)$ and the directions of either growth (or reduction, since $\lambda'$ can be negative) of the equality constraints $\nabla h_k(x)$.

**This means that the function cannot get reduced without reducing the inequality constraints (making the solution infeasible, if already at the bound), or increasing or decreasing the equality constraints (making, thus, the solution again infeasible).**



#### With just one inequality constraint this means that the negative gradients of $f$ and $g$ must point to the same direction.

![alt text](images/KKT_inequality_constraints.svg "KKT with inequality constraint")

#### With equality constraints this means that the negative of the function and the gradiemt of the equality constraint must either to the same or opposite directions

![alt text](images/KKT_equality_constraints.svg "KKT with inequality constraint")

## Complementary conditions
Another way of saying complementary condition
$$
\mu_jg_j(x) = 0 \text{ for all } j=1,\ldots,J
$$
is to say that both $\mu_j$ and $g_j(x)$ cannot be positive at the same time. Especially, if $\mu_j>0$, then $g_j(x)=0$.

**This means that if we want to use the gradient of a constraint for countering the reduction of the function, then the constraint must be at the border.**