## Lagrange Multipliers

**Lagrange Multipliers**, or **Undetermined Multipliers**, are used to find the stationary points of a function of several variables subject to one or more constraints.

Consider the problem of finding the maximum of a function $f(x_1, x_2)$ subject to a constraints relating $x_1$ and $x_2$ which is described as follows:

$$
    g(x_1, x_2) = 0
$$
A common approach to find the optimal solutions of $f(x_1, x_2)$ is to solve the constraint $g(x_1, x_2)$ to express $x_2$ as a function of $x_1$ in the form $x_2 = h(x_1)$. Then we replace $x_2$ in $f(x_1, x_2)$ to obtain $f(x_1, h(x_1))$. The maximum with respect to $x_1$ could be determined in the usual way to give the stationary point $x_1^*$, with the corresponding stationary point of $x_2$ is $x_2^* = h(x_1^*)$.

However, not all $h(x)$ can be analytical results, or in many situations, the differential forms of $\frac{\delta f(x_1, x_2)}{\delta x_1}$ are too complicated that we can only obtain approximate solutions from the numerical perspectives. To overcome such limitations, Lagrange proposed to introduce a novel approach for finding stationary points of $f(x_1, x_2)$.

First, consider a d-dimensional variable $x = (x_1, x_2, ..., x_D)^T \in (D)$. Then the constraints $g(x) = 0$ introduces a hyperplan of dimensional $d-1$ in the space $(D)$ of $x$. Given the neiborgh $x + \epsilon$ of $x$, of course this variable is in $(D)$. Making the Taylor expandation of $g(x') = g(x + \epsilon)$, we have:

$$
    g(x + \epsilon) \approx g(x) + \epsilon^T \nabla g(x)
$$
Because both $x$ and $x + \epsilon$ lie in the same constraints surface $(D)$, we have $g(x) = g(x + \epsilon)$ hence this results in $\epsilon^T \nabla g(x) \approx 0$. However, as $\epsilon \in (D)$, which results in the vector $\epsilon$ is parallel to $(D)$. From that on, we have $\nabla g(x) \perp \epsilon$ hence $\nabla g(x) \perp (D)$.

Our objective is to determine the optimal $x^*$ of the function $f(x)$ which also lies in the constraints surface $(D)$. Deploying Taylor expandation of $f(x + \epsilon)$ we have

$$
    f(x^* + \epsilon) = f(x^*) + \epsilon^T \nabla f(x^*)
$$
As $x^*$ is the optimal point of $f$, hence we have $\epsilon^T \nabla f(x) = 0$ which indicates $\nabla f(x^*) \perp (D)$, which means $\nabla f \parallel \nabla g$, or there is a scalar $\lambda$ so that

$$
    \nabla f + \lambda \nabla g = 0
$$
This $\lambda$ is called the *Lagrange multiplier*.

At this time, we define the *Lagrange* function for the ease of notation:

$$
    L(x, \lambda) \equiv f(x) + \lambda g(x)
$$
By setting the partial derivatives of $L(x, \lambda)$ with respect to (w.r.t.) $x$ to zero $\frac{\delta}{\delta x} L(x, \lambda) = 0$, we obtain $\nabla f(x) + \nabla g(x) = 0$, and by letting $\frac{\delta}{\delta \lambda} L(x, \lambda) = 0$ we have $g(x) = 0$. In other words, the optimal $x^*$ and $\lambda^*$ of the Lagrange function $L(x, \lambda)$ not only maximizing $f(x)$, but also satisfy the constraints $g(x) = 0$.

To this point, the abovementions discuss the case of $g(x) = 0$. Our next step is considering the optimal solution w.r.t the constraints $g(x) \ge 0$. In this case, there are two possible solutions: (1) the stationary points lie in the surface $g(x) = 0$ and (2) those are lie in the region $g(x) > 0$. Such stationary points in the first case are called to be *active*, while those in the second case are called to be *inactive*. 

In the former case, the constraints analogy to the discussed solutions above where we have the stationary points of the Lagrange function correspond to $\lambda \neq 0$. However, we sign of the Lagrange multipliers is somewhat worthy to examine. We have already known that the optimal points for the Lagrange function lie in the boundary $g(x) = 0$. However, as we are considering the region $g(x) > 0$, this implies that $\nabla g(x) < 0$, while $\nabla f(x) > 0$ as we are moving to the maximal points. Accordingly, we have $\nabla f(x) = -\lambda \nabla g(x)$ for some $\lambda > 0$. In the latter case, the function $g(x)$ plays no role, hence the stationary condition is simply $\nabla f(x) = 0$. In either case, we have $\lambda g(x) = 0$, thus the solution of maximizing the function $f(x)$ subjects to $g(x) \ge 0$ is obtained by optimizing the Lagrange function $L(x, \lambda)$ subjects the following conditions

\begin{align}
    g(x) & \ge & 0 \\
    \lambda & \ge & 0 \\
    \lambda g(x) & = & 0
\end{align}
These are known as the **Karush-Kuhn-Tucker (KKT)** conditions.

In the dual case where we want to minimize the function $f(x)$ w.r.t the constraints $g(x) \ge 0$, then we minimize the Lagrange function

$$
    L(x, \lambda) = f(x) - \lambda g(x)
$$
with respect to $x$, and again $\lambda \ge 0$.