# Updated translation and solution method

Ian's most recent set of notes partitions the vector of variables $x$ into three parts:

* $x_1\in\mathbb{R}^{N_rT}$ contains wind deviations

* $x_2\in\mathbb{R}^{(N+1)T}$ contains angle and mismatch variables

* $x_3\in\mathbb{R}^T$ contains angle difference variables involved in line temperature calculation

With this notation, the problem becomes

\begin{align}
&& \min~ & x_1^\top Q_x x_1 \\
s.t. && Ax &= b \\
&& x_3^\top x_3 &= c
\end{align}

where $x=[x_1~x_2~x_3]^\top$.

## Translation

As always, the first step in solving our problem is translation. We need to go from $Ax=b$ to $Ay=0$, where $y=x-x^*$ and $x^*$ is the translation. It is easy to find and translate by a point in the set $\{x:Ax=b\}$, but we have an additional requirement: We don't want to introduce any linear term to the norm constraint.

\begin{align}
&& \min~ & x_3^\top x_3 \\
s.t. && A_1x_1 + A_2x_2 + A_3x_3 &= b
\end{align}

How do we partition $A$?

* $A_1$ contains columns of $A$ corresponding to wind deviations. The indices of these columns are indices for which $Q_{obj}$ is nonero.

* $A_2$ contains columns corresponding to angles and mismatches. These are all the columns remaining after $A_1$ and $A_3$ are removed.

* $A_3$ contains columns corresponding to angle difference variables. This is just the last $T$ columns of $A$.

Once $A$ is partitioned, we can find $x_1^*$, $x_2^*$, and $x_3^*$. To avoid messing up the norm, we need $x_3^*=0$. With that constraint satisfied, we just need $A_1x_2 + A_2x_2 = b$, so we can obtain min-norm $x_1^*$ and $x_2^*$ subject to this constraint.

The translation itself is a change of variables from $x$ to $y = x - x^*$. The translated problem is

\begin{align}
&& \min~ & y_1^\top Q_x y_1 + 2 y_1^\top x_1^* \\
s.t. && Ay &= 0 \\
&& y_3^\top y_3 &= c
\end{align}

## Kernel mapping

Now that the problem has been translated, we replace $y$ by $Nz$, where $N=[v_1~v_2~\ldots~v_k]\in\mathbb{R}^{n\times k}$ spans the $k$-dimensional null space of $A$. This is somewhat similar to a rotation, but it reduces the problem dimension from $n$ variables to $k$ variables.

From Ian's notes:

\begin{align}
\begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}
&= \begin{bmatrix} N_1 \\ N_2 \\ N_3 \end{bmatrix} z
\end{align}

* The objective becomes $\min z^\top \hat{Q}_{obj} z + 2z^\top N_1x_1^*$, where $\hat{Q}_{obj} = N_1^\top Q N_1$

* The temperature constraint becomes $z^\top N_3^\top N_3 z = c$.

## Constraint Eigendecomposition and rotation

After kernel mapping the constraint quadratic is no longer diagonal. We can fix this by performing an Eigendecomposition $N_3^\top N_3 = UDU^\top$ and letting $\hat{z} = U^\top z$ so that

\begin{align}
z^\top N_3^\top N_3 z &= \hat{z}^\top D\hat{z}
\end{align}

$D$ has at most $T$ nonzero elements, because rank$(N
_3) \leq T$ by virtue of its dimension. It will look like this:

\begin{align}
D &= \begin{bmatrix} 0 & 0 \\ 0 & \hat{D} \end{bmatrix}
\end{align}

Now the constraint quadratic is diagonal, but we really need it to be a norm constraint. We can effect this change by another change of variables, this time from $\hat{z}$ to $w$.

Let $w = [w_1~w_2]^\top$ and $\hat{z}=[\hat{z}_1~\hat{z}_2]^\top$, and relate them as follows:

\begin{align}
\begin{bmatrix} w_1 \\ w_2 \end{bmatrix} &=
\begin{bmatrix} I & 0 \\ 0 & \hat{D}^{1/2} \end{bmatrix}
\begin{bmatrix} \hat{z}_1 \\ \hat{z}_2 \end{bmatrix} \\
\implies w &= K\hat{z} = KU^\top z
\end{align}

Then we can rewrite the constraint in terms of $w$:

\begin{align}
\hat{z}^\top D\hat{z} &= \hat{z}_2\hat{D}^{1/2}\hat{D}^{1/2}\hat{z}_2 \\
&= w_2^\top w_2
\end{align}

Note that $z = UK^{-1}w$ because $UU^\top = I$. Changing from $z$ to $w$ is equivalent to rotating by $(UK^{-1})^\top$.

Of course, this change of variables also influences the cost function:

$$ z^\top \hat{Q}_{obj}z = w^\top K^{-1}U^\top \hat{Q}_{obj}UK^{-1}w + 2z^\top K^{-1}U^\top N_1x_1^* = w^\top Bw + w^\top b$$

Thus, the optimization problem becomes

\begin{align}
&& \min~ w^\top Bw + w^\top b \\
s.t. && w_2^\top w_2 &= c
\end{align}

The purpose of this section was to change variables to obtain a norm constraint. The next section eliminates $w_1$ using the KKT conditions of the above problem. This will allow us to write the objective in terms of $w_2$ only.

## Eliminating $w_1$

Note that $w_1$ is unconstrained. For a fixed $w_2$, we can use the KKT conditions to find $w_1$ such that the objective is minimized. Begin by expanding the objective:

\begin{align*} f(w_1,w_2) &=
\begin{bmatrix} w_1^\top & w_2^\top \end{bmatrix}
\begin{bmatrix} B_{11} & B_{12} \\ B_{12}^\top & B_{22}\end{bmatrix}
\begin{bmatrix} w_1 \\ w_2 \end{bmatrix} + 
\begin{bmatrix} w_1^\top & w_2^\top \end{bmatrix}
\begin{bmatrix} b_1 \\ b_2\end{bmatrix} \\
&=
w_1^\top B_{11}w_1 + 2w_1^\top B_{12}w_2 + w_2^\top B_{22}w_2 + w_1^\top b_1 + w_2^\top b_2
\end{align*}

Now set the partial derivative with respect to $w_1$ equal to zero:

\begin{align}
\nonumber \frac{\partial f}{\partial w_1} = 2w_1^\top B_{11} + 2w_2^\top B_{12}^\top + b_1^\top &= 0 \\
\nonumber \iff B_{11}w_1 + B_{12}w_2 + \frac{1}{2}b_1^\top &= 0 \\
\iff w_1 &= -B_{11}^{-1}\left(B_{12}w_2 - \frac{1}{2}b_1 \right)
\end{align}

After substituting this expression for $w_1$ into the objective and simplifying, we obtain a new objective in terms of $w_2$ only:

\begin{align*}
f(w_2) &= w_2^\top\left(B_{22} - B_{12}^\top B_{11}^{-1} B_{12}\right)w_2 + 
w_2^\top (b_2 - B_{12}^\top B_{11}^{-1}b_1) - \frac{1}{4}b_1^\top B_{11}^{-1}b_1
\end{align*}

Note that the last term is constant and therefore plays no role in the minimization. Thus, the optimization problem in terms of $w_2$ is:

\begin{align*}
&& \min~ w_2^\top \hat{B}w_2 + w_2^\top \hat{b} \\
s.t. && w_2^\top w_2 &= c \\ \nonumber\\
\text{where} && \hat{B} = B_{22} - B_{12}^\top B_{11}^{-1}B_{12} &\text{ and }\hat{b} = b_2 - B_{12}^\top B_{11}^{-1}b_1
\end{align*}

Now we are minimizing a quadratic objective subject to a norm constraint. Note that $w_2$ has at most $T$ elements. Next we will diagonalize the objective matrix $\hat{B}$.

## Diagonalizing the objective matrix

Is this necessary? $\hat{B}$ is already diagonal for the RTS-96...

Let the Eigendecomposition of $\hat{B}$ be given by $\hat{B} = \hat{U}\hat{D}\hat{U}^\top$ and perform a change of variables to $\hat{w}_2 = \hat{U}^\top w_2$. Then the problem becomes

\begin{align*}
&& \min~ \hat{w}_2^\top \hat{D}\hat{w}_2 + \hat{w}_2^\top \hat{d} \\
s.t. && \hat{w}_2^\top \hat{w}_2 &= c
\end{align*}

Where $\hat{D}$ is a diagonal matrix and $\hat{d} = \hat{U}^\top\hat{b}$. To solve this optimization problem, we write the KKT conditions.

## Solution via first-order optimality conditions

The gradient of the objective function is $\nabla f(\hat{w_2}) =  2\hat{w}_2\hat{D} + \hat{d}$, and the gradient of the constraint is $\nabla h(\hat{w}_2) = 2\hat{w}_2$. Letting $v$ be the Lagrange multiplier associated with the constraint, we write

\begin{align}
\frac{\partial \mathcal{L}(\hat{w}_2,v)}{\partial \hat{w}_2} = 2\hat{D}\hat{w}_2 + \hat{d} - v(2\hat{w}_2) &= 0 \\
\iff \hat{D}\hat{w}_2 + \frac{1}{2}\hat{d} &= v\hat{w}_2
\end{align}

The above expression gives us $\hat{w}_2$ for any $v$ we choose. We seek $v$ such that the corresponding $\hat{w}_2$ is also feasible (e.g. $\hat{w}_2^\top \hat{w}_2 = c$).

Proceed according to Dan's notes: first, check $v=0$ and $v=\hat{D}_{i,i}$ to rule these values out. Then write $\hat{w}_{2,i}$ in terms of $v$:

\begin{align}
\hat{w}_{2,i} &= \frac{1}{2}\left(\frac{\hat{d}_i}{v - \hat{D}_{i,i}}\right)
\end{align}

(Note that we have already ruled out values of $v$ that result in vanishing denominators.) Substituting the above expression into the constraint yields the secular equation:

\begin{align}
\frac{1}{4} \sum_{i}\left( \frac{\hat{d}_i}{v - \hat{D}_{i,i}}\right)^2 &= c
\end{align}

The secular equation has one pole per unique non-zero diagonal element of $\hat{D}$. There are at most two solutions per pole: one to the left and the other to the right. This is best understood graphically. Below we have a secular equation with one pole. The horizontal axis is the value of the Lagrange multiplier $v$, and the vertical axis is $s(v)$, the value of the secular equation. The horizontal line is $s(v)=c$. The two intersection points are the two solutions to this secular equation.

<img src="../images/secular96.png" width=500>