#### Barrier interior point method

For twice-differentiable convex functions $f_i$, we want to solve

$$\min f_0(x), \text{s.t. } f_i(x)\leq 0, i=1, \cdots, m, \,Ax=b$$

where $A\in \mathbf{R}^{p \times n}, \, \text{rank }A=p$, and $p^*$ is optimal value

We see previously that this problem can be formulated as a `sequence` of centering problem parameterized by variable $t$

For each $t$, we solve an equality-constrained optimization

$$\min tf_0(x)+\phi(x), \text{s.t. } Ax=b$$

where $\phi(x)$ is the log-barrier function eliminating the inequality constraints

$$\phi(x)=-\sum_{i=1}^m \log (-f_i(x)), \text{dom }\phi=\{x|f_1(x)<0,\cdots,f_m(x)<0\} $$

Once we solve the centering problem and obtained $x^*(t)$, we get a set of dual feasible point $\lambda^*, \nu^*$ and the duality gap is given by

$$p^*\geq g(\lambda^*(t), \nu^*(t))=f_0(x^*(t))-\frac{m}{t}$$

We then crank up $t$ and iteratively solve the `centering problem` to reduce the duality gap until desired accuracy

This is known as the `barrier method`, and we get `Newton's step` by solving the KKT equations

$$\boxed{\begin{bmatrix}
t\nabla^2 f_0(x)+\nabla^2 \phi(x) & A^T \\ A & 0
\end{bmatrix}\begin{bmatrix}
\Delta x_{nt}\\ \nu_{nt}
\end{bmatrix}=\begin{bmatrix}
-t\nabla f_0(x)+\nabla \phi(x)\\0
\end{bmatrix}}$$

On central path, $x^*, \lambda^*, \nu^*$ almost satisfy the KKT conditions for the original problem, except for the complementary slackness $-\lambda_i^*f_i(x^*)=1/t$

#### Interpretation as reducing primal and dual residual

We can also see barrier method as minimizing the residuals directly from the `original problem`, using Newton's method

We write the `residual` as

$$r(x, \lambda, w)=\begin{bmatrix}\nabla f_0(x)+\sum_{i=1}^m \lambda_i \nabla f_i(x)+ A^Tw \\ -\text{diag}(\lambda)f(x)-\frac{1}{t}\mathbf{1}\\Ax-b\end{bmatrix}\begin{array}{l}\text{dual residual} \\ \text{centrality residual}\\ \text{primal residual}\end{array}$$

For the dual residual, we can also write as

$$\begin{align*}
r_{d}&=\nabla f_0(x)+\sum_{i=1}^m \lambda_i\nabla f_i(x)+ A^Tw \\
&=\nabla f_0(x)+Df(x)^T \lambda+ A^Tw
\end{align*}$$

where

$$Df(x) = \begin{bmatrix}\nabla f_1(x)^T \\ \vdots \\ \nabla f_m(x)^T\end{bmatrix}$$

We plug in $\lambda_i=-1/(tf_i(x))$ to eliminate $\lambda_i$

$$r(x, w)=\begin{bmatrix}\nabla f_0(x)+\sum_{i=1}^m \frac{1}{-tf_i(x)}\nabla f_i(x)+ A^Tw \\Ax-b\end{bmatrix}\begin{array}{l}\text{dual residual} \\  \text{primal residual}\end{array}$$

Similar to the primal-dual interpretation of Newton step, we `linearize` the residual and set it to zero


$$r(x, w)+Dr(x, w)\begin{bmatrix} \Delta x_{nt} \\ \Delta w_{nt} \end{bmatrix}=0$$

For the Jacobian, we have

$$Dr(x, w)=\begin{bmatrix}\frac{\partial r_1}{\partial x} & \frac{\partial r_1}{\partial w}\\ \frac{\partial r_2}{\partial x} & \frac{\partial r_2}{\partial w} \end{bmatrix}=\begin{bmatrix}\nabla^2 f(x) & A^T\\ A & 0 \end{bmatrix}$$

More specifically

$$\begin{align*}
\nabla^2f(x)&=\nabla \left(\nabla f_0(x)+\sum_{i=1}^m\frac{1}{-tf_i(x)}\nabla f_i(x)\right)\\
&\nabla \left(\sum_{i=1}^m\frac{1}{-f_i(x)}\nabla f_i(x)\right)=\sum_{i=1}^m\frac{1}{f_i(x)^2}\nabla f_i(x)\nabla f_i(x)^T +\sum_{i=1}^m\frac{1}{-f_i(x)}\nabla^2 f_i(x) \\
&=\nabla^2 f_0(x) + \sum_{i=1}^m\frac{1}{tf_i(x)^2}\nabla f_i(x)\nabla f_i(x)^T +\sum_{i=1}^m\frac{1}{-tf_i(x)}\nabla^2 f_i(x)
\end{align*}$$

Let $w = w + \Delta w_{nt}$, we have

$$H\Delta x_{nt} + A^Tw = -g, \,\,A\Delta x_{nt}=0$$

where

$$\begin{align*}
H&=\nabla^2 f_0(x) + \sum_{i=1}^m\frac{1}{tf_i(x)^2}\nabla f_i(x)\nabla f_i(x)^T +\sum_{i=1}^m\frac{1}{-tf_i(x)}\nabla^2 f_i(x)\\
g &= \nabla f_0(x) + \sum_{i=1}^m \frac{1}{-tf_i(x)}\nabla f_i(x)
\end{align*}$$

Compare to the expression of $\nabla \phi(x)$ and $\nabla^2 \phi(x)$, we see that

$$H=\nabla^2 f_0(x) + \frac{1}{t}\nabla^2 \phi(x),\,\, g=\nabla f_0(x)+\frac{1}{t}\nabla \phi(x)$$

Compare to the centering problem

$$\begin{bmatrix}
t\nabla^2 f_0(x)+\nabla^2 \phi(x) & A^T \\ A & 0
\end{bmatrix}\begin{bmatrix}
\Delta x_{nt}\\ \nu_{nt}
\end{bmatrix}=\begin{bmatrix}
-t\nabla f_0(x)+\nabla \phi(x)\\0
\end{bmatrix}$$

we see that Newton step for `primal` variable from the solving the residual of the `original` problem (based on the modified KKT conditions) is the same as directly solving the KKT equations of the `centering` problem

For the `dual` variable, we get a scaled version compared to solving the KKT equations

$$w=\frac{1}{t}\nu_{nt}$$

#### Primal-dual interior point method

An alternative to the barrier method is the primal-dual method, which does not get rid of $\lambda_i$ and instead solves the complete equations of residuals for $\Delta x, \Delta \lambda, \Delta \nu$ (analogous to infeasible start Newton's method)

The linearized residual becomes

$$r(x, \lambda, \nu)+Dr(x, \lambda, \nu)\begin{bmatrix} \Delta x \\ \Delta \lambda \\ \Delta \nu \end{bmatrix}=0$$

where

$$r(x, \lambda, \nu)=\begin{bmatrix}\nabla f_0(x)+\sum_{i=1}^m \lambda_i \nabla f_i(x)+ A^T\nu \\ -\text{diag}(\lambda)f(x)-\frac{1}{t}\mathbf{1}\\Ax-b\end{bmatrix}\begin{array}{l}\text{dual residual} \\ \text{centrality residual}\\ \text{primal residual}\end{array}$$

For the Jacobian, we have

$$Dr(x, \lambda, \nu)=\begin{bmatrix}\nabla^2 f_0(x)+\sum_{i=1}^m \lambda_i \nabla^2 f_i(x) & Df(x)^T & A^T \\
-\text{diag}(\lambda) Df(x) & -\text{diag}(f(x)) & 0 \\
A & 0 & 0 \end{bmatrix}$$

##### Surrogate duality gap

Since the primal-dual method does not guarantee feasibility at each iteration, except in the limit as the algorithm converges, surrogate duality gap is often used

For any $x$ such that $f(x)\leq 0$ and $\lambda \geq 0$, the surrogate duality gap is given by

$$\hat{\eta}(x, \lambda) = -f(x)^T\lambda$$

Obviously, the surrogate duality gap becomes duality gap $m/t$, if $x, \lambda, \nu$ are feasible

##### Steps for primal-dual method

Given $x$ such that $f_i(x)<0, i=1, \cdots, m$, $\lambda >0, \mu>1$, repeat

* Compute surrogate duality gap $\hat{\eta}$
* $t=\mu m /\hat{\eta}$
* Solve primal-dual search direction $\Delta y=\begin{bmatrix} \Delta x \\ \Delta \lambda \\ \Delta \nu \end{bmatrix}$
* Line search to determine step size $s$
* $y \leftarrow y + s\Delta y$

until $\|r_p\|_2\leq \epsilon_{\text{feas}}, \|r_d\|_2\leq \epsilon_{\text{feas}}$, and $\hat{\eta}\leq \epsilon$

That is, we check both residual (how far variables are from being feasible) and surrogate duality gap

##### Step size

To determine step size, the line search ensures that $\lambda >0$ and $f(x)<0$

We first compute the largest positive step length not exceeding one that gives $\lambda^+\geq 0$

$$s^{\max}=\min\{1, \min\{-\lambda_i/\Delta \lambda_i|\Delta \lambda_i<0\}\}$$

Then, we start with $s=0.99s^{\max}$ and do $s\leftarrow s\beta$ until

* $f(x^+)<0$ and
* $\|r(x^+, \lambda^+, \nu^+)\|_2\leq (1 - \alpha s)\|r(x, \lambda, \nu)\|_2$
