#### Recap

For equality-constrained problem

$$\begin{align*}
\min\,\,\, & f(x) \\
\text{s.t. }\,\,\, & Ax= b \\
\end{align*}$$

write out the Lagrangian

$$L(x, \nu)=f(x)+\nu^T(Ax-b)$$

we can do dual (sub)gradient method by starting with some initial $\nu^0$ and repeating

$$\begin{align*}
x^{k+1} &\in \arg \min_x L(x, \nu^k)\\
\nu^{k+1} &= \nu^k+t_k(Ax^{k+1}-b)
\end{align*}$$

The potential issue is that convergence requires stringent assumptions (such as strong convexity of $f$), and these iterates alone may not even ensure primal feasibility

#### Augmented Lagrangian method

Augmented Lagrangian method (aka method of multipliers) modifies the problem, for $\rho>0$

$$\begin{align*}
\min\,\,\, & f(x)+\frac{\rho}{2}\|Ax-b\|_2^2 \\
\text{s.t. }\,\,\, & Ax= b \\
\end{align*}$$

Using a modified Lagrangian

$$L_{\rho}(x, \nu)=f(x)+\nu^T(Ax-b)+\frac{\rho}{2}\|Ax-b\|_2^2$$

we get the iterates

$$\begin{align*}
x^{k+1} &\in \arg \min_x L_{\rho}(x, \nu^k)\\
\nu^{k+1} &= \nu^k+\rho(Ax^{k+1}-b)
\end{align*}$$

Notice the step size choice $t_k=\rho$

Since $x^{k+1}$ minimizes $f(x)+\left(\nu^k\right)^TAx+(\rho/2)\|Ax-b\|_2^2$ over $x$, we have

$$\begin{align*}
0&\in \partial f(x^{k+1})+A^T\left(\nu^k+\rho(Ax^{k+1}-b)\right) \\
&=\partial f(x^{k+1})+A^T\nu^{k+1}
\end{align*}$$

This is the `stationary condition` for the primal problem, so the KKT conditions are satisfied in the limit

Therefore, it helps primal convergence under weaker assumptions

However, we lose decomposability when $f$ is decomposable, since the Lagrangian is no longer trivially decomposable (due to the augmented term $\rho/2\|Ax-b\|_2^2$)

#### Alternating direction method of multipliers (ADMM)

ADMM tries to `force` the decomposability on top of augmented Lagrangian method

For the problem

$$\min_{x,z} f(x)+g(z), \text{s.t. } Ax+Bz=c$$

define augmented Lagrangian with $\rho>0$

$$L_{\rho}(x, z, \nu)=f(x)+g(z)+\nu^T(Ax+Bz-c)+\frac{\rho}{2}\|Ax+Bz-c\|_2^2$$

Instead of jointly optimizing primal variables $x, z$, ADMM repeats

$$\begin{align*}
x^{k+1}&=\arg \min_x L_{\rho}(x, z^k, \nu^k) \\
z^{k+1}&=\arg \min_z L_{\rho}(x^{k+1}, z, \nu^k) \\
\nu^{k+1}&=\nu^k+\rho(Ax^{k+1}+Bz^{k+1}-c) \\
\end{align*}$$

Under modest assumptions on $f, g$ (e.g., closed and convex), ADMM iterates satisfy residual, primal and dual convergence

#### Scaled form ADMM

If we denote $w=\nu/\rho$, we can rewrite the augmented Lagrangian as

$$L_{\rho}(x, z, w)=f(x)+g(z)+\frac{\rho}{2}\|Ax+Bz-c+w\|_2^2-\frac{\rho}{2}\|w\|_2^2$$

and rewrite ADMM iterates as

$$\begin{align*}
x^{k+1}&=\arg \min_x f(x)+\frac{\rho}{2}\|Ax+Bz^{k}-c+w^k\|_2^2 \\
z^{k+1}&=\arg \min_z g(z)+\frac{\rho}{2}\|Ax^{k+1}+Bz-c+w^k\|_2^2 \\
w^{k+1}&=w^k+Ax^{k+1}+Bz^{k+1}-c \\
\end{align*}$$

where the kth iterate $w^k$ is a running sum of residuals

$$w^k=w^0+\sum_{i=1}^k(Ax^i+Bz^i-c)$$

(We can see that $x^{k+1}$ and $z^{k+1}$ are basically `proximal` operators $\text{prox}_{f, 1/\rho}$ of some kind)