---
title: Auxiliary Magic
description: Hierarchical regularization via a trick.
date: 7/7/2022
draft: true
bibliography: references.bib
csl: annals_statistics.csl
format:
  html:
    code-fold: true
---

[Last time](https://jlindbloom.github.io/posts/simple_image_deblurring.html) we looked at an image de-blurring problem, which we solved by finding
$$
x^\star = \text{argmin}_x \,\, \| A x - y \|_2^2 + \mathcal{R}(x)
$${#fig-inv_prob_obj}
where the regularization term was
$$
\mathcal{R}(x) = \gamma \| L x \|_2^2 = \gamma x^T L^T L x.
$$
In this post, our goal is to:
- describe a hierarchical prior and a method that can give us a better image reconstruction,
- walk through a "magic" trick that will speed up our method,
- and look at using [`CuPy`](https://cupy.dev/) to accelerate our reconstruction using a GPU.

# Probabilistic inverse problems

While the problem posed in @inv_prob_obj is completely deterministic, we can actually think of it as haven arisen from a probabilistic model. Suppose that
$$
\begin{align*}
x &\sim \mathcal{N}\left( 0, \left(\gamma L^T L \right)^{-1} \right), \\
y \, | \, x &\sim \mathcal{N}\left( A x, I \right).
\end{align*}
$$
Then our corresponding density functions are
$$
\pi(x) &\propto \exp\left\{ - \gamma x^T L^T L  x \right\}, \\
\pi(y \, | \, x) &\propto \exp\left\{ - \| A x - y \|_2^2 \right\},
$$
and by Bayes' theorem the posterior density for $x \, | \, y$ is given as
$$
\pi(x \, | \, y) &\propto \exp\left\{ - \| A x - y \|_2^2 \right\} \times \exp\left\{ - \gamma x^T L^T L  x \right\}.
$$
The [MAP estimate](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation) of $x$ is then given as
$$
x^\star = \text{argmax}_x \,\, \pi(x \, | \, y),
$$
which is equivalent to 
$$
\begin{align*}
x^\star &= \text{argmin}_x \,\, - \log \pi(x \, | \, y) \\
&= \text{argmin}_x \,\, \gamma \| L x \|_2^2 = \gamma x^T L^T L x,
\end{align*}
$$
which is exactly @inv_prob_obj. The role of $\mathcal{R}(x)$ can then be seen as contributing a prior of
$$
\pi(x) &\propto \exp\left\{ - \mathcal{R}(x) \right\}
$$
into the inference problem. 

# A hierarchical prior

One reason it can be useful to think probabilistically is because we can motivate different choices of the regularizer $\mathcal{R}(x)$. If we pick
$$
Lx &\sim \mathcal{N}\left(0, \frac{1}{\gamma} I \right)
$$
as our prior (which corresponds to $\mathcal{R}(x) = \gamma x^T L^T L x$), then we are saying that we believe \emph{a priori} that the discrete gradients in our image are distributed according to zero-mean Gaussian with variance $\gamma^{-1}$. We can tweak the strength of the prior by adjusting $\gamma$ and in turn its influence on our reconstructed image, but note that the same $\gamma$ governs \emph{all} of the entire discrete gradient in the image. Thus we might think to introduce a hierarchical prior on the discrete gradient that could try to (loosely) capture the fact that in some regions in an image the discrete gradient will be much larger than it is elsewhere. Define the prior
$$
\begin{align}
\beta^H_{i,j}, \beta^V_{i,j} &\sim \Gamma\left( c, d \right),
Lx &\sim \mathcal{N}\left(0, B_{\beta} \right),
\end{align}{#fig-inv_prob_obj}
$$
which has density 
$$
\pi(x, \beta) = \pi(x \, | \, \beta) \pi(\beta) \propto \det \left( B_{\beta} \right)^{1/2} \exp\left\{ - x^T L^T B_{\beta} L x  \right\} \pi(\beta).
$$\
Here $\left( \cdot \right)^{V/H}$ represent the fact that we are assigining two different hyper-parameter to govern the gradient in each the vertical and horizontal directions, $\Gamma\left(c, d \right)$ represents the [gamma density function](https://en.wikipedia.org/wiki/Gamma_distribution), and
$$
\pi(beta) &\propto \left( \Prod_{i,j}^{mn} \Gamma( \beta_{i,j}^H | c, d) \right) \right) \times \left( \Prod_{i,j}^{mn} \Gamma( \beta_{i,j}^V | c, d) \right)
$$
meaning that all hyper-parameters are assumed to be independent of one another. The reason we use a $\Gamma$ distribution for the hyper-parameter is because it is a [conjugate prior](https://en.wikipedia.org/wiki/Conjugate_prior) for a Gaussian, meaning that we can determine certain relevant conditional distributions analytically. 

# The new posterior

Using our prior


will consider hierarchical regularization motivated from a hierarchical prior. 

hierarchical regularization via hierarchical priors.






In [1]:
import numpy as np

In [8]:
M, N = 3000, 4000

In [9]:
# Draw a random image
rand_img = np.random.randn(M,N)
for j in range(100):
    mat_prod_rand_img = np.fft.fft2(rand_img, norm='ortho')