# Sparse and redundant representations: from theory to practice

## SparseLand model

Many problems in image/signal processing can be written as

$y = Cx + \epsilon$ where $\epsilon \sim N(0, \sigma^2)$

where $y$ is observed signal, $x$ is actual signal, and $C$ is some linear operator that 'degrades' signal (blurring, image downsampling et c).

If we have a prior $P(x)$ we can approach this problem as 

$\hat{x} = argmin_{x} P(x) \text{ such that } \|Cx - y\|^2 < \sigma^2$

#### Types of priors

Note: for images at least, I think that prior to wavelets, these methods actually were used as priors on image patches rather than whole images.

* Smoothness-inducing 

    $P(x) \propto e^{-\|Lx\|^2}$ ($L$ is Laplace operator, discrete 2nd order derivative)

* Smoothness-inducing with edge information

    $P(x) \propto e^{-\|LWx\|^2}$ ($W$ is diagonal matrix that has smaller values for edge locations)

* Total variation

    $P(x) \propto e^{-\|\nabla x\|_1}$

* Transform based

    $P(x) \propto e^{-\|T x\|^2}$, $T$ is transform coefficient matrix, for example for Fourier transform

* Wavelet-based, inducing sparsity

    $P(x) \propto e^{-\|Wx\|_1}$ where $W$ is wavelet coefficient matrix, for example for Haar wavelets

* Sparseland - dictionary learning (?)

    $P(x) \propto e^{-\|D\alpha\|_1}$ where $D$ is dictionary coefficient matrix

In this terminology $W$ matrix generalizes $T$ matrix, and $D$ matrix generalizes further, because the dictionary can either fixed or learned from previous data.

#### SparseLand and other generative methods

SparseLand is related to union of subspaces method (subspaces are defined by $D$'s columns), Gaussian Mixture Models, and also local PCA.

## Use case: Image deblurring

Let $v$ denote image, and $H$ the blurring operator.

The observed image is given by $z = Hv + \epsilon$ w

Since $v$ is an image, we can choose a dictionary $D$ such that $D\alpha \approx v$

Thus we can formulate the problem (this is dual formulation)

$\alpha = argmin_{\alpha} \lambda \| \alpha \|_{0} + \|HD\alpha - z\|^2$

In practice $\| \dot \|_0$ is replaced by $\|\dot\|_1$

#### Used optimization methods

* gradient descent (with momentum)
* Minimization-Majorization:
    Idea: for function $f(x)$ hard to minimize, optimize $Q(x, x_0)$ instead
    $Q$ needs to satisfy
    * $Q(\alpha_0, \alpha_0) = f(\alpha_0)$
    * $\forall{\alpha} Q(\alpha, \alpha_0) = f(\alpha)$
    * $\nabla Q(\alpha, \alpha_0) = \nabla f(\alpha)$
    
    The minimization proceeds by updating  $\alpha_{i+1} = \alpha$ where $\alpha$ is optimal in previous step.

## Dictionary Learning

Dictionary learning is what is done when the transform used in prior is to be learned from data.

Technically dictionary learning can be thought of as matrix factorization:

$\hat{A} = argmin_{A}\|Y - XA\|^2$ such that $\forall{i} \|X_i\|_p \leq k$ ($p$ is either 0 or 1) 

Most methods alternate $X$ and $A$ updates, and use either reconstruction error or sparsity of $A$ as stopping criterion.

## The denoising problem

Denoising problem, formulated as 

$z = x + v$ where entries of $v$ are $\mathcal{N}(0, \sigma^2)$ distributed

Is the simplest and important inverse problem.

Denoising images is important, because real-world images taken with camera suffer from noise because of lighting condition, compression issues and other.

Good denoising methods can be also used not only for removing noise - for example some of them can be used for cartooning, image decomposition, expanding dynamic range, or dehazing.

### Simple denoising methods

Thresholding