# Pyramids

## Scale

One of the common problems with finding objects/features in images is that of scale.

Humans are good at recognising the same type of object no mater what it's scale is.
Computers are not.

What we need is a system where by we can search images at several different scales.
This is the idea we will look at today.

And it involves a construct called a Pyramid.\\

## Building a Pyramid


A Pyramid is built by creating multiple copies of an image at different resolutions.
Each level of the Pyramid is $\frac{1}{4}$ the size (in area) of the image at the level below it. i.e. it is halved in both x and y.
So the lowest level of the pyramid has the highest resolution and the highest level has the lowest resolution.



![](images/pyramid.png}
Credit: Mubarak Shah

## Shannon Nyquist

If it seems that downsizing an image is straight forward then think again.

If you've previously taken a class in DSP or data communications, you may be familiar with the Shannon Nyquist sampling theorem.

It states that the sampling rate must be at least twice the highest frequency.
If we downsample, i.e. set the sampling frequency to half of its current value then we need to ensure that there is nothing in our signal (the image in this case) that is higher than half of the new sampling frequency.

Half the sampling rate is called the Nyquist rate or Nyquist frequency.

## Aliasing - Moiré


What happens if we don't?
Well  a phenomenon called aliasing will occur.

This is where frequencies above the Nyquist frequency alias (appear to be) frequencies below the Nyquist frequency.

In images this normally takes the form of Moire.


![](images/Originalandmoire.jpg)

Credit: Gordon Pritchard [The Print Guide: Moiré](http://the-print-guide.blogspot.com/2009/12/moire.html)

## So how do we down-sample?
So what we must do is low-pass filter the signal (image) to remove the frequencies between the new Nyquist rate and the old.

We can do this in an image with our good friend the Gaussian.

We will have to choose the size of the Gaussian carefully so as not to remove any more than we should need to.



## The Gaussian Pyramid - Reduce

In a Gaussian Pyramid, the lowest level $g_0$ is the original image $I$.

The next level up, $g_1$, is computed by a discrete approximation of a Gaussian  weighted average of the values of $g_0$.

Each level $g_l$ is therefore computed by a Gaussian weighted average of the values of $g_{l-1}$.

For a $5\times5$ Gaussian this would be as follows.

\begin{equation}
	g_l(i,j) = \sum_{m=-2}^2\sum_{n=-2}^2 w(m,n)g_{l-1}(2i+m, 2j+n)
\end{equation}

Where $w(m,n)$ is the Gaussian function. Which can of course be made separable.

Note: the $2i$ and $2j$ has the effect of reducing the image by half in each direction.

## The Gaussian Pyramid - Expand
Related to this, is the idea of expanding an image from low-res to higher res.

If we want to do this, the formula is very similar with only a few key differences.

For a $5\times5$ Gaussian this would be as follows.

\begin{equation}
g_{l,n}(i,j) = \sum_{m=-2}^2\sum_{n=-2}^2 w(p,g)g_{l,n-1}\left(\frac{i-p}{2},\frac{j-q}{2}\right)
\end{equation}
    
The major difference is that we are dividing by 2 now instead of multipliying by 2 and this will have the effect of subsampling.  i.e. we will end up with twice as many pixels as before in both x and y.




These are normally carried out with the standard Gaussian (reduce) pyramid which is why we see the indices $l,n$. i.e. We are using the reduced layer to recreate a different version of the higher resolution layer.

Why? well more on that later.

And do we need to use the Gaussian for upscaling?

Well it works well as an interpolation method as it ensures that each pixel has the same impact as every other pixel.

Also, generally when up-scaling you do need to lowpass filter to remove aliased frequencies, although this is not as obvious and a little beyond the scope of these lectures.

## Separability
One of the reasons the Gaussian pyramid is not excessively expensive to compute is that the Gaussian is separable.

In the case of reduction, this means that the image will be filtered and reduced on one axis (making a rectangular image) and then filtered and reduced on the other axis bringing it back to square but now at half the size in each axis or quarter the area in pixels of the original image.

## Laplacian Pyramid

Similar to edge detected images.

Can be used for image compression because most pixels are zero.
\begin{equation}
L_l = g_l - \text{Expand}[g_{l+1}]
\end{equation}

This can also be used to synthetically combine images. 

## Difference of Gaussian Pyramid
An efficient way of calculating the Laplacian Pyramid was introduced by (Burt and Edelson), and was the DoG or Difference of Gaussians.

It is only an approximation, but it's a good one and much more efficient to compute than the Laplacian.

\begin{equation}
\frac{\partial G}{\partial \sigma}=\sigma \Delta^2 G
\end{equation}

This is referred to in physics as the Heat Equation.

\begin{equation}
\frac{\partial G}{\partial \sigma} \approx\frac{ G(x,y,k\sigma)-G(x,y,\sigma)}{k\sigma-\sigma}
\end{equation}
    
\begin{equation}
G(x,y,k\sigma)-G(x,y,\sigma)\approx (k-1)\sigma^2\Delta^2G
\end{equation}



![](images/DoGPyramid.png)

David Lowe: [SIFT Paper](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)