# Predictive Coding
stough 202-

In this notebook we're going to look at a way to account for very simple spatial redundancy, leading to much better compressibility. In short, a prediction of a pixel's value as its left neighbor is usually pretty effective, leading to a more compressible signal than the original without any loss of information. Read on for more.

**Huffman**:
In our discussion of [entropy](./entropy_intro.ipynb) we noted the different kinds of redundancy that we might leverage or account for in order to compress an image. We accounted for **coding redundancy** by applying Huffman variable length encoding. Huffman leverages differences in the relative probability of certain pixel values over others (low entropy) to define an encoding scheme that minimizes the average number of bits needed to represent each pixel value. Huffman's efficiency is inversely proportional to the entropy implied by the histogram of the image:
- If the histogram tends toward uniform, entropy is high and Huffman coding will accomplish little.
- If the histogram is highly non-uniform, with for example a few large spikes, then entropy is low and Huffman coding will work well.

**Predictive coding** computes some derivation of the image: the value at every pixel is its difference with respect to the pixel to the left, with the first column not changing. We'll see that while this is a completely reversible function, such a predictive coded version of an image can be much more compressible that the original. 

In [None]:
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np

# For spatial filtering/operations
from scipy.ndimage import (correlate,
                           convolve)
from scipy.stats import entropy

# For importing from alternative directory sources
import sys  
sys.path.insert(0, '../dip_utils')

from matrix_utils import (arr_info,
                          make_linmap)
from vis_utils import (vis_rgb_cube,
                       vis_hists,
                       vis_pair)

In [None]:
I = plt.imread('../dip_pics/skyandsea.jpg')
vis_hists(I)
print(arr_info(I))

In the above, we're viewing the image in its original, human-readable form. Let's compute the entropy of the image.

In [None]:
freq, bb = np.histogram(I.ravel(), bins = np.arange(257))

In [None]:
entropy(freq, base=2)

Let's observe just one of the color channels, without loss of generality.

In [None]:
J = I[...,0]
vis_hists(J)

## Using [`correlate`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.correlate.html) 

Here we're going to be using the `correlate` method in scipy in order to apply a simple linear function to neighborhoods of pixels. This *filtering* is a core element of image processing from simple edge-detection to convolutional neural networks, and we will be seeing it again and again. You should read more about filtering in the spatial domain [here](https://www.mathworks.com/help/images/what-is-image-filtering-in-the-spatial-domain.html).

In this case the neighborhood in question is extremely simple. When our process is at a pixel $i,j$, then the neighorhood consists of that pixel and its neighbor to the left, $i,j-1$, or 0 if the pixel is at the left edge of the image already. The linear combination of the two pixels that we want to compute is just the difference. That is, $$p_{i,j}' = -1p_{i,j-1} + 1p_{i,j}$$
We'll construct a simple array (also called a mask, or filter) that represents this linear combination, and then go.

In [None]:
h = np.array([-1, 1], ndmin=2).astype('int16')

In [None]:
h

In [None]:
arr_info(h)

In [None]:
Jf = correlate(J.astype('int16'), h, mode='constant', cval=0)
arr_info(Jf)

In [None]:
vis_hists(Jf)

Notice that if we take one pixel (value) and subtract another pixel from it, we may get a negative number. That's why in the above we consider the original image as `int16`, with both positive and negative possible values.

Look at this image and its histogram compared to the one above. Which do you think has less entropy and therefore may be more highly compressible?

I would like you to note as well that this operation is completely reversible; more on that below.

In [None]:
Jf[:5, :10]

In [None]:
J[:5,:10]

In [None]:
Jr = np.cumsum(Jf, axis=1)
vis_hists(Jr)

In [None]:
freq, bins = np.histogram(J.ravel(), bins=np.arange(257))

In [None]:
entropy(freq, base=2)

In [None]:
ff, bb = np.histogram(Jf.ravel(), bins = np.arange(-255,257))

In [None]:
entropy(ff, base=2)

## Conclusions

Here we saw how predictive coding (filtering with that simple mask) is a completely reversible operation that leads to an image with significantly less entropy than the original image. Why is this? 

Basically, from what we've learned so far, images that are interesting to us generally contain large swaths of constant or smoothly-varying color or intensity changes. That is, most of the time the context of any pixel is highly predictive of that pixel's value itself. Where is this not the case? Really only at *edges*.  

Since changes are smooth and small most of the time, the predictive-coded image will have many many more values close to zero, leading to a much more compressible distribution.