# Spatial Coherence and Change of Basis
stough, 202-

- [Khan Academy on Change of Basis](https://www.khanacademy.org/math/linear-algebra/alternate-bases/change-of-basis/v/linear-algebra-coordinates-with-respect-to-a-basis)

Complicated way of saying it: Image compression relies on spatial coherence of image data. The fact that a pixel's value is highly correlated with its neighbors most of the time, means that changes of basis like the below can lead to a much more compressible coordinate representation (more zeros, or minimum description length)...

Little simpler: We are all familiar with plotting points in the plane.

<img src="../dip_figs/xy_plane.png" style="width:150px"/>
<!--![XY-plane](../dip_figs/xy_plane.png)-->

A point $A$ for example might have the coordinates $\langle4,3\rangle$, by which we mean $4\cdot\langle1,0\rangle + 3\cdot\langle0,1\rangle$, or $4$ in the $x$ direction and $3$ in the $y$. But the same point in space can also be represented as for example $3.5\cdot\langle1,1\rangle + 0.5\cdot\langle1,-1\rangle$, or rather $\langle\frac{7}{\sqrt2}, \frac{1}{\sqrt2}\rangle$ if we insist on the two directions we're measuring in ($\langle1,1\rangle,\langle1,-1\rangle$) to be unit length. 



If we think about the magnitude or energy of the point, or distance from the origin, notice that in this alternative representation, much of that magnitude ($\frac{7}{8}$th of it in fact) is accounted for by the first component. In fact for a lot of points for which the $x$ and $y$ components are *nearly the same*, the second component could potentially be ignored without losing too much. This is where the compression will come from: if we can ignore the coefficient, then we don't have to store or transmit it.

In this demo we're going to try to demonstrate how such a change of basis can improve the compressibility of image data. We'll look at a pixel and its neighbor, over all pairs in an image. Spatial coherence implies that most of the time, a pixel and its neighboer will be similar.

In [None]:
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np

# For importing from alternative directory sources
import sys  
sys.path.insert(0, '../dip_utils')

from matrix_utils import (arr_info,
                          make_linmap)

from vis_utils import (vis_image,
                       vis_pair,
                       vis_triple)

In [None]:
I = plt.imread('../dip_pics/cat_small.png')
arr_info(I)

In [None]:
vis_image(I, show_ticks=False, title=None)

In [None]:
vis_image(I[...,0], show_ticks=False, title=None, cmap='gray')

&nbsp;
## Let's look at pixel pairs
We'll just consider one of the color channels so as not to confuse the issue more.

In [None]:
pix_pairs = np.reshape(I[...,0], (-1,2)).copy()
arr_info(pix_pairs)

In [None]:
pix_pairs[:10]

In [None]:
numpoints = 5000
randomInds = np.random.choice(pix_pairs.shape[0], numpoints, replace=False)

In [None]:
f, ax = plt.subplots()
ax.scatter(pix_pairs[randomInds,0], pix_pairs[randomInds,1], alpha=.05)
ax.set_aspect('equal')
ax.set_xlabel(r'$pix_0$')
ax.set_ylabel(r'$pix_1$')
ax.set_title('Pixel pair scatter');

It looks like neighboring pixels are highly correlated with one another. **This is the essence of spatial coherence**. Rather than considering them as independent pieces of information, we can instead reframe how we represent a pixel pair.

&nbsp;

## Changing the basis of a pixel pair
We want to reframe to reframe any pixel pair to take advantage of spatial coherence, we can move from the independent basis $\{\langle1,0\rangle,\langle0,1\rangle\}$ to one like $\{\frac{1}{\sqrt2}\langle1,1\rangle,\frac{1}{\sqrt2}\langle1,-1\rangle\}$. Rather than thinking of the pixel pair as "this pixel, then that pixel", we'll have something like "the average of the pair, then the difference of the pair." If spatial coherence is as prevelent as the above scatter plot implies, then "the difference of the pair" will be a relatively insignificant part of the typical pair.

To transform the normal representation of a pixel pair $\langle pix_0,pix_1\rangle$ to the new basis, we can multiply it by a matrix of the two new basis vectors:

\begin{equation*}
\frac{1}{\sqrt2}\begin{vmatrix}
1 & 1 \\
1 & -1
\end{vmatrix}
\begin{vmatrix}
pix_0 \\
pix_1 
\end{vmatrix} =
\frac{1}{\sqrt2}\begin{vmatrix}
pix_0 + pix_1 \\
pix_0 - pix_1
\end{vmatrix} =
\begin{vmatrix}
c_0 \\
c_1
\end{vmatrix}
\end{equation*} 

You can check this with respect to the pixel pair $\langle4,3\rangle$ noted at the top. Below we'll call this representation **avg/diff**.

In [None]:
H = np.array([[1, 1],[1, -1]])/np.sqrt(2)

In [None]:
H

In [None]:
# H is 2x2, while the pix_pairs is Nx2.  So
# we transpose the pix_pair to get 2xN. The
# result of the matmul is 2xN, then we transpose it back.
new_coords = np.matmul(H, pix_pairs.T).T
arr_info(new_coords)

In [None]:
np.mean(new_coords, axis=0)

&nbsp;
## Let's see how much energy is in each component for each pair.
We would like to understand what percentage of the total is in each component in this new representation. We want each pair to sum to 1, where the components are positive. 

In [None]:
# We want to use the absolute values, and divide by the sum so that the sum of each pair is one.
# The np.abs(new_coords) will have shape (N,2), whereas the .sum(axis=-1) produces a (N,). 
# In order to broadcast the shapes correctly, we can add [:,None] so that we're dividing 
# (N,2) with (N,1), and numpy will know what that means. Or just keepdims!
normed_coords = np.abs(new_coords) / np.abs(new_coords).sum(axis=-1, keepdims=True)

In [None]:
normed_coords[:10]

In [None]:
normed_coords[:10].sum(axis=-1, keepdims=True)

You can see that, at least for the first ten pixel pairs, that in this new representation the energy in the $\frac{1}{\sqrt2}\langle1,1\rangle$ direction is most of the total. Let's look at a histogram of these two components, to see if this trend holds across the whole image.

In [None]:
bins = np.arange(0,1, 1/400)

In [None]:
f, ax = plt.subplots()
ax.hist(normed_coords[:,0], bins, alpha = .6, label = r'$c_0$', color = 'r');
ax.hist(normed_coords[:,1], bins, alpha = .6, label = r'$c_1$', color = 'b');

plt.legend();

It looks like in the new representation, the $c_0$ component, associated with the $\frac{1}{\sqrt2}\langle1,1\rangle$ direction, has almost all the power/energy/magnitude associated with any pair. How does this compare to the original representation?

In [None]:
normed_pix = np.abs(pix_pairs) / np.abs(pix_pairs).sum(axis=-1, keepdims=True)
bins = np.arange(normed_pix.min(), normed_pix.max(), 1/400)

In [None]:
f, ax = plt.subplots()
ax.hist(normed_pix[:,0], bins, alpha = .6, label = r'$pix_0$', color = 'r');
ax.hist(normed_pix[:,1], bins, alpha = .6, label = r'$pix_1$', color = 'b');

plt.legend();

The above plot shows that when we consider a pair as two indpendent pixels, then each component ($pix_0$, and $pix_1$) is equally important. 

&nbsp;
## What if we zeroed out all of that less important dimension?
Given that the $c_1$ components seem to matter so little, what if we just didn't keep them? Would that change the image all that much? 
Once we've zeroed out the $c_1$ components, then going back to the pixel space is a matter of multiplying in the other direction:

\begin{equation*}
\begin{vmatrix}
c_0 & c_1 
\end{vmatrix}  \cdot
\frac{1}{\sqrt2}\begin{vmatrix}
1 & 1 \\
1 & -1
\end{vmatrix} =
\frac{1}{\sqrt2}\begin{vmatrix}
pix_0 + pix_1 & pix_0 - pix_1
\end{vmatrix} \cdot
\frac{1}{\sqrt2}\begin{vmatrix}
1 & 1 \\
1 & -1
\end{vmatrix} =
\frac{1}{2}\begin{vmatrix}
2pix_0 & 2pix_1 
\end{vmatrix} =
\begin{vmatrix}
pix_0 & pix_1
\end{vmatrix}
\end{equation*} 

You can check with respect to the coefficient representation $\langle\frac{7}{\sqrt2}, \frac{1}{\sqrt2}\rangle$ from above.

In [None]:
new_coords[:,1] = 0

In [None]:
rec_pairs = np.matmul(new_coords, H.T)
arr_info(rec_pairs)

In [None]:
Ir = np.reshape(rec_pairs, I.shape[:2])

In [None]:
vis_pair(I[...,0], Ir, cmap='gray', show_ticks=False, second_title='Using half in avg/diff')

It appears that eliminating half of the coefficients (all of the $c_1$'s) has a neglible effect on the image. How does this compare to eliminating one of the pixels in every pixel pair in the original representations?

In [None]:
pix_pairs[:,1] = 0
Irp = np.reshape(pix_pairs, I.shape[:2])

In [None]:
vis_triple(I[...,0], Irp, Ir, cmap='gray', show_ticks=False, 
           second_title='Using half in pixel space', 
           third_title='Using half in avg/diff')

In the above we see that eliminating half of the coefficients in the pixel space leads to more noticeable degradation than doing so in the avg/diff representation. Due to spatial coherence, the avg/diff space usually accounts for most of the energy of a pixel pair in only the "avg" component. Such a change of basis followed by truncating or zeroing out components is the basis for lossy compression schemes like JPEG.

### Little experiment figuring out how to consider a pixel and its horizontal neighbor.

In [None]:
a = np.arange(16).reshape(4,4)

In [None]:
a

In [None]:
np.reshape(a, (8,2))

In [None]:
np.reshape(a, (8,-1))

In [None]:
np.reshape(a, (-1,2))