Linear Data Lab 8

Original lab written by: Emily J. King

Goals: Visualize inner product and cosine similarity. Examine how inner product and its variants (e.g., convolution, cosine similarity, correlation) can be used to compare data and extract information.

Additional files needed: Linear_Data_Chapter_8_Lab.pdf, petedog-lab.png (from Module 5 Lab or another image of your own choice), autocorr.py

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import norm
import scipy # for image convolution
from autocorr import autocorr # for cyclic autocorrelation

Section 1: Visualizing inner product and cosine similarity.

Go to Linear_Data_Chapter_8_Lab.pdf

Section 2: Implementing inner product and cosine similarity.

Create two random vectors x and y in R^7.

In [None]:
x=np.random.rand(7)
y=np.random.rand(7)

Their inner product is:

In [None]:
np.dot(x,y)

Their cosine similarity is:

In [None]:
np.dot(x/norm(x),y/norm(y))

Note that the cosine similarity must be between -1 and 1.  Now set z=-2.7x and compute the cosine similarity of x and z.

In [None]:
z = -2.7*x
np.dot(x/norm(x),z/norm(z))

Why does the answer make sense?  Discuss.

Section 3: Autocorrleation

Let's make a random periodic vector by repeating a random pattern.

In [None]:
r=np.random.rand(50)
per=np.tile(r,10)
plt.plot(per)

Now let's add noise to hide the periodicity.

In [None]:
nper=per+0.5*np.random.normal(0,1,np.size(per))
plt.plot(nper)

Now let's compute the (cyclic) autocorrelation.  (The function defined might not look like the definition of autocorrelation from the class, but it is mathematically the same and a better implentation than directly encoding the formula.)

In [None]:
NPER=autocorr(nper)
plt.plot(NPER)

Discuss the output.

Section 4: 2D cross-correlation / convolution

Let's reload Pete Dog and look at him.  (You may also load another image of your choice by modifying the first command below.)

In [None]:
I=plt.imread('petedog-lab.png')
plt.imshow(I)

Let's add some noise.  Here, eta denotes the amount of noise ("noise level") and is a positive number where the larger the number, the more noise.  Remember that to add noise, we convert the image to double first.


In [None]:
eta = 0.2
nI=I+eta*np.random.normal(0,1,size=I.shape)
plt.imshow(nI)

Let's apply various Gaussian blurring. Note from Scipy documentation: The default filter size is 2*round(4*sigma)+1. So setting sigma to 0.25 results in a 3x3 and to 0.5 results in a 5x5.

Try sigma = 0.25, 0.5, 1, 2.5, 5.

In [None]:
plt.imshow(scipy.ndimage.gaussian_filter(nI, sigma=0.25))

Let's compute the cross correlation with the Sobel filter for horizontal edges.  For ease of visualization, we'll first convert the color image to a grayscale.  Note that scipy.ndimage.sobel(bwI, 1) yields the cross correlation of the Sobel filter for horizontal edges.

In [None]:
bwI=0.2989 * I[:,:,0] + 0.5870 * I[:,:,1] + 0.1140 * I[:,:,2] 
plt.imshow(scipy.ndimage.sobel(bwI, 0),cmap='gray')

And now for vertical edges.

In [None]:
plt.imshow(scipy.ndimage.sobel(bwI, 1),cmap='gray')

Exercises

1. Which of the sigma values used in the Gaussian blurring gave the "best" result?  (Note: There is no correct answer here, but you must justify your answer to get any credit.)

2a. Create the following image.

In [None]:
A=np.hstack((np.ones([500,250]), np.zeros([500,250])))

b. Visualize A.  (If you want grayscale visualization, use cmap='gray' as above.)

c. Choose a sigma level and perform a Gaussian filter on A. Visualize the output.

d. Explain why the output of c makes sense.

e. Compute the cross correlation of A with the Sobel filter for horizontal edges. Visualize the output.


f. Explain why the output of e makes sense.

g. Compute the cross correlation of A with the Sobel filter for vertical edges. Visualize the output.


h. Explain why the output of g makes sense.

3. Go to Linear_Data_Chapter_8_Lab.pdf.