## EECS 453/551
# Eigenimages

What can SVD tell us about the way people write digits?
___

## Before we begin

If you are new to the Jupyter notebook interface, take the tour by clicking Help -> User Interface Tour. The most important thing to know is that you can run a code cell (like the one below) by clicking on it and pressing Ctrl+Enter.

Run the code cell below to load the Python code and data we need.

In [None]:
using MAT, PyPlot, Interact
include("eigenimages.jl")

# load data
trn = matread("TRAIN_DIGITS.mat")["TRAIN_DIGITS"]
testdata = matread("TEST_DIGITS.mat")
tst = testdata["TEST_DIGITS"]
labels = Vector{Int64}(testdata["TEST_DIGIT_LABELS"][:]);

## Comparing digit predictions with labels

This section should be familiar. Run the following cell, then use the slider to scroll through the test dataset and see which digits are correctly classified. The predictions are made using a `classify_image` function just like the one you wrote, and you can change the value of $k$ if you like.

In [None]:
# classify all images:
k = 10
predictions = classify_image(tst,trn,k)

fig1 = figure(figsize=(5,5))

ax1 = fig1[:add_subplot](111)
set_cmap("gray_r")
im1 = ax1[:matshow](vec2mat(tst[:,1]))
ax1[:axis]("off")
ttl = ax1[:text](0,-1,"Predicted: ",size=16)
n, T = size(tst)

In [None]:
@manipulate for i in 1:T; withfig(fig1) do
        test_image = tst[:,i]
        correct_label = Int64(labels[i])
        which_digit = predictions[i]
        im1[:set_data](vec2mat(tst[:,i]))
        ttl[:set_text]("Labeled: $which_digit Actual: $correct_label")
        ttl[:set_color](which_digit == correct_label ? "black" : "red")
        fig1[:canvas][:draw]()
    end
end

## Interactive Eigenimages

We know SVD can do better than mean-based classification, but why? What insight do we gain by taking the SVD over a set of images instead of just using average images?

Run the following cell to generate an interactive figure. The top row of plots shows the first three left singular vectors for a particular digit: $U[:,1]$, $U[:,2]$, and $U[:,3]$. The bottom plot shows the linear combination $ a_1 U[:,1] + a_2 U[:,2] + a_3 U[:,3].$ Think of $U[:,1]$ as the "base image" and $U[:,2]$ & $U[:,3]$ as the two most common deviations from the base image. By adding and subtracting $U[:,2]$ and $U[:,3]$ through the coefficients $a_2$ and $a_3$, we are modifying the base image by adding and subtracting pixels.

Set "Digit" to 0 and play with the sliders. What does this tell you about the way people write "0"?

*Note: you can drag a slider or use the arrow keys to change its value.*

In [None]:
# specify interaction behavior
n,T = size(tst)
ncomps = 3
Uvecs = zeros(n,10,ncomps)
for i in 1:10
    U,S,V = svd(trn[:,:,i])
    Uvecs[:,i,:] = U[:,1:3]
end
    
fig2 = figure(figsize=(8,8))
set_cmap("bwr")
ax21 = subplot2grid((3,2), (0,0), colspan=2)
ax21[:axis]("off")
ax21[:text](6,-1,"u1",size=16)
ax21[:text](27,-1,"u2",size=16)
ax21[:text](49,-1,"u3",size=16)

ax22 = subplot2grid((3,2), (1,0), colspan=2, rowspan=2)
ax22[:axis]("off")
lincomblabel = ax22[:text](2,17,"a1*u1 + a2*u2 + a3*u3",size=16)

# initialize plot with digit "0"
v1,v2,v3 = [vec2mat(Uvecs[:,1,i]) for i in 1:3]
ws = zeros(16,5)
im21 = ax21[:matshow]([v1 ws v2 ws v3],vmin=-0.5,vmax=0.5)

lc = vec2mat(linear_combo(1.0, 0.0, 0.0, 0, trn))
im22 = ax22[:matshow](lc,vmin=-0.5,vmax=0.5)

In [None]:
@manipulate for
    a1=slider(0.1:0.1:1.0, value=1.0, label="a1"),
    a2=slider(-0.5:0.1:0.5, value=0.0, label="a2"),
    a3=slider(-0.5:0.1:0.5, value=0.0, label="a3"),
    d=dropdown(0:9, value=0, label="Digit:");
    withfig(fig2) do
        v1, v2, v3 = [vec2mat(Uvecs[:,d+1,i]) for i in 1:3]
        im21[:set_data]([v1 ws v2 ws v3])
        v = vec2mat(linear_combo(a1,a2,a3,d,trn))
        im22[:set_data](v)
        lincomblabel[:set_text]("$a1*u1 + $a2*u2 + $a3*u3")
        fig2[:canvas][:draw]()
    end
end

## Plot first three vectors for each digit

Run the following cell to see the first three left singular vectors for all ten digits.

Now save the figure, [print it][1], and hang it in your room. (optional)

[1]: http://www.itcs.umich.edu/sites/printing/poster.php

In [None]:
fig3 = figure(figsize=(17,4))
set_cmap("bwr")
for i in 1:3
    for j in 1:10
        ax3 = fig3[:add_subplot](3,10,(i-1)*10 + j)
        v = Uvecs[:,j,i]
        ax3[:matshow](vec2mat(v))
        ax3[:axis]("off")
    end
end

## Singular value "knee"

In class we plotted $P_{correct}$ versus $k$ and found that $P_{correct}$ was highest around $k=11$. Why did accuracy decrease when we moved away from this value? In general, prediction accuracy is highest when we capture the most signal and the least noise, and we can use singular value magnitudes to distinguish the two.

Run the cell below to plot singular value magnitudes for the training set of a particular digit. Use the top slider to vary the digit. Use the bottom slider to set a cutoff value for $k$ and compute the fraction 

$$\frac{\text{sum}(S[1:k])}{\text{sum}(S)}.$$

A couple things to think about:

* How many points "break away" from the smooth (lower-right) portion of the plot?
* What fraction of the typical 16x16 image of a digit is signal?
* Why is there such a dramatic separation between $S[1]$ and $S[2]$ for the digit "1"?

In [None]:
fig4 = figure(figsize=(8,8))
ax4 = fig4[:add_subplot](1,1,1)
ax4[:set_xlabel]("index")
ax4[:set_ylabel]("singular value magnitude")
ax4[:axis]([-2,258,0,250])

U,S,V = svd(trn[:,:,1])
line, = ax4[:plot]([10.5,10.5],[0,250])
pts, = ax4[:plot](S,lw=0,marker="o",c="k",markersize=4)

ttl41 = ax4[:text](100,255,"Digit: ",size=16)
ttl42 = ax4[:text](50,230,"sum(S<cutoff)/sum(S): ",size=14)

In [None]:
@manipulate for 
    digit=dropdown(0:9, label="Digit"),
    cutoff=slider(0:256, value=10, label="");
    withfig(fig4) do
        U,S,V = svd(trn[:,:,digit+1])
        line[:set_xdata]([cutoff,cutoff])
        pts[:set_ydata](S)
        ttl41[:set_text]("Digit: $digit")
        pct = round(100*sum(S[1:cutoff])/sum(S),1)
        ttl42[:set_text](string("$pct% of sum(S) is captured in first\n",
            round(100*cutoff/256,1), "% of components"))
        fig4[:canvas][:draw]()
    end
end