In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

In [2]:
# My standard magic !  You will see this in almost all my notebooks.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%load_ext autoreload
%autoreload 1

%matplotlib inline

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import neural_net_helper
%aimport neural_net_helper

nnh = neural_net_helper.NN_Helper()

# Interpreting Representations: Preview

We have described an $L$ layer (Sequential) Neural Network as
- a sequence of transformations of the input
    - each transformation a *layer* $1 \le \ll \le (L-1)$, producing a new *representation* $\y_\llp$
- that feed the final representation $\y_{(L-1)}$ to a *head* (classifier, regressor)

<div>
    <center>Layers</center>
    <br>
<img src=images/NN_Layers.png>
    </div>

Is it possible to *interpret* each representation $\y_\llp$ ?
- What do the new "synthetic features" mean ?
- Is there some structure among the new features ?
    - e.g., does each feature encode a "concept"

We will briefly introduce the topic of Interpretation.

A deeper dive will be the subject of a later lecture.

Our goal, for the moment, is to motivate Autoencoders.



# Interpretation: Examine the weights

Perhaps the most obvious may to obtain insight into the working of a Neural Network is to examine the weights.
- When the weights are used in a dot product
- They can be interpreted as "patterns" that a layer is trying to match


The linear models of Classical Machine Learning  motivate this idea.

Linear Regression
- $\hat{\y} = \Theta^T \cdot \x
$
- Prediction $\hat{\y}$, given features $\x$, is linear in parameters $\Theta$.

Logistic Regression
- $
\hat{\mathbf{s}} = \Theta^T \cdot \x
$
- Score $\hat{\mathbf{s}}$, which is turned into a probability via the sigmoid function $\sigma$
$$\hat{\mathbf{p}} = \sigma(\hat{\mathbf{s}})$$
is linear in $\Theta$

Let's examine the role of $\Theta_j$ in the dot product.

Consider one *numeric* feature $\x^\ip_j$ for example $i$.

- A unit increase in $\x^\ip_j$
- Holding constant the values for all other features,
- Increases $(\Theta^T \cdot \x^\ip)$ by $\Theta_j$

So $\Theta_j$ may be interpreted as the sensitivity of the dot product to a unit change in feature $j$
$$
\Theta_j = \frac{\partial } {\partial \x_j} (\Theta^T \cdot \x)
$$

That is: how much does the prediction or score depend on the value of the feature.

Suppose instead that $\x_j$ corresponds to the binary feature (indicator/dummy variable)
- $\text{Is } c_1$

Then the  dot product formula indicates that
- $\Theta_j$ is the *increment* to $(\Theta^T \cdot \x)$ 
- Arising from $\x^\ip_j = 1$
- Compared to $\x^\ip_j = 0$

That is: how much the presence of feature $\x_j$ increases the prediction or score.

This idea is even more appealing when the original input $\x^\ip$ is an image.
- We may be able to relate weights to recognizable sub-images of the input

In Convolutional Layers, there is some evidence that
- The first layer recognizes features (matches patterns) for *primitive* concepts
- The second layer recognizes features that are *combinations* of primitive concepts (layer 1 concepts)
- The $\ll$ recognizes features that are *combinations* of layer $(\ll-1)$ concepts

<center>
<div>
    <center><strong>Features by layer</strong></center>
    <br>
     <!-- edX: Original: <img src="images/Layer_features.png"> replace by EdX created image -->
    <img src="images/ThreeLayers_W8_L2_Sl21.png" width=20%>
    </div>
</center>

Although simple, it may be naive to hope that this technique will provide insight into multi-layer Neural Networks
- The layers $1 \le \ll \le (L-1)$ preceding the head Regression/Classification layer $L$
- Are *transforming* input $\x$ into synthetic features $\y_{(L-1)}$
- That are extremely useful for prediction
- But which may no longer be interpretable

For example
- Do we recognize the digit "0"
- Because of interpretable features like the doughnut shape
- Or because of the *ratio* of dark to light pixels ?

We will make further attempts at interpretability that work
- *Not* by interpreting the weights 
- Instead: by finding groups of inputs
- And relating them to synthetic features in some layer

# Interpretation: Clustering of examples

One way to try to interpret $\y_\llp$ is relative to a dataset $\langle \X, \y \rangle = \{ \x^\ip, \y^\ip | 1 \le i \le m \}$

By passing each example $\x^\ip$ through the layers to obtain $\y^\ip_\llp$
- We create a mapping from examples to layer $\ll$ representations
$$
\langle \X, \y_\llp \rangle = \{ \x^\ip, \y^\ip_\llp \; | \; 1 \le i \le m \}
$$

<table>
    <tr>
        <th><center>Mapping inputs to layer l representations</center></th>
    </tr>
    <tr>
        <td><img src="images/Representation_1.png"</td>
    </tr>
</table>

Let's create a scatter plot of each example's representation $\y^\ip_\llp$ 
- In $n_\llp$-dimensional space
- Labeling each point 
- With the target $\y^\ip$
- Or with a set of input attributes, e.g., $(\x^\ip_j, \x^\ip_{j'})$

Perhaps clusters of examples will appear.

If all points in the cluster have the same label
- We might be able to identify the representation with a target or set of input features

Here is an example of the representation of the MNIST digits in an intermediate layer of a particular network
- The output of the Encoder half of an Autoencoder
- Which we will study in a subsequent lecture

<div>
    <center>MNIST clustering produced by a VAE</center>
    <br>
<img src=images/VAE_plot_test-in_latent.png width=800>
    </div>

- Each point is an example $\x^\ip$
- With coordinates chosen from two of the synthetic features in $\y_\llp$
- The color corresponds to the label $\y^\ip$ (i.e., the digit that is represented by the image)

You can see that some digits form tight clusters.

By understanding
- The commonality of examples within a cluster
- How the digit label's vary as a synthetic feature varies

we might be able to infer meaning of the synthetic features.

The first two synthetic features in $\y_\llp$ of MNIST may correspond to properties of those digits
- digits with "tops"
- digits with "curves"

**Note**

This is not too different from trying to interpret Principal Components. 

# Interpretation: Examining the latent space

Suppose we could *invert* the representation $\y_\llp$ to obtain a value $\x$ that lies in the input domain.

Then 
- By perturbing individual synthetic features $\y_{\llp,j}$ in a given representation
    - Perturb $\y_\llp$ to obtain $\y'_\llp$
- And examining the effect on the inverted value $\x'$
- We might be able to assign meaning to the layer $\ll$ feature $\y_{\llp,j}$


Note that the inverted value $\x'$ **is not necessarily** (and probably not) a value in training set $\X$ !
- It is merely a value obtained by the mathematical inversion of a function
- Especially since the perturbed $\y'$ may not be the mapping of any example $\x^\ip \in \X$

<table>
    <tr>
        <th><center>Invert layer l representation</center></th>
    </tr>
    <tr>
        <td><img src="images/Representation_2.png"></td>
    </tr>
</table>

Here are the inverted images obtained by perturbing two synthetic features in $\y_\llp$
- Horizontal axis perturbs one feature
- Vertical axis perturbs a second feature

<center>
<div>
    <center>MNIST clustering produced by a VAE</center>
    <br>
<img src=images/VAE_examine_latent.png>
    </div>
    </center>

Some observations (with possible interpretation)
- Does the  synthetic feature on the horizontal axis control slant ?
    - Examine 0's along bottom row
- Does the synthetic feature on the vertical axis control "curviness" ?
    - Examine the 2's column at the right edge, from bottom to top


There is *no reason to expect* that the inversion of an arbitrary representation
*looks like* a digit but it does !

Perhaps
- The mapping from inputs to representations is such that similar inputs have very similar representations
- Or we impose some constraints on the inversion to force the inverted value to look like a digit


In order for this method to work, we must be able to *invert* $\y_\llp$.

We will show how to do this in a later lecture.

# Deja vu: have we seen this before ?

These two methods of interpretation have been encountered in an earlier lecture
- mapping original features $\x^\ip$ to synthetic features $\tilde{\x}^\ip$
- inverting synthetic feature $\tilde{\x}^\ip$ to obtain original feature $\x^\ip$

Principal Component Analysis (PCA) !

PCA is an Unsupervised Learning task that can be used for
- dimensionality reduction
- clustering

The key to it's intepretability was the simplicity of transforming and inverting

$$
\begin{array}[llll]\\
\X & = & U \Sigma V^T & \text{SVD decomposition of } \X\\
\tilde\X & = & \X V  & \text{transformation to synthetic features}\\
\X & = & \tilde\X V^T  & \text{inverse transformation to original features}\\
\end{array}
$$

The transformation $V$ via matrix multiplication is *linear*.

We will explore *non-linear, invertible* transformations during our study of Autoencoders.

# Conclusion

Neural Networks have the reputation of being magical but opaque.

We hope this brief introduction to interpretation provides some hope that we can understand their inner workings.

A separate lecture will explore this topic in greater depth.

In [4]:
print("Done")

Done
