## Introduction

- When we navigate our visual environment we process high-dimensional visual information into low-D "abstractions" or representations
- The primate visual cortex learns to extract this low-D information in a way that is highly concurrent, extracting several independent facets of a scene (object identity, position, pose, etc) simultaneously .

- The visual cortex learns to extract this information predominantly unsupervised; with minimal or no labels or ground truth reference values



## Motivating questions
- The brain has finite "space" (in numbers of neurons) to represent these abstractions so how do we allocate them?
- Does this allocation change if our environment changes? How so?
- How do we learn to allocate representational space efficiently as a function of our inputs (e.g. visual invironment)?
- How does a single network extract multi-faceted abstractions from a common input (e.g.  that extracts space, category, style from single input)

## Approach

- Create an autoencoder that learns several orthogonal features
- Use semi-supervised training by evaluating its reconstructions of the input
- Analyze the represented space

### Dataset

We can generate an image dataset that contains varying amounts of spatial shifts (dx,dy) and if the network learns this property

### Model

#### Architecture
We use an architecture depicted in the middle (S-AE)

![mod](https://raw.githubusercontent.com/elijahc/tensorflow-generative-model-collections/master/assets/etc/S-AE_structures.png)

- Z latent space has 25 units
- I latent space has 10 for One-Hot encoding of class

#### Loss
The model is trained using a 3-part weighted loss function comprised of:
- Categorical cross-entropy between Identity latent space $I$ and the true class $y$:
    - $XEntropy\big(y_i,I_i\big)$
    
- Mean squared error between the input, $x$, and reconstruction $g(x)$:
    - $MSE\big(x_i,g(x_i)\big)$
    
- Activation cross covariance between the latent spaces
    - $XCov\big(Z, I\big)$
    
$$
Loss = \alpha \cdot XEntropy\big(y_i,I_i\big) + \beta \cdot MSE\big(x_i,g(x_i)\big) + \gamma \cdot XCov\big(Z, I\big)
$$

#### Training

- 5% of training data (n=60000) was withheld as a validation set for determining learning plateau
- All models were trained until validation loss improvement on the last 5 trials fell below 0.05 

## Methods

### Style Embedding

Cheung et al show that split autoencoder networks tend to represent "style" when forced to use Z for learning category-orthogonal info

To measure the degree that Z represents style we would like a contiguous metric or property that represents "style".

I used Isomap to learn a 1-D manifold embedding of all test set images (n=10000) which gives a surrogate contiguous "style" metric.

Sorting images according to this 1D manifold embedding shows it does a decent job of grouping within category "styles".

#### 5-neighbor Isomap
<img src='https://raw.githubusercontent.com/elijahc/vae/master/figures/style_embeddings/isomap_5_neighbor_fashion.png' height ='250px'>

#### 7-neighbor Isomap
![7_neighbor](https://raw.githubusercontent.com/elijahc/vae/master/figures/style_embeddings/isomap_7_neighbor_fashion.png)

#### 10-neighbor Isomap
![10_neighbor](https://raw.githubusercontent.com/elijahc/vae/master/figures/style_embeddings/isomap_10_neighbor_fashion.png)

### Representational variance explained

We want to quantify how well the network does at abstracting scene properties
    
> i.e. Object location (dx,dy) or style variations within an object category

This relationship may not be (and probably isn't) linear so plain correlation may not work.
One way to measure this is to discretize the range of a units activity and examine the property variance in that range.
A "well abstracted" property should have a variance smaller than the properties global variance.

- Define a feature vector $X_{n,t}$ that represents the activations of $n$ units in the latent space over $T$ trials

- Define a contiguous property $P_t$ (e.g. dx from center of FOV) that is indexed by and varied across trials $t$

- If the network learns to represent $P$ in $X_n$'s activity level a subset of activity level of $X_n$ should correspond with a subset $P$

- Split the full activity range across all trials, $X_T$ into a discrete number of $b$ bins so $X_{n,b}$ is some mutually exclusive activity range and a subset of $X_{n,T}$

- For each binned level of activity and calculate the variance of the property $\sigma(P | X_b)$ or $\sigma(P_b)$ for trials evoking activity $X_b$

- A contiguous property $P$ that is "well-represented" by the neurons should have "narrower" variance band at each bin than the global variance of that properity

- A poorly represented property would be expected to have binned variances, $\sigma(P_{b})$, similar to global variance $\sigma(P)$

- $VE_R = E[\frac{\sigma(P)-\sigma(P_b)}{\sigma(P)}]$

## Results

### Model
- All models retained high classification accuracy with only a slight decrease across increasing levels of spatial variation

![pic](https://raw.githubusercontent.com/elijahc/vae/master/figures/2019-01-28/acc_vs_spatial_variation.png)


### Latent Representation Z

#### Representational Variance Explained
![pic](https://raw.githubusercontent.com/elijahc/vae/master/figures/2019-01-28/unit_fve_waterfall.png)

- Each plot of the grid shows the landscape of what the units in latent variable Z learned
- Each column (1-10) is a different model trained on a dataset generated with increasing spatial variation

![pic](https://raw.githubusercontent.com/elijahc/vae/master/figures/2019-01-28/auc_vs_spatial_variation.png)
![pic](https://raw.githubusercontent.com/elijahc/vae/master/figures/2019-01-28/fve_max_vs_spatial_variation.png)

#### Shannon Mutual Information

- I realized we could linearly rescale activity in Z arbitrarily from 0-N and bin them to ints
- Do the same for dx, dy, and isomap embeddings.
- Calculate joint probability distributions for each unit and dx over all trials (25 joint dists)
- Use these to calculate shanon mutual mutual info for each unit

![pic](https://raw.githubusercontent.com/elijahc/vae/master/figures/2019-01-28/unit_shanon_waterfall.png)


![](https://raw.githubusercontent.com/elijahc/vae/master/figures/2019-01-28/shannon_auc_vs_spatial_variation.png)