# Internal models for vision

In [None]:
;;
#require "pkp"

open Owl

open Pkp.Visual_coding

let _ = Pkp.Misc.quiet_owl ()

let data_dir = "/home/opam/pkp/pkp-tutorials/data/"

In this tutorial, we will investigate two candidate “internal models” for the perception of small patches of natural images.

As discussed in class, our brain gets an input image $s$ from the retina (two, actually, but we'll neglect stereo vision here!), and must infer the underlying physical “causes” $x$ (features in the environment) that might have given rise to that image $s$. Given an “internal model of the world” comprised of a prior $p(x)$ over features, and a likelihood function $p(s|x)$ describing the optics / response properties of the eye (i.e. how specific environmental features $x$ give rise to images $s$), all the brain needs to do is compute the so-called posterior distribution $p(x|s)$ : the result of **perception**.

During the lecture, the question arose as to what exactly are those “features of the environment” that our brain should infer. For example, how do we know that the world contains chairs and tables, and that inferring their presence in a scene helps us explain the images we get from our eyes? In this notebook, we consider the problem of learning such internal models. For tractability, we will focus on trying to discover latent causes for small patches of natural images, and work with a simple family of probabilistic models.

## 1. The data

Let's start by visualising the data:

In [None]:
let raw_imgs = load_images ~file:(data_dir ^ "natural_images_raw.bin") ()

Check out the shape of `raw_imgs`:

In [None]:
let _ = Arr.shape raw_imgs

As you can see, it's a 3D array, i.e. an array of 9 images, each represented as a 512x512 matrix.

Run the code below for a high-tech movie of those 9 images: 

In [None]:
let _ =
  let ph = placeholder () in
  Arr.iter_slice
    ~axis:0
    (fun img ->
      plot_image ~ph img;
      Unix.sleepf 1.0)
    raw_imgs

In the rest of this notebook, we will work with pre-processed version of these images. Specifically, preprocessing is applied that mimics the filtering properties of retinal ganglion cells:

In [None]:
let imgs = load_images ~file:(data_dir ^ "natural_images.bin") ()

In [None]:
let _ =
  let ph = placeholder () in
  Arr.iter_slice
    ~axis:0
    (fun img ->
      plot_image ~ph img;
      Unix.sleepf 1.0)
    imgs

Can you guess how, technically, these images were obtained from the raw images?

The learning algorithms below will need to randomly sample a lot of image patches (14x14) from the 9 big images in `imgs`. For this, they will need a `stream`, which is basically an “infinite fountain” of natural image patches.

**TODO**: create a “data stream” from those `imgs` using the `create_stream` function, 
    and visualise the stream using `visualise_stream` (see [documentation](https://pkp-neuro.github.io/pkp-tutorials/pkp/Pkp/Visual_coding/index.html)).

In [None]:
(* your code here *)

## 2. Internal models of small patches

The family of models we are going to investigate postulate that any retinal image $s$ arises from the noisy linear superposition of a fixed set of “prototypical images” $p_i$ ($i=1,2,\ldots,K$) of the same size as $s$ (14x14 pixels), each weighted by some intensity $x_i$:

$$ s = \sum_i x_i p_i  + \text{noise} $$

where $\text{noise}$ is some random Gaussian noise that corrupts the image. The prototypical images $p_i$ are the same for all image patches, but their intensities $x_i$ differ from image to image. Perception is about inferring the $x_i$ that might have given rise to a particular $s$. We assume that, before even observing an image patch, we have a prior belief $p(x)$ over what each $x_i$ might be. Together, $p(x)$ and the $p_i$ templates form our “internal model”.

Think of each $p_i$ as a possible “local feature” of the visual scene, and $x_i$ as the intensity with which it contributes to the given image patch $s$. The $x_i$ are thought to be represented in neural activity (here, we're agnostic to exactly how), and therefore will be subject to (e.g. energy, resources) constraints.

## 3. Dense coding models

We will begin by considering “dense coding models”, i.e. a family of models that assume that the feature intensities are normally distributed (i.e. following a Gaussian distribution) _a priori_ (i.e. $p(x)$ is a Gaussian distribution). The reason why they called “dense” models is because samples from a Gaussian distribution are “densely spread“ around the mean. This will be contrasted later with “sparse models” with highly non-Gaussian prior distributions $p(x)$.

In [None]:
let _ =
  let fig (module P : Plot) =
    let xs = Mat.linspace (-6.) 6. 400 in
    let dense = Owl_stats.gaussian_pdf ~mu:0. ~sigma:1. in
    let sparse = Owl_stats.laplace_pdf ~loc:0. ~scale:1. in
    P.plots
      [ item (F (dense, xs)) ~legend:"dense" ~style:"l lc 7 lw 2"
      ; item (F (sparse, xs)) ~legend:"sparse" ~style:"l lc 3 lw 2"
      ]
      [ barebone
      ; borders [ `bottom; `left ]
      ; xtics (`regular [ -10.; 2. ])
      ; ytics `auto
      ; xlabel "x"
      ; ylabel "density"
      ]
  in
  Juplot.draw ~size:(400, 300) fig

(Try adding `set "log y"` to the list of plot properties above (e.g. after `barebone`), to plot these two distributions on a logarithmic y-axis. What do you notice?)

### 3.1 A random dense model

Let's get warmed up with the library by looking at a model with 25 completely random features. This will be a bad model and will motivate learning of better ones.

In [None]:
let proto = Arr.gaussian [| 25; 14; 14 |]

In [None]:
let _ = plot_patches proto

To convince ourselves that this is a bad model of natural image patches, we can do the following:
1. sample a few image patches s from our stream of natural image patches
2. figure out the most likely feature intensities $x_i$, for each image patch
3. try and “reconstruct the image patches“ according the equation above, using those most likely feature intensities $x_i$ found in step 2 ─ and compare to the original patches.

Step 1: sample 16 image patches from the stream:

In [None]:
let s = sample_stream stream 16

In [None]:
let _ = Arr.shape s

In [None]:
let _ = plot_patches s

Step 2: infer the most likely feature intensities for these 16 patches:

In [None]:
let x_best = Dense_model.most_likely_intensities proto s

In [None]:
let _ = Arr.shape x_best

Step 3: attempt to reconstruct each $s$ as $\sum_i x_i p_i$:

In [None]:
let s_reconstructed = reconstruct proto x_best

In [None]:
let _ = plot_patches s_reconstructed

The comparison should look pretty awful. In other words, densely combined random features are a poor description of natural images, which are in fact much more structured. Try increasing the number of random prototypical patches in the model (it was set to 25 above). How many “random features“ do you need to have a decent-looking reconstruction?

### 3.2 Learning a better dense model

Now, we are going to optimise the prototypical patches in our dense model, so that the distribution of model-generated patches becomes progressively more similar to the empirical distribution of image patches given by our stream:

In [None]:
let proto = Dense_model.learn stream

This will take a few minutes. Observe the process!

Now, with this new optimised set of prototypical patches, try to run the same reconstruction analysis as above. Are these 25 templates any better than the randomly-generated ones?


## Sparse coding model

We are now going to learn a model in which the feature intensities are distributed in a sparse way. We will consider a larger bank of prototypical features ($K=100$, compared to $25$ above), but require that, statistically, only few of these features be used in any given image. In other words, $p(x)$ is such that each $x_i$ is very often very small, but occasionally very large.

In [None]:
let proto = Sparse_model.learn stream

What do you notice? How do these features compare with the receptive fields of neurons in the primary visual cortex?

Revisit the reconstruction analysis performed above. Is sparse coding an efficient way of coding natural images?

What does the distribution of “most likely feature intensities” look like under this new sparse model? Compare with the dense coding model. Given a matrix `x` of intensities, you can plot a histogram with the following code:

In [None]:
let plot_intensity_hist x =
  let open Gp in
  let fig (module P : Plot) =
    P.plot
      (A Pkp.Misc.(hist ~n_bins:50 x))
      ~style:"boxes fs solid 0.5 lc 8"
      [ barebone; borders [ `bottom ]; xtics `auto; xlabel "intensity" ]
  in
  Juplot.draw ~size:(400, 300) fig