# The Convolutional Neural Network

## Learning objectives

## Historical and theoretical background

### Hubel and Wiesel

Rosenblatt's photo-perceptron (XXXX) was the first neural network model attempting to emulate human visual and perceptual capacities. Unfortunetaly, little was known at the time about the mammalian visual cortex that could inform Rosenblatt's work. Consequently, the photo-perceptron architecture was inspired by a very coarse idea of how the information flows from the eyeballs to be processed by the brain. This changed fast in the years following the introduction of the perceptron. 

In 1962, David H. Hubel and Torsten Wiesel published one the major breaktroughts in the neurophysiology of the visual cortex: **the existence of orientation selectivity and columnar organization**. This is what they did: they placed tiny microelectrode in a single neuron in the primary visual cortex (V1) of an anesthetized cat, and proyected light and dark dots into the cat's eye. It did not work at all, they could not get a response from the neuron. But, they had a lucky accident. Since they were using a slide projector to show the dots, the margin of the slide with the dot also was projected into the cat's ayes, and bam!, the neuron fired. From there, they experimented with light and dark bars in different orientations to which led them to propose the existence of three types of cells in the visual cortex:

1. **simple cells**, that fire at higher (or lower) rate depending on the bar orientation. Sometimes called "line detectors"
2. **"complex cells"** that fire in response to a wider variety of orientations, yet, they still show a preference (higher firing rate) to certain orientations. Sometimes are called "motion detectors". Importantly, these cells receive input from several *simple cells*. 
3. **"hypercomplex cell"**, characterized by reacting to "stopped" oriented edges, this is, by a decreasing firing rate with increasingly larger stimuli. Again, these cells receive input from several *complex cells*.

As you may have noticed, there three types of cells are hierarchically concatenate. Keeps this in mind as it'll become important later. Altogether, these discoveries were the basis of the work that granted them the Nobel Prize in Physiology in 1981. Below is short video from their experiments. 

[![](http://img.youtube.com/vi/jw6nBWo21Zk/0.jpg)](http://www.youtube.com/watch?v=jw6nBWo21Zk "Hubel & Wiesel's demonstration of simple, complex and hypercomplex cells in the cat's visual cortex")

### Fukushima's Neocognitron

The work of Hubel and Wiesel served as the basis for the precursor of modern convolutional neural netwroks: **Fukushima's Neocognitron** (1980). Kunihiko Fukushima, a Japanese computer scientist, developed the the Neocognitron idea while working at the NHK Science & Technology Research Laboratories, by implementing the simple and complex cells ideas in a multilayer neural network architecture. **Figure X** shows a simplified diagram of the Neocognitron with 3 layers (4 if you count the inputs). 

<center> Figure X: Simplified Neocognitrone </center>

<img src="./images/cov-net/neocognitron.svg">

The general idea behind the Neocognitron is the following: the **input layer $L_0$ works as the retina**, reading the raw input pattern. Then, each cell in a $S_1$ patch "reads" a sub-section of the input image based on a "preference" for certain type of pattern. Any given layer $L_n$ will have several of this $S_j$ patches as a collection of **feature "filters"**. Some may detect a diagonal line, while other a small triangle, or something else. Each $S_j$ patch connects to a $C_k$ cell, and such a cell fires if gets any positive input from its corresponding patch. This process is also known as **"pooling"**. This cycle of "feature" detection and "pooling" is repeated as many times as intermediate layers in the network. The last layer correspond to the output, where some output neuron will fire depending of the input pattern. Mathematically, "feature detection" is accomplished by multiplying the input by a fix matrix of weights, whereas "pooling" corresponding to taking an average of the S-cells. 

You may have noticed that the behavior of the S-cells and C-cells replicate (to some extent) what Hubel and Wiesel found in their experiments. The great thing about this architecture is that is **robust to shifts in the input image**: you can move the image around and the combination of "feature detection" and "pooling" will detect the precense of each part of the image regardless. **Figure X** exemplifies this trait.

<center> Figure X <\center>

<img src="./images/cov-net/neocognitron-cells.svg">

The neocognitron is also **robust to deformation**: it will detect the object even if it's enlarged, reduced in size, or blurred, by virtue of the same mechanism that allows robustness to positional shifting. It is also important to notice that the pooling operation will "blur" the input image, and the fact that C-cells take the average of its corresponding S-cells makes the pooling more robust to random noise added to the image. Below you can find a short video explaining the basics of the Neocognitron as well.

[![](http://img.youtube.com/vi/Qil4kmvm2Sw/0.jpg)](http://www.youtube.com/watch?v=Qil4kmvm2Sw "Neocognitron Movie - Part #1")

If you are familiar with convolutional neural networks, you may be wondering what is the difference between the Neocognitron and later models like Yann LeCun's LeNet (XXXX), since they look remarkably similar. They main (but not only) difference is the training algorithm: **the Neocognitron does not use backpropagation**. At the time, backpropagation was not widely known as a training method for multilayer neural networks reason why Fukushima never use it, and trained his model by using an unsupervised learning approach. Regardless, the Neocognitron lay the groundwork of modern neural network models of vision and computer vision more generally.

### LeCun's LeNet

The basis of what today is known as the convolutional neural network was introduced by Yann LeCun in 1989. Yann LeCun 

Inspired by the work of Fukushima, LeCun and colleagues demostrated that a small convolutional neural network trained with backpropagation could eefectively recognize handwritten zip codes. 


TODO:
- complete LeNet 
- Introduce AlexNet
- Introduce issue of computer vision divergin from cog-science/neuroscience

## Mathematical formaliation

## Code implementation

## Application

## Limitations

## References