# Limits of MLP for image recognition

Neural network architecture can leverage domain's specificities

MLP takes a 1D vector as an input. Images are 2D if greyscale, 3D if color.  
The color dimension is called the *channel*.

By *flattening* the image, we don't lose information, but we lose meta-information.

In [1]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import torch
import torchvision
mnist_dataset = torchvision.datasets.MNIST('./', download=True,
                                           transform=torchvision.transforms.Compose([
                                               torchvision.transforms.ToTensor(),
                                               torchvision.transforms.Normalize(
                                                 (0.1307,), (0.3081,))
                                             ]))
train_loader = torch.utils.data.DataLoader(dataset=mnist_dataset,
                                           batch_size=1, shuffle=False)
data, _ = next(iter(train_loader))
one_example = data[0]
matplotlib.rcParams['figure.figsize'] = [500, 500]

In [2]:
import numpy as np

def show(img, title=None):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest')
    if title is not None:
        plt.title(title)
    plt.show()

show(mnist_dataset[0][0], mnist_dataset[0][1])


KeyboardInterrupt



In [None]:
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = [100, 1]
plt.imshow(one_example.view(1, 1, -1).permute(1, 2, 0), aspect='auto')

Pixels that should be close from one to another are very far apart

We can't recognize the number

Images are natural signals which have 3 properties:

* **Stationarity**: Certain motifs are repeated throughout the input.
* **Locality**: Nearby points are correlated. Meaning the information is **sparse**.
* **Compositionality**: Parts are composed of sub-parts. A deep neural network can decompose layer after layer the information. 

One huge drawback of MLP is that they are not translation *invariant*. If the images are always centered in the training set, the MLP will only focus on central pixels.

When we compare with random signal, this become clear

In [None]:
%matplotlib inline
plt.imshow(torch.randn(1, 28, 28).permute(1, 2, 0))