In [1]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

# a special module that converts [batch, channel, w, h] to [batch, units]
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

In [2]:
# assuming input shape [batch, 3, 32, 32]
# 3 channels, 32 width, 32 height
cnn = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=512, kernel_size=(3,3)),
    # 512 is too much, dont have enough information to fill with
    # usually we start with small sizes and increase them
    # kernel_size - width x height of filter mask
    nn.Conv2d(in_channels=512, out_channels=128, kernel_size=(3,3)),
    nn.Conv2d(in_channels=128, out_channels=16, kernel_size=(3,3)),
    nn.ReLU(),
    nn.MaxPool2d((6,6)),
    # pooling is kinda big. too much information is thrown away
    nn.Conv2d(in_channels=6, out_channels=32, kernel_size=(10,10)),
    # must be 16. relu and max pooling dont change number of channels
    # kernel 10x10 size is also too much
    nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(10,10)),
    # wont have enough resolution to have such kernel size
    nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(10,10)),
    nn.Softmax(),
    Flatten(),
    # squeeze picture into a vector
    nn.Linear(64, 256),
    # our size will be 128xWxH (of a remaining image)
    nn.Softmax(),
    # better replace with RELu
    nn.Linear(256, 10),
    nn.Sigmoid(),
    # better softmax
    nn.Dropout(0.5)
    
)


```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```

```


# Book of grudges
* Input channels are wrong literally half the time (after pooling, after flatten).
* Too many filters for first 3x3 convolution - will lead to enormous matrix while there's just not enough relevant combinations of 3x3 images (overkill).
* Usually the further you go, the more filters you need.
* large filters (10x10 is generally a bad pactice, and you definitely need more than 10 of them
* the second of 10x10 convolution gets 8x6x6 image as input, so it's technically unable to perform such convolution.
* Softmax nonlinearity effectively makes only 1 or a few neurons from the entire layer to "fire", rendering 512-neuron layer almost useless. Softmax at the output layer is okay though
* Dropout after probability prediciton is just lame. A few random classes get probability of 0, so your probabilities no longer sum to 1 and crossentropy goes -inf.