# Lab 7: Neural Nets Advanced

## 1. Dataset downloading
Download one of the small images dataset: [MNIST](http://yann.lecun.com/exdb/mnist/), [FashionMNIST](https://github.com/zalandoresearch/fashion-mnist), [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html). You can find the list of all available datasets by the [link](https://pytorch.org/vision/stable/datasets.html). Help:

```python
training_data = torchvision.datasets.NAME_OF_THE_DATASET(root="data", train=True, download=True, 
                                                  transform=torchvision.transforms.ToTensor())
test_data = ...
```

Check your data folder and find downloaded files

In [1]:
# place for code
import torch, torchvision
training_data = torchvision.datasets.FashionMNIST(root="data", train=True, download=True, transform=torchvision.transforms.ToTensor())
test_data = torchvision.datasets.FashionMNIST(root="data", train=False, download=True, transform=torchvision.transforms.ToTensor())


## 2. Printing
Print some image shape and label. Datasets don't contain label names (only indexes). Use 

```python
label_map = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']  # CIFAR10
label_map = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']  # FashionMNIST
img, label = training_data[image_index]
```

In [3]:
# place for code
image_index = 1
label_map = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']  # CIFAR10
label_map = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']  # FashionMNIST
img, label = training_data[image_index]

print(img, label)

tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0039, 0.0000, 0.0000,
          0.0000, 0.0000, 0.1608, 0.7373, 0.4039, 0.2118, 0.1882, 0.1686,
          0.3412, 0.6588, 0.5216, 0.0627, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0039, 0.0000, 0.0000, 0.0000, 0.1922,
          0.5333, 0.8588, 0.8471, 0.8941, 0.9255, 1.0000, 1.0000, 1.0000,
          1.0000, 0.8510, 0.8431, 0.9961, 0.9059, 0.6275, 0.1765, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0549, 0.6902, 0.8706,
          0.8784, 0.8314, 0.7961, 0.7765, 0.7686, 0.7843, 0.8431, 0.8000,
          0.7922, 0.7882, 0.7882, 0.7882, 0.8196, 0.8549, 0.8784, 0.6431,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.7373, 0.8588, 0.7843,
          0.7765, 0.7922, 0.7765, 0.7804, 0.7804, 0.7882, 0.7686, 0.7765,
          0.7765, 0.7843, 0.7843, 0.7843, 0.7843, 0.7882,

## 3. Plotting
Display several images from the dataset. Help:

```python
img1 = torchvision.transforms.ToPILImage()(img)
plt.imshow(img1)
plt.title(f'class = {label_map[label]}')
plt.show()
```

In [4]:
# place for code
%matplotlib inline
import matplotlib.pyplot as plt

img1 = torchvision.transforms.ToPILImage()(img)
plt.imshow(img1)
plt.title(f'class = {label_map[label]}')
plt.show()

## 4. Neural Net Structure
Define neural net structure, loss criterion and optimizer

In [None]:
import torch.nn.functional as F

class Net(torch.nn.Module):
    def __init__(self):
        super().__init__()
        
        # first argument - input channel count (3 - for color, 1 - for grayscale images)
        # second argument - number of output channels (number of different kernels to use)
        # third argument - kernel size
        self.conv1 = torch.nn.Conv2d(in_channels=3, out_channels=..., kernel_size=...)
        
        # takes max from 2x2 subimages
        self.pool = torch.nn.MaxPool2d(kernel_size=2)
        
        # define one more convolution layers
        # output size of previous layers must match input size of next layers
        self.conv2 = ...
        
        # define fully connected linear layer
        # first argument - input size (channel_num * width * height - for images)
        #second argument - output size
        self.fc1 = torch.nn.Linear(in_features=..., out_features=...)
        
        # define several more fully connected linear layers
        # output size of previous layers must match input size of next layers
        # usually output size decreases from layer to layer
        self.fc2 = ...
        self.fc3 = ...  # output size of last layer should match classes count

    def forward(self, x):
        print('Initial input shape:', x.shape)
        
        # apply self.conv1 layer to the input
        x = self.conv1(x)
        print('Output of conv1 shape:', x.shape)
        
        # apply F.relu activation function to the x
        x = ....
        
        # apply the subsampling layer self.pool and print the resulting shape
        x = ....
        print('Output of first subsampling shape:', x.shape)
        
        # apply second convolution layer conv2, F.relu and self.pool (print shapes to debug errors)
        ....
        
        # linearize output of previous layer, but keep batch dimension
        # view first argument - batch size (-1 - default), second - linearized size of x
        x = x.view(-1, ....)
        
        # apply fully connected layer 1 
        x = ...
        
        # apply F.relu activation function to the x
        x = ...

        # apply fully connected layer 2
        x = ...
        
        # apply F.relu activation function to the x
        x = ...

        # apply fully connected layer 3
        x = ...
        
        # we don't need activation after last layer, because we use softmax with exponents

        # return result
        return x

# construct neural network
net = Net()

# define loss criterion
criterion = torch.nn.CrossEntropyLoss()

# define optimizer
optimizer = torch.optim.SGD(net.parameters(), lr=...., momentum=....)

# define data loaders
trainloader = torch.utils.data.DataLoader(training_data, batch_size=32, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32, shuffle=False)

## 5. Training

Train your model using gradient descent method. Choose learning_rate and epoch count to reach the loss = 0.098

In [None]:
mean_loss = 0
for epoch in range(...):  # loop over the dataset multiple times (epochs)

    running_loss = 0.0
    for cycle, data_batch in enumerate(trainloader):
        # get the inputs; data is a list of [inputs, labels]
        x_batch, label_batch = data_batch

        # forward step
        output = net(x_batch)
        
        # calculate loss
        loss = criterion(output, label_batch)
        
        # calculate gradients
        optimizer.zero_grad()
        loss.backward()
        
        # change weights
        optimizer.step()
        
        # Compute mean loss for several cycles using exponential moving average and alpha = 0.01
        # (use loss.item() to detach gradients)
        mean_loss = ....
        
        # Print loss each 200 cycle
        if cycle % 200 == 0:
            print(f'epoch = {epoch} cycle = {cycle}, loss = {mean_loss}')

print('Training finished')

In [None]:
# save model
torch.save(net.state_dict(), 'model.pkl')

In [None]:
# load previously saved model
net = Net()
net.load_state_dict(torch.load('model.pkl'))

## 6. Prediction
For several images print true classes and predicted probabilities (for each class)

In [None]:
image_index = 3
img, label = training_data[image_index]
print('true class =', label_map[label])

# Network requires batchs as input. Build a batch with size 1 from image
batch = torch.unsqueeze(img, 0)
print('Batch shape =', batch.shape)

output = net(batch)
# convert output to probabilities using F.softmax function (add extra argument dim=1 to suppress warning)
probabilities = ....

# print class name and its probability for each class
....