## Imports

Importing `padl` and most importantly `transform` decorator used to change any `callable` to `padl.Transform`

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import torch
from torchvision import models

import padl
from padl import transform

## Kaggle Digit Recognizer dataset:
Kaggle Digit Recognizer dataset is used in this notebook. It can be easily downloaded from the kaggle link below.

https://www.kaggle.com/c/digit-recognizer

Details on the structure of the data can be read from the link above. Important information on the data structure is given in exerpt below

> The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.


### 0. Reading `csv` files for training and testing
Note: `test.csv` does not contain data label in kaggle dataset. It is inteded to be used for submission to kaggle competition. Here, we can use it for quick inference. 

In [None]:
train_csv = 'mnist/train.csv'
test_csv = 'mnist/test.csv'

with open(train_csv) as f:
    train_data = f.readlines()
train_array = torch.tensor([list(map(int, line.split(','))) for line in train_data[1:]])


with open(test_csv) as f:
    test_data = f.readlines()
test_array = torch.tensor([list(map(int, line.split(','))) for line in test_data[1:]])

In [None]:
print('Train data shape:', train_array.shape)
print('Test data shape:', test_array.shape)

<hr style="border:1px solid"> </hr>


### 1. Plot few images to check the data

`load_image` is a normal python function that takes in an image tensor and uses `matplotlib.pyplot` to plot the image. With `@transform` decorator, we can easily convert it to `padl.transform`. This allows us to use `padl` functional api and build data pipeline easily and quickly. 

Quick recap to `padl` operators:
- `>>`: Compose operator: $(f_1 >> f_2)(x) \rightarrow f_2(f_1(x))$
- `+`: Rollout operator: $(f_1 + f_2) (x) \rightarrow (f_1(x), f_2(x))$
- `/`: Parallel operator: $(f_1 / f_2)((x_1,x_2)) \rightarrow (f_1(x_1), f_2(x_2))$
- `-`: Name operator: Names a transform so that its output can be accesed by given name or the transform itself can be accessed by its name from the pipeline:  
    - $((f_1 - \text{'zulu'})+f_2)(x) \rightarrow \text{Namedtuple}(\text{'zulu'}:f_1(x), \text{'out_1'}:f_2(x))$
    - $((f_1 - \text{'zulu'})+f_2)[\text{'zulu'}] = f_1$

In [None]:
@transform
def load_image(img_tensor):
    fig= plt.figure(figsize=(2,2))
    ax = fig.add_subplot(111)
    ax.imshow(img_tensor, cmap='gray')
    plt.axis('off')
    plt.show()

### 1.1 Building a simple ploting pipeline using `padl` operators

Description of inbuilt transforms used.

- `padl.this` is a self reflexive trasform that allows for a quick mutation of input. 

        Example: padl.this[0]([1,2,3]) = 1

- `padl.Identity()` is a simple transform that does exactly as it sounds, passes the input on as it is. 

        Example: padl.Identity()([1,2,3]) = [1,2,3]



Description of `transform` pipeline defined below.
- `reshape_load`: Takes in a `Torch.tensor` of length of `784`. `tensor` is then reshaped to image size of `28x28` and is plotted using `load_image` transform defined above. 
- `plot_train_datapoint`: 
    - First step is a `rollout` that splits datapoint of length `785` to two tensors of size `1` and `784`
    - Second step is a `parallel` that passes first output of earlier step as it is (`padl.Identity()`) and passes second output to `reshape_load`
    - In second step, transforms are also named by `-`, so the components of output is also accesible by using transform name.




In [None]:
reshape_load = (padl.this.reshape(28, 28) >> load_image)

plot_train_datapoint = (
    padl.this[0] + padl.this[1:]
    >> (padl.Identity() - 'label')/ (reshape_load - 'image')
)

In [None]:
plot_train_datapoint

In [None]:
for _ in range(5):
    output = plot_train_datapoint(train_array[np.random.randint(len(train_array))])
    print(f'Label : {output.label}')
    print('-'*30)

<hr style="border:1px solid"> </hr>


### 2. Model
We will build a simple `Unet` to classify `MNIST` handwritings. In the cell below, a simple pytorch net is defined with just one added decorator `@transform`. This is enough to wrap the pytorch model into `padl.Transform` and use it with other transform to build a data pipeline.

### 2.1 Simple Unet

In [None]:

import torch.nn.functional as F
import torchvision.models.resnet 
from torch.utils.data import DataLoader
import torch.optim as optim
from torch.optim import lr_scheduler

        
@transform
class SimpleNet(torch.nn.Module):
    def __init__(self):
        super().__init__()

        # Conv 1
        # size : input: 28x28x1 -> output : 26 x 26 x 32
        self.conv1 = torch.nn.Conv2d(1, 32, kernel_size=3)
        self.batchnorm1 = torch.nn.BatchNorm2d(32)
        
        # Conv 2
        # size : input: 26x26x32 -> output : 24 x 24 x 32
        self.conv2 = torch.nn.Conv2d(32, 32, kernel_size=3)
        self.batchnorm2 = torch.nn.BatchNorm2d(32)
        
        # Conv 3
        # size : input: 24x24x32 -> output : 12 x 12 x 32
        self.conv3 = torch.nn.Conv2d(32, 32, kernel_size=2, stride = 2)
        self.batchnorm3 = torch.nn.BatchNorm2d(32)
        
        # Conv 4
        # size : input : 12 x 12 x 32 -> output : 8 x 8 x 64
        self.conv4 = torch.nn.Conv2d(32, 64, kernel_size=5)
        self.batchnorm4 = torch.nn.BatchNorm2d(64)
        
        # Conv 5
        # size : input: 8x8x64 -> output : 4 x 4 x 64 -> Linearize = 1024
        self.conv5 = torch.nn.Conv2d(64, 64, kernel_size=2, stride = 2)
        self.batchnorm5 = torch.nn.BatchNorm2d(64)
        
        # dropout layer 
        self.conv5_drop = torch.nn.Dropout2d()
        
        # FC 1 
        self.fc1 = torch.nn.Linear(1024, 128)
        
        # FC 2
        self.fc2 = torch.nn.Linear(128, 10)

    def forward(self, x):
        x = self.batchnorm1(F.relu(self.conv1(x)))
        x = self.batchnorm2(F.relu(self.conv2(x)))
        x = self.batchnorm3(F.relu(self.conv3(x)))
        x = self.batchnorm4(F.relu(self.conv4(x)))
        x = self.batchnorm5(F.relu(self.conv5(x)))
        x = self.conv5_drop(x)
        x = x.view(-1, 1024)
        x = F.relu(self.fc1(x))
        x = F.log_softmax(self.fc2(x), dim=1)
        return x

### 2.2 Preprocessing

We need to take the input of `train_array` which is of shape `785` and need to reshape it as an `28x28` image. 
The `preprocess` pipeline below, again splits the `tensor` into two tensor of size `784` for image and size `1` for labels. Image is then further  reshaped to `-1, 28, 28`.

In [None]:
preprocess = (
    padl.this.type(torch.FloatTensor)
    >> padl.this[1:] + padl.this[0]
    >> padl.this.reshape(-1, 28, 28) / padl.Identity()
)

### 2.3 Instantiating the network and loss function

Initialising instances of `SimpleNet` and `loss` function. Loss function here is a wrapped `torch` negative log likelihood loss which is again wrapped easily with same `transform` call. 

In [None]:
simplenet = SimpleNet()
loss_func = transform(F.nll_loss)

In [None]:
simplenet

In [None]:
loss_func

### 2.4  Building the training model

`train_model` is now composed (`>>`) with the transforms already defined.
- preprocess: preprocessing transform defined above
- Batchify: Batchify is a inbuilt `transform` that marks end of preprocess (dataloading) and that adds batch dimension to the inputs. Batchify also moves the input tensors to device specified for the model
- simplenet: Instance of SimpleNet
- padl.this: A self reflexive trasform that allows for a quick mutation of input.

`train_model` is then sent to the intended device. It is by default in `cpu`.


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Device to be used: ', device)

In [None]:

train_model = (
    preprocess
    >> padl.Batchify()
    >> simplenet / padl.this.type(torch.long)
)

train_model.pd_to(device)

<hr style="border:1px solid"> </hr>


### 3. Training and validating the `train_model`

Training is not much different than the normal torch training steps, except dataloading and training is made even simplier by `train_apply`. It is one of the three inbuilt methods along with `infer_apply` and `eval_apply` that handles the stage context of model.

In [None]:
learning_rate = 0.01
momentum = 0.5
log_interval = 10
nepoch = 2
num_workers = 4

optimizer = optim.SGD(train_model.pd_parameters(), lr=learning_rate, momentum = momentum)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.95)



for epoch in range(nepoch):
    step_counter = 0
    for batch_output, batch_targets in train_model.train_apply(train_array, num_workers=num_workers, batch_size=256):

        optimizer.zero_grad()

        loss = F.nll_loss(batch_output, batch_targets)

        loss.backward()

        optimizer.step()
        exp_lr_scheduler.step()

        if step_counter % log_interval == 0:
            print(f'Epoch:{epoch}; Step: {step_counter}; loss: {loss}')
        step_counter += 1

### 3.1 Accuracy of the model

We can quickly build a `validation_model` by adding a further step to `train_model` to get the number associated with the maximum confidence predicted by the model. 


Note: As we don't have a separate validation dataset with labels, we will have to use the same train data to `validate` the model in this example.


First, lets look at the format of the prediction by `infer_apply`ing one of datapoint.

In [None]:
train_model

In [None]:
train_model.infer_apply(train_array[0])

`train_model` predicts a tensor of confidence associated for the 10 numbers, and the index associated with the maximum of these confidence is the prediction by the model. Thus, we can add another transform to the same `train_model` to get that index associated with maximum of the confidence. 

Note that the new `validation_model` is a new instance of Transform but it contains same objects as `train_model` with added two new `transform`s. All the transform objects that are already in `train_model` is in `device` as assigned above, but the main `validation_model` object itself will have by default `cpu` device assigned. Thus, to move it to (or assign it with) correct device, we have to again call `pd_to(device)`.

In [None]:
validation_model = (
    train_model
    >> padl.transform(lambda x: x.max(1).indices) / padl.Identity()
)

# We need to send the validation_model to device again
validation_model.pd_to(device)

In [None]:
accuracy = 0
for batch_output, batch_targets in validation_model.eval_apply(train_array, num_workers=0, batch_size=256):
    accuracy += (batch_targets == batch_output).sum()

accuracy = accuracy.item()/ train_array.shape[0]
print(f'Accuracy of model: {accuracy}')

Not a bad accuracy of `~0.95` for a quick train model.

### 3.2 Infer few images from `test.csv` or test data

Although we do not have labels for images in test data, we can still infer and verify the predictions ourselves. For that, we can again use model object `simplenet` that we have trained by using `train_model` and now stack it with other `transform`s to build an infer model.

In [None]:
plot_infer_datapoint = reshape_load - 'image'

valid_preprocess =(
    padl.this.type(torch.FloatTensor)
    >> padl.this.reshape(28, 28)
)
infer_model = (
    valid_preprocess
    >> padl.Batchify()
    >> padl.this.unsqueeze(1) 
    >> simplenet
    >> padl.transform(lambda x: x.max(1).indices)
)
infer_model.pd_to(device)

In [None]:
for _ in range(5):
    data_point = test_array[np.random.randint(len(test_array))]
    plot_infer_datapoint(data_point)
    print(f'Prediction: {infer_model.infer_apply(data_point).item()}')
    print('-'*30)

<hr style="border:1px solid"> </hr>


### 4. Using further image augmentation on training

We can easily use some of the `torchvision.transforms` for image augmentation and add it to our preproccessing of image to help with training. Lets add a couple of augmentations to our training: `GaussianBlur` and `RandomRotation`
We need to wrap the call to these `torchvision.transforms` with our `padl.transform` before instantiating them, and that is all. 

In [None]:
import torchvision.transforms as T

In [None]:
gaussian_blur = transform(T.GaussianBlur)(kernel_size=(3,3), sigma=0.1)

rotate_img = transform(T.RandomRotation)(degrees=(-15,15))

Now we can again use `padl`'s functional api to build an `image_augmentation` pipeline.

Note: `torchvision.transforms` expect images with channels but our images are just in grayscale. So, we need to unsqueeze our image tensor here.

In [None]:
image_augmentation = (
    padl.this.unsqueeze(0)
    >> rotate_img
    >> gaussian_blur
    >> padl.this[0]
)

####  Sample of image augmentation

Lets try the augmentation on one of image. 

In [None]:
out = infer_model[:2](test_array[0])
plot_infer_datapoint(out)

Augmented images: 

In [None]:
for _ in range(5):
    out_aug = image_augmentation(out)
    plot_infer_datapoint(out_aug)

### 4.2 We can add the `image_augmentation` pipeline easily to the `preprocess` and rebuild `train_model`

In [None]:
preprocess_with_augmentation = (
    padl.this.type(torch.FloatTensor)
    >> padl.this[1:] + padl.this[0]
    >> padl.this.reshape(-1, 28, 28) / padl.Identity()
    >> image_augmentation / padl.Identity()
)


train_model = (
    preprocess_with_augmentation
    >> padl.Batchify()
    >> simplenet / padl.this.type(torch.long)
)


train_model.pd_to(device)

### 4.3 Retraining the model with `image_augmentation`

In [None]:
learning_rate = 0.01
momentum = 0.5
log_interval = 10
nepoch = 3
num_workers = 4

optimizer = optim.SGD(train_model.pd_parameters(), lr=learning_rate, momentum = momentum)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.95)



for epoch in range(nepoch):
    step_counter = 0
    for batch_output, batch_targets in train_model.train_apply(train_array, num_workers=num_workers, batch_size=256):

        optimizer.zero_grad()

        loss = F.nll_loss(batch_output, batch_targets)

        loss.backward()

        optimizer.step()
        exp_lr_scheduler.step()

        if step_counter % log_interval == 0:
            print(f'Epoch:{epoch}; Step: {step_counter}; loss: {loss}')
        step_counter += 1

### 4.4 Validate the model

In [None]:
# Calculate accuracy
accuracy = 0
for batch_output, batch_targets in validation_model.eval_apply(train_array, num_workers=0, batch_size=256):
    accuracy += (batch_targets == batch_output).sum()

accuracy = accuracy.item()/ train_array.shape[0]
print(f'Accuracy of model: {accuracy}')