# *Fundamentals of Machine Learning*
## Linear Regression and Classification Programming Report 2
---
18B09790 Tanpipat Kornvik

## 1. Training word from load_digits
First, we will import necessary library. We also check whether we have access to a GPU.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from tqdm.auto import tqdm

# GPUが使える場合は使う
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(f"device: {device}")

RuntimeError: ignored

### 1.1 Import and Process Data
we import that data that will be used in this problem from sklearn. We can also check the number of the data and the dimensionality(width*height of the image) as shown below.

In [None]:
from sklearn.datasets import load_digits

digits = load_digits()
print(digits.data.shape)


After that, we split the data into 8:1:1 by split the data using sklearn model selection 2 times as shown below. In this process, we also have to reshape the data to add extra dimension for the channel numbers. 

In [None]:
from sklearn.model_selection import train_test_split

x_all = digits.data #add another dimension
y_all = digits.target
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_std = sc.fit_transform(x_all)
x_all = x_std.reshape(-1,1,8,8)
x_train, x_tmp, y_train, y_tmp = train_test_split(x_all,y_all,train_size=0.8)
x_valid, x_test, y_valid, y_test = train_test_split(x_tmp,y_tmp,train_size=0.5)

Next, we will perform transform the data into torch tensors in order to use torch. 

In [None]:
x_train = torch.from_numpy(x_train).float()
y_train = torch.from_numpy(y_train).long()

x_valid = torch.from_numpy(x_valid).float()
y_valid = torch.from_numpy(y_valid).long()

x_test  = torch.from_numpy(x_test).float()
y_test = torch.from_numpy(y_test).long()

trainset = torch.utils.data.TensorDataset(x_train,y_train)
validset = torch.utils.data.TensorDataset(x_valid,y_valid)
testset  = torch.utils.data.TensorDataset(x_test,y_test)


After we have the dataset, we will specify the batch size and create data loader. Batch size is the number of training examples utilized in one iteration.There is no magic number for batch size. However, it has been observed in practice that using a larger batch can degrade the quality of the model. In general, batch size of 32 is a good starting point, we can also try 64, 128, and 256.

In [None]:
batch_size=32
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)
validloader = torch.utils.data.DataLoader(validset, batch_size=batch_size, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True, num_workers=2)


### 1.2 Define CNN and training function
Next, we define the network we want to use for our classification problem. Follwing the instruction from the problem, we can define the CNN as the following configuration shown in the comment part. Lastly, We also need to send the created network to the GPU.

In [None]:
# （CNN,Pooling）x1 (4 Filter 3x3, Pooling 2x2)，Feed Forward  x1 Network
# Input 8x8 grayscale image
# 1x8x8 --> 4x6x6 (3x3 conv, same padding, 4ch)
# 4x6x6 --> 4x3x3 (2x2 max pooling)
# 4x3x3 --> 10 (full connection)

model1 = nn.Sequential(
    nn.Conv2d(1,4,3),  # input channels,output channels,kernel_size, 
    nn.ReLU(),
    nn.MaxPool2d(2,2), # define pooling size
    nn.Flatten(),
    nn.Linear(36,10), # apply feedforward in the end to 10 classes
)

model1 = model1.to(device) # send the network to GPU

After that we will create the function to train the data which can be defined as below.

In [None]:
def validate(model, criterion, validloader):
  size_valid = 0
  loss_valid = 0
  correct_valid = 0
  for batch_index, (inputs, labels) in enumerate(validloader):
    # pbar.set_description("[Epoch %d/%d, valid batch %d/%d]" % (epoch_index + 1, epochs, batch_index + 1, len(validloader)))
    
    # send the data to GPU
    inputs = inputs.to(device)
    labels = labels.to(device)

    # calculate output
    outputs = model(inputs)
    # calculate loss
    loss = criterion(outputs, labels)

    # calculate accuracy
    size_valid += inputs.shape[0]                 # add batch size to trained data
    loss_valid += loss.item() * inputs.shape[0]   # calculate this batch loss
    _, predicted = torch.max(outputs, 1)          # output the prediction
    correct = (predicted == labels).sum().item()  # calculate how many images in the batch are predicted correctly
    correct_valid += correct                      # add the number to the overall correct prediction
    acc_valid = correct_valid / size_valid * 100
    loss_valid /= size_valid
    # pbar.set_postfix({'loss': ("%.3f" % (loss_valid / size_valid)),
    #                   'acc%': ("%.1f" % (correct_valid / size_valid * 100)),})
  
  return loss_valid, acc_valid
def train(model,criterion,optimizer, trainloader, validloader, epochs):
  # use tqdm to print to progress bar
  correct_train = 0
  size_train = 0
  loss_train = 0
  with tqdm(range(epochs)) as pbar:
      for epoch_index in pbar:
          loss_train, size_train, correct_train = 0.0, 0, 0 
          # train with each batch of the data 
          for batch_index, (inputs, labels) in enumerate(trainloader):
              pbar.set_description("[Epoch %d/%d, train batch %d/%d]" % (epoch_index + 1, epochs, batch_index + 1, len(trainloader)))

              # send that data for GPU for calculation
              inputs = inputs.to(device)
              labels = labels.to(device)

              # Set the gradient inside the optimizor as 0
              optimizer.zero_grad()
              # calculate the output using the model
              outputs = model(inputs)
              # calculate loss using given loss function
              loss = criterion(outputs, labels)
              # calculate the gradient of each parameter and work the loss backward to the previous layer
              loss.backward()
              # update the weight from backpropagation
              optimizer.step()

              # calculate accuracy of the model on training data
              # add batch size to trained data
              size_train += inputs.shape[0]                 # add batch size to trained data 
              #(each time the model is updated, we trained data equals to the number of batch size)
              loss_train += loss.item() * inputs.shape[0]   # calculate this batch loss
              _, predicted = torch.max(outputs, 1)          # output the prediction
              correct = (predicted == labels).sum().item()  # calculate how many images in the batch are predicted correctly
              correct_train += correct                      # add the number to the overall correct prediction
              pbar.set_postfix({'loss': ("%.3f" % (loss_train / size_train)),
                                'acc%': ("%.1f" % (correct_train / size_train * 100)),})
           # calculate this epoch accuracy
          loss_train /= size_train
          acc_train = correct_train / size_train * 100
          # validate the trained model with validloader and calculate the accuracy
          model.eval()   
          loss_valid, acc_valid = validate(model, criterion, validloader)

          if (epoch_index+1)%10 == 0:
            print("epoch %d, train loss %.4f, train acc %.1f%%, valid loss %.4f, valid acc %.1f%%" %
                (epoch_index + 1, loss_train, acc_train, loss_valid, acc_valid))

  print('Finished Training')

Next, we will define loss function and optimizer for training the model.

In [None]:
import torch.optim as optim

# Use cross entropy as loss function
criterion = nn.CrossEntropyLoss()

# Use adam as optimizer
optimizer = optim.Adam(model1.parameters(), lr=0.001, betas=(0.9, 0.999))

Finally, we can train the model using the defined training function, loss function, optimizer, and the processed dataset. Here we will let epochs equal to 100.

In [None]:
epochs = 100
train(model1,criterion,optimizer, trainloader, validloader, epochs)

### 1.3 Calculate accuracy with test data
We can easily can calculate the accuracy of the trained model with the test data by using the predefined validate function.

In [None]:
model1.eval()
loss_test, acc_test = validate(model1, criterion, testloader)
print(f"test_accuracy: {acc_test}%")

## 2. [Optional] Test with another model
### 2.1 Training data with new model
Using the same process above, we create different model, defined below, to train the data. This time we will also used different optimizer to increase the accuracy.

In [None]:
# （CNN,Pooling）x2 (4 Filter 3x3, Pooling 2x2)，Feed Forward  x1 Network
# Input 8x8 grayscale image
# 1x8x8 --> 4x6x6 (3x3 conv, no padding, 4ch)
# 4x6x6 --> 4x3x3 (2x2 max pooling)
# 4x3x3 --> 30x1x1 (3x3 conv, no padding, 30ch)
# 30x1x1 --> 10 (full connection)

model2 = nn.Sequential(
    nn.Conv2d(1,4,3),  # input channels,output channels,kernel_size, 
    # nn.ReLU(),
    nn.MaxPool2d(2,2), # define pooling size
    nn.Conv2d(4,30,3),  # input channels,output channels,kernel_size, 
    nn.Flatten(),
    nn.Linear(30,10), # apply feedforward in the end to 10 classes
)

model2 = model2.to(device) # send the network to GPU

In [None]:
epochs = 100
optimizer2 = torch.optim.SGD(model2.parameters(), lr=0.001, momentum=0.9)
train(model2,criterion,optimizer2, trainloader, validloader, epochs)

### 2.2 Evaluating the result
The result of the new model with SDG optimizer as shown as below.

In [None]:
model2.eval()
loss_test, acc_test = validate(model2, criterion, testloader)
print(f"test_accuracy: {acc_test}%")

We can cleary see that the model clearly perform better than the previous one.

## 3. [Optional] Showing images that are misclassified

We can reuse the code in the train function to predict the outcome of model1. However, we will try to find the predictions that are incorrect.

In [None]:
output = model1(x_test.to(device))
labels = y_test.to(device)
_, predicted = torch.max(output, 1)          # output the prediction
incorrect = (predicted != labels)            # find the incorrect prediction

After we know which prediction is wrong, we can map that back to find the corresponging images and their true output as shown below.

In [None]:
fig, axs = plt.subplots(1,5,figsize=(15,10))
num_incor = 0 # variable to count number of incorrect prediction
for i, val in enumerate(incorrect):
  if(val): # for each incorrect prediction, plot the image 
    image = x_test[i].reshape(8,8).cpu().numpy() # i is the index of image that the prediction went wrong
    axs[num_incor].imshow(image, cmap=plt.cm.gray_r) 
    axs[num_incor].title.set_text(f'Real: {y_test[i]}, Predicted: {predicted[i]}')
    num_incor += 1
    if(num_incor>=5):
      break

## 4. [Optional] Fashion MNIST database
### 4.1 Train the model
In this last part, we will train the data from Fashion MNIST which contrains 10 different classes of clothes images. We have to be careful that the image's size is 28x28. Here we will also perform nomalization on the images as well.

In [None]:
from torchvision import datasets, transforms

# define transformation to normalize data
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, ), (0.5, ))])

# downlod train and test data 
batch_size=256
trainset = datasets.FashionMNIST(root="./", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)
validset = datasets.FashionMNIST(root="./", train=False, download=True, transform=transform)
validloader = torch.utils.data.DataLoader(validset, batch_size=batch_size, shuffle=False, num_workers=2)

The images from fashionMNIST can be seen below.

In [None]:
# Images
fig, ax = plt.subplots(1, 5, figsize=(15, 6), sharex='col', sharey='row')
for j in range(5):
    ax[j].imshow(trainset.data[j], cmap=plt.cm.gray_r)

We also know that each training and test example is assigned to one of the following labels:
*   0 T-shirt/top
*   1 Trouser
*   2 Pullover
*   3 Dress
*   4 Coat
*   5 Sandal
*   6 Shirt
*   7 Sneaker
*   8 Bag
*   9 Ankle boot

Thus we can create a mapping from index to text label by defining the following array.

In [None]:
text_labels = ["T-shirt/top","Trouser","Pullover","Dress","Coat","Sandal","Shirt","Sneaker","Bag","Ankle boot"]

Next we will define some complex CNN in order to classify this dataset. We will try to use 2 different convolutional layers to do this.

In [None]:
# （CNN,Pooling) (2 Filter (3x3,5x5), 2 Pooling (2x2,3x3))，Feed Forward  x1 Network
# Input 28x28 grayscale image
# 1x28x28 --> 16x26x26 (3x3 conv, no padding, 16ch)
# 16x26x26 -->16x13x13 (2x2 max pooling)
# 16x13x13 --> 32x9x9 (5x5 conv, no padding, 32ch)
# 32x9x9 -->32x3x3 (3x3 max pooling)
# 32x3x3 -->100 (2full connection)
# 100 --> 40 (full connection)
# 40 --> 10 (full connection)

model3 = nn.Sequential(
    nn.Conv2d(1,16,3),  # input channels,output channels,kernel_size, 
    nn.ReLU(inplace=True), 
    nn.MaxPool2d(2,2), # define pooling size
    nn.Conv2d(16,32,5), 
    nn.ReLU(inplace=True), 
    nn.MaxPool2d(3,3),  
    nn.Flatten(),
    nn.Linear(32*3*3,100), 
    nn.ReLU(inplace=True), 
    nn.Linear(100,40), 
    nn.ReLU(inplace=True), 
    nn.Linear(40,10), # apply feedforward in the end to 10 classes
)

model3 = model3.to(device) # send the network to GPU

In [None]:
epochs = 30
criterion = nn.CrossEntropyLoss()
optimizor3 = torch.optim.SGD(model3.parameters(),lr=0.05)
train(model3, criterion, optimizor3, trainloader, validloader, epochs)

### 4.2 Accuracy
The accuracy tested with the test data is as below.

In [None]:
model3.eval()
loss_test, acc_test = validate(model3, criterion, validloader)
print(f"test_accuracy: {acc_test}%")

### 4.3 Showing images that are misclassified
Since the data is very big and is not processed like the one in the previous section, using the same method to label and retreive incorrect result might not work well. As a result, we can loop the data to extract the incorrect result instead as the following code.

In [None]:
for batch_index, (inputs, labels) in enumerate(validloader):
  # send that data for GPU for calculation
  inputs = inputs.to(device)
  labels = labels.to(device)

  # calculate the output using the model
  outputs = model3(inputs)
  _, predicted = torch.max(outputs, 1)          # output the prediction
  incorrect = (predicted != labels)            # find the incorrect prediction
  num_incorrect = torch.sum(incorrect)
  correct_labels = labels[incorrect].cpu().detach().numpy()
  predicted_labels = predicted[incorrect].cpu().detach().numpy()
  imgs = inputs[incorrect].cpu().detach().numpy().reshape((-1,28,28))
  if num_incorrect>=5:
    fig, axs = plt.subplots(1,5,figsize=(20,10))
    for i, val in enumerate(correct_labels):
      # if(val): # for each incorrect prediction, plot the image 
        axs[i].imshow(imgs[i], cmap=plt.cm.gray_r) 
        axs[i].title.set_text(f'Real: {text_labels[correct_labels[i]]}, Predicted: {text_labels[predicted_labels[i]]}')
        if i==4:
          break
    break