# CIS6930 Week 1: Deep Learning Basics (For students)

---

Preparation: Go to `Runtime > Change runtime type` and choose `GPU` for the hardware accelerator.



## PyTorch Basics

`torch.Tensor` is similar to `numpy.ndarray` in the sense that it "wraps" vector/matrix values with data type information. PyTorch implements similar operations as NumPy. 

Let's take a look at how we create `torch.Tensor` and convert `torch.Tensor` into `numpy.ndarray` or `list` in addition to moving the tensor data between CPUs and GPUs. 

In [None]:
import numpy as np
import torch
torch.__version__

In [None]:
np.array([1, 2, 3])

In [None]:
torch.Tensor([1, 2, 3])

In [None]:
a = torch.Tensor([1, 2, 3]).dtype

In [None]:
a

In [None]:
#device = torch.device("cuda")
device = torch.device("cuda")
a = torch.Tensor([1, 2, 3]).to(device)  # Load on GPU
b = torch.Tensor([4, 5, 6])             # Load on CPU

In [None]:
b

In [None]:
# This returns an error. Comment out before running the cell
#a + b

In [None]:
# GPU -> CPU
a.detach().cpu() + b

In [None]:
# PyTorch Tensor -> NumPy Array
a.detach().cpu().numpy()

In [None]:
# PyTorch Tensor -> Built-in List
a.detach().cpu().tolist()

## Logistic Regression/Multi-layer Perceptron with PyTorch

Run the following code to conduct an experiment with `LogisticRegression`. Keep the default configuration for the first experiment. 

In [None]:
import random

import numpy as np
import pandas as pd
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.utils.data import TensorDataset, DataLoader

### Logistic Regression

Note that `forward()` does not have to use `softmax()` as it is implemented in the `CrossEntropy` loss function. Please see the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html).

In [None]:
class LogisticRegression(nn.Module):
    def __init__(self,
                 num_input,
                 num_output):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(num_input, num_output)

    def forward(self, X):
        out = self.linear(X)
        return out

### Experiment Code (Copy this block for the assignments)

In [None]:
## Configurations ======
n_epochs = 10
batch_size = 16

lr = 0.01
momentum = 0.

num_input = 64
num_output = 10

# Random Seeds
torch.manual_seed(0)
random.seed(0)
np.random.seed(0)
## ======================


# GPU configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the handwritten digit dataset
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
data = load_digits()
X, y = data.data, data.target

# Splint into 60% train, 20% valid, 20% test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1)

X_train, X_valid, y_train, y_valid = train_test_split(
    X_train, y_train, test_size=0.25, random_state=1)  # 0.25 x 0.8 = 0.2

# NumPy array -> Torch tensor -> Dataset -> DataLoader
# for the train, validation, test datasets
dataset_train = TensorDataset(torch.Tensor(X_train),
                              torch.LongTensor(y_train))
dl_train = DataLoader(dataset_train,
                      batch_size=batch_size,
                      shuffle=True)

dataset_valid = TensorDataset(torch.Tensor(X_valid),
                              torch.LongTensor(y_valid))
dl_valid = DataLoader(dataset_valid)

dataset_test = TensorDataset(torch.Tensor(X_test),
                              torch.LongTensor(y_test))
dl_test = DataLoader(dataset_test)

# Model, Optimzier, Loss function
model = LogisticRegression(num_input=num_input,
                           num_output=num_output).to(device)
optimizer = optim.SGD(model.parameters(),
                      lr=lr, momentum=momentum)

loss_fn = nn.CrossEntropyLoss()

# For each epoch
eval_list = []
for n in range(n_epochs):
    print("Epoch {}".format(n))
    # Training
    train_loss = 0.
    train_pred_list = []
    train_true_list = []
    model.train()  # Switch to the training mode

    # For each batch
    for batch in dl_train:
        optimizer.zero_grad()              # Initialize gradient information
        X, y = batch
        out = model(X.to(device))          # Call `forward()` function of the model
        loss = loss_fn(out, y.to(device))  # Calculate loss 
        loss.backward()                    # Backpropagate the loss value
        optimizer.step()                   # Update the parameters
        
        train_loss += loss.data.item() * batch_size
        train_pred_list += out.argmax(1).detach().cpu().tolist()
        train_true_list += y.detach().cpu().tolist()

    train_loss /= len(dl_train)
    train_acc = accuracy_score(train_true_list, train_pred_list)
    print("    Training loss: {:.4f}\t  Training acc: {:.4f}".format(train_loss, train_acc))

    # Validation
    valid_loss = 0.
    valid_pred_list = []
    valid_true_list = []

    model.eval()  # Switch to the evaluation mode
    for batch in dl_valid:
        X, y = batch
        out = model(X.to(device))
        loss = loss_fn(out, y.to(device))
        valid_loss += loss.data.item() * batch_size
        valid_pred_list.append(out.argmax(1).detach().cpu())
        valid_true_list.append(y.detach().cpu())

    valid_loss /= len(dl_valid)
    valid_acc = accuracy_score(valid_true_list, valid_pred_list)
    print("  Validation loss: {:.4f}\tValidation acc: {:.4f}".format(valid_loss, valid_acc))
    # Store train/validation loss, accuracy values
    eval_list.append([n, train_loss, train_acc, valid_loss, valid_acc])

eval_df = pd.DataFrame(eval_list, columns=["epoch", "train_loss", "train_acc",
                                           "valid_loss", "valid_acc"])

# Test
model.eval()
pred_list = []
true_list = []
for batch in dl_test:
    X, y = batch
    out = model(X.to(device))
    pred = out.argmax().item()
    pred_list.append(pred)
    true_list.append(y.item())
y_pred = np.array(pred_list)
y_true = np.array(true_list)

test_accuracy = accuracy_score(y_true, y_pred)
print("\nTest accuracy: {:.4f}".format(test_accuracy))

eval_df[["train_loss", "valid_loss"]].plot()
eval_df[["train_acc", "valid_acc"]].plot()

# Please do NOT directly edit the code above but copy the code to blocks below

# Assignments (due Fri 9/17)

- Quiz 1. Complete the `MLP` class and run the same experiments with the MLP model. Use the default hidden layer size and the original random seed (i.e., `0`). Do you see better performance? How do `{train|valid}_loss` and `{train|valid}_acc` look like compared to those for `LogisticRegression`?

- Quiz 2. Replace the logistic sigmoid function with ReLU in `MLP` (`MLPReLU`) and run the same experiment. Discuss the difference. 

- Quiz 3. Try smaller `learning_rate` (e.g., `0.001`) and larger `n_epochs` (e.g., `100`) for training (with `SGD` and `MLP`). Does it give more/less stable performance? Discuss why. 

- Quiz 4. Replace `SGD` with another optimizer `Adam` and run the same experiment (`n_epochs=10`). Look at [torch.optim](https://pytorch.org/docs/stable/optim.html) for the details of `Adam`. Discuss the resuls. 

- Quiz 5. Summarize the test accuracies of the experiments. Discuss the results. Which configuration is the best? Why? (Hint: Is this random effect? If you think so, how can you remove the randomness from the experiments?)


After execution, keep the figures/results as they are and print the Colab notebook as PDF. 

## Quiz 1

Complete the `MLP` class and run the same experiments with the MLP model. Use the default hidden layer size and the original random seed (i.e., `0`). Do you see better performance? How do `{train|valid}_loss` and `{train|valid}_acc` look like compared to those for `LogisticRegression`?


In [None]:
class MLP(nn.Module):
    def __init__(self,
                 num_input,
                 num_output,
                 num_hidden=16):
        super(MLP, self).__init__()
        self.linear = nn.Linear(_, _)  ## COMPLETE CODE ##
        self.sigmoid =                 ## COMPLETE CODE ##
        self.hidden = nn.Linear(_, _)  ## COMPLETE CODE ##

    def forward(self, X):
        out = self.linear(X)
        ## ADD 1 LINE HERE ##
        ## ADD 1 LINE HERE ##
        return out

In [None]:
## Paste the experiment code here and modify it

## Configurations ======
n_epochs = 10
batch_size = 16

lr = 0.01
momentum = 0.

num_input = 64
num_output = 10

# Random Seeds
torch.manual_seed(0)
random.seed(0)
np.random.seed(0)
## ======================


# GPU configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the handwritten digit dataset
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
data = load_digits()
X, y = data.data, data.target

# Splint into 60% train, 20% valid, 20% test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1)

X_train, X_valid, y_train, y_valid = train_test_split(
    X_train, y_train, test_size=0.25, random_state=1)  # 0.25 x 0.8 = 0.2

# NumPy array -> Torch tensor -> Dataset -> DataLoader
# for the train, validation, test datasets
dataset_train = TensorDataset(torch.Tensor(X_train),
                              torch.LongTensor(y_train))
dl_train = DataLoader(dataset_train,
                      batch_size=batch_size,
                      shuffle=True)

dataset_valid = TensorDataset(torch.Tensor(X_valid),
                              torch.LongTensor(y_valid))
dl_valid = DataLoader(dataset_valid)

dataset_test = TensorDataset(torch.Tensor(X_test),
                              torch.LongTensor(y_test))
dl_test = DataLoader(dataset_test)

# Model, Optimzier, Loss function
model = LogisticRegression(num_input=num_input,
                           num_output=num_output).to(device)
optimizer = optim.SGD(model.parameters(),
                      lr=lr, momentum=momentum)

loss_fn = nn.CrossEntropyLoss()

# For each epoch
eval_list = []
for n in range(n_epochs):
    print("Epoch {}".format(n))
    # Training
    train_loss = 0.
    train_pred_list = []
    train_true_list = []
    model.train()  # Switch to the training mode

    # For each batch
    for batch in dl_train:
        optimizer.zero_grad()              # Initialize gradient information
        X, y = batch
        out = model(X.to(device))          # Call `forward()` function of the model
        loss = loss_fn(out, y.to(device))  # Calculate loss 
        loss.backward()                    # Backpropagate the loss value
        optimizer.step()                   # Update the parameters
        
        train_loss += loss.data.item() * batch_size
        train_pred_list += out.argmax(1).detach().cpu().tolist()
        train_true_list += y.detach().cpu().tolist()

    train_loss /= len(dl_train)
    train_acc = accuracy_score(train_true_list, train_pred_list)
    print("    Training loss: {:.4f}\t  Training acc: {:.4f}".format(train_loss, train_acc))

    # Validation
    valid_loss = 0.
    valid_pred_list = []
    valid_true_list = []

    model.eval()  # Switch to the evaluation mode
    for batch in dl_valid:
        X, y = batch
        out = model(X.to(device))
        loss = loss_fn(out, y.to(device))
        valid_loss += loss.data.item() * batch_size
        valid_pred_list.append(out.argmax(1).detach().cpu())
        valid_true_list.append(y.detach().cpu())

    valid_loss /= len(dl_valid)
    valid_acc = accuracy_score(valid_true_list, valid_pred_list)
    print("  Validation loss: {:.4f}\tValidation acc: {:.4f}".format(valid_loss, valid_acc))
    # Store train/validation loss, accuracy values
    eval_list.append([n, train_loss, train_acc, valid_loss, valid_acc])

eval_df = pd.DataFrame(eval_list, columns=["epoch", "train_loss", "train_acc",
                                           "valid_loss", "valid_acc"])

# Test
model.eval()
pred_list = []
true_list = []
for batch in dl_test:
    X, y = batch
    out = model(X.to(device))
    pred = out.argmax().item()
    pred_list.append(pred)
    true_list.append(y.item())
y_pred = np.array(pred_list)
y_true = np.array(true_list)

test_accuracy = accuracy_score(y_true, y_pred)
print("\nTest accuracy: {:.4f}".format(test_accuracy))

eval_df[["train_loss", "valid_loss"]].plot()
eval_df[["train_acc", "valid_acc"]].plot()

## Quiz 2

Quiz 2. Replace the logistic sigmoid function with ReLU in `MLP` (`MLPReLU`) and run the same experiment. Discuss the difference. 


In [None]:
class MLPReLU(nn.Module):
    def __init__(self,
                 num_input,
                 num_output,
                 num_hidden=16):
        super(MLP, self).__init__()
        self.linear = nn.Linear(_, _)  ## COMPLETE CODE ##
        self.sigmoid =                 ## COMPLETE CODE ##
        self.hidden = nn.Linear(_, _)  ## COMPLETE CODE ##

    def forward(self, X):
        out = self.linear(X)
        ## ADD 1 LINE HERE ##
        ## ADD 1 LINE HERE ##
        return out

In [None]:
## Paste the experiment code here and modify it

## Quiz 3

Try smaller `learning_rate` (e.g., `0.001`) and larger `n_epochs` (e.g., `100`) for training (with `SGD` and `MLPReLU`). Does it give more/less stable performance? Discuss why. 

In [None]:
## Paste the experiment code here and modify it


## Quiz 4

Quiz 4. Replace `SGD` with another optimizer `Adam` and run the same experiment (`n_epochs=10`). Look at [torch.optim](https://pytorch.org/docs/stable/optim.html) for the details of `Adam`. Discuss the resuls. 

In [None]:
## Paste the experiment code here and modify it

## Quiz 5

Summarize the test accuracies of the experiments. Discuss the results. Which configuration is the best? Why? (Hint: Is this random effect? If you think so, how can you remove the randomness from the experiments?)

(Write your answer here)