**NOTE: This notebook is written for the Google Colab platform, which provides free hardware acceleration. However it can also be run (possibly with minor modifications) as a standard Jupyter notebook, using a local GPU.** 



In [None]:
#@title -- Installation of Packages -- { display-mode: "form" }
import sys
!{sys.executable} -m pip install git+https://github.com/michalgregor/class_utils.git

In [None]:
#@title -- Import of Necessary Packages -- { display-mode: "form" }
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OrdinalEncoder, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
import torch.nn as nn
import torch

In [None]:
#@title -- Downloading Data -- { display-mode: "form" }
from class_utils.download import download_file_maybe_extract
download_file_maybe_extract("https://www.dropbox.com/s/v3ptdkv5fvmx5zk/iris.csv?dl=1", directory="data")

# also create a directory for storing any outputs
import os
os.makedirs("output", exist_ok=True)

## Neural Network Classifiers

This notebook deals with the application of a neural network constructed using the `PyTorch` python package to a simple classification task. We will show how a network can be created and trained. We will use a very simple architecture – no convolutional layers, batch normalization or anything like that.

### The Dataset

In this example, we will again be using the Iris dataset, with which we are very familiar by now. We will now load it from the CSV file and split it into the train and test folds:



In [None]:
#@title -- Loading and Splitting the dataset df_train, df_test -- { display-mode: "form" }

# we load the data from the CSV
df = pd.read_csv("data/iris.csv")
display(df.head())

# we split it into train and test, stratifying by species
df_train, df_test = train_test_split(df, test_size=0.25,
                                     stratify=df['species'],
                                     random_state=4)

As usual, we sort the columns into categorical, numerical and output.



In [None]:
categorical_inputs = []
numeric_inputs = list(df.columns[:-1])
output = ["species"]

The preprocessing that we have standardly applied up till now re-encodes categorical attributes into numbers, by assigning a number to each unique value of the attribute (using the `OrdinalEncoder` transformer). In the case of neural networks it will usually be more suitable to use one-hot encoding instead: for each categorical column there will be as many input neurons as there are distinct categorical values and exactly one out of these will be active at any given time. This kind of preprocessing can be achieved using the `OneHotEncoder` transformer. The preprocessing for numeric values can remain unchanged.

We do not forget to transform the arrays into PyTorch tensors with appropriate datatypes: 32-bit floats for inputs and long ints for class labels (output). We are also going to pick a device at this point, the same way we did in the previous notebooks.



In [None]:
input_preproc = make_column_transformer(
    (make_pipeline(
        SimpleImputer(strategy='constant', fill_value='MISSING'),
        OneHotEncoder()),
     categorical_inputs),
    
    (make_pipeline(
        SimpleImputer(),
        StandardScaler()),
     numeric_inputs)
)

In [None]:
output_preproc = OrdinalEncoder()

X_train = input_preproc.fit_transform(df_train[categorical_inputs+numeric_inputs])
Y_train = output_preproc.fit_transform(df_train[output]).reshape(-1)

X_test = input_preproc.transform(df_test[categorical_inputs+numeric_inputs])
Y_test = output_preproc.transform(df_test[output]).reshape(-1)

device = "cuda" if torch.cuda.is_available() else "cpu"

X_train = torch.as_tensor(X_train, dtype=torch.float32).to(device)
Y_train = torch.as_tensor(Y_train, dtype=torch.long).to(device)
X_test = torch.as_tensor(X_test, dtype=torch.float32).to(device)
Y_test = torch.as_tensor(Y_test, dtype=torch.long).to(device)

### Creating the Neural Network

Our neural network will be very similar to that used for regression. The number of inputs will once again equal the number of columns in our dataset, while the number of outputs will now, of course, equal the number of classes, since the network is going to return their respective probabilities.

You will recall that in classifiers, we generally use the softmax function as the ativation function of the output layer. This function makes sure that the outputs of this last layer always sum up to 1 so that they can be interpreted as properly normalized probabilities. It also applies a nonlinear transformation that makes it easier to get probabilities close to 1.

 **ATTENTION: In the case of the PyTorch framework, the softmax function is part of the ``nn.CrossEntropyLoss'' loss function, so we DO NOT ADD IT AT THE END OF OUR MODEL! We leave the last layer linear.** 



In [None]:
class Net(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(num_inputs, 50)
        self.fc2 = nn.Linear(50, 50)
        self.fc3 = nn.Linear(50, num_outputs)

    def forward(self, x):
        y = self.fc1(x)
        y = torch.relu(y)
        
        y = self.fc2(y)
        y = torch.relu(y)
        
        y = self.fc3(y)
        
        return y

---
### Task 1: Training the Network

**In the cell below, complete the training loop and train the neural network.** 

The training loop is going to be pretty much the same as for regression, with the exception that now we are going to be using the `nn.CrossEntropyLoss`.

---


In [None]:
num_inputs = X_train.shape[1]
num_outputs = len(np.unique(Y_train.cpu()))

model = Net(num_inputs, num_outputs)
model.to(device)

criterion = nn.CrossEntropyLoss()



# ----




In [None]:
plt.plot(loss_train)
plt.xlabel("epoch")
plt.ylabel("loss")
plt.grid(ls='--')

### Testing

Now we are again ready to test performance. We will again need to remember to put our model into evaluation mode using `model.eval()` first and running the model inside `torch.no_grad()` to skip building the computational graph.

Note that what our network predicts are class probabilities. To get at the class labels, we run `argmax` on the probabilities (actually, since our network does not contain the final softmax layer, the values we get here are actually logits, not normalized probabilities, but that makes no difference when looking for the maximum) and thus identify the most probable class.

#### On Training Data



In [None]:
model.eval()
with torch.no_grad():
    y_train_logit = model(X_train)
    y_train = y_train_logit.argmax(dim=1)

In [None]:
Y_train_cpu = Y_train.cpu()
y_train_cpu = y_train.cpu()

cm = pd.crosstab(
    output_preproc.inverse_transform(
        Y_train_cpu.reshape(-1, 1)).reshape(-1),
    output_preproc.inverse_transform(
        y_train_cpu.reshape(-1, 1)).reshape(-1),
    rownames=['actual'],
    colnames=['predicted']
)
print(cm)

In [None]:
acc = accuracy_score(Y_train_cpu, y_train_cpu)
print("Accuracy = {}".format(acc))

#### On Testing Data



In [None]:
model.eval()
with torch.no_grad():
    y_test_logit = model(X_test)
    y_test = y_test_logit.argmax(dim=1)

In [None]:
Y_test_cpu = Y_test.cpu()
y_test_cpu = y_test.cpu()

cm = pd.crosstab(
    output_preproc.inverse_transform(
        Y_test_cpu.reshape(-1, 1)).reshape(-1),
    output_preproc.inverse_transform(
        y_test_cpu.reshape(-1, 1)).reshape(-1),
    rownames=['actual'],
    colnames=['predicted']
)
print(cm)

In [None]:
acc = accuracy_score(Y_test_cpu, y_test_cpu)
print("Accuracy = {}".format(acc))