# Introduction

<div class="alert alert-warning">
<font color=black>

**What?** How to Develop an MLP for Multiclass Classification

</font>
</div>

# Theoretical recall: ML life cycle

<div class="alert alert-block alert-info">
<font color=black><br>

The five steps in the life-cycle are as follows:

1. Prepare the Data.
- Define the Model.
- Train the Model.
- Evaluate the Model.
- Make Predictions.

<br></font>
</div>

# Import modules

In [None]:
from numpy import vstack
from numpy import argmax
from pandas import read_csv
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
from torch import Tensor
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.utils.data import random_split
from torch.nn import Linear
from torch.nn import ReLU
from torch.nn import Softmax
from torch.nn import Module
from torch.optim import SGD
from torch.nn import BCELoss
from torch.nn import CrossEntropyLoss
from torch.nn.init import kaiming_uniform_
from torch.nn.init import xavier_uniform_

# Load the datatest

<div class="alert alert-block alert-info">
<font color=black><br>

- Iris flowers multiclass classification dataset 
- This problem involves predicting the species of iris flower given measures of the flower.
- PyTorch provides the **Dataset** class that you can extend and customize to load your dataset
- **__len__()** function that can be used to get the length of the dataset
- **__getitem__()** function that is used to get a specific sample by index.
- PyTorch provides the **DataLoader** class to navigate a Dataset instance during the training and evaluation of your model. We use it to create the training & test dataset, and even a validation dataset.
-

<br></font>
</div>

In [None]:
class CSVDataset(Dataset):
    """load the dataset
    """
    def __init__(self, path):
        # load the csv file as a dataframe
        df = read_csv(path, header=None)
        # store the inputs and outputs
        self.X = df.values[:, :-1]
        self.y = df.values[:, -1]
        
        # ensure input data is floats
        self.X = self.X.astype('float32')
        
        
        # label encode target and ensure the values are floats
        self.y = LabelEncoder().fit_transform(self.y)
        self.y = self.y.astype('float32')
        self.y = self.y.reshape((len(self.y), 1))
 
    # number of rows in the dataset
    def __len__(self):
        return len(self.X)
 
    # get a row at an index
    def __getitem__(self, idx):
        return [self.X[idx], self.y[idx]]
 
    # get indexes for train and test rows
    def get_splits(self, n_test=0.33):
        # determine sizes
        test_size = round(n_test * len(self.X))
        train_size = len(self.X) - test_size
        # calculate the split
        return random_split(self, [train_size, test_size])

In [None]:
# prepare the dataset
def prepare_data(path):
    # load the dataset
    dataset = CSVDataset(path)
    # calculate split
    train, test = dataset.get_splits()
    # prepare data loaders
    train_dl = DataLoader(train, batch_size=32, shuffle=True)
    test_dl = DataLoader(test, batch_size=1024, shuffle=False)
    return train_dl, test_dl

In [None]:
# prepare the data
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
#path = '../DATASETS/ionosphere_V2.csv'
train_dl, test_dl = prepare_data(path)
print(len(train_dl.dataset), len(test_dl.dataset))

# Building the MLP model

<div class="alert alert-block alert-info">
<font color=black><br>

- It is a good practice to use a non linear activation with a weight initialisation.
- This combination help combact the problem of vanishing gradients.
- The **pyTorch idiom** to define a model is todefine the layers in the class constructor and then use the **forward()** function to define how to propagate the inputs truough the the layers.
- Given that it is a **multiclass** classification, the model must have one node for each class in the output layer and use the softmax activation function. 
- The loss function is the cross entropy, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.).

<br></font>
</div>

In [None]:
# model definition
class MLP(Module):
    # Definition of the model elements in the class constructor
    def __init__(self, n_inputs):
        super(MLP, self).__init__()
        # input to first hidden layer
        self.hidden1 = Linear(n_inputs, 10)
        kaiming_uniform_(self.hidden1.weight, nonlinearity='relu')
        self.act1 = ReLU()
        # second hidden layer
        self.hidden2 = Linear(10, 8)
        kaiming_uniform_(self.hidden2.weight, nonlinearity='relu')
        self.act2 = ReLU()
        # third hidden layer and output
        self.hidden3 = Linear(8, 3)
        xavier_uniform_(self.hidden3.weight)
        self.act3 = Softmax(dim=1)
 
    # Forward propagate input
    def forward(self, X):
        # input to first hidden layer
        X = self.hidden1(X)        
        X = self.act1(X)        
        # second hidden layer
        X = self.hidden2(X)
        X = self.act2(X)
        # output layer
        X = self.hidden3(X)
        X = self.act3(X)
        return X

In [None]:
# Where is the number of input
model = MLP(4)

# Train the model

<div class="alert alert-block alert-info">
<font color=black><br>

- **BCELoss:** Binary cross-entropy loss for binary classification.
- **SGD:** Stochastic gradient descent is used for optimization
- First, a loop is required for the number of training epochs. Then an inner loop is required for the mini-batches for stochastic gradient descent.

Each update to the model involves the **same** general pattern comprised of:

1. Clearing the last error gradient.
- A forward pass of the input through the model.
- Calculating the loss for the model output.
- Backpropagating the error through the model.
- Update the model in an effort to reduce loss.

<br></font>
</div>

In [None]:
# train the model
def train_model(train_dl, model):
    # define the optimization
    criterion = BCELoss()
    optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
    # enumerate epochs
    for epoch in range(500):
        # enumerate mini batches
        for i, (inputs, targets) in enumerate(train_dl):
            print(inputs.size())
            print(targets.size())
            
            # clear the gradients
            optimizer.zero_grad()
            # compute the model output
            yhat = model(inputs)
            # calculate loss
            loss = criterion(yhat, targets)
            # credit assignment
            loss.backward()
            # update model weights
            optimizer.step()

In [None]:
# train the model
train_model(train_dl, model)

# Evaluate the model

<div class="alert alert-block alert-info">
<font color=black><br>

- Once the model is fit, it can be evaluated on the test dataset.
- This can be achieved by using the DataLoader for the test dataset and collecting the predictions for the test set, then comparing the predictions to the expected values of the test set and calculating a performance metric.

<br></font>
</div>

In [None]:
# evaluate the model
def evaluate_model(test_dl, model):
    predictions, actuals = list(), list()
    for i, (inputs, targets) in enumerate(test_dl):
        # evaluate the model on the test set
        yhat = model(inputs)
        # retrieve numpy array
        yhat = yhat.detach().numpy()
        actual = targets.numpy()
        actual = actual.reshape((len(actual), 1))
        # round to class values
        yhat = yhat.round()
        # store
        predictions.append(yhat)
        actuals.append(actual)
    predictions, actuals = vstack(predictions), vstack(actuals)
    # calculate accuracy
    acc = accuracy_score(actuals, predictions)
    return acc

In [None]:
# evaluate the model
acc = evaluate_model(test_dl, model)
print('Accuracy: %.3f' % acc)

# Make predictions

<div class="alert alert-block alert-info">
<font color=black><br>

- If you have a single image or a single row of data and want to make a prediction.
- This requires that you wrap the data in a PyTorch Tensor data structure.
- A Tensor is just the PyTorch version of a NumPy array for holding data. 
- It also allows you to perform the automatic differentiation tasks in the model graph, like calling backward() when training the model.
- The prediction too will be a Tensor, although you can retrieve the NumPy array by detaching the Tensor from the automatic differentiation graph and calling the NumPy function.

<br></font>
</div>

In [None]:
# make a class prediction for one row of data
def predict(row, model):
    # convert row to data
    row = Tensor([row])
    # make prediction
    yhat = model(row)
    # retrieve numpy array
    yhat = yhat.detach().numpy()
    return yhat

In [None]:
# make a single prediction
row = [5.1,3.5,1.4,0.2]
yhat = predict(row, model)
print('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))

# References

<div class="alert alert-warning">
<font color=black>


- https://machinelearningmastery.com/pytorch-tutorial-develop-deep-learning-models/
- [Dataset repository](https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv)

</font>
</div>

# Conclusion

<div class="alert alert-block alert-danger">
<font color=black><br>

- TBD

<br></font>
</div>