# <font color='blue'>Table of contents</font>

- [Create the Model](#create-model)
- [Generate Dataset](#data)
- [Train the Model](#train)
- [Confusion Matrix](#conf-mat)
- [Accuracy](#accuracy)
- [Precision](#precision)
- [Recall / Sensitivity](#recall)
- [F-1 Score](#F1-score)
- [ROC Curve](#roc-curve)

# <font style="color:blue">Classification Evaluation Metrics</font>

This notebook will elaborate how to implement the different metrics in code. Most of these metrics are available in popular Machine Learning packages like Scikit-Learn etc. You need to develop a good  understanding of these metrics,for they play an important  role in  business decision-making.  Every data scientist or Machine Learning practitioner should know their significance and get familiar with their inner workings.

# <font style="color:blue">1. Create the Model</font><a name="create-model"></a>

For the sake of simplicity, we will illustrate the performance metrics for the task of point classification to two classes: $\{0, 1\}$.


Let's start by importing all the required packages.

In [1]:
%matplotlib inline
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
plt.style.use('ggplot')

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings('ignore')

In [2]:
plt.rcParams["figure.figsize"] = (8, 8)

# <font style="color:blue">2. Generate Dataset</font><a name="data"></a>

The Scikit-Learn library provides a range of supervised as well as  standard  Machine Learning algorithms. [A blog on Introduction to Scikit-Learn](https://towardsdatascience.com/an-introduction-to-scikit-learn-the-gold-standard-of-python-machine-learning-e2b9238a98ab). 

You start by creating a dataset. Use [make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html) function from the  scikit-learn library. It generates a random `n-class` classification problem, with normally-distributed clusters of points. Aso, add uniformly-distributed points, as noise to your data.

For find more details on `sklearn.datasets.make_classification`, [click here](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html).

In [3]:
# for reproducible results
seed = 42  
rng = np.random.RandomState(seed)
torch.manual_seed(seed)


# generate two class classification problem
X, y = make_classification(
    n_features=2, n_redundant=0, n_informative=2, random_state=seed, n_clusters_per_class=1
)

# add unifom random noise
X += 4 * rng.uniform(size=X.shape)

print('Inputs (X) shape: {}'.format(X.shape))
print('Lables (y) shape: {}'.format(y.shape))

plt.scatter(X[:,0],X[:,1],c=y,edgecolor='k')
plt.show()

# <font style="color:blue">3. Train the Model</font><a name="train"></a>

Here, you train a [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) classifier. It  is equivalent to a one-layer neural network with sigmoid activation. You  have already implemented this using the basic functionality of PyTorch. But now, let's implement it again using PyTorch NN-module. 

We use sigmoid activation, so the model prediction will be prediction probability of class `1`.

Ideally, you should create separate classes for different functionalities. For example, in this case, you have to train a model. There are  two parts: 1. Model, and 2. Training. Suppose we want to train different models to discover the best model for the dataset (a common practice in Machine Learning), it's a good idea to have a `trainer` class, which takes the `model` and `data`, and trains it.

Let's start with this practice. Here, we will create two classes: `LogisticRegression` (model) and `Trainer`.

## <font style="color:rgb(8,133,37)">3.1. Create the Model Class</font>

The `LogisticRegression` class  follows these methods:


**`__init__`:**  It takes `n_features` (number of input data features) and initiates `nn-linear` function.

**`forward`:** It takes `x` (data input) and does forward pass of the network.

In [4]:
class LogisticRegression(nn.Module):
    def __init__(self, n_features):
        super().__init__()
        
        # define linear layer (WX + B)
        self.linear = nn.Linear(n_features, 1, bias=True)

    def forward(self, x):
        # calculate WX + B
        x = self.linear(x)
        
        # sigmoid activation (prediction probability of class 1)
        predictions = torch.sigmoid(x)
        return predictions

## <font style="color:rgb(8,133,37)">3.2. Create the Trainer Class</font>

Because PyTorch does not provide unified methods for training, we create a simple `Trainer` class to fit our model and make predictions.

The **`Trainer` class follows these methods:**

**`__init__`:** 

It takes three arguments.
        
- `model`: This has to be trained. Although we will pass the above-defined Linear Regression Model here, it can take any PyTorch model.
 
- `criterion`: It  takes any NN-module loss function.
        
- `optimizer`: It takes the optimizer algorithm method.
        
- `epoch_num`: Number of epochs for training.

**`fit`:** It takes two arguments, input and target.

This method does the following:
1. Forward pass of the `model`  with the `input`.

1. Finds loss, using forward pass and `target`.

1. Finds `gradient`, using `backprop`.

1. Updates parameters using, `step`

**`predict`:** It takes `input` as an argument. Only does forward pass and returns prediction.


In [5]:
class Trainer:
    def __init__(self, model, criterion, optimizer, epoch_num):
        self.model = model
        
        # loss function
        self.criterion = criterion
        
        # Optimizer
        self.optimizer = optimizer
        
        # num of epochs
        self.epoch_num = epoch_num

    def fit(self, inputs, targets):
        """
        Updating model trainable parameters in loop for given number of epochs
        """
        
        # set model in train state. 
        # Why this (and model.eval()) is important, 
        # we will see when we will train a deep neural network.
        self.model.train()
        
        # run train loop for given epochs
        for _ in range(self.epoch_num):
            
            # reset previously calculated gradient to zero
            self.optimizer.zero_grad()
            
            # predict probability of class '1'
            preds = self.model(inputs)
            
            # get loss
            loss = self.criterion(preds, targets)
            
            # calculate gradients
            loss.backward()
            
            # update parameters with gradient
            self.optimizer.step()

    def predict(self, inputs):
        
        # set model in train state. 
        self.model.eval()
        # temporarily set requires_grad flag to false
        with torch.no_grad():
            # probability of class one prediction
            preds = self.model(inputs)
        return preds

## <font style="color:rgb(8,133,37)">3.3. Training and Prediction</font>

1. Divide the data into `train` (75%) and `test` data (25 %).

2. Create a model object using `LogisticRegression` model class.

3. Define `criterion` as binary cross-entropy loss.

4. Define `optimizer` as `SGD` optimizer.

5. Create the trainer object.

6. Train the model using the `fit` method defined in `Trainer` class.

7. Finally, get predictions for test data.

In [6]:
# Divide data into train (0.75) and test (0.25) set. 
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=seed)

# train data from numpy to torch
x_train, y_train = torch.from_numpy(x_train).float(), torch.from_numpy(y_train).float()

# create model object
log_regression = LogisticRegression(n_features=2)

# define loss, in this case binary cross-entropy loss
criterion = nn.BCELoss()

# define optimizer, in this case Stochastic Gradient Descent  
optimizer = torch.optim.SGD(log_regression.parameters(), lr=0.01)

# create trainer object
trainer = Trainer(log_regression, criterion, optimizer, 200)

# train the model
trainer.fit(x_train, y_train.unsqueeze(dim=1))

# test data from numpy to torch
x_test, y_test = torch.from_numpy(x_test).float(), torch.from_numpy(y_test).float()

# probability of class one prediction
y_predicted = trainer.predict(x_test)

# <font style="color:blue">4. Confusion Matrix</font><a name="conf-mat"></a>

<img src="https://www.learnopencv.com/wp-content/uploads/2020/01/c3_w3_confusion_matrix.png" width=600>

Let's assume class `1` is a **`positive`** class, and class `0` is **`negative`** class.

To get the confusion matrix and derive other methods from it, we implement the `ConfusionMatrix` class thus:


**`__init__`:**  `self.conf` (confusion matrix variable) is initiated with `2x2` `ndarray`.

**`reset`:** Reset `self.conf` to zero.

**`add`:** It takes `pred` (prediction), and `target` (target label) to update `self.conf` .  Use `numpy.histogramdd` to get a multidimensional histogram. For more details about [click here](https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html). 

Let's see what happens in the implementation.

`np.histogramdd(replace_indices, bins=(2, 2), range=[(0, 2), (0, 2)])`


let's `replace_indices` is `7 x 2` `ndarray`, where Column `0` corresponds to `target` and column `1` corresponds to `prediction`.

```
replace_indices = [ [0, 1],
                    [0, 0],
                    [1, 1],
                    [1, 1],
                    [1, 0],
                    [0, 1],
                    [0, 0] ]
```

`bins=(2, 2)` means it will return `2 X 2` `ndarray`.

`range=[(0, 2), (0, 2)])` means it will have the following bins:
```
[0, 0], [0, 1], [1, 0] and [1, 1]
```

So the following `ndarray` will return:
```
[[count_of([0, 0]), count_of([0, 1])],
 [count_of([1, 0]), count_of([1, 1])]] 
i.e.
[[2, 2],
 [1, 2]]
```

Hence, `0` is the negative class, and `1` is the positive class:
```
count_of([1, 1]) = TP,
count_of([0, 1]) = FP,
count_of([0, 0]) = TN, and
count_of([1, 0]) = FN
```

**`TP`:** Returns `true positive`

**`FP`:** Returns `false positive`

**`TN`:** Returns `true negative`

**`FN`:** Returns `false negative`

**`confusion_matrix`:** Returns confusion matrix:
```
[[TP, FP],
 [FN, TN]]
```



In [7]:
class ConfusionMatrix:
    def __init__(self):
        # init confusion matrix
        self.conf = np.ndarray((2, 2), np.int32)

    def reset(self):
        # reset to zero
        self.conf.fill(0)

    def add(self, pred, target):
        """
        This will take predicted probability and True label and compute confusion matrix
        """
        replace_indices = np.vstack((target.flatten(), pred.flatten())).T

        conf, _ = np.histogramdd(replace_indices, bins=(2, 2), range=[(0, 2), (0, 2)])

        self.conf += conf.astype(np.int32)

    def TP(self):
        return self.conf[1,1]
    
    def FP(self):
        return self.conf[0, 1]
    
    def TN(self):
        return self.conf[0, 0]
    
    def FN(self):
        return self.conf[1, 0]
    
    def confusion_matrix(self):
        """
        get confusion matrix as defined in figure
        """
        cm = np.array([[self.TP(), self.FP()],
                      [self.FN(), self.TN()]])
        return cm

Let's compute the confusion matrix for threshold probability `0.5` and `0.7`.

Follow these steps:


1. Init the `ConfusionMatrix` class.
1. Get `prediction` by using`y_predicted` and `threshold probability` .
1. Reset the confusion matrix.
1. Compute confusion matrix using `add`.
1. Call `confusion_matrix()`.

In [8]:
# threshold probability 0.5

cm = ConfusionMatrix()

thres_prob = 0.5
predictions = y_predicted > thres_prob

# reset confusion matrix
cm.reset()

# compute confusion matrix
cm.add(predictions, y_test)


print('Confusion Matrix for threshold probability 0.5:\n{}'.format(cm.confusion_matrix()))


thres_prob = 0.6
predictions = y_predicted > thres_prob

# reset confusion matrix
cm.reset()

# compute confusion matrix
cm.add(predictions, y_test)


print('Confusion Matrix for threshold probability 0.6:\n{}'.format(cm.confusion_matrix()))


# <font style="color:blue">5. Accuracy</font><a name="accuracy"></a>

<img src="https://www.learnopencv.com/wp-content/uploads/2020/01/c3_w3_accuracy.png" width=600>

$$
accuracy = \frac{TP + TN}{TF + FP + FN + TN }
$$

In [9]:
def accuracy(thres_prob, y_predicted, y_true):
    predictions = y_predicted > thres_prob

    # reset confusion matrix
    cm.reset()
    # compute confusion matrix
    cm.add(predictions, y_true)
    
    # accuracy 
    acc = (cm.TP() + cm.TN())/(cm.TP() + cm.FP() + cm.FN() + cm.TN())
    
    return acc


In [10]:
acc = accuracy(0.5, y_predicted, y_test)

print('Accuracy at threshold 0.5: {}'.format(acc))

# <font style="color:blue">6. Precision</font><a name="precision"></a>

<img src="https://www.learnopencv.com/wp-content/uploads/2020/01/c3_w3_precision.png" width=600>

$$
precision = \frac{TP}{TP + FP}
$$

In [11]:
def precision(thres_prob, y_predicted, y_true):
    predictions = y_predicted > thres_prob

    # reset confusion matrix
    cm.reset()
    # compute confusion matrix
    cm.add(predictions, y_true)
    
    # precision
    pre = cm.TP()/(cm.TP() + cm.FP())
    
    return pre

In [12]:
pre = precision(0.5, y_predicted, y_test)

print('Precision at threshold 0.5: {0:.3}'.format(pre))

# <font style="color:blue">7. Recall / Sensitivity</font><a name="recall"></a>

<img src='https://www.learnopencv.com/wp-content/uploads/2020/01/c3_w3_recall.png' width=600>

$$
recall = \frac{TP}{TP + FN}
$$

In [13]:
def recall(thres_prob, y_predicted, y_true):
    predictions = y_predicted > thres_prob

    # reset confusion matrix
    cm.reset()
    # compute confusion matrix
    cm.add(predictions, y_true)
    
    # recall
    rec = cm.TP()/(cm.TP() + cm.FN())
    
    return rec
    

In [14]:
rec = recall(0.5, y_predicted, y_test)

print('Recall at threshold 0.5: {0:.3}'.format(rec))

# <font style="color:blue">8. F-1 Score</font><a name="F1-score"></a>

$$
F_1 score = \frac{2 TP}{2TP + FP + FN}
$$


In [15]:
def f1_score(thres_prob, y_predicted, y_true):
    predictions = y_predicted > thres_prob

    # reset confusion matrix
    cm.reset()
    # compute confusion matrix
    cm.add(predictions, y_true)
    
    # f1 score
    score = (2*cm.TP())/(2*cm.TP() + cm.FP() + cm.FN())
    
    return score
    

In [16]:
f1_score = f1_score(0.5, y_predicted, y_test)

print('F1 score at threshold 0.5: {}'.format(f1_score))

# <font style="color:blue">9. ROC Curve</font><a name="roc-curve"></a>

<img src="https://www.learnopencv.com/wp-content/uploads/2020/01/c3_w3_roc.png" width=700>


\begin{align}
TPR (recall) &= \frac{TP}{TP + FN} \\
FPR &= \frac{FP}{FP + TN} \\
\end{align}

## <font style="color:rgb(8,133,37)">9.1. ROC Curve Using Confusion Matrix</font>

For `threshold_probability` in `[0, 1]`:
1. By using `ConfusionMatrix` class get `TP` (true positive), `FP` (false positive), `FN` (false negative), and `TN` (true negative).

2. Calculate `TPR` (true positive rate) and `FPR` (false positive rate).

Plot `TPR-vs-FPR`.


In [17]:
thresholds = np.linspace(0.001, 0.999, 1000)

tp_rates = []
fp_rates = []
cm = ConfusionMatrix()

for threshold in thresholds:

    # get prediction
    predictions = y_predicted > threshold
    
    # rest confusion matrix
    cm.reset()
    
    # calculate confusion matrix
    cm.add(predictions, y_test)
    
    # get TP, FP, FN, and TN to calculate TPR and FPR
    TN = cm.TN()
    FP = cm.FP()
    FN = cm.FN()
    TP = cm.TP()

    # Sensitivity, recall, or true positive rate
    TPR = TP / (TP + FN)
    tp_rates.append(TPR)

    # False positive rate
    FPR = FP / (FP + TN)
    fp_rates.append(FPR)

**Let's plot `true positive rate` vs `false positive rate`.**

In [18]:
plt.plot(fp_rates, tp_rates, label='ROC curve', color='b')
plt.plot([0, 1], [0, 1], label='Random Classifier (AUC = 0.5)', linestyle='--', lw=2, color='r')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.title('ROC Curve')
plt.show()

## <font style="color:rgb(8,133,37)">9.2. ROC Curve Using Definition & AUC</font>

To plot the ROC Curve and calculate the ROC AUC Score, let's create a class `ROCCurve`. This approach suses the [False Positive](https://en.wikipedia.org/wiki/False_positive_rate)  and True [True Positive](https://en.wikipedia.org/wiki/Sensitivity_and_specificity#Sensitivity) to build the ROC Curve.

By increasing the threshold between two classes and calculating a number of `true positives`, `true negatives`, `false positives` and `false negatives` for each of them, we can get a set of corresponding `true positive rate` and `false positive rates`.


Let us also write an `ROCCurve` class and follow its methods to plot the ROC curve and calculate AUC. Check out  all the methods:


**`__init__`:** It takes `y_test` and `y_pred_score` as arguments and initiate attributes `y_test` and `y_pred_score` respectively. 

**`_get_fpr_tpr`:** returns `FPR` and `TPR` for a range of thresholds.

**`_get_tp_fp_tn_fn`:** returns `TP`, `FP`, `TN`, and `FN` for a range of threshold.

**`plot_roc`:** get `TPR` and `FPR` from `_get_fpr_tpr` and plot `TPR`-vs-`FPR` (ROC Curve).

**`get_auc_score`:** get `TPR` and `FPR` from `_get_fpr_tpr` and calculate `AUC`.

In [19]:
class ROCCurve:
    def __init__(self, y_test, y_pred_score):
        # Init attributes 
        self.y_test = y_test
        self.y_pred_score = y_pred_score

    def _get_fpr_tpr(self):
        # thresholds
        thresholds = torch.linspace(0.001, 0.999, 1000).unsqueeze(1)
        
        # get prediction for all thresholds
        self.y_pred = self.y_pred_score.T > thresholds
        
        # get TP, FP, TN, and FN for all thresholds
        tp, fp, tn, fn = self._get_tp_fp_tn_fn()
        
        # calculate true positive rate for all thresholds
        tpr = tp.float() / (tp + fn)
        
        # calculate false positive rate for all thresholds
        fpr = fp.float() / (fp + tn)
        
        return fpr.flip((0, )), tpr.flip((0, ))
        

    def _get_tp_fp_tn_fn(self):
        
        # change datatype to bool
        self.y_pred = self.y_pred.bool()
        self.y_test = self.y_test.bool()
        
        # calculate TP
        tp = (self.y_pred & self.y_test).sum(dim=1)
        
        # calculate FP
        fp = (self.y_pred & ~self.y_test).sum(dim=1)
        
        # calculate TN
        tn = (~self.y_pred & ~self.y_test).sum(dim=1)
        
        # calculate FN
        fn = (~self.y_pred & self.y_test).sum(dim=1)
        
        return tp, fp, tn, fn

    def plot_roc(self):
        
        # get TPR and FPR and plot TPR-vs-FPR
        plt.plot(*self._get_fpr_tpr(), label='ROC curve', color='g')
        plt.plot([0, 1], [0, 1], label='Random Classifier (AUC = 0.5)', linestyle='--', lw=2, color='r')
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.legend(loc="lower right")
        plt.title('ROC Curve')
        plt.show()

    def get_auc_score(self):
        # Get TPR and FPR
        fpr, tpr = self._get_fpr_tpr()
        
        # get area under the curve of TPR-vs-FPR plot
        return np.trapz(tpr, fpr), fpr, tpr

**Now we can use an object of our class to plot the ROC curve**

In [20]:
roc_auc = ROCCurve(y_test, y_predicted)
roc_auc.plot_roc()

**Also, we implemented the function to calculate the area under the ROC curve.**

In [21]:
roc_auc_score, fpr, tpr = roc_auc.get_auc_score()
print('ROC AUC Score: {0:.3}'.format(roc_auc_score))