# Evaluating a Classifier

For Classifiers, the squared error is less suitable metric to mearure a model's effectiveness, considering that we work with nominal variables (e.g. good/bad wine). For these tasks, we often use **Recall** and **Precision**. For these, we assume that there is one class that we are actually more interested in. Let's say that our objective is to identify good wines:

- Recall: The fraction of good wines that are estimated to be good wines
- Precision: The fraction good wines amongst the wines that were estimated to be good wines

We have written out an example in a 'confusion matrix' below. In this example, there are 1000 bottles of wine in the data set. Of those only 100 bottles of wine are truly good. On the other hand, our classifier estimates that 120 bottles are good, but of those 120 only 80 are actually good. In this case, recall = TP / (TP + FN) = 80 / 100 = 0.8 and precision = TP / (TP + FP) = 80 / 120 = 0.67.

| Estimated/True      | Truly Good Wine         | Truly Bad Wine          | Total |
|---------------------|------------------------:|------------------------:|------:|
| Estimated Good Wine | True Positives (TP)  80 | False positives (FP) 40 |   120 |
| Estimated Bad Wine  | False Negatives (FN) 20 | True Nagatives (TN) 860 |   880 |
| Total               |                     100 |                     900 |  1000 |


# Data

We will start by loading the Iris data set for Linear Regression, i.e. using the flower type as the target variable and Petal length and width as features.

In [8]:
from ml import iris_pd

Load the Iris dataset. In the dataset there are originally 3 classes, we will just use class 1 and 2 and rename them to 0 and 1 to apply binary classification.

In [6]:
df = iris_pd()
df = df[df.target > 0]
df.target -= 1

In [11]:
X = df.drop(columns='target')
y = df.target

In [12]:
from sklearn.model_selection import train_test_split

train_X, valid_X, train_y, valid_y = train_test_split(X, y)

#### Computing Recall

In [13]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(train_X, train_y)


To compute recall we can compute the number of correctly classified items and the number of True Positives. Our classifier recognized 97% of flowers type 1 correctly in the training set and all in the validation set.

In [16]:
y_pred = model.predict(train_X)
type_1 = sum(train_y)
true_positives = sum(train_y * y_pred)
recall = true_positives / type_1
recall

0.9714285714285714

In [17]:
y_pred = model.predict(valid_X)
type_1 = sum(valid_y)
true_positives = sum(valid_y * y_pred)
recall = true_positives / type_1
recall

1.0