# Classification Metrics Roundup 

_By Jeff Hale_

---

## Learning Objectives
By the end of this lesson students will be able to:

- Understand binary classification metrics 
- Compute binary classification metrics by hand
- Use sklearn to compute binary classification metrics
---

### Read in titanic data from seaborn

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [None]:
df_titanic = sns.load_dataset('titanic', )
df_titanic.head()

In [None]:
df_titanic.info()

#### Split into x and y. 

Let's use `survived` for y and `sex` and `class` for X.

In [None]:
X = df_titanic[['sex', 'class']]
y = df_titanic['survived']

In [None]:
X.head()

In [None]:
y.head()

In [None]:
y.value_counts()

In [None]:
y.value_counts(normalize=True)

## Split into training and test sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)

In [None]:
X_train.head(2)

In [None]:
X_test.head(2)

In [None]:
y_train.head(2)

In [None]:
y_test.head(2)

### Make an object from the OneHotEncoder class. 

#### Warning! ☝️  The arguments are important here. 

In [None]:
ohe = OneHotEncoder(sparse=False, handle_unknown='ignore')
ohe

### Save the fit and transformed training data

In [None]:
X_train_dummified = ohe.fit_transform(X_train, y_train)
X_train_dummified

### Save the transformed `X_test`

In [None]:
X_test_dummified = ohe.transform(X_test)
X_test_dummified

In [None]:
pd.get_dummies(X_train)

## Make a LogisticRegression model

In [None]:
logreg = LogisticRegression()


### Fit the model

In [None]:
logreg.fit(X_train_dummified, y_train)

### Create the model predictions

In [None]:
preds = logreg.predict(X_test_dummified)

In [None]:
preds[:6]

In [None]:
probs = logreg.predict_proba(X_test_dummified)
probs[:6]

In [None]:
logreg.score(X_test_dummified, y_test)

### Generate the confusion matrix

In [None]:
confusion_matrix(y_test, preds)

In [None]:
tn, fp, fn, tp = confusion_matrix(y_test, preds).ravel()
tn

In [None]:
fp

In [None]:
fn

In [None]:
tp

### Try out the plot_confusion_matrix method

In [None]:
from sklearn.metrics import plot_confusion_matrix

In [None]:
plot_confusion_matrix(logreg, X_test_dummified, y_test, values_format = '.5g')

Accuracy = 73%

#### Compute the True Postive Rate

In [None]:
tp/(tp+fn)

#### Compute the Sensitivity

In [None]:
# same

#### Compute the Recall

In [None]:
# same

#### Compute the Precision

In [None]:
tp / (tp + fp)

#### Compute the Specificity

In [None]:
tn/ (tn + fp)

In [None]:
from sklearn.metrics import recall_score, precision_score

In [None]:
recall_score(y_test, preds)

In [None]:
precision_score(y_test, preds)

In [None]:
from sklearn.metrics import classification_report

In [None]:
classification_report(y_test, preds, output_dict=True)

### Make the ROC curve

In [None]:
from sklearn.metrics import roc_auc_score, plot_roc_curve
import matplotlib.pyplot as plt

In [None]:
plot_roc_curve(logreg, X_test_dummified, y_test);
plt.plot([0, 1], [0, 1], 'k--')

### What's the ROC AUC score?

Recall that `.predict_proba()` method will return the probabilities of both classes in a numpy array.

In [None]:
both_probs = logreg.predict_proba(X_test_dummified)
both_probs[:5]

In [None]:
roc_auc_score(y_test, probs)

### F1 score

2 *    (Precision * Recall)      /      (Precision  + Recall)

In [None]:
from sklearn.metrics import f1_score

In [None]:
f1_score(y_test, preds)

### Balanced Accuracy Score

When is balanced accuracy a good metric to use?

Average of TPR and TNR

(Sensitivity + Specificity) / 2

In [None]:
# compute sensitivity


In [None]:
# compute specificity


In [None]:
# compute balanced accuracy


In [None]:
# use sklearn to do it the fast way


# Summary

You've seen how to compute common classification metrics by hand and using sklearn. 🎉

### Check for Understanding

What do the following terms mean in words? How would you compute them? When would you want to use each one?

- Recall
- Precision
- Sensitivity
- Specificity
- Balanced Accuracy
- F1 Score
- ROC AUC 
- Accuracy