### Evaluating Classification Models

**OBJECTIVES**
- Use the confusion matrix to evaluate classification models
- Explore precision and recall as evaluation metrics
- Determine cost of predicting highest probability targets

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder, PolynomialFeatures
from sklearn.compose import make_column_transformer
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
from sklearn.datasets import load_breast_cancer, load_digits, fetch_openml

### Evaluating Classifiers

Today, we want to think a bit more about the appropriate classification metrics in different situations.  Please use this [form](https://forms.gle/cSBk5cGSXwTxZTyB8) to summarize your work.

### Problem

Below, a dataset with measurements of cancerous and non-cancerous breast tumors is loaded and displayed.  Use `LogisticRegression` and `KNeighborsClassifier` to build predictive models on train/test splits.  Generate a confusion matrix and explore the classifiers mistakes.  

- Which model do you prefer and why?
- Do you care about predicting each of these classes equally?
- Is there a ratio other than accuracy you think is more important based on the confusion matrix?  

In [2]:
cancer = load_breast_cancer(as_frame=True).frame

In [3]:
cancer.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [4]:
# changing target label
#cancer['target'] = np.where(cancer['target'] == 0, 1, 0)

In [5]:
from sklearn.model_selection import train_test_split, cross_val_score

In [6]:

X = cancer.iloc[:, :-1]
y = cancer['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 11)

In [7]:

lgr = LogisticRegression()
knn = KNeighborsClassifier(n_neighbors=30)

In [8]:

scaler = StandardScaler()

In [9]:
from sklearn.pipeline import Pipeline

In [10]:

lgr_pipe = Pipeline([('scale', scaler), ('model', lgr)])
knn_pipe = Pipeline([('scale', scaler), ('model', knn)])

In [11]:

lgr_pipe.fit(X_train, y_train)
knn_pipe.fit(X_train, y_train)

In [22]:
#plot confusion matrices

### Problem

Below, a dataset around customer churn is loaded and displayed.  Build classification models on the data and visualize the confusion matrix.  

- Suppose you want to offer an incentive to customers you think are likely to churn, what is an appropriate evaluation metric?
- Suppose you only have a budget to target 100 individuals you expect to churn.  By targeting the most likely predictions to churn, what percent of churned customers did you capture?

In [13]:
churn = fetch_openml(data_id = 43390).frame

In [14]:
churn.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1.0,15634602.0,Hargrave,619.0,France,Female,42.0,2.0,0.0,1.0,1.0,1.0,101348.88,1.0
1,2.0,15647311.0,Hill,608.0,Spain,Female,41.0,1.0,83807.86,1.0,0.0,1.0,112542.58,0.0
2,3.0,15619304.0,Onio,502.0,France,Female,42.0,8.0,159660.8,3.0,1.0,0.0,113931.57,1.0
3,4.0,15701354.0,Boni,699.0,France,Female,39.0,1.0,0.0,2.0,0.0,0.0,93826.63,0.0
4,5.0,15737888.0,Mitchell,850.0,Spain,Female,43.0,2.0,125510.82,1.0,1.0,1.0,79084.1,0.0


In [15]:

X = churn.iloc[:, :-1]
y = churn['Exited']
X.drop(['Surname', 'RowNumber', 'CustomerId'], axis = 1, inplace = True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 11)

In [16]:

encoder = make_column_transformer((OneHotEncoder(drop = 'first'), ['Geography', 'Gender']),
                                  remainder = StandardScaler())

In [17]:

knn_pipe = Pipeline([('transform', encoder), ('model', KNeighborsClassifier())])
lgr_pipe = Pipeline([('transform', encoder), ('model', LogisticRegression())])

In [18]:

knn_pipe.fit(X_train, y_train)
lgr_pipe.fit(X_train, y_train)

In [23]:
#plot confusion matrices


### Predicting Positives

Return to the churn example and a Logistic Regression model on the data.



1. If you were to make predictions on a random 30% of the data, what percent of the true positives would you expect to capture?

2. Use the predict probability capabilities of the estimator to create a `DataFrame` with the following columns:

| probability of prediction = 1 | true label | 
| -----------  | -------------- |
| .8 | 1 |
| .7 | 1 |
| .4 | 0 |

3. Sort the probabilities from largest to smallest.  What percentage of the positives are in the first 3000 rows?

### `scikit-learn` visualizers

- `PrecisionRecallDisplay`
- `ROCurveDisplay`

from `skplot` [docs](https://scikit-plot.readthedocs.io/en/stable/metrics.html)

- `plot_cumulative_gain`

In [None]:
from sklearn.metrics import PrecisionRecallDisplay, RocCurveDisplay

In [None]:
import scikitplot as skplot