# Evaluating Classification

### Introduction

### Loading our Data

In [5]:
import pandas as pd
from sklearn.datasets import load_breast_cancer

cancer_data = load_breast_cancer()
X = pd.DataFrame(cancer_data['data'], columns = cancer_data['feature_names'])
y = pd.Series(cancer_data['target'])

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)

In [10]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(solver = 'lbfgs', max_iter = 5000)
model.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=5000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

### Evaluating our Logistic Regression Model

Now there are different ways that we can evaluate how well our logistic regression model fits to the data.  The first way is accuracy.  

> Accuracy is the percentage of observations that were predicted correctly.

Sklearn returns to us the accuracy score of our model out of the box.

In [13]:
model.score(X_test, y_test)

0.9440559440559441

So 94% of our observations in the test set were predicted correctly.

### Beyond Accuracy

Now that we know the percentage of observations predicted correctly, the next component is to know the amount of cancerous observations that were correctly predicted, and the amount of benign observations that were correctly predicted.  

When our classification model predicts the event as occurring, we say that the event is **positive** and a non-event is a **negative** prediction.  This leads us to the following terms:

* **True positive** - Our model predicts the event occurred, and it did occur
* **True negative** - Our model predicts the event **did not** occur, and it did not occur

We also have terms to distinguish between the types of mistakes our model makes:
* **False positive** The model predicts an event, but it did not occur (False alarm) 
* **False negative** The model predicts a non-event, but it did occur (Missed opportunity) 

A false positive is also called a type I error, and a false negative is called a type II error.

### Working with a Confusion Matrix

We can summarize the above measures of accuracy with a confusion matrix.  The confusion matrix breaks down each type of measurement above.

<img src="./confusion-matrix.png" width="50%">

The four conditions in our confusion matrix account for each of our predicted observations.  Sklearn has a built in `confusion_matrix` function to calculate the above for a given model.

In [16]:
from sklearn.metrics import confusion_matrix
y_pred_test = model.predict(X_test)

conf_data = confusion_matrix(y_test, y_pred_test)
conf_data

array([[50,  5],
       [ 3, 85]])

In [21]:
conf_mat_df = pd.DataFrame(conf_data, 
             columns = ['observed +', 'observed -'], 
             index = ['predicted +', 'predicted -'])
conf_mat_df

Unnamed: 0,observed +,observed -
predicted +,50,5
predicted -,3,85


So for our model we can see that there 50 true positives, and 85 true negatives.  Over to the top right we see 5 false positives, and at the bottom we see 3 false negatives.

### Resources

[ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)