## Module 17: Learning Notebook: More on Evaluation Metrics

Let's introduce some new classification metrics:
- Precision
- Recall
- F1 Score
- Classification Report

These are building toward even better stuff in the next notebook

In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import f1_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import matplotlib.pyplot as plt
import boto3
import pandas as pd
import numpy as np

### 1. Load and investigate data

In [3]:
# Load from S3
sess = boto3.session.Session()
s3 = sess.client('s3') 
source_bucket = 'machinelearning-read-only'
source_key = 'data/cancer-10.csv' 
response = s3.get_object(Bucket = source_bucket, Key = source_key)
df = pd.read_csv(response.get("Body"))
print('df size (rows, columns):',df.shape)
df.head(3)

df size (rows, columns): (1000, 7)


Unnamed: 0,gene1,gene2,gene3,gene4,gene5,gene6,cancer_detected
0,3.447535,14.196807,80.524611,-36.487496,289.932591,146.27369,0
1,3.276234,17.705782,72.786907,-63.487129,293.618375,90.953863,0
2,4.036522,14.942696,67.819683,-48.681795,249.619909,165.576714,0


In [4]:
df['cancer_detected'].value_counts()

0    899
1    101
Name: cancer_detected, dtype: int64

In [5]:
# Prepare data
X = df.drop(['cancer_detected'], axis = 1)
y = df['cancer_detected']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20,random_state = 7)
# Verify the sizes of the split datasets
print('X_train:', X_train.shape)
print('y_train:', y_train.shape)
print('X_test:', X_test.shape)
print('y_test:', y_test.shape)

X_train: (800, 6)
y_train: (800,)
X_test: (200, 6)
y_test: (200,)


In [6]:
# Logitic Regression
scaler = StandardScaler() # Standardize the data
lr = LogisticRegression()
steps = [('Scaler', scaler), ('LogReg', lr)]
pipe = Pipeline(steps)
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print('Logistic Regression accuracy:', round(acc,4))

Logistic Regression accuracy: 0.905


### Recall the confusion matrix
<P>
<img src="images/cm.png" width=200 height=200 /><BR>
True Positive (TP) 

    The predicted value matches the actual value
    The actual value was positive and the model predicted a positive value

True Negative (TN) 

    The predicted value matches the actual value
    The actual value was negative and the model predicted a negative value

False Positive (FP) – Type 1 error

    The predicted value was falsely predicted
    The actual value was negative but the model predicted a positive value
    Also known as the Type 1 error

False Negative (FN) – Type 2 error

    The predicted value was falsely predicted
    The actual value was positive but the model predicted a negative value
    Also known as the Type 2 error


In [7]:
# True values from the test dataset
y_test.value_counts()

0    182
1     18
Name: cancer_detected, dtype: int64

In [8]:
# Predicted values from the logistic regression algorithm
pd.DataFrame(data = y_pred).value_counts()

0    197
1      3
dtype: int64

In [9]:
# Plot our confusion matrix for 1 = cancer detected and 0 = no cancer detected
confusion_matrix(y_test, y_pred,labels = [1,0])

array([[  1,  17],
       [  2, 180]])

In [10]:
# Store each value in the matrix
tp, fp, fn, tn = confusion_matrix(y_test, y_pred,labels = [1,0]).ravel()
# Print them out
tp, fp, fn, tn

(1, 17, 2, 180)

#### Precision
Precision tells us how many of the correctly predicted cases actually turned out to be positive.<P>
Intuitively, precision is the ability of the classifier to **not mislabel a true negative as a positive**.<P>

The best value is 1 and the worst value is 0.<BR>
<img src="images/precision.png" width=200 height=200 />

In [11]:
# Manually calculate
p = tp / (tp + fp)
print('Precision:',p)

Precision: 0.05555555555555555


#### Recall
Recall tells us how many of the actual positive cases we were able to predict correctly with our model.<P>
Intuitively, recall is the ability of the classifier to **find all the positive samples**.<P>
The best value is 1 and the worst value is 0.<BR>
<img src="images/recall.png" width=200 height=200 />

In [12]:
# Manually calculate
r = tp / (tp + fn)
print('Recall:',r)

Recall: 0.3333333333333333


#### F1-Score
F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics.<P>
It is maximum value is 1 when Precision is equal to Recall.<BR>
<img src="images/f1.png" width=200 height=200 />

In [13]:
# Manually calculate
f1 = 2 / ((1/r) + (1/p))
print('F1 score:', f1)

F1 score: 0.09523809523809523


#### Classification Report
You can do it all with a classification report

In [15]:
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
from sklearn.metrics import classification_report
print(classification_report(y_pred, y_test,target_names = ['cancer','no cancer'], digits = 3, labels = [1,0]))

              precision    recall  f1-score   support

      cancer      0.056     0.333     0.095         3
   no cancer      0.989     0.914     0.950       197

    accuracy                          0.905       200
   macro avg      0.522     0.624     0.523       200
weighted avg      0.975     0.905     0.937       200

