
# Confusion Matrix

|      |          |        Predicted        |        Predicted        |
|------|----------|:-----------------------:|:-----------------------:|
|      |          | Negative                | Positive                |
| Real | Negative | True Negative      (TN) | False Positive     (FP) |
| Real | Positive | False Negative     (FN) | True Positive     (TP)  |

Used to check how good your classification model is

Metrics: 
- **Accuracy**: (TP + TN) / (TP + TN + FP + FN)
- **Precision**: (TP)/(TP + FP). High Precision -> Lower False Positive rate
- **Recall**: (TP)/(TP + FN). High Recall -> Lower False Negative rate

We want to know what predictions are wrong. And we can be wrong in 2 ways: False Positives or False Negatives
Precision takes care of the False Positives meanwhile Recall takes care of the False Negatives


In [1]:
import pandas as pd

In [2]:
diabetes_df = pd.read_csv('datasets/diabetes_clean.csv')
diabetes_df.head(2)

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0


## Goal
The goal is to **predict whether or not each individual is likely to have diabetes based on the features body mass index (BMI) and age (in years)**. \
Therefore, it is a **binary classification problem**. 

if target == 0 -> not diabetes\
if target == 1 -> diabetes

We will **use a KNN model**.

In [3]:
# Separate the data between train and test
from sklearn.model_selection import train_test_split

In [4]:
features = ["bmi", "age"]  
target = ["diabetes"]

df_features = diabetes_df[features]
df_target = diabetes_df[target]

## Features

In [5]:
df_features.head(2)

Unnamed: 0,bmi,age
0,33.6,50
1,26.6,31


## Target

In [6]:
df_target.head(2)

Unnamed: 0,diabetes
0,1
1,0


## Train & Test

In [7]:
feature_train, feature_test, target_train, target_test = train_test_split(df_features, df_target, test_size=0.30, random_state = 1)

In [8]:
print(feature_train.shape)
print(feature_test.shape)

print(target_train.shape)
print(target_test.shape)

(537, 2)
(231, 2)
(537, 1)
(231, 1)


## KNN Model

In [9]:
from sklearn.neighbors import  KNeighborsClassifier

In [10]:
model = KNeighborsClassifier(n_neighbors=6)
model.fit(feature_train, target_train)
target_predicted = model.predict(feature_test)

In [11]:
target_predicted[:3]

array([1, 0, 0])

## Confussion Matrix

In [12]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(target_test, target_predicted)
print(cm)

[[123  23]
 [ 52  33]]


In [13]:
true_negatives = cm[0][0]
false_positives = cm[0][1]
false_negatives = cm[1][0]
true_positives = cm[1][1]

print("True Negatives: {}".format(true_negatives))
print("False Negatives: {}".format(false_negatives))

print("True Positives: {}".format(true_positives))
print("False Positives: {}".format(false_positives))


True Negatives: 123
False Negatives: 52
True Positives: 33
False Positives: 23


## Metrics from Confussion Matrix

In [14]:
accuracy = (true_negatives + true_positives) / (true_negatives + true_positives + false_negatives + false_positives)
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_negatives + false_negatives)

f1_score = 2* ( (precision*recall)/(precision*recall) )

print("Accuracy: {}".format("%.2f" % accuracy)) 
print("Precision: {}".format("%.2f" % precision))  
print("Recall: {}".format("%.2f" % recall)) 
print("F1 Score: {}".format("%.2f" % f1_score)) 

Accuracy: 0.68
Precision: 0.59
Recall: 0.19
F1 Score: 2.00


In [15]:
# The metrics calculated above can be more easy calculated with a library
from  sklearn.metrics import classification_report

report = classification_report(target_test, target_predicted)
print(report)

              precision    recall  f1-score   support

           0       0.70      0.84      0.77       146
           1       0.59      0.39      0.47        85

    accuracy                           0.68       231
   macro avg       0.65      0.62      0.62       231
weighted avg       0.66      0.68      0.66       231

