# Fine Tuning Machine Learning Models

We can use acuracy , the fraction of correctly classified labels to measure model performance. However, accuracy is not always a useful metric.

In cases like in Fraud detection, a model will have a 99% accuracy in predicting a legitimate tranaction but will be terrible at predicting a fraud. The situation where one class is more frequent is called **class imbalance**.

A **confusion matrix** is a table used in machine learning and statistics to evaluate the performance of a classification algorithm. It is particularly useful for assessing the performance of a model on a set of data where the true values are known. The confusion matrix provides a summary of the predictions made by a classification model, breaking down the results into four categories:

- True Positive (TP): Model correctly predicts positive instances. Instances where the model correctly predicted the positive class.

- True Negative (TN): Model correctly predicts negative instances. Instances where the model correctly predicted the negative class.

- False Positive (FP): Model incorrectly predicts positive instances. Instances where the model incorrectly predicted the positive class when the true class was       negative. Also known as a Type I error.

- False Negative (FN): Model incorrectly predicts negative instances. Instances where the model incorrectly predicted the negative class when the true class was positive. Also known as a Type II error.



|                  | Predicted Legitimate | Predicted Fraudulent |
|------------------|-----------------------|-----------------------|
| **Actual Legitimate**   | True Negative (TN)   | False Positive (FP)  |
| **Actual Fraudulent**   | False Negative (FN)  | True Positive (TP)   |


From the confusion matrix, various performance metrics can be calculated, such as:

Accuracy: (TP + TN) / (TP + TN + FP + FN)

Precision: TP / (TP + FP)

Recall (Sensitivity or True Positive Rate): TP / (TP + FN)

Specificity (True Negative Rate): TN / (TN + FP)

F1 Score: 2 * (Precision * Recall) / (Precision + Recall)

In [1]:
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

### Churn data

In [2]:
churn_df = pd.read_csv('Data/telecom_churn_clean.csv')

churn_df = churn_df.drop(columns='Unnamed: 0')
print(churn_df.shape)
churn_df.head()

(3333, 19)


Unnamed: 0,account_length,area_code,international_plan,voice_mail_plan,number_vmail_messages,total_day_minutes,total_day_calls,total_day_charge,total_eve_minutes,total_eve_calls,total_eve_charge,total_night_minutes,total_night_calls,total_night_charge,total_intl_minutes,total_intl_calls,total_intl_charge,customer_service_calls,churn
0,128,415,0,1,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,0
1,107,415,0,1,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,0
2,137,415,0,0,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,0
3,84,408,1,0,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,0
4,75,415,1,0,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,0


In [3]:
X = churn_df.drop('churn', axis = 1).values
y = churn_df['churn'].values

knn = KNeighborsClassifier(n_neighbors = 7)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

# We have 1126 True Negative Values, 12 False Positives, 158 False Negatives, 38 True Positives. 
print('Confusion Matrix')
print(confusion_matrix(y_test, y_pred))

print()

print('Classification Report')
print(classification_report(y_test, y_pred))

Confusion Matrix
[[1126   12]
 [ 158   38]]

Classification Report
              precision    recall  f1-score   support

           0       0.88      0.99      0.93      1138
           1       0.76      0.19      0.31       196

    accuracy                           0.87      1334
   macro avg       0.82      0.59      0.62      1334
weighted avg       0.86      0.87      0.84      1334



### Diabetes Data

The goal is to predict whether or not each individual is likely to have diabetes based on the features body mass index (BMI) and age (in years). Therefore, it is a binary classification problem. A target value of 

0 indicates that the individual does not have diabetes, 

while a value of 1 indicates that the individual does have diabetes.

In [4]:
diabetes_df = pd.read_csv('Data/diabetes_clean.csv')
print(diabetes_df.shape)
diabetes_df.head()

(768, 9)


Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [5]:
X = diabetes_df[['bmi', 'age']].values
y = diabetes_df['diabetes'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

knn = KNeighborsClassifier(n_neighbors=6)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)


# y_test: This is the true target variable or ground truth for the test set. 
# It contains the actual labels corresponding to the observations in your test set. 
# It serves as a reference against which the predictions (y_pred) will be compared.
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[117  34]
 [ 47  33]]
              precision    recall  f1-score   support

           0       0.71      0.77      0.74       151
           1       0.49      0.41      0.45        80

    accuracy                           0.65       231
   macro avg       0.60      0.59      0.60       231
weighted avg       0.64      0.65      0.64       231



True Positive (TP): Instances where both y_test and y_pred are positive.

True Negative (TN): Instances where both y_test and y_pred are negative.

False Positive (FP): Instances where y_test is negative, but y_pred is positive.

False Negative (FN): Instances where y_test is positive, but y_pred is negative.

Precision: The ratio of correctly predicted positive observations to the total predicted positives. It measures the accuracy of the positive predictions.

Recall (Sensitivity or True Positive Rate): The ratio of correctly predicted positive observations to all the observations in the actual positive class. It measures the model's ability to capture all the positives.

F1-score: The harmonic mean of precision and recall. It provides a balance between precision and recall.

Support: The number of actual occurrences of the class in the specified dataset.