[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/francisco-ortin/data-science-course/blob/main/classification/metrics.ipynb)
[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

# Evaluation metrics for classification 

This example notebook show how to use common metrics for classification.

We use the [Breast Cancer Wisconsin (Diagnostic) Data Set](https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data) stored in `data/cancer.csv`.

The dataset features we will use are:
- diagnosis: target variable (M = malignant, B = benign)
- radius_mean: average size of the nuclei.
- texture_mean: variability in the intensity of pixel values.
- smoothness_mean: regularity of the nuclei surfaces.
- compactness_mean: compactness of the nuclei, indicating shape irregularities.

In [88]:
# make sure the required packages are installed
%pip install pandas seaborn matplotlib scikit-learn --quiet
# if running in colab, install the required packages and copy the necessary files
directory='data-science-course/classification'
if get_ipython().__class__.__module__.startswith('google.colab'):
    !git clone --depth 1 https://github.com/francisco-ortin/data-science-course.git  2>/dev/null
    !cp --update {directory}/*.py .
    !mkdir -p img data
    !cp {directory}/data/* data/.
# import the required modules
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

import utils
import models

random_state = 42

Note: you may need to restart the kernel to use updated packages.


## Dataset

Load, transform, scale split the dataset and show the percentage of malignant samples.

In [89]:
dataset_file_name = 'data/cancer.csv'
independent_vars = ['radius_mean', 'texture_mean']
dependent_var = 'diagnosis'
# read the dataset
dataset = pd.read_csv(dataset_file_name)
# replace diagnosis values with 0 and 1
dataset[dependent_var] = dataset[dependent_var].map({'M': 1, 'B': 0})
# Describe the dependent variable of the dataset
print(f"Percentage of positive labels: {dataset[dependent_var].mean() * 100:.2f}%.")
# Split the dataset into training and testing sets
(X_train, y_train), (X_test, y_test) = utils.split_dataset(dataset, independent_vars,
                                                           dependent_var, 0.2, random_state)
# Scale the dataset with a Standard Scaler
X_train, X_test = utils.scale_X_dataset(X_train, X_test, StandardScaler)

Percentage of positive labels: 37.26%.


## Classification models

We create the following classifiers:
- AlwaysTrue: always predicts the positive class (malignant).
- AlwaysFalse: always predicts the negative class (benign).
- LR_0_5: logistic regression model with a threshold of 0.5.
- LR_0_8: logistic regression model with a threshold of 0.8.
- LR_0_2: logistic regression model with a threshold of 0.2.

In [90]:
lr_model = LogisticRegression(random_state=random_state)
lr_model.fit(X_train, y_train)
always_true_model = models.AlwaysTrue()
always_false_model = models.AlwaysFalse()
lr_0_5_model = models.ThresholdModel(lr_model, 0.5)
lr_0_8_model = models.ThresholdModel(lr_model, 0.8)
lr_0_2_model = models.ThresholdModel(lr_model, 0.2)
# list of models
models = [always_true_model, always_false_model, lr_0_5_model, lr_0_8_model, lr_0_2_model]

## ✨ Questions ✨ about accuracy

1. What model obtains the best performance between always_true_model and always_false_model?
2. What is the best threshold between 0.8 and 0.2 for the logistic regression model?

### Answers:

*Write your answers here.*



In [91]:
# Uncomment the following code and see if your answers are correct
#accuracies = utils.compute_metrics(models, X_test, y_test, accuracy_score)
#utils.show_metrics('Accuracy', accuracies, models)

## ✨ Questions ✨ about precision and recall

3. What are going to be the precision and recall of always_false_model?
4. What are going to be the precision and recall of always_true_model?
5. What is the threshold with the best precision?
6. What is the threshold with the best recall?

### Answers

*Write your answers here.*



In [92]:
# Uncomment the following code and see if your answers are correct
#precisions = utils.compute_metrics(models, X_test, y_test, precision_score)
#recalls = utils.compute_metrics(models, X_test, y_test, recall_score)
#utils.show_metrics('Precision', precisions, models)
#utils.show_metrics('Recall', recalls, models)

## ✨ Questions ✨ about f1-score

7. What is the F1 score of always_false_model?
8. What is the F1 score of always_true_model?
9. What is the threshold with the best F1 score?

### Answers

*Write your answers here.*


In [93]:
# Uncomment the following code and see if your answers are correct
#f1_scores = utils.compute_metrics(models, X_test, y_test, f1_score)
#utils.show_metrics('F1-score', f1_scores, models)

## ✨ Questions ✨ about AUC

10. What is the AUC of always_true_model?
11. What is the AUC of always_false_model?
12. Does it make sense to compute the AUC of the logistic regression model for different thresholds?
13. What will the AUC of the logistic regression model be?

### Answers

*Write your answers here.*



In [94]:
# Uncomment the following code and see if your answers are correct
#auc_models = [always_true_model, always_false_model, lr_0_5_model]
#auc_scores = utils.compute_AUCs(auc_models, X_test, y_test)
#utils.show_metrics('AUC score', auc_scores, auc_models)