# Metrics

Hello again, welcome to this chapter on metrics in scikit-learn.

As part of all the tools that scikit-learn offers us, it also provides a way to measure how good the predictions made by the models are.

Metrics play a crucial role in the evaluation and selection of machine learning models. These metrics allow us to quantify the performance of our models in terms of accuracy, robustness, and generalization. At the same time, they also allow us to communicate the performance of the models to others outside our machine learning team.

scikit-learn offers more than 30 metrics, which are too many to cover in a single chapter, but the concepts and code you will see in this chapter can be applied to all of them since they follow a very similar interface.

## Classification metrics

First, let's create a small dataset that we'll use to test the classification metrics:

In [None]:
import numpy as np

y_pred = np.array(
    [0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0]
)

y_true = np.array(
    [1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0]
)


$$
  \frac{{Verdaderos\ positivos}\ +\ {Verdaderos\ negativos}}{NÃºmero\ total\ de\ predicciones}
$$

### Accuracy

Accuracy is a measure that represents the proportion of correct predictions out of the total predictions made. It's an easy-to-understand metric, but it can be misleading in situations where the data is imbalanced.

The formula is as follows:

Which means it will count the number of matches between ones and zeros and divide them by the total amount.

To use it in scikit-learn, you need to import it from `sklearn.metrics`:

In [None]:
from sklearn.metrics import accuracy_score

accuracy_score(y_true, y_pred)


$$
  \frac{{Verdaderos\ positivos}}{{Verdaderos\ positivos}\ +\ {Falsos\ positivos}}
$$

The closer the result is to one, the better. Although remember that in certain problems, accuracy is not the best metric to evaluate your model's performance.

### Precision

Precision, also known as positive predictive value, is used to measure the model's ability to predict positive cases. Formally, it is defined as the number of true positives divided by the sum of true positives and false positives. Here, a value close to 1 is ideal.

In [None]:
from sklearn.metrics import precision_score

precision_score(y_true, y_pred)


$$
  \frac{{Verdaderos\ positivos}}{{Verdaderos\ positivos}\ +\ {Falsos\ negativos}}
$$

Precision metrics are especially useful in situations where the cost of a false positive is high. That is, when it's more important to avoid false positives than false negatives. For example, think about a system that detects spam messages; here, precision is the most important appropriate measure since it's crucial to minimize the number of false positives, in this case legitimate emails identified as spam. Failure to do so could cause our users to lose important information.

### Recall

The *recall* metric, also known as sensitivity or true positive rate, is used to measure the model's ability to identify all positive cases. Formally, it's defined as the number of true positives divided by the sum of true positives and false negatives. Here, a value close to 1 is ideal.

In [None]:
from sklearn.metrics import recall_score

recall_score(y_true, y_pred)


$$
  2\ \times\ \frac{{Precision}\ \times\ {Recall}}{{Precision}\ +\ {Recall}}
$$

Unlike precision, the recall metric is especially useful in situations where the cost of a false negative is high. That is, when it is more important to avoid false negatives than false positives. Think about early cancer detection; it is important to minimize the number of false negatives, cancer patients who are not identified, as this can delay treatment and put the person's life at risk. In this case, recall is more important than precision.

### F1 score

The F1 score, also known as F1 measure, is a commonly used metric to evaluate the quality of a classification model. It is a single measure that combines precision and recall into a single number.

The F1 score is the harmonic mean of precision and recall, its values range between 0 and 1, with 1 being the best possible result.

The formula that represents it is as follows:

In [None]:
from sklearn.metrics import f1_score

f1_score(y_true, y_pred)


As F1 is a combination of the two metrics, it can be used in various scenarios to measure the overall performance of the model. It's ideal when a balance between being precise and retrieving all possible cases must be found. In the end, the metric to choose depends entirely on the problem you're trying to solve.