# Visualizaciones

Hello, it's great to see you here. On this occasion, we're going to talk a bit about visualizing our model's performance.

Data visualization is not only done before training a model during exploratory data analysis, but it's also a very useful tool for understanding the performance of our models in a more intuitive and easy-to-interpret way. Fortunately, scikit-learn also offers us several functions to visualize the results of our models.

The visualizations I'm going to tell you about apply specifically to classification models.

To create these visualizations, you need to have a model already trained, so that's what I'm doing in this cell.

In [None]:
# Crea un dataset
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

X, y = make_moons(n_samples=1000, random_state=42, noise=0.40)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

print(f"Total number of samples: {len(X)}")
print(f"Samples on the test set {len(X_test)}")


In [None]:
# Visualización del dataset
import matplotlib.pyplot as plt

plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor="k")


In [None]:
# Creamos un clasificador básico para esta lección
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)


## Confusion Matrix

A confusion matrix is used to evaluate the performance of a classification model. It is a square matrix that shows the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for each class in a test dataset.

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay

ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)


```plain
TP | FP
-------
FN | TN
```

The confusion matrix works by comparing the model's predictions with the actual class labels in the test dataset. The main diagonal of the confusion matrix shows the true positives and true negatives, which are the correct predictions of the model. The other entries in the matrix show the false positives and false negatives, which are the incorrect predictions of the model.

It is recommended to use it to see what type of predictions your model is failing on, whether in false positives or false negatives. These errors can have important consequences depending on the problem being solved, so it is important to use the confusion matrix to understand and evaluate the model's performance.

## Visualization of decision boundaries

The `DecisionBoundaryDisplay` function from scikit-learn is a useful tool for visualizing the decision boundaries of a classification model. This function allows us to visualize the regions of the feature space where the model predicts each class, which helps us better understand how the model is making classifications.

Visualization of decision boundaries is particularly useful in cases where classes cannot be perfectly separated in the feature space. In these cases, the model may have difficulties making accurate classifications, and the decision boundaries may be irregular or complex.

In [None]:
from sklearn.inspection import DecisionBoundaryDisplay

DecisionBoundaryDisplay.from_estimator(
    estimator = model, 
    X = X,
    alpha=0.5,
)

plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor="k")


## Precision-recall curve

The precision-recall curve is a useful tool for evaluating the performance of a classification model in terms of precision and recall for different classification thresholds. The curve is generated by plotting precision on the *y*-axis and recall on the *x*-axis for different classification thresholds.

In [None]:
from sklearn.metrics import PrecisionRecallDisplay

PrecisionRecallDisplay.from_estimator(model, X_test, y_test)


In other words, it shows the *tradeoff* between precision and recall, where a large area under the curve represents both high recall and high precision. High precision is related to a low false positive rate, and high recall is related to a low false negative rate.

This graph is particularly useful when our dataset is imbalanced.

## ROC Curve

The ROC (Receiver Operating Characteristic) curve is a useful visualization for evaluating the performance of a classification model in terms of its ability to distinguish between classes.

The curve is generated by plotting the true positive rate (TPR) on the *y*-axis and the false positive rate (FPR) on the *x*-axis for different classification thresholds.

In [None]:
from sklearn.metrics import RocCurveDisplay

RocCurveDisplay.from_estimator(model, X_test, y_test)


This allows evaluating the ability of a classification model to distinguish between classes, even when class distributions are unequal. It is useful because it represents the trade-off between the true positive rate and the false positive rate. An ideal classifier is located in the upper-left corner of the graph, indicating a high true positive rate and a low false positive rate. In this case, the model can perfectly distinguish between the two classes.

## Conclusion

In summary, visualizations are a powerful tool for understanding and communicating the results of our models more effectively. Scikit-learn offers us several functions to visualize the results of our classification and regression models, and it's important to consider these tools when evaluating and presenting our model results.

In the next chapter, we'll discuss what is perhaps the most famous function in scikit-learn. Join me.