# Classification Matrix and Classification Report

In machine learning, classification is a supervised learning task where the goal is to categorize or predict the class labels of new instances based on past observations. Classification metrics help evaluate the performance of classification models.

In this notebook, we'll explore two important concepts related to classification: classification matrix (or confusion matrix) and classification report. We'll use a simple example to illustrate these concepts.

In [ ]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix, classification_report

### Example Dataset

Let's consider a simple example of a binary classification problem where we predict whether an email is spam or not spam (ham) based on two features: length of the email and number of exclamation marks.

In [ ]:
# Example dataset
X = np.array([[10, 2], [20, 1], [15, 3], [8, 0], [25, 2]])  # Features: length, exclamation marks
y_true = np.array([1, 0, 1, 0, 1])  # True labels: 1 (spam), 0 (ham)

# Sample predictions (you can replace this with actual predictions from your model)
y_pred = np.array([1, 0, 1, 0, 1])  # Predicted labels

### Classification Matrix

A classification matrix, also known as a confusion matrix, is a table that describes the performance of a classification model on a set of test data for which the true values are known. It allows visualization of the performance of an algorithm. Let's calculate the classification matrix for our example.

In [ ]:
# Calculate classification matrix
cm = confusion_matrix(y_true, y_pred)
cm_df = pd.DataFrame(cm, columns=['Predicted Ham', 'Predicted Spam'], index=['Actual Ham', 'Actual Spam'])

# Display classification matrix
cm_df

### Classification Report

A classification report is used to evaluate the quality of predictions from a classification algorithm. It provides precision, recall, F1-score, and support for each class. Let's calculate the classification report for our example.

In [ ]:
# Calculate classification report
cr = classification_report(y_true, y_pred)

# Display classification report
print(cr)

### Interpretation

In our example:
- **True Positives (TP)**: Emails correctly classified as spam (actual spam and predicted spam)
- **False Positives (FP)**: Emails incorrectly classified as spam (actual ham but predicted spam)
- **True Negatives (TN)**: Emails correctly classified as ham (actual ham and predicted ham)
- **False Negatives (FN)**: Emails incorrectly classified as ham (actual spam but predicted ham)

From the classification matrix and report, we can see how well our classification model is performing. Depending on the application, we may prioritize certain metrics. For example, in spam detection, we may want to minimize false positives (emails incorrectly classified as spam), so precision would be an important metric. On the other hand, in medical diagnosis, we may want to minimize false negatives (missed diagnoses), so recall would be crucial.