# <font color="#418FDE" size="6.5" uppercase>**Classification Metrics**</font>

>Last update: 20260201.
    
By the end of this Lecture, you will be able to:
- Define accuracy as the fraction of correctly classified examples. 
- Construct simple confusion-style summaries from prediction results. 
- Explain why accuracy alone may be insufficient in some classification problems. 


## **1. Understanding Accuracy**

### **1.1. Correct prediction fraction**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_01_01.jpg?v=1769962652" width="250">



>* Accuracy is correct predictions over all predictions
>* Each example is one chance to be right

>* Accuracy is simple, everyday performance comparison
>* Count correct predictions, divide by total cases

>* Accuracy depends on the specific evaluation dataset
>* Different example sets can change measured accuracy



### **1.2. Accuracy as Percentage**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_01_02.jpg?v=1769962662" width="250">



>* Accuracy is the percentage of correct predictions
>* Percentages give a familiar, easy-to-read summary

>* Percent accuracy makes model comparisons immediately clear
>* Percentages summarize progress and support business decisions

>* Percent accuracy hides large error counts at scale
>* Small percentage changes can greatly impact real outcomes



### **1.3. Limits of Accuracy**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_01_03.jpg?v=1769962675" width="250">



>* High accuracy can hide costly classification mistakes
>* Rare but important cases may be missed

>* Accuracy hides which specific errors are made
>* Can mask unfair treatment of certain groups

>* Accuracy ignores different costs of prediction errors
>* Threshold choices and real impacts require extra metrics



## **2. Classification Error Breakdown**

### **2.1. Predicted Class Counts**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_02_01.jpg?v=1769962691" width="250">



>* Count how many predictions fall in each class
>* These counts show decision distribution and guide analysis

>* Predicted label counts reveal imbalance and bias
>* Rarely used labels signal collapsed or missing classes

>* Predicted counts link model decisions to real impacts
>* They guide deeper analysis of important classification errors



In [None]:
#@title Python Code - Predicted Class Counts

# This script explores predicted class counts simply.
# We use tiny example predictions for clarity.
# Focus on counting how often each label appears.

# Import collections for convenient counting tools.
from collections import Counter

# Create a tiny list of predicted labels.
predicted_labels = [
    "spam",
    "not_spam",
    "spam",
    "spam",

    "not_spam",
    "spam",
    "not_spam",
    "not_spam",
]

# Confirm the list has the expected small size.
if not isinstance(predicted_labels, list):
    raise TypeError("predicted_labels must be a list here")

# Use Counter to count how often each label appears.
predicted_counts = Counter(predicted_labels)

# Convert counts into a sorted list of (label, count) pairs.
sorted_counts = sorted(
    predicted_counts.items(),
    key=lambda pair: pair[0],
)

# Compute the total number of predictions for later percentages.
total_predictions = len(predicted_labels)

# Guard against division by zero in case list is empty.
if total_predictions == 0:
    raise ValueError("There must be at least one prediction here")

# Print a short header explaining the upcoming summary.
print("Predicted class counts for our tiny spam classifier:")

# Loop through each label and show count and percentage.
for label, count in sorted_counts:
    percentage = (count / total_predictions) * 100.0
    print(
        f"Label '{label}' predicted {count} times, {percentage:.1f}% of predictions."
    )

# Print a final line summarizing the total number of predictions.
print("Total number of predictions considered:", total_predictions)




### **2.2. Per Class Outcomes**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_02_02.jpg?v=1769962722" width="250">



>* Look at results for each class separately
>* Compare correct hits, misses, false alarms, rejections

>* Pick one target class, group all others
>* Count correct, missed, and mistaken predictions per class

>* Per class results reveal domain-specific behavior
>* They show strengths, failures, and improvement opportunities



In [None]:
#@title Python Code - Per Class Outcomes

# This script explores per class classification outcomes.
# We manually build a tiny prediction example.
# Then we compute confusion style counts per class.

# Required libraries are already available in Colab.
# No additional installations are necessary here.

# Define tiny true labels for a three class problem.
true_labels = ["bear", "bear", "deer", "fox", "bear", "deer"]

# Define corresponding model predictions for each example.
pred_labels = ["bear", "deer", "deer", "bear", "bear", "fox"]

# Collect the unique classes appearing in true labels.
classes = sorted(list(set(true_labels)))

# Validate that labels and predictions have equal lengths.
assert len(true_labels) == len(pred_labels)

# Define a function computing per class outcome counts.
def per_class_outcomes(true_list, pred_list, class_name):
    # Initialize counters for four outcome types.
    tp_count = 0
    fn_count = 0
    fp_count = 0
    tn_count = 0

    # Loop through all paired true and predicted labels.
    for true_value, pred_value in zip(true_list, pred_list):
        # Check if current example truly belongs to class.
        is_true_class = true_value == class_name
        # Check if prediction assigns current example to class.
        is_pred_class = pred_value == class_name

        # Update counts based on true and predicted membership.
        if is_true_class and is_pred_class:
            tp_count += 1
        elif is_true_class and not is_pred_class:
            fn_count += 1
        elif (not is_true_class) and is_pred_class:
            fp_count += 1
        else:
            tn_count += 1

    # Return a dictionary summarizing the four outcomes.
    return {"TP": tp_count, "FN": fn_count, "FP": fp_count, "TN": tn_count}

# Print a short header describing the tiny dataset.
print("True labels:", true_labels, "Predicted labels:", pred_labels)

# Loop over each class and display its outcome summary.
for current_class in classes:
    # Compute outcomes for the current focus class.
    outcomes = per_class_outcomes(true_labels, pred_labels, current_class)

    # Nicely format the per class confusion style summary.
    print("Class", current_class, "->", outcomes)




### **2.3. Finding Systematic Errors**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_02_03.jpg?v=1769962759" width="250">



>* Use confusion summaries to spot repeated misclassifications
>* Scan rows and columns to locate error clusters

>* Relate confusion cells to real-world class meanings
>* Use patterns to find and prioritize serious errors

>* Inspect examples from confusion cells to diagnose causes
>* Use insights to improve data and model



## **3. Why Context Matters**

### **3.1. Imbalanced Class Pitfalls**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_03_01.jpg?v=1769962771" width="250">



>* High accuracy can hide imbalanced class problems
>* Model may miss rare, important minority cases

>* High accuracy can ignore rare critical events
>* Minority class errors matter more than accuracy

>* Similar accuracy can hide minority class failures
>* Use class-specific metrics to avoid misleading accuracy



In [None]:
#@title Python Code - Imbalanced Class Pitfalls

# This script shows imbalanced class accuracy pitfalls.
# We use a tiny medical style disease example.
# Focus on accuracy and confusion style summaries.

# No extra installations are required for this script.
# All used libraries are available by default.
# You can run everything directly in Colab.

# Import numpy for simple array handling.
import numpy as np

# Set a deterministic random seed value.
np.random.seed(42)

# Create labels for one thousand patients.
num_patients = 1000

# Define rare disease count and healthy count.
num_disease = 10
num_healthy = num_patients - num_disease

# Build true labels array with zeros and ones.
true_labels = np.array(([1] * num_disease) + ([0] * num_healthy))

# Shuffle labels to avoid ordered structure.
np.random.shuffle(true_labels)

# Validate labels shape before further operations.
assert true_labels.shape == (num_patients,)

# Model A predicts everyone as healthy always.
pred_all_healthy = np.zeros_like(true_labels)

# Model B randomly guesses disease with small probability.
pred_random = (np.random.rand(num_patients) < 0.02).astype(int)

# Define a helper function computing accuracy safely.
def compute_accuracy(y_true, y_pred):
    assert y_true.shape == y_pred.shape
    correct = np.sum(y_true == y_pred)
    return correct / y_true.size


# Define a helper function building confusion style counts.
def confusion_counts(y_true, y_pred):
    assert y_true.shape == y_pred.shape
    tp = int(np.sum((y_true == 1) & (y_pred == 1)))
    fn = int(np.sum((y_true == 1) & (y_pred == 0)))
    tn = int(np.sum((y_true == 0) & (y_pred == 0)))
    fp = int(np.sum((y_true == 0) & (y_pred == 1)))
    return tp, fn, tn, fp


# Compute accuracy for both simple models.
acc_all_healthy = compute_accuracy(true_labels, pred_all_healthy)
acc_random = compute_accuracy(true_labels, pred_random)

# Compute confusion style summaries for both models.
conf_all_healthy = confusion_counts(true_labels, pred_all_healthy)
conf_random = confusion_counts(true_labels, pred_random)

# Unpack confusion counts for readability and printing.
(tp_a, fn_a, tn_a, fp_a) = conf_all_healthy
(tp_b, fn_b, tn_b, fp_b) = conf_random

# Print overall accuracy for both models.
print("Model A accuracy value:", round(acc_all_healthy, 4))
print("Model B accuracy value:", round(acc_random, 4))

# Print confusion style summary for model A.
print("Model A counts tp fn tn fp:", tp_a, fn_a, tn_a, fp_a)

# Print confusion style summary for model B.
print("Model B counts tp fn tn fp:", tp_b, fn_b, tn_b, fp_b)

# Print short explanation highlighting minority class performance.
print("Notice similar accuracy but very different disease detection performance.")




### **3.2. Error Costs in Classification**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_03_02.jpg?v=1769962818" width="250">



>* Different error types can have unequal consequences
>* Relying only on accuracy can misguide models

>* Different errors have very different real costs
>* Accuracy can hide costly mistakes like missed fraud

>* Different mistakes have very different real impacts
>* Choose metrics that match real-world error costs



### **3.3. Selecting Appropriate Metrics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_06/Lecture_B/image_03_03.jpg?v=1769962828" width="250">



>* Choose metrics based on task and consequences
>* Different problems prioritize different error types, metrics

>* Metrics must handle rare classes and imbalance
>* Threshold-based metrics reveal trade-offs and guide choices

>* Balance easy-to-explain metrics with technical detail
>* Use multiple metrics to capture fairness and costs



# <font color="#418FDE" size="6.5" uppercase>**Classification Metrics**</font>


In this lecture, you learned to:
- Define accuracy as the fraction of correctly classified examples. 
- Construct simple confusion-style summaries from prediction results. 
- Explain why accuracy alone may be insufficient in some classification problems. 

In the next Module (Module 7), we will go over 'Data Preparation'