# Simple Precision, Recall, and F-Measure Program

## Introduction

This program demonstrates how to calculate the three fundamental evaluation metrics in Information Retrieval:

1. **Precision**: How many retrieved documents are actually relevant?
2. **Recall**: How many relevant documents were successfully retrieved?
3. **F-measure**: The harmonic mean that balances precision and recall

We'll use a simple example to show these calculations step by step.

## Mathematical Formulas

### Precision
$$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}} = \frac{\text{Relevant Retrieved}}{\text{Total Retrieved}}$$

### Recall  
$$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} = \frac{\text{Relevant Retrieved}}{\text{Total Relevant}}$$

### F-Measure
$$\text{F-measure} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$

In [1]:
# Documents in the collection
all_documents = ["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8", "doc9", "doc10"]

# Ground truth: Which documents are actually relevant for our query
relevant_documents = ["doc1", "doc3", "doc5", "doc7", "doc9"]  # 5 relevant documents

# What our system retrieved
retrieved_documents = ["doc1", "doc2", "doc3", "doc4", "doc6", "doc7"]  # 6 retrieved documents

print("All Documents:", all_documents)
print("Relevant Documents (Ground Truth):", relevant_documents)
print("Retrieved Documents (System Output):", retrieved_documents)
print()

All Documents: ['doc1', 'doc2', 'doc3', 'doc4', 'doc5', 'doc6', 'doc7', 'doc8', 'doc9', 'doc10']
Relevant Documents (Ground Truth): ['doc1', 'doc3', 'doc5', 'doc7', 'doc9']
Retrieved Documents (System Output): ['doc1', 'doc2', 'doc3', 'doc4', 'doc6', 'doc7']



In [2]:
# Step 1: Calculate True Positives, False Positives, False Negatives
def calculate_metrics(retrieved_docs, relevant_docs):
    """
    Calculate precision, recall, and F-measure
    """
    # Convert to sets for easy intersection operations
    retrieved_set = set(retrieved_docs)
    relevant_set = set(relevant_docs)
    
    # True Positives: Documents that are both retrieved AND relevant
    true_positives = retrieved_set.intersection(relevant_set)
    tp_count = len(true_positives)
    
    # False Positives: Documents that are retrieved but NOT relevant  
    false_positives = retrieved_set - relevant_set
    fp_count = len(false_positives)
    
    # False Negatives: Documents that are relevant but NOT retrieved
    false_negatives = relevant_set - retrieved_set
    fn_count = len(false_negatives)
    
    print("=== CONFUSION MATRIX ANALYSIS ===")
    print(f"True Positives (TP):  {list(true_positives)} -> Count: {tp_count}")
    print(f"False Positives (FP): {list(false_positives)} -> Count: {fp_count}")
    print(f"False Negatives (FN): {list(false_negatives)} -> Count: {fn_count}")
    print()
    
    # Calculate Precision
    precision = tp_count / (tp_count + fp_count) if (tp_count + fp_count) > 0 else 0
    
    # Calculate Recall
    recall = tp_count / (tp_count + fn_count) if (tp_count + fn_count) > 0 else 0
    
    # Calculate F-measure
    f_measure = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    
    return precision, recall, f_measure, tp_count, fp_count, fn_count

# Calculate metrics for our example
precision, recall, f_measure, tp, fp, fn = calculate_metrics(retrieved_documents, relevant_documents)

print("=== METRIC CALCULATIONS ===")
print(f"Precision = TP / (TP + FP) = {tp} / ({tp} + {fp}) = {tp}/{tp+fp} = {precision:.3f}")
print(f"Recall    = TP / (TP + FN) = {tp} / ({tp} + {fn}) = {tp}/{tp+fn} = {recall:.3f}")
print(f"F-measure = 2 * P * R / (P + R) = 2 * {precision:.3f} * {recall:.3f} / ({precision:.3f} + {recall:.3f}) = {f_measure:.3f}")
print()

print("=== FINAL RESULTS ===")
print(f"Precision: {precision:.3f} ({precision*100:.1f}%)")
print(f"Recall:    {recall:.3f} ({recall*100:.1f}%)")
print(f"F-measure: {f_measure:.3f} ({f_measure*100:.1f}%)")

=== CONFUSION MATRIX ANALYSIS ===
True Positives (TP):  ['doc7', 'doc3', 'doc1'] -> Count: 3
False Positives (FP): ['doc6', 'doc2', 'doc4'] -> Count: 3
False Negatives (FN): ['doc5', 'doc9'] -> Count: 2

=== METRIC CALCULATIONS ===
Precision = TP / (TP + FP) = 3 / (3 + 3) = 3/6 = 0.500
Recall    = TP / (TP + FN) = 3 / (3 + 2) = 3/5 = 0.600
F-measure = 2 * P * R / (P + R) = 2 * 0.500 * 0.600 / (0.500 + 0.600) = 0.545

=== FINAL RESULTS ===
Precision: 0.500 (50.0%)
Recall:    0.600 (60.0%)
F-measure: 0.545 (54.5%)
