Implementing ROUGE Score
Medium
Machine Learning

Implement the ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation) score to evaluate the quality of a generated summary by comparing it to a reference summary. ROUGE-1 focuses on unigram (single word) overlaps between the candidate and reference texts. Your task is to write a function that computes the ROUGE-1 recall, precision, and F1 score based on the number of overlapping unigrams.

Example:
Input:
rouge_1_score('the cat sat on the mat', 'the cat is on the mat')
Output:
{'precision': 0.8333333333333334, 'recall': 0.8333333333333334, 'f1': 0.8333333333333334}
Reasoning:
The reference text 'the cat sat on the mat' has 6 tokens, and the candidate text 'the cat is on the mat' has 6 tokens. The overlapping words are: 'the' (appears 2 times in reference, 2 times in candidate, so min(2,2)=2 overlap), 'cat' (1,1 → 1 overlap), 'on' (1,1 → 1 overlap), and 'mat' (1,1 → 1 overlap). Total overlap = 2+1+1+1 = 5. Precision = 5/6 ≈ 0.833 (5 overlapping words out of 6 candidate words). Recall = 5/6 ≈ 0.833 (5 overlapping words out of 6 reference words). F1 = 2×(0.833×0.833)/(0.833+0.833) = 0.833 since precision equals recall.

In [None]:
from collections import Counter

def rouge_1_score(reference: str, candidate: str):
    # lowercase + whitespace tokenization (simple ROUGE-1)
    ref_tokens = reference.lower().split()
    cand_tokens = candidate.lower().split()

    if len(ref_tokens) == 0 and len(cand_tokens) == 0:
        return {"precision": 0.0, "recall": 0.0, "f1": 0.0}

    ref_counts = Counter(ref_tokens)
    cand_counts = Counter(cand_tokens)

    # overlap counts = sum of min(ref[w], cand[w]) for all shared unigrams
    overlap = 0
    for word in ref_counts:
        if word in cand_counts:
            # Step 2. Add the minimum count (so duplicates are handled correctly)
            overlap += min(ref_counts[word], cand_counts[word])

    precision = overlap / len(cand_tokens) if cand_tokens else 0.0
    recall    = overlap / len(ref_tokens)  if ref_tokens  else 0.0
    f1 = 0.0 if precision + recall == 0 else 2 * precision * recall / (precision + recall)

    return {"precision": precision, "recall": recall, "f1": f1}

# Example
print(rouge_1_score('the cat sat on the mat', 'the cat is on the mat'))
# {'precision': 0.8333333333333334, 'recall': 0.8333333333333334, 'f1': 0.8333333333333334}