# FiftyOne Text Evaluation Metrics Plugin

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harpreetsahota204/text_evaluation_metrics/blob/main/text_evaluation_demo.ipynb)

This notebook demonstrates the **Text Evaluation Metrics** plugin for FiftyOne.

## Available Metrics

1. **ANLS** - Average Normalized Levenshtein Similarity (primary VLM OCR metric)
2. **Exact Match** - Binary exact match accuracy
3. **Normalized Similarity** - Continuous similarity without threshold
4. **CER** - Character Error Rate
5. **WER** - Word Error Rate

üîó [GitHub Repository](https://github.com/harpreetsahota204/text_evaluation_metrics)

## Installation

First, install FiftyOne and the required dependencies.

In [None]:
!pip install -q fiftyone python-Levenshtein

### Install the Plugin

Download and install the plugin directly from GitHub.

In [None]:
!fiftyone plugins download https://github.com/harpreetsahota204/text_evaluation_metrics

## Setup

Import libraries and check versions.

In [None]:
import fiftyone as fo
import fiftyone.operators as foo


### Download a Dataset

In [None]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

dataset = load_from_hub("harpreetsahota/visual_ai_at_neurips2025_jina_with_ocr", persistent=True)

---
## 1. Compute ANLS

**ANLS** is the primary metric for VLM OCR evaluation.

In [None]:
anls_op = foo.get_operator("@harpreetsahota/text-evaluation-metrics/compute_anls")

result = anls_op(
    dataset, 
    pred_field="md_abstract", 
    gt_field="abstract", 
    output_field="md_anls"
    threshold=0.5,
    delegate=False
    )


---
## 2. Compute Exact Match

**Exact Match** returns 1.0 only for perfect matches.

In [None]:
em_op = foo.get_operator("@harpreetsahota/text-evaluation-metrics/compute_exact_match")

result = em_op(
    dataset, 
    pred_field="md_abstract", 
    gt_field="abstract",
    output_field="md_exact_match",
    delegate=False
    )


---
## 3. Compute Normalized Similarity

**Normalized Similarity** provides continuous scores without threshold.

In [None]:
sim_op = foo.get_operator("@harpreetsahota/text-evaluation-metrics/compute_normalized_similarity")

result = sim_op(
    dataset, 
    pred_field="md_abstract", 
    gt_field="abstract",
    output_field="md_norm_sim",
    delegate=False
    )


---
## 4. Compute CER (Character Error Rate)

**CER** measures character-level edits needed.

In [None]:
cer_op = foo.get_operator("@harpreetsahota/text-evaluation-metrics/compute_cer")

result = cer_op(
    dataset, 
    pred_field="md_abstract", 
    gt_field="abstract",
    output_field="md_cer",
    delegate=False
    )


---
## 5. Compute WER (Word Error Rate)

**WER** measures word-level edits needed.

In [None]:
wer_op = foo.get_operator("@harpreetsahota/text-evaluation-metrics/compute_wer")

result = wer_op(
    dataset, 
    pred_field="md_abstract", 
    gt_field="abstract",
    output_field="md_wer"
    delegate=False
    )


---
## Conclusion

This notebook demonstrated all 5 text evaluation metrics.

### Key Takeaways
- **ANLS** is the primary metric for VLM OCR tasks
- **Exact Match** provides a strict accuracy baseline
- **Normalized Similarity** helps understand error distribution
- **CER/WER** provide detailed error analysis

### Resources
- üìö [Plugin Documentation](https://github.com/harpreetsahota204/text_evaluation_metrics)
- üåê [FiftyOne Docs](https://docs.voxel51.com/)

**Author:** Harpreet Sahota | **License:** Apache 2.0