<a href="https://colab.research.google.com/github/patrickfleith/datapipes/blob/main/Evaluation_101.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here are all the evaluate metrics provided by huggingface `evaluate`.

In [13]:
!pip install evaluate --quiet
# You can safely ignore ERROR related to requirements to fsspec==2024.10.0 etc.

# Exact Match
This is a straightforward metric, although you could be surprised.
We used the `evaluate` library from 🤗 HuggingFace.

With `evaluate` it generally works as follow:
- A list of **references**. The ground truth labels 🙏
- A list of **predictions**. The labels of the LLM

In [4]:
from evaluate import load
exact_match_metric = load("exact_match")

### Exactly Exact
Here all the words are the same but there is **only 1 perfect match over 4**

In [25]:
references = ["the cat", "theater", "YELLING", "agent007"] # the ground truth labels
predictions = ["cat", "theater", "yelling?", "agent"] # what's generated from your LLM

results = exact_match_metric.compute(
    references=references,
    predictions=predictions,
)

print(round(results["exact_match"],2))

0.25


## Exactly except
- `regexes_to_ignore`: Regex expressions of characters to ignore when calculating the exact matches. Note: these regexes are removed from the input data before the changes based on the options below (e.g. ignore_case,      ignore_punctuation, ignore_numbers) are applied.

In [24]:
results = exact_match_metric.compute(
    references=references,
    predictions=predictions,
    regexes_to_ignore=["the "]
)

print(round(results["exact_match"],2))

0.5


# Quasy exactly
You also have the following option to ignore:
- **`ignore_case`**: Boolean, defaults to False. If true, turns everything to lowercase so that capitalization differences are ignored.
- **`ignore_punctuation`**: Boolean, defaults to False. If true, removes all punctuation before comparing predictions and references.
- **`ignore_numbers`**: Boolean, defaults to False. If true, removes all punctuation before comparing predictions and references.

In [26]:
results = exact_match_metric.compute(
    references=references,
    predictions=predictions,
    regexes_to_ignore=["the "],
    ignore_case=True,
    ignore_punctuation=True,
    ignore_numbers=False
)

print(round(results["exact_match"],2))

0.75


# Example with full sentences
With exact match you'll probably not compare individual words but full sentences or completions.

So here a full walkthrough

In [32]:
from evaluate import load
exact_match_metric = load("exact_match")

references = [
    "I like to eat chocolate with my coffee 😀",
    "Tomorrow, I'll graduate!! So excited"
]

predictions = [
    "I like chocolate with coffee",
    "Tomorrow, I'll graduate! So excited"
]

results = exact_match_metric.compute(
    references=references,
    predictions=predictions,
    ignore_case=True,
    ignore_punctuation=True,
    ignore_numbers=True
)

print(round(results["exact_match"],2))

0.5
