# Span Evaluator Example
This notebook demonstrates how to use the `SpanEvaluator` class to create spans from a DataFrame of tokens and evaluate predictions.  
The evaluation is performed at the **span level** using token-level Intersection over Union (IoU), which allows for partial matches between predicted and annotated entities.

In [1]:
# Import required libraries
import pandas as pd
from presidio_evaluator.evaluation.span_evaluator import SpanEvaluator
from presidio_evaluator.data_objects import Span

stanza and spacy_stanza are not installed
Flair is not installed by default


## Example DataFrame
Below is a sample DataFrame representing tokenized text.  
Each row corresponds to a token, with columns for:
- `sentence_id`: The sentence this token belongs to.
- `token`: The text of the token.
- `annotation`: The ground truth entity label for the token.
- `prediction`: The predicted entity label for the token.
- `start`: The character start index of the token in the sentence.

This data is designed so that some annotation spans and prediction spans only partially overlap, which will result in evaluation metrics (precision, recall, F1) between 0 and 1.

- In sentence 1, "John Doe" is annotated as a PERSON, but only "John" is predicted as PERSON.
- "New York City" is annotated as LOCATION, but only "New" and "City" are predicted as LOCATION (not "York").
- In sentence 2, "Jane Smith" is annotated as PERSON, but only "Smith" is predicted as PERSON.
- "Paris" is correctly annotated and predicted as LOCATION.

In [2]:
# Create a sample DataFrame with partially matching annotations and predictions
# Example: annotation error splits 'John Doe' into two PERSON spans, prediction is correct
sample_data = [
    {"sentence_id": 1, "token": "Hello", "annotation": "O", "prediction": "O", "is_entity_start": False},
    {"sentence_id": 1, "token": "Mr.", "annotation": "O", "prediction": "O", "is_entity_start": False},
    {"sentence_id": 1, "token": "John", "annotation": "PERSON", "prediction": "PERSON", "is_entity_start": True},
    {"sentence_id": 1, "token": "Doe", "annotation": "PERSON", "prediction": "O", "is_entity_start": False},  # Should be False if contiguous
    {"sentence_id": 1, "token": "went", "annotation": "O", "prediction": "O", "is_entity_start": False},
    {"sentence_id": 1, "token": "to", "annotation": "O", "prediction": "O", "is_entity_start": False},
    {"sentence_id": 1, "token": "New", "annotation": "LOCATION", "prediction": "LOCATION", "is_entity_start": True},
    {"sentence_id": 1, "token": "York", "annotation": "LOCATION", "prediction": "LOCATION", "is_entity_start": False},
    {"sentence_id": 1, "token": "City", "annotation": "LOCATION", "prediction": "O", "is_entity_start": False},
]
df = pd.DataFrame(sample_data)
df

Unnamed: 0,sentence_id,token,annotation,prediction,is_entity_start
0,1,Hello,O,O,False
1,1,Mr.,O,O,False
2,1,John,PERSON,PERSON,True
3,1,Doe,PERSON,O,False
4,1,went,O,O,False
5,1,to,O,O,False
6,1,New,LOCATION,LOCATION,True
7,1,York,LOCATION,LOCATION,False
8,1,City,LOCATION,O,False


## Create Spans from Tokens
The `SpanEvaluator` reconstructs entity spans from token-level labels.  
Adjacent tokens with the same entity label are merged into a single span.  
This is important for evaluating at the entity (span) level rather than the token level.

In [3]:
# Initialize the SpanEvaluator
span_evaluator = SpanEvaluator()

# Create annotation spans
annotation_spans = span_evaluator._create_spans(df, "annotation")

# Create prediction spans
prediction_spans = span_evaluator._create_spans(df, "prediction")

# Display the created spans
print("Annotation Spans:")
for span in annotation_spans:
    print(span)

print("\nPrediction Spans:")
for span in prediction_spans:
    print(span)

Annotation Spans:
Span(type: PERSON, value: ['John', 'Doe'], char_span: [2: 4])
Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [6: 9])

Prediction Spans:
Span(type: PERSON, value: ['John'], char_span: [2: 3])
Span(type: LOCATION, value: ['New', 'York'], char_span: [6: 8])


## Evaluate Predictions
The `evaluate` method compares the annotation spans and prediction spans using token-level IoU.  
For each annotation span, it finds the best-matching prediction span of the same entity type.  
If the IoU is above the threshold (default 0.5), it is counted as a true positive.  
Otherwise, it is a false negative (missed entity), and unmatched predictions are counted as false positives.

The method returns:
- Overall precision, recall, and F1 score
- Per-entity-type metrics

In [4]:
# Evaluate the predictions
results = span_evaluator.evaluate(df)
results

{'precision': 0.0,
 'recall': 0.0,
 'f_beta': 0.0,
 'per_type': {'PERSON': {'precision': 0.0, 'recall': 0.0, 'f_beta': 0.0},
  'LOCATION': {'precision': 0.0, 'recall': 0.0, 'f_beta': 0.0}}}

## Pairwise Span IoU Table for Error Analysis
You can also compute the IoU for **all pairs** of annotation and prediction spans using `span_pairwise_iou_df`.  
This is useful for detailed analysis and debugging, as it shows how well each predicted span overlaps with each annotated span.
You can inspect the actual span objects and their IoU values for further analysis.

You have now seen how to use the SpanEvaluator to create spans and evaluate predictions from a token-level DataFrame.

In [5]:
iou_results = span_evaluator.span_pairwise_iou_df(df)

In [6]:
iou_results

Unnamed: 0,sentence_id,annotation_span,prediction_span,ann_entity,ann_start,ann_end,pred_entity,pred_start,pred_end,iou
0,1,"Span(type: PERSON, value: ['John', 'Doe'], cha...","Span(type: PERSON, value: ['John'], char_span:...",PERSON,2,4,PERSON,2,3,0.5
1,1,"Span(type: PERSON, value: ['John', 'Doe'], cha...","Span(type: LOCATION, value: ['New', 'York'], c...",PERSON,2,4,LOCATION,6,8,0.0
2,1,"Span(type: LOCATION, value: ['New', 'York', 'C...","Span(type: PERSON, value: ['John'], char_span:...",LOCATION,6,9,PERSON,2,3,0.0
3,1,"Span(type: LOCATION, value: ['New', 'York', 'C...","Span(type: LOCATION, value: ['New', 'York'], c...",LOCATION,6,9,LOCATION,6,8,0.666667


In [7]:
iou_results["annotation_span"].tolist()

[Span(type: PERSON, value: ['John', 'Doe'], char_span: [2: 4]),
 Span(type: PERSON, value: ['John', 'Doe'], char_span: [2: 4]),
 Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [6: 9]),
 Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [6: 9])]

In [8]:
iou_results["prediction_span"].tolist()

[Span(type: PERSON, value: ['John'], char_span: [2: 3]),
 Span(type: LOCATION, value: ['New', 'York'], char_span: [6: 8]),
 Span(type: PERSON, value: ['John'], char_span: [2: 3]),
 Span(type: LOCATION, value: ['New', 'York'], char_span: [6: 8])]

In [10]:
i=3
iou_results["annotation_span"].iloc[i]

Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [6: 9])

In [11]:
iou_results["prediction_span"].iloc[i]

Span(type: LOCATION, value: ['New', 'York'], char_span: [6: 8])

In [12]:
iou_results["prediction_span"].iloc[3]

Span(type: LOCATION, value: ['New', 'York'], char_span: [6: 8])

### Example: Why Merging Spans is Important
This example demonstrates a scenario where annotation errors cause an entity to be split into multiple spans, while the prediction is correct. Merging adjacent spans is necessary for fair evaluation.

In [13]:
import pandas as pd

# Example: annotation error splits 'John Doe' into two PERSON spans, prediction is correct
example_data = [
    {"sentence_id": 1, "token": "Hello", "annotation": "O", "prediction": "O", "is_entity_start": False},
    {"sentence_id": 1, "token": "Mr.", "annotation": "O", "prediction": "O", "is_entity_start": False},
    {"sentence_id": 1, "token": "John", "annotation": "PERSON", "prediction": "PERSON", "is_entity_start": True},
    {"sentence_id": 1, "token": "Doe", "annotation": "PERSON", "prediction": "PERSON", "is_entity_start": True},  # Should be False if contiguous
    {"sentence_id": 1, "token": "went", "annotation": "O", "prediction": "O", "is_entity_start": False},
    {"sentence_id": 1, "token": "home", "annotation": "O", "prediction": "O", "is_entity_start": False},
]
example_df = pd.DataFrame(example_data)
example_df

Unnamed: 0,sentence_id,token,annotation,prediction,is_entity_start
0,1,Hello,O,O,False
1,1,Mr.,O,O,False
2,1,John,PERSON,PERSON,True
3,1,Doe,PERSON,PERSON,True
4,1,went,O,O,False
5,1,home,O,O,False


In this example, the annotation mistakenly splits 'John' and 'Doe' into two PERSON spans due to the `is_entity_start` flag. The prediction correctly identifies 'John Doe' as a single PERSON span. Merging adjacent spans in the annotation is necessary for a fair IoU comparison.

## BIO Scheme Input

In [None]:
sample_data_bio = [
    {"sentence_id": 1, "token": "Hello", "annotation": "O", "prediction": "O"},
    {"sentence_id": 1, "token": "Mr.", "annotation": "O", "prediction": "O"},
    {"sentence_id": 1, "token": "John", "annotation": "B-PERSON", "prediction": "B-PERSON"},
    {"sentence_id": 1, "token": "Doe", "annotation": "I-PERSON", "prediction": "O"},
    {"sentence_id": 1, "token": "went", "annotation": "O", "prediction": "O"},
    {"sentence_id": 1, "token": "to", "annotation": "O", "prediction": "O"},
    {"sentence_id": 1, "token": "New", "annotation": "B-LOCATION", "prediction": "B-LOCATION"},
    {"sentence_id": 1, "token": "York", "annotation": "I-LOCATION", "prediction": "O"},
    {"sentence_id": 1, "token": "City", "annotation": "I-LOCATION", "prediction": "B-LOCATION"}
]

df_bio = pd.DataFrame(sample_data_bio)
df_bio

In [None]:
# Initialize SpanEvaluator with BIO schema
span_evaluator_bio = SpanEvaluator(schema="BIO")

# Create and display spans
annotation_spans = span_evaluator_bio._create_spans(df_bio, "annotation")
prediction_spans = span_evaluator_bio._create_spans(df_bio, "prediction")

print("Annotation Spans:")
for span in annotation_spans:
    print(span)

print("\nPrediction Spans:")
for span in prediction_spans:
    print(span)

# Evaluate
results = span_evaluator_bio.evaluate(df_bio)
print("\nEvaluation Results:")
print(results)

In [None]:
# Evaluate the predictions
results = span_evaluator_bio.evaluate(df_bio)
results

In [None]:
iou_results = span_evaluator_bio.span_pairwise_iou_df(df_bio)
iou_results

In [None]:
iou_results["annotation_span"].tolist()

In [None]:
iou_results["prediction_span"].tolist()