# Span Evaluator Example
This notebook demonstrates how to use the `SpanEvaluator` class to create spans from a DataFrame of tokens and evaluate predictions.  
The evaluation is performed at the **span level** using token-level Intersection over Union (IoU), which allows for partial matches between predicted and annotated entities.

In [1]:
# Import required libraries
import pandas as pd
from presidio_evaluator.evaluation.span_evaluator import SpanEvaluator
from presidio_evaluator.data_objects import Span

stanza and spacy_stanza are not installed
Flair is not installed by default


## Example DataFrame
Below is a sample DataFrame representing tokenized text.  
Each row corresponds to a token, with columns for:
- `sentence_id`: The sentence this token belongs to.
- `token`: The text of the token.
- `annotation`: The ground truth entity label for the token.
- `prediction`: The predicted entity label for the token.
- `start`: The character start index of the token in the sentence.

This data is designed so that some annotation spans and prediction spans only partially overlap, which will result in evaluation metrics (precision, recall, F1) between 0 and 1.

- In sentence 1, "John Doe" is annotated as a PERSON, but only "John" is predicted as PERSON.
- "New York City" is annotated as LOCATION, but only "New" and "City" are predicted as LOCATION (not "York").
- In sentence 2, "Jane Smith" is annotated as PERSON, but only "Smith" is predicted as PERSON.
- "Paris" is correctly annotated and predicted as LOCATION.

In [2]:
# Create a sample DataFrame with partially matching annotations and predictions
sample_data = [
    {"sentence_id": 1, "token": "John", "annotation": "PERSON", "prediction": "PERSON", "start": 0},
    {"sentence_id": 1, "token": "Doe", "annotation": "PERSON", "prediction": "O", "start": 5},
    {"sentence_id": 1, "token": "lives", "annotation": "O", "prediction": "O", "start": 9},
    {"sentence_id": 1, "token": "in", "annotation": "O", "prediction": "O", "start": 15},
    {"sentence_id": 1, "token": "New", "annotation": "LOCATION", "prediction": "LOCATION", "start": 18},
    {"sentence_id": 1, "token": "York", "annotation": "LOCATION", "prediction": "O", "start": 22},
    {"sentence_id": 1, "token": "City", "annotation": "LOCATION", "prediction": "LOCATION", "start": 27},
    {"sentence_id": 1, "token": ".", "annotation": "O", "prediction": "O", "start": 31},
    {"sentence_id": 2, "token": "Jane", "annotation": "PERSON", "prediction": "O", "start": 33},
    {"sentence_id": 2, "token": "Smith", "annotation": "PERSON", "prediction": "PERSON", "start": 38},
    {"sentence_id": 2, "token": "visited", "annotation": "O", "prediction": "O", "start": 44},
    {"sentence_id": 2, "token": "Paris", "annotation": "LOCATION", "prediction": "LOCATION", "start": 52},
    {"sentence_id": 2, "token": "last", "annotation": "O", "prediction": "O", "start": 58},
    {"sentence_id": 2, "token": "summer", "annotation": "O", "prediction": "O", "start": 63},
    {"sentence_id": 2, "token": ".", "annotation": "O", "prediction": "O", "start": 69},
]

df = pd.DataFrame(sample_data)
df

Unnamed: 0,sentence_id,token,annotation,prediction,start
0,1,John,PERSON,PERSON,0
1,1,Doe,PERSON,O,5
2,1,lives,O,O,9
3,1,in,O,O,15
4,1,New,LOCATION,LOCATION,18
5,1,York,LOCATION,O,22
6,1,City,LOCATION,LOCATION,27
7,1,.,O,O,31
8,2,Jane,PERSON,O,33
9,2,Smith,PERSON,PERSON,38


## Create Spans from Tokens
The `SpanEvaluator` reconstructs entity spans from token-level labels.  
Adjacent tokens with the same entity label are merged into a single span.  
This is important for evaluating at the entity (span) level rather than the token level.

In [3]:
# Initialize the SpanEvaluator
span_evaluator = SpanEvaluator()

# Create annotation spans
annotation_spans = span_evaluator._create_spans(df, "annotation")

# Create prediction spans
prediction_spans = span_evaluator._create_spans(df, "prediction")

# Display the created spans
print("Annotation Spans:")
for span in annotation_spans:
    print(span)

print("\nPrediction Spans:")
for span in prediction_spans:
    print(span)

Annotation Spans:
Span(type: PERSON, value: ['John', 'Doe'], char_span: [0: 9])
Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [18: 31])
Span(type: PERSON, value: ['Jane', 'Smith'], char_span: [33: 44])
Span(type: LOCATION, value: ['Paris'], char_span: [52: 58])

Prediction Spans:
Span(type: PERSON, value: ['John'], char_span: [0: 5])
Span(type: LOCATION, value: ['New'], char_span: [18: 22])
Span(type: LOCATION, value: ['City'], char_span: [27: 31])
Span(type: PERSON, value: ['Smith'], char_span: [38: 44])
Span(type: LOCATION, value: ['Paris'], char_span: [52: 58])


## Evaluate Predictions
The `evaluate` method compares the annotation spans and prediction spans using token-level IoU.  
For each annotation span, it finds the best-matching prediction span of the same entity type.  
If the IoU is above the threshold (default 0.5), it is counted as a true positive.  
Otherwise, it is a false negative (missed entity), and unmatched predictions are counted as false positives.

The method returns:
- Overall precision, recall, and F1 score
- Per-entity-type metrics

In [4]:
# Evaluate the predictions
results = span_evaluator.evaluate(df)
results

{'precision': 1.0,
 'recall': 0.75,
 'f1': 0.8571428571428571,
 'per_type': {'PERSON': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0},
  'LOCATION': {'precision': 1.0, 'recall': 0.5, 'f1': 0.6666666666666666}},
 'error_analysis': {'low_iou_LOCATION': 1}}

## Pairwise Span IoU Table for Error Analysis
You can also compute the IoU for **all pairs** of annotation and prediction spans using `span_pairwise_iou_df`.  
This is useful for detailed analysis and debugging, as it shows how well each predicted span overlaps with each annotated span.
You can inspect the actual span objects and their IoU values for further analysis.

You have now seen how to use the SpanEvaluator to create spans and evaluate predictions from a token-level DataFrame.

In [5]:
iou_results = span_evaluator.span_pairwise_iou_df(df)

In [6]:
iou_results

Unnamed: 0,sentence_id,annotation_span,prediction_span,ann_entity,ann_start,ann_end,pred_entity,pred_start,pred_end,iou
0,1,"Span(type: PERSON, value: ['John', 'Doe'], cha...","Span(type: PERSON, value: ['John'], char_span:...",PERSON,0,9,PERSON,0,5,0.5
1,1,"Span(type: PERSON, value: ['John', 'Doe'], cha...","Span(type: LOCATION, value: ['New'], char_span...",PERSON,0,9,LOCATION,18,22,0.0
2,1,"Span(type: PERSON, value: ['John', 'Doe'], cha...","Span(type: LOCATION, value: ['City'], char_spa...",PERSON,0,9,LOCATION,27,31,0.0
3,1,"Span(type: LOCATION, value: ['New', 'York', 'C...","Span(type: PERSON, value: ['John'], char_span:...",LOCATION,18,31,PERSON,0,5,0.0
4,1,"Span(type: LOCATION, value: ['New', 'York', 'C...","Span(type: LOCATION, value: ['New'], char_span...",LOCATION,18,31,LOCATION,18,22,0.333333
5,1,"Span(type: LOCATION, value: ['New', 'York', 'C...","Span(type: LOCATION, value: ['City'], char_spa...",LOCATION,18,31,LOCATION,27,31,0.333333
6,2,"Span(type: PERSON, value: ['Jane', 'Smith'], c...","Span(type: PERSON, value: ['Smith'], char_span...",PERSON,33,44,PERSON,38,44,0.5
7,2,"Span(type: PERSON, value: ['Jane', 'Smith'], c...","Span(type: LOCATION, value: ['Paris'], char_sp...",PERSON,33,44,LOCATION,52,58,0.0
8,2,"Span(type: LOCATION, value: ['Paris'], char_sp...","Span(type: PERSON, value: ['Smith'], char_span...",LOCATION,52,58,PERSON,38,44,0.0
9,2,"Span(type: LOCATION, value: ['Paris'], char_sp...","Span(type: LOCATION, value: ['Paris'], char_sp...",LOCATION,52,58,LOCATION,52,58,1.0


In [8]:
iou_results["annotation_span"].tolist()

[Span(type: PERSON, value: ['John', 'Doe'], char_span: [0: 9]),
 Span(type: PERSON, value: ['John', 'Doe'], char_span: [0: 9]),
 Span(type: PERSON, value: ['John', 'Doe'], char_span: [0: 9]),
 Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [18: 31]),
 Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [18: 31]),
 Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [18: 31]),
 Span(type: PERSON, value: ['Jane', 'Smith'], char_span: [33: 44]),
 Span(type: PERSON, value: ['Jane', 'Smith'], char_span: [33: 44]),
 Span(type: LOCATION, value: ['Paris'], char_span: [52: 58]),
 Span(type: LOCATION, value: ['Paris'], char_span: [52: 58])]

In [None]:
iou_results["prediction_span"].tolist()

In [13]:
i=5
iou_results["annotation_span"].iloc[i]

Span(type: LOCATION, value: ['New', 'York', 'City'], char_span: [18: 31])

In [14]:
iou_results["prediction_span"].iloc[i]

Span(type: LOCATION, value: ['City'], char_span: [27: 31])

In [None]:
iou_results["prediction_span"].iloc[3]