# Named-entity Recognition (NER) 

Named entity recognition is a fundamental task in information extraction from textual documents. While named entities originally corresponded to real-world entities with names (named entities), this concept has been extended to any type of information: it is possible to extract chemical molecules, product numbers, amounts, addresses, etc. In this practical assignment, we will use several named entity extraction libraries in French on a small corpus. The objective is not to train the best possible model, but to test the use of each of these libraries.



## The AdminSet dataset
The AdminSet dataset is a corpus of administrative documents in French produced by automatic character recognition and manually annotated with named entities. This corpus is quite difficult because the document recognition process produces noisy text (errors due to layout, recognition, fonts, etc.).

The paper describing the dataset is available [here](https://hal.science/hal-04855066v1/file/AdminSet_et_AdminBERT__version___preprint.pdf).

The corpus is available on HuggingFace: [Adminset-NER](https://huggingface.co/datasets/taln-ls2n/Adminset-NER).

In [1]:
from datasets import load_dataset
ds = load_dataset('taln-ls2n/Adminset-NER')
print(ds)

  from .autonotebook import tqdm as notebook_tqdm


DatasetDict({
    train: Dataset({
        features: ['tokens', 'ner_tags'],
        num_rows: 729
    })
    validation: Dataset({
        features: ['tokens', 'ner_tags'],
        num_rows: 85
    })
})


#### Question
> * Compute descriptive statistics on the texts  for each split (train, dev)
> * Compute descriptive statistics on the entities for each split (train, dev)
> * Compare with the statistics reported in the paper (Table 2)
> * Display a couple of random texts with their entities

In [2]:
import numpy as np
import pandas as pd

train_df = pd.DataFrame(ds['train'])
train_df.head()

Unnamed: 0,tokens,ner_tags
0,"[fin, Proc√®s-Verbal, Conseil, communautaire, d...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
1,"[Monsieur, MORLET, excuse, Monsieur, Christoph...","[B-PER, I-PER, O, B-PER, I-PER, I-PER, O, O, O..."
2,"[Monsieur, MORLET, annonce, le, d√©c√®s, de, Mon...","[B-PER, I-PER, O, O, O, O, B-PER, I-PER, I-PER..."
3,"[Commentaires, ,, d√©bat, Constatant, qu'il, n'...","[O, O, O, O, O, O, O, O, O, O, O, B-PER, I-PER..."
4,"[Page, 4, sur, 15, <, page, >, 4, <, /, page, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."


In [3]:
val_df = pd.DataFrame(ds['validation'])
val_df.head()

Unnamed: 0,tokens,ner_tags
0,"[et, L‚ÄôOffice, Communautaire, d‚ÄôAnimations, et...","[O, B-ORG, I-ORG, I-ORG, I-ORG, I-ORG, I-ORG, ..."
1,"[Sign√©, le, 22, f√©vrier, 2024, Re√ßu, au, Contr...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
2,"[Re√ßu, au, Contr√¥le, de, l√©galit√©, le, 12, d√©c...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
3,"[Etaient, absents, et, repr√©sent√©s, Mesdames, ...","[O, O, O, O, O, O, O, O, B-PER, I-PER, O, O, B..."
4,"[Commune, d'Ollioules, -, Departement, du, Var...","[B-LOC, I-LOC, O, B-LOC, I-LOC, I-LOC, O, O, O..."


In [4]:
train_df.shape, val_df.shape

((729, 2), (85, 2))

In [6]:
# Compute statistics on the number of token in train and validation : min, max, mean std, median

import numpy as np
from collections import Counter
import random

# for train set
train_df["n_tokens"] = train_df["tokens"].apply(len)
train_df.head()

Unnamed: 0,tokens,ner_tags,n_tokens
0,"[fin, Proc√®s-Verbal, Conseil, communautaire, d...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...",63
1,"[Monsieur, MORLET, excuse, Monsieur, Christoph...","[B-PER, I-PER, O, B-PER, I-PER, I-PER, O, O, O...",24
2,"[Monsieur, MORLET, annonce, le, d√©c√®s, de, Mon...","[B-PER, I-PER, O, O, O, O, B-PER, I-PER, I-PER...",31
3,"[Commentaires, ,, d√©bat, Constatant, qu'il, n'...","[O, O, O, O, O, O, O, O, O, O, O, B-PER, I-PER...",18
4,"[Page, 4, sur, 15, <, page, >, 4, <, /, page, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...",41


In [7]:
train_df["n_tokens"].describe()

count    729.000000
mean      63.367627
std       52.705843
min       15.000000
25%       30.000000
50%       45.000000
75%       75.000000
max      379.000000
Name: n_tokens, dtype: float64

In [9]:
print(train_df["n_tokens"].median())

45.0


In [12]:
# for validation set
val_df["n_tokens"] = val_df["tokens"].apply(len)

print("Median:", val_df["n_tokens"].median())
val_df["n_tokens"].describe()

Median: 50.0


count     85.000000
mean      79.835294
std       68.679006
min       19.000000
25%       35.000000
50%       50.000000
75%       86.000000
max      352.000000
Name: n_tokens, dtype: float64

**Table2 of the paper**

<img src="images/paper_table2.png" width="500">


In [16]:
train_tags = train_df["ner_tags"].explode()

# remove "O"
train_entities = train_tags[train_tags != "O"]

print("Total entity tokens:", len(train_entities))
print("Number of entity labels:", train_entities.nunique())
print(train_entities.value_counts().sort_values(ascending=False))


Total entity tokens: 4983
Number of entity labels: 6
ner_tags
I-ORG    1476
I-PER    1092
B-ORG     770
B-PER     764
B-LOC     454
I-LOC     427
Name: count, dtype: int64


In [17]:
val_tags = val_df["ner_tags"].explode()

val_entities = val_tags[val_tags != "O"]

print("Total entity tokens:", len(val_entities))
print("Number of entity labels:", val_entities.nunique())
print(val_entities.value_counts().sort_values(ascending=False))

Total entity tokens: 694
Number of entity labels: 6
ner_tags
I-ORG    203
I-PER    138
B-PER    124
B-ORG    123
I-LOC     54
B-LOC     52
Name: count, dtype: int64


In [19]:
train_b_entities = train_entities[train_entities.str.startswith("B-")]

print("Total entities:", len(train_b_entities))
print("Entity types:", train_b_entities.str[2:].value_counts().sort_values(ascending=False))


Total entities: 1988
Entity types: ner_tags
ORG    770
PER    764
LOC    454
Name: count, dtype: int64


In [21]:
val_b_entities = val_entities[val_entities.str.startswith("B-")]

print("Total entities:", len(val_b_entities))
print("Entity types:", val_b_entities.str[2:].value_counts())


Total entities: 299
Entity types: ner_tags
PER    124
ORG    123
LOC     52
Name: count, dtype: int64


In [22]:
sample = train_df.sample(2)

for i, row in sample.iterrows():
    print("Text:")
    print(" ".join(row["tokens"]))
    
    print("\nEntities:")
    
    current_entity = []
    current_label = None
    
    for token, tag in zip(row["tokens"], row["ner_tags"]):
        
        if tag.startswith("B-"):
            if current_entity:
                print(" ".join(current_entity), "->", current_label)
            current_entity = [token]
            current_label = tag[2:]
        
        elif tag.startswith("I-"):
            current_entity.append(token)
        
        else:
            if current_entity:
                print(" ".join(current_entity), "->", current_label)
                current_entity = []
                current_label = None
    
    if current_entity:
        print(" ".join(current_entity), "->", current_label)


Text:
Article 3 : En contrepartie , la Direction de l‚ÄôAction Socioculturelle s'engage √† faire b√©n√©ficier aux adh√©rents du Comit√© Social et √âconomique Airbus Defence and Space Toulouse uniquement de tarifs sp√©cifiques ¬´ Comit√© d‚ÄôEntreprise ¬ª pour l‚Äôachat de place de spectacles des centres culturels de la Direction de l‚ÄôAction Socioculturelle de Toulouse .

Entities:
l‚ÄôAction Socioculturelle -> ORG
Comit√© Social et √âconomique Airbus Defence and Space Toulouse -> ORG
l‚ÄôAction Socioculturelle de Toulouse -> ORG
Text:
Si le dossier est retenu , une convention sera √©tablie et communiqu√©e au porteur de projet pr√©cisant : 1 La nature , la dur√©e et l‚Äôobjet de l‚Äôintervention de la CAPB .

Entities:
CAPB . -> ORG


### Creation of the splits

The train_test_split() function from huggingface allow to split a dataset randomly in 2 parts : https://huggingface.co/docs/datasets/v4.5.0/process#split

The ```spacy_utils.py``` file contains functions to save a dataset in text format (```save_text```, usefull for inspection), BIO format (```save_bio```) and spacy format (```save_docbin```).

#### Questions
>* Using the split function, create a train/dev/test split corresponding to the proportions reported in the paper
>* Save the sets in a corpus directory, in text, bio and docbin formats.

In [23]:
from spacy_utils import save_bio, save_text, save_docbin
from datasets import concatenate_datasets

full_ds = concatenate_datasets([ds["train"], ds["validation"]])
full_ds.shape

(814, 2)

In [24]:
# from paper
train_ds = full_ds.select(range(0, 583))
dev_ds   = full_ds.select(range(583, 583 + 146))
test_ds  = full_ds.select(range(583 + 146, 814))

train_ds.shape, dev_ds.shape, test_ds.shape

((583, 2), (146, 2), (85, 2))

In [25]:
from spacy_utils import save_bio, save_text, save_docbin

# save the datasets in different formats
save_text(train_ds, "corpus/train.txt")
save_text(dev_ds, "corpus/dev.txt")
save_text(test_ds, "corpus/test.txt")

save_bio(train_ds, "corpus/train.bio")
save_bio(dev_ds, "corpus/dev.bio")
save_bio(test_ds, "corpus/test.bio")

save_docbin(train_ds, "corpus/train.spacy")
save_docbin(dev_ds, "corpus/dev.spacy")
save_docbin(test_ds, "corpus/test.spacy")

Saving text to corpus/train.txt...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 583/583 [00:00<00:00, 9137.61it/s]


Saved to corpus/train.txt
Saving text to corpus/dev.txt...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 146/146 [00:00<00:00, 13776.57it/s]


Saved to corpus/dev.txt
Saving text to corpus/test.txt...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 85/85 [00:00<00:00, 9229.94it/s]


Saved to corpus/test.txt
Saving BIO text to corpus/train.bio...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 583/583 [00:00<00:00, 9312.33it/s]


Saved to corpus/train.bio
Saving BIO text to corpus/dev.bio...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 146/146 [00:00<00:00, 18575.19it/s]


Saved to corpus/dev.bio
Saving BIO text to corpus/test.bio...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 85/85 [00:00<00:00, 12294.50it/s]

Saved to corpus/test.bio
Creating corpus/train.spacy with 583 examples...



100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 583/583 [00:00<00:00, 3595.48it/s]


Saved to corpus/train.spacy
Creating corpus/dev.spacy with 146 examples...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 146/146 [00:00<00:00, 4539.79it/s]


Saved to corpus/dev.spacy
Creating corpus/test.spacy with 85 examples...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 85/85 [00:00<00:00, 2793.57it/s]

Saved to corpus/test.spacy





### Testing spaCy pre-trained NER models

spaCy comes with a several pretrained models for many languages. For French, 4 models are provided : https://spacy.io/models/fr

To apply a pretrained model to dataset, use : 
- ```nlp = spacy.load(MODEL_NAME)``` to load the model. You need to download it first with "spacy download MODEL_NAME"
- ```DocBin().from_disk()``` to load a dataset in spaCy format from the disk
- ```doc_bin.get_docs(nlp.vocab)``` to convert the dataset from binary to text format
- ```nlp(doc.text)```to apply the NER model to a text

To evaluate the prediction, you can use the spaCy [Scorer](https://spacy.io/api/scorer)
- ```scorer.score(examples)``` where examples is a list of spaCy ```Example(prediction, reference)````

#### Question

>* Using a spaCy pretrained model for French, evaluate its performace for NER prediction on the train, dev and test sets
>* Compare this model to results reported in the paper

In [27]:
import spacy
from spacy.tokens import DocBin
from spacy.scorer import Scorer
from spacy.training import Example
from tqdm import tqdm
from prettytable import PrettyTable

nlp = spacy.load("fr_core_news_md")

In [28]:
# for train dataset
doc_bin = DocBin().from_disk("corpus/train.spacy")
gold_docs = list(doc_bin.get_docs(nlp.vocab))

examples = []

for gold_doc in tqdm(gold_docs):
    pred_doc = nlp(gold_doc.text)
    examples.append(Example(pred_doc, gold_doc))

scorer = Scorer()
train_scores = scorer.score(examples)

print("Precision:", train_scores["ents_p"])
print("Recall:", train_scores["ents_r"])
print("F1:", train_scores["ents_f"])


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 583/583 [00:06<00:00, 93.70it/s] 


Precision: 0.19733252500757806
Recall: 0.4033457249070632
F1: 0.26501119478933444


In [29]:
# for dev dataset
doc_bin = DocBin().from_disk("corpus/dev.spacy")
gold_docs = list(doc_bin.get_docs(nlp.vocab))

examples = []

for gold_doc in tqdm(gold_docs):
    pred_doc = nlp(gold_doc.text)
    examples.append(Example(pred_doc, gold_doc))

scorer = Scorer()
dev_scores = scorer.score(examples)

print("Precision:", dev_scores["ents_p"])
print("Recall:", dev_scores["ents_r"])
print("F1:", dev_scores["ents_f"])


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 146/146 [00:01<00:00, 95.97it/s] 


Precision: 0.1683599419448476
Recall: 0.31016042780748665
F1: 0.21825023518344308


In [30]:
# for testset
doc_bin = DocBin().from_disk("corpus/test.spacy")
gold_docs = list(doc_bin.get_docs(nlp.vocab))

examples = []

for gold_doc in tqdm(gold_docs):
    pred_doc = nlp(gold_doc.text)
    examples.append(Example(pred_doc, gold_doc))

scorer = Scorer()
test_scores = scorer.score(examples)

print("Precision:", test_scores["ents_p"])
print("Recall:", test_scores["ents_r"])
print("F1:", test_scores["ents_f"])

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 85/85 [00:01<00:00, 82.51it/s] 


Precision: 0.1555944055944056
Recall: 0.2976588628762542
F1: 0.20436280137772675


In [31]:
table = PrettyTable()
table.field_names = ["Split", "Precision", "Recall", "F1"]

table.add_row(["Train",
               round(train_scores["ents_p"],3),
               round(train_scores["ents_r"],3),
               round(train_scores["ents_f"],3)])

table.add_row(["Dev",
               round(dev_scores["ents_p"],3),
               round(dev_scores["ents_r"],3),
               round(dev_scores["ents_f"],3)])

table.add_row(["Test",
               round(test_scores["ents_p"],3),
               round(test_scores["ents_r"],3),
               round(test_scores["ents_f"],3)])

print(table)


+-------+-----------+--------+-------+
| Split | Precision | Recall |   F1  |
+-------+-----------+--------+-------+
| Train |   0.197   | 0.403  | 0.265 |
|  Dev  |   0.168   |  0.31  | 0.218 |
|  Test |   0.156   | 0.298  | 0.204 |
+-------+-----------+--------+-------+


### Training a custom spaCy model

The training of a cupstom spaCy NER model can be done both with the command line interface (cli) or in a python script. Using the cli is ususally more optimzed. All the configuration of the training is defined in a coniguration file, which is a good practice for documentation, tracing and reproducibility.

The configuration file can be generated on line using the [Quickstart](https://spacy.io/usage/training#quickstart)

<img src="images/spacy_quickstart.jpg" width="600" >

You can run the training process as a script using the train function (https://spacy.io/usage/training#api-train), specifying the configuration file and the directory in which to save the model as parameters. Once the training is complete, the best and last models are saved in the directory.

#### Question
> * Generate a training configuration file for a NER in French
> * Add the correct path to the training and dev sets generated previously
> * train a NER model
> * Evaluate the model on the train, dev et test sets. Compare to the results reported in the paper.

In [41]:
# train the model
from spacy.cli.train import train

train(
    config_path="config.cfg",
    output_path="training_output",
    use_gpu=0
)


[38;5;4m‚Ñπ Saving to output directory: training_output[0m
[38;5;4m‚Ñπ Using GPU: 0[0m
[1m
[38;5;2m‚úî Initialized pipeline[0m
[1m
[38;5;4m‚Ñπ Pipeline: ['tok2vec', 'ner'][0m
[38;5;4m‚Ñπ Initial learn rate: 0.001[0m
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     31.93    0.45    1.33    0.27    0.00
  0     200         49.08   1895.69   22.43   38.82   15.78    0.22
  1     400        321.35   1407.31   31.31   36.27   27.54    0.31
  1     600         34.73   1288.48   38.43   46.74   32.62    0.38
  2     800         81.49   1377.11   33.94   44.21   27.54    0.34
  3    1000       1257.20   1377.33   43.22   52.69   36.63    0.43
[38;5;2m‚úî Saved pipeline to output directory[0m
training_output/model-last


In [42]:
import spacy

nlp = spacy.load("training_output/model-best")

In [43]:
from spacy.tokens import DocBin

def load_spacy_dataset(path, nlp):
    doc_bin = DocBin().from_disk(path)
    return list(doc_bin.get_docs(nlp.vocab))

train_docs = load_spacy_dataset("corpus/train.spacy", nlp)
dev_docs   = load_spacy_dataset("corpus/dev.spacy", nlp)
test_docs  = load_spacy_dataset("corpus/test.spacy", nlp)

In [44]:
# evaluate the model
from spacy.scorer import Scorer
from spacy.training import Example
from tqdm import tqdm

def evaluate_model(docs, nlp):
    scorer = Scorer()
    examples = []

    for doc in tqdm(docs):
        pred = nlp(doc.text)
        examples.append(Example(pred, doc))

    scores = scorer.score(examples)
    return scores

train_scores = evaluate_model(train_docs, nlp)
dev_scores   = evaluate_model(dev_docs, nlp)
test_scores  = evaluate_model(test_docs, nlp)

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 583/583 [00:19<00:00, 29.38it/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 146/146 [00:04<00:00, 31.76it/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 85/85 [00:03<00:00, 26.61it/s]


In [45]:
from prettytable import PrettyTable

table = PrettyTable()
table.field_names = ["Split", "Precision", "Recall", "F1"]

table.add_row([
    "Train",
    round(train_scores["ents_p"], 3),
    round(train_scores["ents_r"], 3),
    round(train_scores["ents_f"], 3)
])

table.add_row([
    "Dev",
    round(dev_scores["ents_p"], 3),
    round(dev_scores["ents_r"], 3),
    round(dev_scores["ents_f"], 3)
])

table.add_row([
    "Test",
    round(test_scores["ents_p"], 3),
    round(test_scores["ents_r"], 3),
    round(test_scores["ents_f"], 3)
])

print(table)

+-------+-----------+--------+-------+
| Split | Precision | Recall |   F1  |
+-------+-----------+--------+-------+
| Train |   0.733   | 0.729  | 0.731 |
|  Dev  |   0.525   | 0.366  | 0.431 |
|  Test |   0.681   | 0.528  | 0.595 |
+-------+-----------+--------+-------+


In [48]:
# compare with pretrain for f1
nlp_pretrained = spacy.load("fr_core_news_lg")
test_scores_pretrained = evaluate_model(test_docs, nlp_pretrained)
test_scores_pretrained['ents_f']

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 85/85 [00:01<00:00, 66.71it/s]


0.19955654101995565

### Zero-shot NER prediction with GLiNER


[GLiNER](https://github.com/fastino-ai/GLiNER2/tree/main)  is a library that provides models for zero-shot named entity recognition. This means that[structured information extraction](https://github.com/fastino-ai/GLiNER2/blob/main/tutorial/3-json_extraction.md)structured information extraction, which means that the extracted information can be organised in a structured JSON format. GLiNER does not provide the location of entities in the text by default, but you can configure the model to output this information (```include_spans=True```). Finally, GLiNER enables entities to be overlapped and nested, which is not supported by the spaCy scorer. The spaCy [filter_spans](https://spacy.io/api/top-level#util.filter_spans) function can be used to remove overlapping entities for evaluation.

#### Question
> * Define the entities to extract from the text.
> * Apply GLiNER on the dev and test sets
> * Evaluate the models on the dev and test sets and compare to the results reported in the paper.

In [57]:
import numpy as np

all_tags = []
for example in ds["train"]:
    all_tags.extend(example["ner_tags"])

unique_tags = sorted(set(all_tags))
unique_tags

['B-LOC', 'B-ORG', 'B-PER', 'I-LOC', 'I-ORG', 'I-PER', 'O']

In [71]:
from gliner2 import GLiNER2
extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")

from spacy.util import filter_spans
nlp = spacy.blank("fr")  # tokenizer only

doc_bin = DocBin().from_disk("corpus/dev.spacy")
gold_docs = list(doc_bin.get_docs(nlp.vocab))

label_map = {
    "PER": "nom de personne dans un document administratif",
    "ORG": "organisation administrative ou institution",
    "LOC": "lieu g√©ographique ou adresse postale"
}

gliner_labels = list(label_map.values())
reverse_map = {v: k for k, v in label_map.items()}

examples = []

for gold_doc in gold_docs:
    text = gold_doc.text
    
    predictions = extractor.extract_entities(text, gliner_labels, include_spans=True)
    pred_doc = nlp.make_doc(text)

    spans = []
    for gliner_label, entities in predictions["entities"].items():
        spacy_label = reverse_map.get(gliner_label)
        if not spacy_label:
            continue
        for ent in entities:
            start = ent["start"]
            end = ent["end"]

            span = pred_doc.char_span(start, end, label=spacy_label)
            if span:
                spans.append(span)

    spans = filter_spans(spans)
    pred_doc.ents = spans
    examples.append(Example(pred_doc, gold_doc))

# Evaluate
scorer = Scorer()
scores_dev = scorer.score(examples)

print("Precision:", round(scores_dev["ents_p"], 3))
print("Recall:", round(scores_dev["ents_r"], 3))
print("F1:", round(scores_dev["ents_f"], 3))

You are using a model of type extractor to instantiate a model of type . This is not supported for all configurations of models and can yield errors.


üß† Model Configuration
Encoder model      : microsoft/deberta-v3-base
Counting layer     : count_lstm_v2
Token pooling      : first
Precision: 0.288
Recall: 0.39
F1: 0.331


In [72]:
# Load TEST dataset
doc_bin = DocBin().from_disk("corpus/test.spacy")
gold_docs = list(doc_bin.get_docs(nlp.vocab))

examples = []

for gold_doc in tqdm(gold_docs):
    text = gold_doc.text

    predictions = extractor.extract_entities(
        text,
        gliner_labels,
        include_spans=True
    )

    pred_doc = nlp.make_doc(text)

    spans = []
    for gliner_label, entities in predictions["entities"].items():
        spacy_label = reverse_map.get(gliner_label)
        if not spacy_label:
            continue

        for ent in entities:
            start = ent["start"]
            end = ent["end"]

            span = pred_doc.char_span(start, end, label=spacy_label)
            if span:
                spans.append(span)

    spans = filter_spans(spans)
    pred_doc.ents = spans

    examples.append(Example(pred_doc, gold_doc))

# Evaluate TEST
scorer = Scorer()
scores_test = scorer.score(examples)

print("Precision:", round(scores_test["ents_p"], 3))
print("Recall:", round(scores_test["ents_r"], 3))
print("F1:", round(scores_test["ents_f"], 3))

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 85/85 [00:10<00:00,  8.22it/s]

Precision: 0.321
Recall: 0.371
F1: 0.344





In [73]:
from prettytable import PrettyTable

table = PrettyTable()
table.field_names = ["Model", "Split", "Precision", "Recall", "F1"]

# GLiNER
table.add_row(["GLiNER", "Dev",
               round(scores_dev["ents_p"], 3),
               round(scores_dev["ents_r"], 3),
               round(scores_dev["ents_f"], 3)])

table.add_row(["GLiNER", "Test",
               round(scores_test["ents_p"], 3),
               round(scores_test["ents_r"], 3),
               round(scores_test["ents_f"], 3)])

print(table)

+--------+-------+-----------+--------+-------+
| Model  | Split | Precision | Recall |   F1  |
+--------+-------+-----------+--------+-------+
| GLiNER |  Dev  |   0.288   |  0.39  | 0.331 |
| GLiNER |  Test |   0.321   | 0.371  | 0.344 |
+--------+-------+-----------+--------+-------+


### Result from the paper

**Tabl3 of the paper**

<img src="images/table3.png" width="500">


**Table4 of the paper**

<img src="images/table4.png" width="500">
