# NER benchmark: `spacy` vs `spacy-llm`+Ollama+langchain vs `span_marker`

References:
- setup for spacy-llm + ollama via langchain https://spacy.io/api/large-language-models#install-hf and https://spacy.io/api/large-language-models#langchain-models
    ```sh
    pip install "spacy-llm[extras]" "spacy-llm[transformers]" "transformers[sentencepiece]" "accelerate>=0.16.0,<1.0"
    ```
- Mistral model through Ollama has default temperature of 0.8 and context length of 2048

Results:

|                                  | Default                                                                                                                         | LLM with simple prompt                                                | LLM with complex prompt and Chain of Thoughts                                                 | SpanMarker                                             |
|----------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------|
| model                            | [spacy/en_core_web_sm](https://huggingface.co/spacy/en_core_web_sm)                                                                                                            | [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)                                             | [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)                                                                     | [tomaarsen/span-marker-mbert-base-multinerd](https://huggingface.co/tomaarsen/span-marker-mbert-base-multinerd)             |
| mechanism                        | Fine-tuned on OntoNotes 5, 85% F1-score; transition-based algorithm with [EntityRecognizer](https://spacy.io/api/entityrecognizer)                                                            | Prompt engineering with local Ollama LLM, spacy-llm with [spacy.NER.v2](https://spacy.io/api/large-language-models#ner-v2) | Prompt engineering with local Ollama LLM, spacy-llm with [spacy.NER.v3](https://spacy.io/api/large-language-models#ner-v3)                         | Fine-tuned on MultiNERD, 93% F1-score; PL-marker paper [docs](https://tomaarsen.github.io/SpanMarkerNER/index.html)|
| inference time on test paragraph on Macbook Pro M1 no GPU | 0.097s                                                                                                                          | 26.16s                                                                | 37.22s                                                                                        | 3.22s                                                  |
| eye-balled performance                     | Baseline with ok performance but with issues, e.g.: 1 span is split in 2 entities, PER entity label misclassified as GPE or ORG | Good, high precision and recall                                       | Poor, high Precision but very low recall; docs mention it performs best with fewshot learning | Best                                                   |

In [1]:
import json
from typing import cast
from IPython.display import display, Markdown

import spacy
from spacy_llm.util import assemble, Config, assemble_from_config
from spacy_llm.pipeline import LLMWrapper

In [2]:
with open("../data/wiki_guardians.json", "r") as fh:
    text: str = json.load(fh)["text"]
    paragraph = text.split("\n\n\n")[0]

## Using default spaCy model

In [14]:
nlp = spacy.load("en_core_web_sm")

In [15]:
doc = nlp(paragraph)

In [16]:
spacy.displacy.render(
    doc,
    style = "ent",
    jupyter = True,
)

## spacy-llm with spacy.NER.v2


In [27]:
cfg_string = """
[nlp]
lang = "en"
pipeline = ["llm"]
[components]
[components.llm]
factory = "llm"
[components.llm.model]
@llm_models = "langchain.Ollama.v1"
name = "mistral"
context_length = 2048
config = {"temperature": 0.0}
[components.llm.task]
@llm_tasks = "spacy.NER.v2"
labels = ["CARDINAL", "DATE", "EVENT", "FAC", "GPE", "LANGUAGE", "LAW", "LOC", "MONEY", "NORP", "ORDINAL", "ORG", "PERCENT", "PERSON", "PRODUCT", "QUANTITY", "TIME", "WORK_OF_ART"]
"""
config = Config().from_str(cfg_string)
nlp = assemble_from_config(config)

In [28]:
doc = nlp(paragraph)  
# if you run it for the first time, Ollama will add 10 sec of warmup time

In [29]:
llm_pipe = cast(LLMWrapper, nlp.get_pipe("llm"))
prompts = list(llm_pipe._task.generate_prompts([doc]))
print(prompts[0][0][0])

You are an expert Named Entity Recognition (NER) system. Your task is to accept Text as input and extract named entities for the set of predefined entity labels.
From the Text input provided, extract named entities for each label in the following format:

CARDINAL: <comma delimited list of strings>
DATE: <comma delimited list of strings>
EVENT: <comma delimited list of strings>
FAC: <comma delimited list of strings>
GPE: <comma delimited list of strings>
LANGUAGE: <comma delimited list of strings>
LAW: <comma delimited list of strings>
LOC: <comma delimited list of strings>
MONEY: <comma delimited list of strings>
NORP: <comma delimited list of strings>
ORDINAL: <comma delimited list of strings>
ORG: <comma delimited list of strings>
PERCENT: <comma delimited list of strings>
PERSON: <comma delimited list of strings>
PRODUCT: <comma delimited list of strings>
QUANTITY: <comma delimited list of strings>
TIME: <comma delimited list of strings>
WORK_OF_ART: <comma delimited list of string

In [30]:
spacy.displacy.render(
    doc,
    style = "ent",
    jupyter = True,
)

### leverage sharding for documents of arbitrary length
In the docs there is no mention of chunking, but after reading the 2 PRs below, I realised that we need to pass `context_length` to the config.
reference:
- [Updated docs w.r.t. infinite doc length changes](https://github.com/explosion/spaCy/pull/13214)
- [Support arbitrarily long docs](https://github.com/explosion/spacy-llm/pull/332)

## spacy-llm with spacy.NER.v3
Filling the config with references from:
- [NER definition](https://spacy.io/usage/spacy-101#features)
- [NER label list](https://spacy.io/models/en)
- [NER label descriptions](https://github.com/DerwenAI/textgraphs/blob/dcdd568268f4986c72378b740fa970df5f492697/textgraphs/kg.py#L52)

In [31]:
cfg_string = """
[nlp]
lang = "en"
pipeline = ["llm"]
[components]
[components.llm]
factory = "llm"
[components.llm.model]
@llm_models = "langchain.Ollama.v1"
name = "mistral"
context_length = 2048
config = {"temperature": 0.0}
[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = ["CARDINAL", "DATE", "EVENT", "FAC", "GPE", "LANGUAGE", "LAW", "LOC", "MONEY", "NORP", "ORDINAL", "ORG", "PERCENT", "PERSON", "PRODUCT", "QUANTITY", "TIME", "WORK_OF_ART"]
description = Labelling named “real-world” objects, like persons, companies or locations.
[components.llm.task.label_definitions]
CARDINAL= "Numerals that do not fall under another type"
DATE= "Absolute or relative dates or periods"
EVENT= "Named hurricanes, battles, wars, sports events, etc."
FAC= "Buildings, airports, highways, bridges, etc."
GPE= "Countries, cities, states"
LANGUAGE= "Any named language"
LAW= "Named documents made into laws"
LOC= "Non-GPE locations, mountain ranges, bodies of water"
MONEY= "Monetary values, including unit"
NORP= "Nationalities or religious or political groups"
ORDINAL= "Ordinal number, i.e., first, second, etc."
ORG= "Companies, agencies, institutions, etc."
PERCENT= "Percentage"
PERSON= "People, including fictional"
PRODUCT= "Vehicles, weapons, foods, etc. (Not services)"
QUANTITY= "Measurements, as of weight or distance"
TIME= "Times smaller than a day"
WORK OF ART= "Titles of books, songs, etc."
"""
config = Config().from_str(cfg_string)
nlp = assemble_from_config(config)

In [32]:
doc = nlp(paragraph)

In [33]:
spacy.displacy.render(
    doc,
    style = "ent",
    jupyter = True,
)

## Comparing with SpanMarker

In [34]:
# Load the spaCy model with the span_marker pipeline component
nlp = spacy.load("en_core_web_sm", exclude=["ner"])
nlp.add_pipe("span_marker", config={"model": "tomaarsen/span-marker-mbert-base-multinerd"})

<span_marker.spacy_integration.SpacySpanMarkerWrapper at 0x2b066fdc0>

In [35]:
# Feed some text through the model to get a spacy Doc
doc = nlp(paragraph)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [36]:
spacy.displacy.render(
    doc,
    style = "ent",
    jupyter = True,
)