# GLINER Usage Example

This notebook provides an example of the usage of the LLM-based named entity recognition system [GLINER](https://github.com/urchade/GLiNER). Unlike most other named entity recognition, GLINER is not bound to particular types of named entities. This means that you can use it for recognizing the entity types that you need for your application.

* GLINER code: [https://github.com/urchade/GLiNER](https://github.com/urchade/GLiNER)
* GLINE paper: [https://aclanthology.org/2024.naacl-long.300/](https://aclanthology.org/2024.naacl-long.300/)

## 1. Prerequisites

In order to be able to run this notebook you need to install the following software:

! pip install --upgrade pip
! pip install gliner
! pip install spacy

## 2. Required Python modules

These are the Python modules needed for running this notebook:

In [1]:
from gliner import GLiNER
import spacy

  from .autonotebook import tqdm as notebook_tqdm


## 3. GLINER Large Language Model

GLINER needs a large language model to operate. We use the model `urchade/gliner_large-v2.1` (1.7Gb). There are also two smaller models available, which instead of `large` use `medium` and `small` in their names.

In [2]:
GLINER_HUGGINGFACE_MODEL = "urchade/gliner_large-v2.1"

model = GLiNER.from_pretrained(GLINER_HUGGINGFACE_MODEL)

Fetching 4 files: 100%|████████████████████████| 4/4 [00:00<00:00, 81442.80it/s]


## 4. Example text

We use the example text from the GLINER Github page, which is the first paragraph of the English Wikipedia article about the soccer player [Cristiano Ronaldo](https://en.wikipedia.org/wiki/Cristiano_Ronaldo). You may replace this text by any other English text that you want to process.

In [3]:
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""

## 5. Target labels

Since GLINER has no pre-set types of entities that it can recognize, you need to specify which entities you want the system to identify. You can use arbitrary text for describing the types. Make sure that these are types of names, for example: Person, Movie actor or Renaissance painter. If you use other descriptions for words, like Noun, Verb of Five letter word, these types will probably not be identified correctly. 

In [4]:
labels = ["Person", "Award", "Date", "Competitions", "Teams", "Country"]

## 6. Running GLINER

Next, you can ask GLINER to identify the entities by calling the `predict_entities` of the model. In the call the minimum confidence level for accepting an entity needs to be specified (parameter `threshold`). The confidence level varies between 0.0 and 1.0.

In [5]:
entities = model.predict_entities(text, labels, threshold=0.1)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


## 7. Visualizing the entities

The entities can be visualized in the text with the `visualize_entities` function below:

In [6]:
def visualize_entities(text, entities):
    nlp = spacy.blank("en")
    doc = nlp(text)
    spans = [doc.char_span(entity["start"], 
                           entity["end"], 
                           label=entity["label"],
                           alignment_mode="contract")
             for entity in entities]
    doc.ents = spacy.util.filter_spans(spans)
    spacy.displacy.render(doc, style="ent")

In [7]:
visualize_entities(text, entities)

## 8. Analyze your own texts

You can now experiment with GLINER to learn about its strengths and weaknesses:

1. Repeat steps 4-7 with an English text of your own choice and with your own entity labels
2. Experiment with different confidence threshold levels (step 6) to check what effects these have
3. Check the contents of the `entities` variable (a list of dicts) to see what is in there
4. You can also try to change the [LLM used by GLINER](https://huggingface.co/urchade/gliner_large-v2.1), defined in step 3. Apart from a [multilingual model](https://huggingface.co/urchade/gliner_large-v2.1) there are also models for [Italian and Korean](https://huggingface.co/collections/urchade/language-specific-gliner)

Can you make GLINER identify entities of the types that you are interested in? How well does it do? Good luck!