# Named Entity Processing with Impresso Models

<a target="_blank" href="https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoHF.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## What is this notebook about?

This notebook demonstrates how to use Impresso models for named entity recognition (NER) and entity linking (EL).

NER detects and classifies entities such as persons, locations, and organizations in text, while EL connects recognized entities to unique identifiers in a knowledge base, like Wikipedia or its data counterpart, Wikidata.

In this notebook, both NER and EL are performed using models trained by Impresso and hosted on [Hugging Face](https://huggingface.co/impresso-project/) (thus the 'HF' suffix in the notebook name):

- The **Impresso NER model** is a Transformer model trained on the Impresso HIPE-2020 portion of the [HIPE-2022 dataset](https://github.com/hipe-eval/HIPE-2022-data). It recognizes entity types such as person, location, and organization while supporting the complete [HIPE typology](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md), including coarse and fine-grained entity types as well as components like names, titles, and roles. Additionally, the NER model's backbone ([dbmdz/bert-medium-historic-multilingual-cased](https://huggingface.co/dbmdz/bert-medium-historic-multilingual-cased)) was trained on various European historical datasets, giving it a broader language capability.  This training included data from the Europeana and British Library collections across multiple languages: German, French, English, Finnish, and Swedish. Due to this multilingual backbone, the NER model may also recognize entities in other languages beyond French and German.
  
- The **Impresso NEL model** links detected entities to unique identifiers in Wikipedia and Wikidata or assigns a 'NIL' label (indicating "not in list" in NLP) if no reference is found. The NEL model was trained on various historical datasets (AjMC, CLEF-HIPE-2020, LeTemps, Living with Machines, NewsEye, SoNAR) across multiple languages, including German, French, English, Finnish, and Swedish, to support comprehensive entity linking (EL) and named entity recognition (NER). Its backbone, [mGENRE,](https://huggingface.co/facebook/mgenre-wiki) uses a multilingual text generation approach for Wikipedia entity prediction, trained on 105 languages from Wikipedia.

Both models can also be tested interactively in Hugging Face spaces: the [NER space](https://huggingface.co/spaces/impresso-project/multilingual-named-entity-recognition) and the [EL space](https://huggingface.co/spaces/impresso-project/multilingual-entity-linking).

## What will you learn in this notebook?

By the end of this notebook, you will know how to:
- Install the necessary libraries to run the models
- Load the models and modules from Hugging Face
- Perform NER on a text input
- Perform EL on the NER output

**Warning**:
To use this notebook, you may need to set the `HF_TOKEN` environment variable in the `.env` file (refer to `.env.example`). You can obtain a token by signing up on the [Hugging Face website](https://huggingface.co/join) and find additional information in the [official documentation](https://huggingface.co/docs/huggingface_hub/v0.20.2/en/quick-start#environment-variable). If you do not want to register an account on HF, simply select Cancel when prompted for a Hugging Face token — no token is needed for this notebook.

## Prerequisites
First, we install and download necessary libraries:

- **torch**: PyTorch is a popular open-source library for deep learning that provides tools for tensor computation, GPU acceleration, and building neural networks.
- **protobuf**: Protobuf, short for 'Protocol Buffers', is a library developed by Google for serializing structured data in a fast and efficient way, ideal for communication between services. 
- **sentencepiece**: SentencePiece is a text processing library used primarily for tokenization, especially for languages with complex scripts. It supports subword tokenization, which is key for training language models that need flexible token units (e.g., parts of words). BERT and transformers in general often use SentencePiece for multilingual support.
- **transformers**:  Developed by Hugging Face, this library offers many functionalities to support the development of NLP deep learning models. It provides pre-trained models for various tasks, supports architectures like BERT, GPT, T5, and others for model developement and manipulation, offers useful pipelines and easily integrates with PyTorch. The models developed by Impresso are BERT-based-
**nltk**: The Natural Language Toolkit is a library for NLP in Python that offers tools for text processing tasks like tokenization, stemming, lemmatization, and parsing, as well as datasets for linguistic research and training. 

Libraries can be installed from the notebook, or within your environment:

```bash
pip install torch protobuf sentencepiece transformers nltk
```

In [None]:
!pip install torch
!pip install protobuf
!pip install sentencepiece
!pip install transformers
!pip install nltk

## Entity Recognition

In [None]:
# Import necessary Python modules from the Transformers library
from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers import pipeline

For NER, we use the Impresso NER model named 'ner-stacked-bert-multilingual' and published on Hugging Face: https://huggingface.co/impresso-project/ner-stacked-bert-multilingual.

In [None]:
# We set the model_name variable to our chosen model, enabling us to load it and use it for token classification and NER
MODEL_NAME = "impresso-project/ner-stacked-bert-multilingual"

# Load the tokenizer corresponding to the model
ner_tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

It is necessary to create a pipeline for our task (`generic-ner`), using the loaded model and tokenizer. This pipeline handles multiple tasks under the hood. This custom NER pipeline streamlines tokenization, language-specific rules, and post-processing into a single workflow. It accurately identifies, aligns, and cleans entities while managing complexities like multilingual punctuation rules, attachment of complementary information (e.g., titles), and removal of redundant tokens. Using this pipeline simplifies our task by handling the NER within a single, streamlined process, making the workflow efficient and minimizing manual data processing.


In [None]:
ner_pipeline = pipeline("generic-ner", model=MODEL_NAME, 
                        tokenizer=ner_tokenizer, 
                        trust_remote_code=True,
                        device='cpu')

In [5]:
# We define some test input
sentence = """In the year 1789, King Louis XVI, ruler of France, convened the Estates-General at the Palace of Versailles, 
                where Marie Antoinette, the Queen of France, alongside Maximilien Robespierre, a leading member of the National Assembly, 
                debated with Jean-Jacques Rousseau, the famous philosopher, and Charles de Talleyrand, the Bishop of Autun, 
                regarding the future of the French monarchy. At the same time, across the Atlantic in Philadelphia, 
                George Washington, the first President of the United States, and Thomas Jefferson, the nation's Secretary of State, 
                were drafting policies for the newly established American government following the signing of the Constitution."""

print(sentence)

In the year 1789, King Louis XVI, ruler of France, convened the Estates-General at the Palace of Versailles, 
                where Marie Antoinette, the Queen of France, alongside Maximilien Robespierre, a leading member of the National Assembly, 
                debated with Jean-Jacques Rousseau, the famous philosopher, and Charles de Talleyrand, the Bishop of Autun, 
                regarding the future of the French monarchy. At the same time, across the Atlantic in Philadelphia, 
                George Washington, the first President of the United States, and Thomas Jefferson, the nation's Secretary of State, 
                were drafting policies for the newly established American government following the signing of the Constitution.


In [6]:
# A function that formats and displays the model output in a readable structure
def print_nicely(data):
    for idx, entry in enumerate(data):
        for key, value in entry.items():
            print(f"  {key.capitalize()}: {value}")
        print()  # Blank line between entries
        

We apply the pipeline on the input and print nicely the output

In [7]:
# Recognize stacked entities for each sentence
entities = ner_pipeline(sentence)

# Extract coarse and fine entities
print_nicely(entities)

  Type: time
  Confidence_ner: 79.68
  Index: (0, 4)
  Surface: year 1789
  Loffset: 7
  Roffset: 16

  Type: pers
  Confidence_ner: 95.57
  Index: (5, 12)
  Surface: King Louis XVI, ruler of France
  Loffset: 18
  Roffset: 49
  Title: King
  Name: King Louis XVI
  Function: ruler of France

  Type: loc
  Confidence_ner: 68.91
  Index: (20, 23)
  Surface: Palace of Versailles
  Loffset: 87
  Roffset: 107

  Type: pers
  Confidence_ner: 77.25
  Index: (25, 32)
  Surface: Marie Antoinette, the Queen of France
  Loffset: 132
  Roffset: 169
  Name: Marie Antoinette
  Function: Queen of France

  Type: pers
  Confidence_ner: 90.78
  Index: (34, 44)
  Surface: Maximilien Robespierre, a leading member of the National Assembly
  Loffset: 181
  Roffset: 246
  Name: Maximilien Robespierre
  Function: leading member of the National Assembly

  Type: pers
  Confidence_ner: 93.25
  Index: (47, 55)
  Surface: Jean-Jacques Rousseau, the famous philosopher
  Loffset: 278
  Roffset: 323
  Name: Jean-Ja

### Example of Entity Recognition with OCR Errors

Below, we introduce simulated OCR errors, such as character misrecognition, missing spaces, and incorrect capitalization.

In [8]:
sentence_with_ocr_errors = """In the year 1789, K1ng L0uis XVl, ruler of France, convened the Estatzs-General at the Palaceof Versailles,
                where Marie Antoinette, the Qveen of France, alongside Max1milien Robespierre, a leading member of the National Assembly,
                debated with JeanJacques Rousseau, the fam0us philos0pher, and Charles de Talleyrand, the B1shop of Autun,
                regarding the futureoftheFrench monarchy. At the same time, across the Atlant1c in Philadelp1ia,
                GeorgeWashington, the first President of the United States, and Thomas Jeffers0n, the nation’s SecretaryofState,
                were drafting policies for the newly establ1shed American govemment foll0wing the sign1ng of the Const1tution."""


Now, let’s run the OCR-affected text through the NER pipeline to observe how well the algorithm performs under OCR-induced distortions.


In [9]:
entities_with_errors = ner_pipeline(sentence_with_ocr_errors)

print_nicely(entities_with_errors)

  Type: time
  Confidence_ner: 76.76
  Index: (0, 4)
  Surface: year 1789
  Loffset: 7
  Roffset: 16

  Type: pers
  Confidence_ner: 90.26
  Index: (5, 12)
  Surface: K1ng L0uis XVl, ruler of France
  Loffset: 18
  Roffset: 49
  Name: K1ng L0uis XVl
  Function: ruler of France

  Type: loc
  Confidence_ner: 77.74
  Index: (20, 22)
  Surface: Palaceof Versailles
  Loffset: 87
  Roffset: 106

  Type: pers
  Confidence_ner: 80.7
  Index: (24, 31)
  Surface: Marie Antoinette, the Qveen of France
  Loffset: 130
  Roffset: 167
  Name: Marie Antoinette
  Function: Qveen of France

  Type: pers
  Confidence_ner: 86.82
  Index: (33, 43)
  Surface: Max1milien Robespierre, a leading member of the National Assembly
  Loffset: 179
  Roffset: 244
  Name: Max1milien Robespierre
  Function: member of the National Assembly

  Type: pers
  Confidence_ner: 84.25
  Index: (46, 52)
  Surface: JeanJacques Rousseau, the fam0us philos0pher
  Loffset: 275
  Roffset: 319
  Name: JeanJacques Rousseau
  Function:

In [10]:
# Verify that the entity counts match for the original and OCR-affected sentences
original_entities = ner_pipeline(sentence)
entities_with_errors = ner_pipeline(sentence_with_ocr_errors)

print("Number of entities in the original text:", len(original_entities))
print("Number of entities in the OCR-affected text:", len(entities_with_errors))
print("Are entity counts equal?", len(original_entities) == len(entities_with_errors))


Number of entities in the original text: 11
Number of entities in the OCR-affected text: 11
Are entity counts equal? True


## Entity Linking

With the EL model, we can link the previously recognised entity mentions to unique referents in Wikipedia and Wikidata.

We use the Impresso model named 'nel-mgenre-multilingual' and published on Hugging Face: https://huggingface.co/impresso-project/nel-mgenre-multilingual.

In [None]:
# Import the necessary modules from the transformers library
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import pipeline

NEL_MODEL_NAME = "impresso-project/nel-mgenre-multilingual"

# Load the tokenizer and model from the specified pre-trained model name
# The model used here is "https://huggingface.co/impresso-project/nel-mgenre-multilingual"
nel_tokenizer = AutoTokenizer.from_pretrained("impresso-project/nel-mgenre-multilingual")

In [None]:
nel_pipeline = pipeline("generic-nel", model=NEL_MODEL_NAME, 
                        tokenizer=nel_tokenizer, 
                        trust_remote_code=True,
                        device='cpu')

Our entity linker requires a specific format to correctly identify the entity that needs to be linked, as follows:

```
The event was held at the [START] Palace of Versailles [END], a symbol of French monarchy.
```

Assuming that the string "Palace of Versailles" was previously detected by an NER tool, we need to surround it with the `[START]` and `[END]` markers. 

The EL pipeline processes only one entity per input text. Therefore, for multiple entities within the same input, we must create separate inputs for each entity. For instance:

```
The event was held at the [START] Palace of Versailles [END], a symbol of French monarchy.
The event was held at the Palace of Versailles, a symbol of [START] French monarchy [END].
```

Let's take this example:

In [13]:
simple_sentence = "The event was held at the [START] Palace of Versailles [END], a symbol of French monarchy."

linked_entity = nel_pipeline(simple_sentence)

print_nicely(linked_entity)

  Surface: Palace of Versailles
  Wkd_id: Q2946
  Wkpedia_pagename: Palace of Versailles
  Wkpedia_url: https://en.wikipedia.org/wiki/Palace_of_Versailles
  Type: UNK
  Confidence_nel: 99.98999786376953
  Loffset: 33
  Roffset: 55



It _could_ work without the special markers and texts mentioning only one entity, but we do not recommend it.

In [14]:
simple_sentence = "The event was held at the Palace of Versailles, a symbol of French monarchy."

linked_entity = nel_pipeline(simple_sentence)

print_nicely(linked_entity)

  Surface: None
  Wkd_id: Q142
  Wkpedia_pagename: France
  Wkpedia_url: https://en.wikipedia.org/wiki/France
  Type: UNK
  Confidence_nel: 63.689998626708984
  Loffset: None
  Roffset: None



By using our NER tool, we can automatically generate sentences with entity markers and subsequently link each entity:

In [15]:
# Run the NER pipeline on the input sentence and store the results
entities = ner_pipeline(sentence)

print(f'{len(entities)} entities were detected.')

# List to keep track of already processed words to avoid duplicate tagging
already_done = []

# Process each entity for linking
for entity in entities:
    if entity['surface'] not in already_done:
        # Tag the entity in the text

        language = 'en'
        tokens = sentence.split(' ')
        start, end = (
            entity["index"][0],
            entity["index"][1],
        )

        context_start = max(0, start - 10)
        context_end = min(len(tokens), end + 11)

        nel_sentence = (
            " ".join(tokens[context_start:start])
            + " [START] "
            + entity['surface']
            + " [END] "
            + " ".join(tokens[end + 1 : context_end])
        )

        linked_entities = nel_pipeline(nel_sentence)
        print(nel_sentence)
        print_nicely(linked_entities)

11 entities were detected.
 [START] year 1789 [END] Louis XVI, ruler of France, convened the Estates-General at the
  Surface: year 1789
  Wkd_id: Q142
  Wkpedia_pagename: France
  Wkpedia_url: https://en.wikipedia.org/wiki/France
  Type: UNK
  Confidence_nel: 38.290000915527344
  Loffset: 8
  Roffset: 19

In the year 1789, King [START] King Louis XVI, ruler of France [END] at the Palace of Versailles, 
    
  Surface: King Louis XVI, ruler of France
  Wkd_id: NIL
  Wkpedia_pagename: NIL
  Wkpedia_url: None
  Type: UNK
  Confidence_nel: 100.0
  Loffset: 30
  Roffset: 63

convened the Estates-General at the Palace of Versailles, 
  [START] Palace of Versailles [END]          
  Surface: Palace of Versailles
  Wkd_id: Q2946
  Wkpedia_pagename: Palace of Versailles
  Wkpedia_url: https://en.wikipedia.org/wiki/Palace_of_Versailles
  Type: UNK
  Confidence_nel: 100.0
  Loffset: 68
  Roffset: 90

Palace of Versailles, 
       [START] Marie Antoinette, the Queen of France [END]  where Marie A

### Example of Entity Linking with OCR Errors

To evaluate the robustness of entity linking with OCR errors, we use both the original and OCR-affected sentences. Below, the entities identified by NER are linked individually to unique Wikipedia/Wikidata entries, while OCR errors are present.



In [16]:
print(f'{len(entities_with_errors)} entities were previously detected in OCR-affected text.')

# List to avoid reprocessing the same entities
already_done_ocr = []

# Process each detected entity in OCR-affected text for linking
for entity in entities_with_errors:
    if entity['surface'] not in already_done_ocr:
        # Format sentence with entity markers for EL
        language = 'en'
        tokens = sentence_with_ocr_errors.split(' ')
        start, end = entity["index"][0], entity["index"][1]

        context_start = max(0, start - 10)
        context_end = min(len(tokens), end + 11)

        # Surround entity with [START] and [END] tags
        nel_sentence = (
            " ".join(tokens[context_start:start])
            + " [START] "
            + entity['surface']
            + " [END] "
            + " ".join(tokens[end + 1:context_end])
        )

        # Perform entity linking on OCR-affected sentence
        linked_entity_ocr = nel_pipeline(nel_sentence)
        print("Sentence with OCR Error:")
        print(nel_sentence)
        print("Linked Entity:")
        print_nicely(linked_entity_ocr)
        already_done_ocr.append(entity['surface'])


11 entities were previously detected in OCR-affected text.
Sentence with OCR Error:
 [START] year 1789 [END] L0uis XVl, ruler of France, convened the Estatzs-General at the
Linked Entity:
  Surface: year 1789
  Wkd_id: Q7772
  Wkpedia_pagename: 1789
  Wkpedia_url: https://en.wikipedia.org/wiki/1789
  Type: UNK
  Confidence_nel: 73.66000366210938
  Loffset: 8
  Roffset: 19

Sentence with OCR Error:
In the year 1789, K1ng [START] K1ng L0uis XVl, ruler of France [END] at the Palaceof Versailles,
      
Linked Entity:
  Surface: K1ng L0uis XVl, ruler of France
  Wkd_id: NIL
  Wkpedia_pagename: NIL
  Wkpedia_url: None
  Type: UNK
  Confidence_nel: 79.62000274658203
  Loffset: 30
  Roffset: 63

Sentence with OCR Error:
convened the Estatzs-General at the Palaceof Versailles,
    [START] Palaceof Versailles [END]          where
Linked Entity:
  Surface: Palaceof Versailles
  Wkd_id: Q2946
  Wkpedia_pagename: Palace of Versailles
  Wkpedia_url: https://en.wikipedia.org/wiki/Palace_of_Versaille

## Looking up entities in the Impresso Corpus

Are the previously recognised entities present in the Impresso Corpus? For each entity, we use impresso_session.entities.find() to look it up by name. This search will attempt to find a match for the exact name provided. If OCR errors are introduced (e.g., "Max1milien Robespierre" instead of "Maximilien Robespierre"), we can observe how resilient the search function is to variations. Let's explore using the Impresso API and Python Library. 

In [None]:
!pip install --upgrade --force-reinstall impresso
from impresso import version
print(version)

In [18]:
from impresso import connect

impresso_session = connect()


Click on the following link to access the login page: https://impresso-project.ch/datalab/token
 - 🔤 Enter your email/password on this page.
 - 🔑 Once logged in, a secret token will be generated for you.
 - 📋 Copy this token and paste it into the input field below. Then press "Enter". 👇🏼.



🔑 Enter your token:  ········


🎉 You are now connected to the Impresso API!  🎉


In [19]:
entity = impresso_session.entities.find("Maximilien Robespierre")

entity

Unnamed: 0_level_0,name,type,countItems,countMentions,wikidataId,wikidata.birthDate,wikidata.deathDate,wikidata.birthPlace.id,wikidata.birthPlace.type,wikidata.birthPlace.labels.fr,wikidata.birthPlace.labels.it,wikidata.birthPlace.labels.de,wikidata.birthPlace.labels.en,wikidata.birthPlace.descriptions.en,wikidata.birthPlace.descriptions.fr,wikidata.birthPlace.descriptions.de,wikidata.birthPlace.descriptions.it,wikidata.birthPlace.images,wikidata.birthPlace.coordinates.latitude,wikidata.birthPlace.coordinates.longitude,wikidata.birthPlace.coordinates.altitude,wikidata.birthPlace.coordinates.precision,wikidata.birthPlace.coordinates.globe,wikidata.birthPlace.country.entity-type,wikidata.birthPlace.country.numeric-id,wikidata.birthPlace.country.id,wikidata.deathPlace.id,wikidata.deathPlace.type,wikidata.deathPlace.labels.de,wikidata.deathPlace.labels.en,wikidata.deathPlace.labels.fr,wikidata.deathPlace.labels.it,wikidata.deathPlace.descriptions.fr,wikidata.deathPlace.descriptions.it,wikidata.deathPlace.descriptions.de,wikidata.deathPlace.descriptions.en,wikidata.deathPlace.images,wikidata.deathPlace.coordinates.latitude,wikidata.deathPlace.coordinates.longitude,wikidata.deathPlace.coordinates.altitude,wikidata.deathPlace.coordinates.precision,wikidata.deathPlace.coordinates.globe,wikidata.deathPlace.country.entity-type,wikidata.deathPlace.country.numeric-id,wikidata.deathPlace.country.id,wikidata.id,wikidata.type,wikidata.labels.fr,wikidata.labels.it,wikidata.labels.de,wikidata.labels.en,wikidata.descriptions.it,wikidata.descriptions.de,wikidata.descriptions.fr,wikidata.descriptions.en,wikidata.images
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1
aida-0001-50-Maximilien_Robespierre,Maximilien Robespierre,person,275,297,Q44197,+1758-05-06T00:00:00Z,+1794-07-28T00:00:00Z,Q131329,location,Arras,Arras,Arras,Arras,"commune in Pas-de-Calais, France",commune française du département du Pas-de-Cal...,Gemeinde in Frankreich,comune francese,"[{'value': 'Arras Rathaus.JPG', 'rank': 'norma...",50.289167,2.78,,0.000278,http://www.wikidata.org/entity/Q2,item,142,Q142,Q90,location,Paris,Paris,Paris,Parigi,capitale de la France,capitale della Francia,Hauptstadt und bevölkerungsreichste Stadt Fran...,capital city of France,[{'value': 'Paris - Eiffelturm und Marsfeld2.j...,48.856667,2.352222,,0.000278,http://www.wikidata.org/entity/Q2,item,142,Q142,Q44197,human,Maximilien de Robespierre,Maximilien de Robespierre,Maximilien de Robespierre,Maximilien Robespierre,Avvocato e politico rivoluzionario francese (1...,"französischer Politiker, Führer der Jakobiner ...","avocat, homme d'État et révolutionnaire français",French revolutionary lawyer and politician (17...,"[{'value': 'Robespierre.jpg', 'rank': 'normal'..."


This command checks if "Maximilien Robespierre" exists in the Impresso database. Similarly, we test the resilience of the search function by querying slightly altered names (e.g., "Max1milien Robespierre").



In [22]:
entity = impresso_session.entities.find("Max1milien Robespierre")

entity

In [23]:
entity = impresso_session.entities.find("Marie Antoinette")

entity

Unnamed: 0_level_0,name,type,countItems,countMentions,wikidataId,wikidata.birthDate,wikidata.deathDate,wikidata.birthPlace.id,wikidata.birthPlace.type,wikidata.birthPlace.labels.fr,wikidata.birthPlace.labels.it,wikidata.birthPlace.labels.de,wikidata.birthPlace.labels.en,wikidata.birthPlace.descriptions.de,wikidata.birthPlace.descriptions.en,wikidata.birthPlace.descriptions.it,wikidata.birthPlace.descriptions.fr,wikidata.birthPlace.images,wikidata.birthPlace.coordinates.latitude,wikidata.birthPlace.coordinates.longitude,wikidata.birthPlace.coordinates.altitude,wikidata.birthPlace.coordinates.precision,wikidata.birthPlace.coordinates.globe,wikidata.birthPlace.country.entity-type,wikidata.birthPlace.country.numeric-id,wikidata.birthPlace.country.id,wikidata.deathPlace.id,wikidata.deathPlace.type,wikidata.deathPlace.labels.fr,wikidata.deathPlace.labels.it,wikidata.deathPlace.labels.de,wikidata.deathPlace.labels.en,wikidata.deathPlace.descriptions.fr,wikidata.deathPlace.descriptions.en,wikidata.deathPlace.descriptions.de,wikidata.deathPlace.descriptions.it,wikidata.deathPlace.images,wikidata.deathPlace.coordinates.latitude,wikidata.deathPlace.coordinates.longitude,wikidata.deathPlace.coordinates.altitude,wikidata.deathPlace.coordinates.precision,wikidata.deathPlace.coordinates.globe,wikidata.deathPlace.country.entity-type,wikidata.deathPlace.country.numeric-id,wikidata.deathPlace.country.id,wikidata.id,wikidata.type,wikidata.labels.fr,wikidata.labels.it,wikidata.labels.de,wikidata.labels.en,wikidata.descriptions.de,wikidata.descriptions.fr,wikidata.descriptions.en,wikidata.descriptions.it,wikidata.images
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1
aida-0001-50-Marie_Antoinette,Marie Antoinette,person,1621,1806,Q47365,+1755-11-02T00:00:00Z,+1793-10-16T00:00:00Z,Q46242,location,Hofburg,Hofburg,Hofburg,Hofburg Palace,frühere Residenz der Habsburger und Amtssitz d...,"palace located in Vienna, Austria",residenza ufficiale dei presidenti federali d'...,"palais, musée et résidence officielle du prési...","[{'value': 'Wien - Neue Hofburg.JPG', 'rank': ...",48.206389,16.365278,,0.000278,http://www.wikidata.org/entity/Q2,item,40.0,Q40,Q189503,location,place de la Concorde,place de la Concorde,Place de la Concorde,place de la Concorde,"place de Paris, en France","square in the 8th arrondissement of Paris, France",Platz in Paris‎,"Piazza a Parigi, Francia",[{'value': 'Place de la Concorde from the Eiff...,48.865633,2.321236,,3e-06,http://www.wikidata.org/entity/Q2,item,142.0,Q142,Q47365,human,Marie-Antoinette,Maria Antonietta d'Asburgo-Lorena,Marie Antoinette,Marie Antoinette,"Erzherzogin von Österreich, Königin von Frankr...","reine de France originaire d'Autriche, épouse ...",last Queen of France prior to the French Revol...,regina consorte di Francia e di Navarra e mogl...,[{'value': 'Louise Elisabeth Vigée-Lebrun - Ma...
aida-0001-50-Marie_Antoinette_$28$2006_film$29$,Marie Antoinette (2006 film),person,268,284,Q829695,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Q829695,item,Marie-Antoinette,Marie Antoinette,Marie Antoinette,Marie Antoinette,Film von Sofia Coppola (2006),film sorti en 2006,2006 film directed by Sofia Coppola,film del 2006 scritto e diretto da Sofia Coppola,[]
aida-0001-50-Marie_Antoinette_$28$1938_film$29$,Marie Antoinette (1938 film),person,175,188,Q1897128,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Q1897128,item,Marie-Antoinette,Maria Antonietta,Marie-Antoinette,Marie Antoinette,Film 1938,film sorti en 1938,1938 American film,"film del 1938 diretto da W. S. Van Dyke II e, ...",[{'value': 'Shearer Marie Antoinette 1938.jpg'...
