<a href="https://colab.research.google.com/github/simon-clematide/colab-notebooks-for-teaching/blob/main/impresso_ner_displacy_visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualize coarse- and fine-grained Named Entity information
This notebook offers an overview into multilingual fine-grained and nested Named Entity Recognition

*Thanks for Emma Bourous for training the models, programming the API, and the HIPE team for providing the training material.*

## Install spacy

In [None]:
! pip install spacy



## Call the impresso NER and NEL API
We only use the NER part in the visualization.

In [None]:
def get_linked_entities(text, coarse_only=False):
    """
    Calls the external API to get named entity recognition (NER) results.
    """
    url = "https://impresso-annotation.epfl.ch/api/ner/"
    payload = {"data": text}
    try:
        response = requests.post(url, json=payload)
        if response.status_code == 200:
            data = response.json()
            data["text"] = text
            # remove fine-grained and components
            if coarse_only:
                for ne in data["nes"]:
                    data["nes"] = [ne for ne in data["nes"] if not "." in ne["type"]]
            return data
        else:
            print(f"Request failed with status code {response.status_code}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None


In [None]:
# Let's test this function
import json

In [None]:
text = "Die Zentrallbibliothek in Zürich wurde am 1. Oktober 1917 nourgeschrieben."
api_result = get_linked_entities(text,coarse_only=True)
import json
print(json.dumps(api_result, ensure_ascii=False,indent=2))

{
  "ts": "2024-09-11T00:23:57Z",
  "sys_id": "stacked-2-bert-medium-historic-multilingual-v3-base|mgenre",
  "nes": [
    {
      "type": "org",
      "surface": "Zentrallbibliothek",
      "lOffset": 4,
      "rOffset": 22,
      "id": "4:22:org:stacked-2-bert-medium-historic-multilingual-v3-base|mgenre",
      "nested": false,
      "wkd_id": "NIL",
      "confidence_ner": 77.14,
      "confidence_nel": 59.1
    },
    {
      "type": "loc",
      "surface": "Zürich",
      "lOffset": 26,
      "rOffset": 32,
      "id": "26:32:loc:stacked-2-bert-medium-historic-multilingual-v3-base|mgenre",
      "nested": false,
      "wkd_id": "Q72",
      "wkpedia_pagename": "Zürich >> de ",
      "confidence_ner": 90.33,
      "confidence_nel": 78.77
    }
  ],
  "text": "Die Zentrallbibliothek in Zürich wurde am 1. Oktober 1917 nourgeschrieben."
}


## Code for the visualization

In [None]:
import spacy
from spacy.tokens import Span
from spacy import displacy
import requests


# Define the color scheme for the entities and components
colors = {
    "pers": "#CBC3E3",  # Pastel Lavender
    "org": "#FFC0CB",  # Pastel Pink
    "prod": "#ADD8E6",  # Pastel Blue
    "time": "#FDFD96",  # Pastel Goldenrod
    "loc": "#B0E57C",  # Pastel Green,
    # Fine-grained categories
    "pers.ind": "#E6E6FA",  # Light Lavender
    "pers.coll": "#D8BFD8",  # Lavender Mist
    "pers.ind.articleauthor": "#DDA0DD",  # Thistle
    "org.adm": "#FFD1DC",  # Pale Pink
    "org.ent": "#FFB6C1",  # Pink
    "org.ent.pressagency": "#F08080",  # Light Coral
    "prod.media": "#87CEFA",  # Light Sky Blue
    "prod.doctr": "#B0E0E6",  # Powder Blue
    "time.date.abs": "#EEE8AA",  # Pale Goldenrod
    "loc.adm.town": "#98FB98",  # Pale Green
    "loc.adm.reg": "#90EE90",  # Light Green
    "loc.adm.nat": "#F0FFF0",  # Honeydew
    "loc.adm.sup": "#F5FFFA",  # Mint Cream
    "loc.phys.geo": "#20B2AA",  # Light Sea Green
    "loc.phys.hydro": "#66CDAA",  # Medium Aquamarine
    "loc.phys.astro": "#AFEEEE",  # Pale Turquoise
    "loc.oro": "#FFDAB9",  # Peach Puff
    "loc.fac": "#FFE4B5",  # Moccasin
    "loc.add.phys": "#FFFFE0",  # Light Yellow
    "loc.add.elec": "#FFFACD",  # Lemon Chiffon
    "loc.unk": "#FAF0E6",  # Linen,
}

options = {
    "spans_key": "sc",  # Custom key for spans
    "colors": colors  # Custom color mapping
}

def visualize_overlapping_spans(api_response):
    """
    Visualizes overlapping entities from the given API response using SpaCy's span visualizer.
    """
    # Load a blank SpaCy model (no NER model)
    nlp = spacy.blank("en")
    text = api_response.get('text', '')
    entities = api_response.get('nes', [])

    # Create a spaCy doc object
    doc = nlp(text)

    # Collect all entities without filtering
    spans = []
    for entity in entities:
        char_start = entity['lOffset']
        char_end = entity['rOffset']
        label = entity['type']
        span = doc.char_span(char_start, char_end, label=label, alignment_mode="expand")
        if span is not None:
            spans.append(span)
        else:
            print(f"Warning: Span could not be created for {entity['surface']} at offsets {char_start}-{char_end}")

    # Add the spans to the Doc
    doc.spans["sc"] = spans  # "sc" is a custom key for the overlapping spans

    # Visualize the overlapping spans using the "span" style
    displacy.render(doc, style="span", jupyter=True, options=options)




## The interactive visualization app
Enter German, French, or English text (try noisy input):

```
1917 wurde die Zentralbibliothek der Stadt Zürich vom zürcherischen Kantonsbaumeister Hermann Fietz fertig gebaut.

In 1917, the Central Library of the city of Zurich was completed by the Zurich cantonal architect Hermann Fietz.

La construction de la Bibliothèque Centrale de la ville de Zurich a été achevée en 1917 par l'architecte cantonal zurichois Hermann Fietz.
```


In [None]:
create_text_input_interface(get_linked_entities, placeholder_text="Enter your text here...", button_description="Analyze")

Textarea(value='', layout=Layout(height='100px', width='800px'), placeholder='Enter your text here...')

HBox(children=(Button(button_style='info', description='Analyze', layout=Layout(height='40px', margin='20px 10…

Output()

Output()

# New section