<a href="https://colab.research.google.com/github/simon-clematide/colab-notebooks-for-teaching/blob/main/impresso_ner_displacy_visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualizing Named Entities with Impresso API

Welcome to this interactive notebook designed for visualizing named entities extracted from text using the Impresso API. This notebook focuses on fine-grained and nested Named Entity Recognition (NER) and provides an interface for exploring multilingual texts.

**Key Features:**

- **Multilingual Support:** Visualize named entities in texts from different languages including German, French, and English.
- **Fine-Grained Details:** Explore both coarse and fine-grained named entity categories.
- **Interactive Visualization:** Use a custom interface to input text and view visualizations of detected entities directly in this notebook.

**Getting Started:**

1. **Installation:** First, you'll need to install SpaCy, a powerful library for NLP, which will be used for visualizing the entities.
2. **API Integration:** The notebook integrates with the Impresso NER and Named Entity Linking (NEL) API to fetch entity recognition results.
3. **Visualization:** Utilize SpaCy’s displacy to render and explore named entities with custom color schemes for different entity types.
4. **Interactive Interface:** An interactive text input interface allows you to test various texts and view their NER results dynamically.

**Acknowledgements:**

Special thanks to Ema Boros for training the models and programming the API, and to the HIPE team for providing the training material.

For further experimentation, you can directly access the experimental API at [Impresso Annotation](https://impresso-annotation.epfl.ch/).


## The code behind the demo app

### Install spacy for NLP processing visualization.

In [None]:
%pip install spacy

### Call the impresso NER and NEL API
We only use the NER part in the visualization.

In [None]:
import requests

def get_linked_entities(text, coarse_only=False):
    """
    Calls the external API to get named entity recognition (NER) results.
    """
    url = "https://impresso-annotation.epfl.ch/api/ner/"
    payload = {"data": text}
    try:
        response = requests.post(url, json=payload)
        if response.status_code == 200:
            data = response.json()
            data["text"] = text
            # remove fine-grained and components
            if coarse_only:
                for ne in data["nes"]:
                    data["nes"] = [ne for ne in data["nes"] if not "." in ne["type"]]
            return data
        else:
            print(f"Request failed with status code {response.status_code}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None


In [None]:
# Let's test this function
import json
import requests

In [None]:
text = "Die Zentrallbibliothek in Zürich wurde am 1. Oktober 1917 nourgeschrieben."
api_result = get_linked_entities(text,coarse_only=True)
import json
print(json.dumps(api_result, ensure_ascii=False,indent=2))

### Code for the visualization

In [None]:
import spacy
from spacy.tokens import Span
from spacy import displacy


# Define the color scheme for the entities and components
colors = {
    "pers": "#CBC3E3",  # Pastel Lavender
    "org": "#FFC0CB",  # Pastel Pink
    "prod": "#ADD8E6",  # Pastel Blue
    "time": "#FDFD96",  # Pastel Goldenrod
    "loc": "#B0E57C",  # Pastel Green,
    # Fine-grained categories
    "pers.ind": "#E6E6FA",  # Light Lavender
    "pers.coll": "#D8BFD8",  # Lavender Mist
    "pers.ind.articleauthor": "#DDA0DD",  # Thistle
    "org.adm": "#FFD1DC",  # Pale Pink
    "org.ent": "#FFB6C1",  # Pink
    "org.ent.pressagency": "#F08080",  # Light Coral
    "prod.media": "#87CEFA",  # Light Sky Blue
    "prod.doctr": "#B0E0E6",  # Powder Blue
    "time.date.abs": "#EEE8AA",  # Pale Goldenrod
    "loc.adm.town": "#98FB98",  # Pale Green
    "loc.adm.reg": "#90EE90",  # Light Green
    "loc.adm.nat": "#F0FFF0",  # Honeydew
    "loc.adm.sup": "#F5FFFA",  # Mint Cream
    "loc.phys.geo": "#20B2AA",  # Light Sea Green
    "loc.phys.hydro": "#66CDAA",  # Medium Aquamarine
    "loc.phys.astro": "#AFEEEE",  # Pale Turquoise
    "loc.oro": "#FFDAB9",  # Peach Puff
    "loc.fac": "#FFE4B5",  # Moccasin
    "loc.add.phys": "#FFFFE0",  # Light Yellow
    "loc.add.elec": "#FFFACD",  # Lemon Chiffon
    "loc.unk": "#FAF0E6",  # Linen,
}

options = {
    "spans_key": "sc",  # Custom key for spans
    "colors": colors  # Custom color mapping
}

def visualize_overlapping_spans(api_response):
    """
    Visualizes overlapping entities from the given API response using SpaCy's span visualizer.
    """
    # Load a blank SpaCy model (no NER model)
    nlp = spacy.blank("en")
    text = api_response.get('text', '')
    entities = api_response.get('nes', [])

    # Create a spaCy doc object
    doc = nlp(text)

    # Collect all entities without filtering
    spans = []
    for entity in entities:
        char_start = entity['lOffset']
        char_end = entity['rOffset']
        label = entity['type']
        span = doc.char_span(char_start, char_end, label=label, alignment_mode="expand")
        if span is not None:
            spans.append(span)
        else:
            print(f"Warning: Span could not be created for {entity['surface']} at offsets {char_start}-{char_end}")

    # Add the spans to the Doc
    doc.spans["sc"] = spans  # "sc" is a custom key for the overlapping spans

    # Visualize the overlapping spans using the "span" style
    displacy.render(doc, style="span", jupyter=True, options=options)




### Code for the interactive visualization app


In [None]:
import ipywidgets as widgets
from IPython.display import display, HTML

def create_text_input_interface(process_callback, api_options={}, placeholder_text='Type something...', button_description="Process"):
    """
    Creates an interactive text input interface with a processing button, clear button, and spinner.

    Args:
    process_callback (function): Function to call when the process button is clicked. It should accept a string input.
    placeholder_text (str): Placeholder text for the text area.
    button_description (str): Description for the process button.

    Returns:
    None: Displays the interactive interface in a Jupyter environment.
    """

    # Spinner output
    spinner_output = widgets.Output()

    # Spinner HTML content
    spinner_html = """
    <div class="loader" style="border: 6px solid #f3f3f3; border-radius: 50%; border-top: 6px solid #3498db; width: 30px; height: 30px; -webkit-animation: spin 2s linear infinite; animation: spin 2s linear infinite;"></div>
    <style>
    @keyframes spin {
      0% { transform: rotate(0deg); }
      100% { transform: rotate(360deg); }
    }
    </style>
    """

    # Create a larger text area for user input
    text_area = widgets.Textarea(
        placeholder=placeholder_text,
        layout=widgets.Layout(width='800px', height='100px')
    )

    # Process Button to trigger the visualization
    button_process = widgets.Button(
        description=button_description,
        button_style='info',
        layout=widgets.Layout(
            width='150px',
            height='40px',
            margin='20px 10px 0px 0px',
            border_radius='8px',
            box_shadow='2px 2px 12px rgba(0, 0, 0, 0.2)',
        ),
    )

    # Clear Button to clear the text area and output
    button_clear = widgets.Button(
        description="Clear",
        button_style='warning',
        layout=widgets.Layout(
            width='150px',
            height='40px',
            margin='20px 0px 0px 0px',
            border_radius='8px',
            box_shadow='2px 2px 12px rgba(0, 0, 0, 0.2)',
        ),
    )

    # Function to run when the process button is clicked
    def on_button_process_clicked(b):
        if not text_area.value or text_area.value == placeholder_text:
            print("Please enter some text.")
            return

        with spinner_output:
            spinner_output.clear_output()  # Clear previous output
            display(HTML(spinner_html))  # Show spinner

        text_input = text_area.value

        # Call the external process callback
        api_response = process_callback(text_input,**api_options)

        # Hide spinner after processing
        spinner_output.clear_output()

        if api_response:
            visualize_overlapping_spans(api_response)
        else:
            print("Failed to get linked entities from the API.")

    # Function to run when the clear button is clicked
    def on_button_clear_clicked(b):
        text_area.value = ''  # Clear the text area
        spinner_output.clear_output()  # Clear any spinner or output content

    # Link the buttons to their respective functions
    button_process.on_click(on_button_process_clicked)
    button_clear.on_click(on_button_clear_clicked)

    # Display the form with both buttons and spinner output
    display(text_area, widgets.HBox([button_process, button_clear]), spinner_output)

def ner_coarse():
    create_text_input_interface(get_linked_entities, api_options={"coarse_only":True},placeholder_text="Enter your text here...", button_description="Analyze")

def ner_fine():
    create_text_input_interface(get_linked_entities, api_options={"coarse_only":False},placeholder_text="Enter your text here...", button_description="Analyze")


# Play with the impresso NER visualization
Enter German, French, or English text (try noisy input):

```
1917 wurde die Zentralbibliothek der Stadt Zürich vom zürcherischen Kantonsbaumeister Hermann Fietz fertig gebaut.

In 1917, the Central Library of the city of Zurich was completed by the Zurich cantonal architect Hermann Fietz.

La construction de la Bibliothèque Centrale de la ville de Zurich a été achevée en 1917 par l'architecte cantonal zurichois Hermann Fietz.
```
Showing only coarse-grained results:

In [None]:
ner_coarse()

Visualization for all recognized named entities and components

In [None]:
ner_fine()