## Displaying Annotations

### Options
- DisplaCy: Easy to impliment, there is an example below. 
- BRAT: Need to use 'brat-widget' for implimention in Jupyter. Need node.js.
- Pigeon: More of an annotation tool.
- ipywidgets: Allows for our own customisation.  


### Load Data

In order to make our visualisations compatable with all NER models, we read directly from the data, rather than using inbuilt NER tools, e.g. with Spacy

In [6]:
import json

with open("../../../example_output/example_pipeline_14_05_24/llm.json") as f:
    string_data = json.load(f)

with open(
    "../../../example_output/example_pipeline_14_05_24/extraction.json"
) as f:
    entity_data = json.load(f)

## DisplaCy

In [7]:
import spacy
from spacy import displacy

In [8]:
string_id = 1

# Turn entities into dictionary compatable with displacy.
dic_ents = {
    "text": string_data[string_id],
    "ents": entity_data[string_id]["Entities"],
}

In [10]:
displacy.render(dic_ents, manual=True, style="ent")

## ipyWidgets

In [5]:
# pip install ipywidgets

ipywidgets is a very versatile tool for generating 'widgets', which includes being able to label/annotate text. 

Below is a prototype, but allows for lots of future change. 

In [17]:
import ipywidgets as widgets
from IPython.display import display, HTML


def generate_colours(num_colours, transparency=1):
    colours = []
    for i in range(num_colours):
        hue = i * (360 / num_colours)
        saturation = 90 + (i * (10 / num_colours))
        lightness = 50 + (i * (20 / num_colours))
        colours.append(
            f"hsl({hue}, {saturation}%, {lightness}%, {transparency})".format(
                hue, saturation, lightness
            )
        )
    return colours


def highlight_character_ranges(text, entities):
    # Initialize an empty list to store formatted HTML for each character
    formatted_chars = []
    # Create a dictionary to map character positions to their corresponding labels
    position_to_label = {}
    colour_of_label = {}

    labels = [e["label"] for e in entities]
    start_ids = [e["start"] for e in entities]
    end_ids = [e["end"] for e in entities]

    colours = generate_colours(len(labels), transparency=0.5)

    # Iterate through each range of character positions and labels
    for label, start_idx, end_idx, colour in zip(
        labels, start_ids, end_ids, colours
    ):
        # Store the label for each character position within the range
        for i in range(start_idx, end_idx):
            position_to_label[i] = label
            colour_of_label[i] = colour

    # Iterate through each character in the text string
    for i, char in enumerate(text):
        # Check if the current character position should be highlighted
        if i in position_to_label:
            # If the character position should be highlighted, wrap it in a span with CSS class for styling and tooltip
            formatted_chars.append(
                f'<span class="highlight" style="background-color: {colour_of_label[i]};" title="{position_to_label[i]}">{text[i]}</span>'
            )
        else:
            # If the character position shouldn't be highlighted, just use the character as is
            formatted_chars.append(char)

    # Join the formatted characters back into a single string
    highlighted_text = "".join(formatted_chars)

    # Create custom CSS style for the highlight class
    custom_css = """
    <style>
    .highlight {
        cursor: pointer; /* Change cursor to pointer on hover */
    }
    </style>
    """

    # Create an HTML widget to display the highlighted text
    html_widget = widgets.HTML(value=custom_css + highlighted_text)

    # Display the HTML widget
    display(html_widget, clear=True)

In [18]:
text = string_data[string_id]
entities = entity_data[string_id]["Entities"]

highlight_character_ranges(text, entities)

HTML(value='\n    <style>\n    .highlight {\n        cursor: pointer; /* Change cursor to pointer on hover */\â€¦

### Aggregate Analysis

In [19]:
from collections import Counter

all_entities = []
for entities in entity_data:
    for entity in entities["Entities"]:
        all_entities.append(entity["label"])

entity_counter = Counter(all_entities)

In [20]:
import pandas as pd

top_n = 3

keys = list(entity_counter.keys())
counter_df = pd.DataFrame(
    {"keys": keys, "counts": [entity_counter[k] for k in keys]}
)

counter_df = counter_df.sort_values("counts", ascending=False).iloc[:top_n, :]

print(counter_df)

         keys  counts
1   diagnosis      10
0      person       8
2  nhs number       4
