**Named Entity Recognition (NER)** is a subtask of information extraction in natural language processing. Its primary goal is to identify and classify named entities (such as names of people, organizations, locations, dates, and more) within a given text.<br><br>NER systems analyze the text and locate spans of words or tokens that correspond to specific entity types, such as persons, places, or dates.<br><br>NER is a crucial component in various NLP applications, including information retrieval, question answering, and named entity linking. It helps extract structured information from unstructured text, making it easier to analyze and derive insights from large volumes of textual data. 

### Importing libraries

In [1]:
#optional
#!pip install tkinterhtml 

In [2]:
import spacy
from spacy import displacy
from ipywidgets import interact, widgets, Layout, VBox, HTML
from IPython.display import display, clear_output

### Loading the Named Entity Recognition Model
The **"en_core_web_sm" model** is a pre-trained natural language processing (NLP) model provided by spaCy for English text. It's a small and efficient model that is designed for a wide range of NLP tasks, including tokenization, part-of-speech tagging, dependency parsing, and of course in our case, Named Entity Recognition (NER).

The steps taken for NER can be summarised as follows:
1. **Tokenization:** The first step in NER is tokenization, where the input text is divided into individual words or tokens. Each token represents a unit of meaning in the text, and this step is crucial because NER operates at the token level.

2. **Part-of-Speech Tagging (POS):** NER often begins with part-of-speech tagging. Each token is tagged with its grammatical part of speech (e.g., noun, verb, adjective). This information helps NER models understand the syntactic structure of the text.

3. **Dependency Parsing:** NER systems may also perform dependency parsing. Dependency parsing analyzes the relationships between tokens in a sentence. For example, it identifies which words are the subjects, objects, and modifiers in a sentence. This information can be useful in understanding the context of named entities.

3. **Pattern Matching and Machine Learning:** NER can be approached through rule-based systems or machine learning models. In rule-based systems, predefined patterns or rules are used to identify entities based on their surrounding words, grammatical structures, or regular expressions. Machine learning models, such as conditional random fields (CRFs) or deep learning models like bidirectional LSTMs, learn to recognize entities from labeled training data.

4. **Feature Extraction:** For machine learning-based NER, features are extracted from the tokens and their context. These features may include word embeddings, part-of-speech tags, dependency relationships, and more. These features are used as input to the NER model.

5. **Classification:** In the core of NER, the model classifies each token or span of tokens into predefined entity categories. The model assigns a label to each token, indicating whether it belongs to an entity and, if so, which type of entity (e.g., PERSON, ORGANIZATION, LOCATION).

6. **Contextual Analysis:** NER models often consider the context of tokens to make entity recognition more accurate. They take into account neighboring words and the relationships between them. For example, "Apple" might be classified as an ORGANIZATION when referring to the technology company but as a FRUIT in a different context.

In [3]:
# Load the English NER model
nlp = spacy.load("en_core_web_sm")

### Creating an interactive GUI

In [4]:
# Create a Text widget for input
text_input = widgets.Textarea(
    value='',
    placeholder='Enter text...',
    layout=Layout(width='95%', height='150px')
)

# Create an Output widget for displaying NER visualization
output = widgets.Output(layout=Layout(width='95%', height='auto'))

### Mapping explanations for each NER label available in our model

In [5]:
explanation_mapping = {
    "PERSON": "Person's name",
    "NORP": "Nationality, Religious, or Political Group",
    "FAC": "Building, airport, highway, bridge, etc.",
    "ORG": "Organization, company, agency, institution, etc.",
    "GPE": "Countries, cities, states",
    "LOC": "Non-GPE locations, mountain ranges, bodies of water",
    "PRODUCT": "Objects, vehicles, foods, etc. (not services)",
    "EVENT": "Named hurricanes, battles, wars, sports events, etc.",
    "WORK_OF_ART": "Titles of books, songs, etc.",
    "LAW": "Named documents made into laws",
    "LANGUAGE": "Any named language",
    "DATE": "Absolute or relative dates or periods",
    "TIME": "Times smaller than a day",
    "PERCENT": "Percentage, including '%'",
    "MONEY": "Monetary values, including unit",
    "QUANTITY": "Measurements, as of weight or distance",
    "ORDINAL": "Ordinal numbers (e.g., 'first', 'second')",
    "CARDINAL": "Cardinal numbers (e.g., 'one', 'two')",
    # Add more labels and explanations here
}


#### Creating the function that calls our NER model when we input our text, and generates the output and explanations

In [6]:
# Define a function to process text and display NER visualization
def process_text(change):
    text = text_input.value
    doc = nlp(text)
    
    with output:
        clear_output()  # Clear the previous content
        
        # Display NER visualization
        html = displacy.render(doc, style="ent", page=True)
        display(HTML(html))  # Use display(HTML(html)) to show the HTML content
        
        # Display legends in the order they appear in the input text
        legend_html = "<b>NER Label Explanations:</b><br>"
        for ent in doc.ents:
            label = ent.label_
            if label in explanation_mapping:
                explanation = explanation_mapping[label]
                legend_html += f"<b>{label}:</b> {explanation}<br>"
        display(HTML(legend_html))

# Attach the process_text function to the text_input widget's value change event
text_input.observe(process_text, names='value')

# Create a VBox layout to display widgets vertically
layout = VBox([text_input, output])

### Running the model

In [7]:
display(layout)

VBox(children=(Textarea(value='', layout=Layout(height='150px', width='95%'), placeholder='Enter text...'), Ou…