## **Simple tasks in Named Entity Recognition (NER) - NLP**

Tasks involve performing basic NER using **spaCy** a widely used open-source library for natural language processing (NLP) in Python.

---

### **Steps performed in the First Task:**
- Asking a user to enter a sentence
- Identify the named entities in the sentence
- Printing the details of each entity in the sentence provided by the user
- Visualizing the recognized entities

#### **Test Sentence**
*"Chandler tried to quit his job on Monday, but his boss at Tulsa Data Systems didn’t even notice."*

#### **Expectef Entities**
- **Chandler** → PERSON  
- **Monday** → DATE  
- **Tulsa Data Systems** → ORG  


Test sentence would be: Chandler tried to quit his job on Monday, but his boss at Tulsa Data Systems didn’t even notice.

I am expecting entities to be Chandler as PERSON, Monday as DATE, and Tusla Data Systems as ORG

In [1]:
# Import the spaCy library and load the English language model
import spacy
nlp = spacy.load("en_core_web_sm")

In [2]:
# Input text for NER
input_sen = input('Hey You! Enter a sentence: ')
print(f'You have entered the following senyence: {input_sen}')
doc1 = nlp(input_sen)
print(f"Entities recognized in entered sentence: {doc1.ents}") # Return a tuple of entities recognized in text

You have entered the following senyence: Chandler tried to quit his job on Monday, but his boss at Tulsa Data Systems didn’t even notice.
Entities recognized in entered sentence: (Chandler, Monday, Tulsa Data Systems)


In [3]:
# Iterate through recognized entities and print their details
for entity in doc1.ents:
    print(f"Entity: {entity.text}")
    print(f"Label: {entity.label_}")
    print(f"Start Position: {entity.start}")
    print(f"End Position: {entity.end}")
    print(f"Entity Type: {spacy.explain(entity.label_)}")
    print("_"*10)

Entity: Chandler
Label: PERSON
Start Position: 0
End Position: 1
Entity Type: People, including fictional
__________
Entity: Monday
Label: DATE
Start Position: 7
End Position: 8
Entity Type: Absolute or relative dates or periods
__________
Entity: Tulsa Data Systems
Label: ORG
Start Position: 13
End Position: 16
Entity Type: Companies, agencies, institutions, etc.
__________


In [4]:
# Entity Visualization
from spacy import displacy
displacy.render(doc1, style = "ent", jupyter = True)

In this task, a longer paragraph is processed using spaCy's pre-trained NER model.

### **Steps Performed in the Second Task:**
- Applying spaCy’s NER model to the given paragraph
- Listing all detected named entities
- Identifying any entities that were **not recognized** by the model  
- Counting the number of **PERSON** and **ORG** entities  
- Visualizing the recognized entities using `displacy`

#### **Paragraph Used**
*“This agreement is made and entered into by and between ‘Abc & Co.’ and ‘Bcd LLC’ for term 1 year starting from April 1, 2020, hereinafter collectively referred to as the Parties. Samuel P. Jackson in the place (New York) and on the date written below, with the following terms and conditions.”*


#### **Expected Entities**

| Entity Text            | Expected Label |
|------------------------|----------------|
| Abc & Co.              | ORG            |
| Bcd LLC                | ORG            |
| 1 year                 | DATE           |
| April 1, 2020          | DATE           |
| Samuel P. Jackson      | PERSON         |
| New York               | GPE            |


#### **Expected Counts**
- **PERSON:** 1  
- **ORG:** 2  


In [5]:
from spacy.training.example import Example
from spacy.training.iob_utils import offsets_to_biluo_tags

In [6]:
nlp = spacy.load("en_core_web_sm")
text = "This agreement is made and entered into by and between ‘Abc & Co.’  and ‘Bcd LLC’ for term 1 year starting from April 1, 2020,  hereinafter collectively referred to as the Parties. Samuel P. Jackson in the place (New York) and on the date written below, with the following terms and conditions."
doc2 = nlp(text)
print(f"Entities recognized in entered sentence: {doc2.ents}") 

Entities recognized in entered sentence: (between ‘Abc & Co., 1 year, April 1, 2020, Samuel P. Jackson, New York)


#### **Notes:**
- SpaCy incorrectly tag **“between”** because the opening quotation mark before *‘Abc & Co.’*) caused the model to misinterpret the token as part of a named phrase. This reflects the known limitation of the small English model.
- SpaCy missed **“Bcd LLC”**, so it must be added manually.

In [7]:
# Iterate through recognized entities and print thier label
for entity in doc2.ents:
    print(f"Entity: {entity.text} > Label: {entity.label_}")

Entity: between ‘Abc & Co. > Label: ORG
Entity: 1 year > Label: DATE
Entity: April 1, 2020 > Label: DATE
Entity: Samuel P. Jackson > Label: PERSON
Entity: New York > Label: GPE


In [8]:
# Add custom entities annotations manually
annotations = [
    (text.find("Abc & Co."), text.find("Abc & Co.") + len("Abc & Co."), "ORG"),
    (text.find("Bcd LLC"), text.find("Bcd LLC") + len("Bcd LLC"), "ORG"),
    (text.find("1 year"), text.find("1 year") + len("1 year"), "DATE"),
    (text.find("April 1, 2020"), text.find("April 1, 2020") + len("April 1, 2020"), "DATE"),
    (text.find("Samuel P. Jackson"), text.find("Samuel P. Jackson") + len("Samuel P. Jackson"), "PERSON"),
    (text.find("New York"), text.find("New York") + len("New York"), "GPE"),
]

print("Annotations:", annotations)

Annotations: [(56, 65, 'ORG'), (73, 80, 'ORG'), (91, 97, 'DATE'), (112, 125, 'DATE'), (181, 198, 'PERSON'), (213, 221, 'GPE')]


In [9]:
# Check entity alignment
doc = nlp.make_doc(text) # Create a fresh spaCy Doc  - just tokenized, no pipeline run!
tags = offsets_to_biluo_tags(doc, annotations) # Convert offsets annotations into BILUO tags (Begin, Inside, Last, Unit, Outside) to help spaCy check that entity offsets align correctly with tokens
print("Alignment Tags:", tags)

Alignment Tags: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'O', 'O', 'B-ORG', 'L-ORG', 'O', 'O', 'O', 'B-DATE', 'L-DATE', 'O', 'O', 'B-DATE', 'I-DATE', 'I-DATE', 'L-DATE', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'B-GPE', 'L-GPE', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']


In [10]:
example = Example.from_dict(doc, {"entities": annotations})

In [11]:
# Train for 30 iterations
for i in range(30):
    nlp.update([example], drop = 0.03)

In [12]:
# Save model
nlp.to_disk("custom_ner_model")
print("Model trained and saved successfully :)")

Model trained and saved successfully :)


In [13]:
# Load and use the custom model for entity recognition
custom_nlp = spacy.load("custom_ner_model")

# Process the text
doc2 = custom_nlp(text)

# Print recognized entities with thier lables
for entity in doc2.ents:
    print(f"Entity: {entity.text} > Label: {entity.label_}")

Entity: Abc & Co. > Label: ORG
Entity: Bcd LLC > Label: ORG
Entity: 1 year > Label: DATE
Entity: April 1, 2020 > Label: DATE
Entity: Samuel P. Jackson > Label: PERSON
Entity: New York > Label: GPE


In [14]:
# Count PERSON and ORG in the text

person_count = 0
org_count = 0

for entity in doc2.ents:
    if entity.label_ == "PERSON":
        person_count += 1
    elif entity.label_ == "ORG":
        org_count += 1

print("PERSON count:", person_count) # Expected to be 1: Samuel P. Jackson
print("ORG count:", org_count) # Expected to be 2: Abc & Co. + Bcd LLC

PERSON count: 1
ORG count: 2


In [15]:
# Entity visualization
from spacy import displacy
displacy.render(doc2, style = "ent", jupyter = True)