Named Entity Recognition (NER) is a subtask of Natural Language Processing (NLP) that aims to identify and extract named entities from a text. Named entities are objects or concepts that are assigned a name, such as persons, organizations, locations, dates, and numerical expressions.

NER involves using machine learning algorithms to automatically recognize and classify named entities in text data, based on their context and characteristics. NER can be applied in a wide range of applications, such as information extraction, question answering, text classification, and sentiment analysis.

The output of NER is a structured representation of the text, where named entities are tagged and classified according to predefined categories. NER is a critical component in many NLP applications, as it helps to extract structured information from unstructured text data, making it easier to process and analyze.

In [1]:
import numpy as np



In [2]:
import spacy


ModuleNotFoundError: No module named 'spacy'

In [None]:
pwd

'c:\\Users\\Omar\\Desktop\\NLP\\NER'

In [None]:
nlp = spacy.load("en_core_web_sm")
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [None]:
doc = nlp("Tesla Inc is going to Omar Hamed ali")
#print(doc.ents)
print(doc)
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tesla Inc is going to Omar Hamed ali
Tesla Inc  |  ORG  |  Companies, agencies, institutions, etc.
Omar Hamed  |  PERSON  |  People, including fictional


In [None]:
from spacy import displacy

displacy.render(doc, style="ent")

### List down all the entities


In [None]:
nlp.pipe_labels['ner']


['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

In [None]:
doc = nlp("Omar Hamed  founded Bloomberg in 1982")
doc.ents


(Omar Hamed, Bloomberg, 1982)

In [None]:
doc = nlp("Omar Hamed founded Bloomberg in 1982")
# for ent in doc.ents:
#     print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))
displacy.render(doc, style="ent")

In [None]:
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Tesla Inc  |  ORG  |  0 | 9
Twitter Inc  |  PERSON  |  30 | 41
$45 billion  |  MONEY  |  46 | 57


### Setting custom entities


In [None]:
doc = nlp("Tesla is going to acquire Twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  PERSON
$45 billion  |  MONEY


In [None]:
s = doc[2:5]
s

going to acquire

In [None]:
type(s)


spacy.tokens.span.Span

In [None]:
from spacy.tokens import Span

# Create a Span object for the first entity
s1 = Span(doc, 0, 1, label="ORG")  # 'Tesla' is labeled as an organization (ORG)
# Create a Span object for the second entity
s2 = Span(doc, 5, 6, label="ORG")  # 'Twitter' is labeled as an organization (ORG)

# Set the custom entities in the doc
doc.set_ents([s1, s2], default="unmodified")

In [None]:
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  ORG
$45 billion  |  MONEY


In [None]:
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

# Define input text
input_text = "Steve Jobs was the CEO of Apple Corp. in California."

# Tokenize input text
tokens = word_tokenize(input_text)

# Perform Part-of-Speech (POS) tagging
pos_tags = pos_tag(tokens)
print('pos_tags',pos_tags)

# Perform Named Entity Recognition (NER)
ne_tree = ne_chunk(pos_tags)
print('ne_tree',ne_tree)

# Extract named entities and their labels
named_entities = []
for subtree in ne_tree.subtrees():
    if subtree.label() in ['PERSON', 'ORGANIZATION', 'LOCATION']:
        named_entity = ' '.join(word for word, tag in subtree.leaves())
        print(named_entity)
        named_entities.append((named_entity, subtree.label()))

# Print named entities and their labels
print(named_entities)


pos_tags [('Steve', 'NNP'), ('Jobs', 'NNP'), ('was', 'VBD'), ('the', 'DT'), ('CEO', 'NNP'), ('of', 'IN'), ('Apple', 'NNP'), ('Corp.', 'NNP'), ('in', 'IN'), ('California', 'NNP'), ('.', '.')]
ne_tree (S
  (PERSON Steve/NNP)
  (PERSON Jobs/NNP)
  was/VBD
  the/DT
  (ORGANIZATION CEO/NNP)
  of/IN
  (ORGANIZATION Apple/NNP Corp./NNP)
  in/IN
  (GPE California/NNP)
  ./.)
Steve
Jobs
CEO
Apple Corp.
[('Steve', 'PERSON'), ('Jobs', 'PERSON'), ('CEO', 'ORGANIZATION'), ('Apple Corp.', 'ORGANIZATION')]


In [None]:
for subtree in ne_tree.subtrees():
    print(subtree)

(S
  (PERSON Steve/NNP)
  (PERSON Jobs/NNP)
  was/VBD
  the/DT
  (ORGANIZATION CEO/NNP)
  of/IN
  (ORGANIZATION Apple/NNP Corp./NNP)
  in/IN
  (GPE California/NNP)
  ./.)
(PERSON Steve/NNP)
(PERSON Jobs/NNP)
(ORGANIZATION CEO/NNP)
(ORGANIZATION Apple/NNP Corp./NNP)
(GPE California/NNP)


In [None]:
for word, tag in subtree.leaves():
    print(word)

California
