<div style="
    background: linear-gradient(90deg,rgb(251, 255, 10), #ff758c, #ff4d6d);
    -webkit-background-clip: text;
    -webkit-text-fill-color: transparent;
    font-size: 20px;
    font-weight: bold;
    text-align: center;">
 Named Entity Recognition
</div>


* Named Entity Recognition (NER) - sometimes referred to as entity chunking, extraction, or identification is the task of identifying and categorizing key information(entities) in text.

* An entity can be any word or series of words that consistently refers to the same thing. Every detected entity is classified into a predetermined category.

* For example, an NER machine learning(ML) model might detect the word "MITU Skillogies" in a text and classify it as a "Company".

<div style="
    background: linear-gradient(90deg,rgb(251, 255, 10), #ff758c, #ff4d6d);
    -webkit-background-clip: text;
    -webkit-text-fill-color: transparent;
    font-size: 17px;
    font-weight: bold;
    text-align: center;">
    Using nltk <br>
-----------------------------------------------------------------------------------------------------------------------------------------
</div>


In [1]:
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

In [2]:
nltk.download('words')
nltk.download('maxent_ne_chunker')
nltk.download('maxent_ne_chunker_tab')

[nltk_data] Downloading package words to
[nltk_data]     C:\Users\DAI.STUDENTSDC\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\DAI.STUDENTSDC\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     C:\Users\DAI.STUDENTSDC\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!


True

In [3]:
text = 'Sachin Tenduakal was born in Mumbai, India on April 24, 1974.'


tokens =word_tokenize(text)

tagged_tokens = pos_tag(tokens)

ner_tree = ne_chunk(tagged_tokens)
print(ner_tree)

(S
  (PERSON Sachin/NNP)
  (PERSON Tenduakal/NNP)
  was/VBD
  born/VBN
  in/IN
  (GPE Mumbai/NNP)
  ,/,
  (GPE India/NNP)
  on/IN
  April/NNP
  24/CD
  ,/,
  1974/CD
  ./.)


In [4]:
ner_tree.draw()
ner_tree.pos()

[(('Sachin', 'NNP'), 'PERSON'),
 (('Tenduakal', 'NNP'), 'PERSON'),
 (('was', 'VBD'), 'S'),
 (('born', 'VBN'), 'S'),
 (('in', 'IN'), 'S'),
 (('Mumbai', 'NNP'), 'GPE'),
 ((',', ','), 'S'),
 (('India', 'NNP'), 'GPE'),
 (('on', 'IN'), 'S'),
 (('April', 'NNP'), 'S'),
 (('24', 'CD'), 'S'),
 ((',', ','), 'S'),
 (('1974', 'CD'), 'S'),
 (('.', '.'), 'S')]

In [5]:
nouns = {i[0][0]:i[1] for i in ner_tree.pos() if i[0][1].startswith('NN')}
nouns

{'Sachin': 'PERSON',
 'Tenduakal': 'PERSON',
 'Mumbai': 'GPE',
 'India': 'GPE',
 'April': 'S'}

<div style="
    background: linear-gradient(90deg,rgb(251, 255, 10), #ff758c, #ff4d6d);
    -webkit-background-clip: text;
    -webkit-text-fill-color: transparent;
    font-size: 17px;
    font-weight: bold;
    text-align: center;">
    Using Spacy <br>
-----------------------------------------------------------------------------------------------------------------------------------------
</div>


In [10]:
# !python -m pip install spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.6.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ------------------------------- ------ 10.5/12.8 MB 108.8 MB/s eta 0:00:01
     --------------------------------------- 12.8/12.8 MB 38.2 MB/s eta 0:00:00
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.6.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [62]:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')

In [40]:
text = 'Mark Zukerberg will meet Aditya Joshi in New York, USA on Monday 21, 2024 4pm for $3 Trillion deal.'

sent = nlp(text)

print(sent.ents, sent.cats, end='\n\n')
for word in sent.ents:
    print(f'{word.ents} --> {word.label_}')

(Mark Zukerberg, Aditya Joshi, New York, USA, Monday 21, 2024 4, $3 Trillion) {}

[Mark Zukerberg] --> PERSON
[Aditya Joshi] --> PERSON
[New York] --> GPE
[USA] --> GPE
[Monday 21, 2024 4] --> DATE
[$3 Trillion] --> MONEY


In [69]:
text = 'Sachin Tendukar was born in Mumbai, India on April 24, 1974.'

sent = nlp(text)

for word in sent.ents:
    print(f'{word.ents[0]} --> {word.label_}')


displacy.render(sent)

Sachin Tendukar --> PERSON
Mumbai --> GPE
India --> GPE
April 24, 1974 --> DATE


In [49]:
spacy.explain('PERSON'), spacy.explain('DATE'), spacy.explain('GPE'),  spacy.explain('MONEY')

('People, including fictional',
 'Absolute or relative dates or periods',
 'Countries, cities, states',
 'Monetary values, including unit')

In [59]:
text = """
Indigenous people have lived in Alaska for thousands of years, and it is widely believed that the region served as the entry point for the initial settlement of North America by way of the Bering land bridge. The Russian Empire was the first to actively colonize the area beginning in the 18th century, eventually establishing Russian America, which spanned most of the current state and promoted and maintained a native Alaskan Creole population.[7] The expense and logistical difficulty of maintaining this distant possession prompted its sale to the U.S. in 1867 for US$7.2 million (equivalent to $157 million in 2023). The area went through several administrative changes before becoming organized as a territory on May 11, 1912. It was admitted as the 49th state of the U.S. on January 3, 1959.
"""

sent = nlp(text)

for word in sent.ents:
    print(f'{str(word.ents[0]):<25} {word.label_}')

Alaska                    GPE
thousands of years        DATE
North America             LOC
The Russian Empire        GPE
first                     ORDINAL
the 18th century          DATE
Russian America           LOC
Alaskan                   NORP
U.S.                      GPE
1867                      DATE
US$7.2 million            MONEY
$157 million              MONEY
2023                      DATE
May 11, 1912              DATE
49th                      ORDINAL
U.S.                      GPE
January 3, 1959           DATE


In [63]:
displacy.render(sent, style='ent')