## Named Entity Recognition

The named entity recognition (NER) is one of the most popular data preprocessing task. It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or refer to in the text.

**NER is the form of NLP.**

At its core, NLP is just a two-step process, below are the two steps that are involved:

- Detecting the entities from the text
- Classifying them into different categories

Some of the categories that are the most important architecture in NER such that:

- Person
- Organization
- Place/ location

Other common tasks include classifying of the following:

- date/time.
- expression
- Numeral measurement (money, percent, weight, etc)
- E-mail address

In [1]:
# import modules and download packages
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
import matplotlib.pyplot as plt

In [2]:
nltk.download('words')
nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('averaged_perceptron_tagger')
nltk.download('state_union')


[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package state_union to /root/nltk_data...
[nltk_data]   Unzipping corpora/state_union.zip.


True

In [3]:
# process the text and print Named entities
# tokenization
train_text = state_union.raw()

sample_text = state_union.raw("2006-GWBush.txt")
print(sample_text)
# function

PRESIDENT GEORGE W. BUSH'S ADDRESS BEFORE A JOINT SESSION OF THE CONGRESS ON THE STATE OF THE UNION
 
January 31, 2006

THE PRESIDENT: Thank you all. Mr. Speaker, Vice President Cheney, members of Congress, members of the Supreme Court and diplomatic corps, distinguished guests, and fellow citizens: Today our nation lost a beloved, graceful, courageous woman who called America to its founding ideals and carried on a noble dream. Tonight we are comforted by the hope of a glad reunion with the husband who was taken so long ago, and we are grateful for the good life of Coretta Scott King. (Applause.)

President George W. Bush reacts to applause during his State of the Union Address at the Capitol, Tuesday, Jan. 31, 2006. White House photo by Eric DraperEvery time I'm invited to this rostrum, I'm humbled by the privilege, and mindful of the history we've seen together. We have gathered under this Capitol dome in moments of national mourning and national achievement. We have served America 

In [4]:
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
tokenized

["PRESIDENT GEORGE W. BUSH'S ADDRESS BEFORE A JOINT SESSION OF THE CONGRESS ON THE STATE OF THE UNION\n \nJanuary 31, 2006\n\nTHE PRESIDENT: Thank you all.",
 'Mr. Speaker, Vice President Cheney, members of Congress, members of the Supreme Court and diplomatic corps, distinguished guests, and fellow citizens: Today our nation lost a beloved, graceful, courageous woman who called America to its founding ideals and carried on a noble dream.',
 'Tonight we are comforted by the hope of a glad reunion with the husband who was taken so long ago, and we are grateful for the good life of Coretta Scott King.',
 '(Applause.)',
 'President George W. Bush reacts to applause during his State of the Union Address at the Capitol, Tuesday, Jan. 31, 2006.',
 "White House photo by Eric DraperEvery time I'm invited to this rostrum, I'm humbled by the privilege, and mindful of the history we've seen together.",
 'We have gathered under this Capitol dome in moments of national mourning and national achieve

In [5]:
for i in tokenized:
    words = nltk.word_tokenize(i)
    tagged = nltk.pos_tag(words)
    namedEnt = nltk.ne_chunk(tagged, binary=False)
    print(namedEnt)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  to/TO
  national/JJ
  elections/NNS
  ./.)
(S
  At/IN
  the/DT
  same/JJ
  time/NN
  ,/,
  our/PRP$
  coalition/NN
  has/VBZ
  been/VBN
  relentless/VBN
  in/IN
  shutting/VBG
  off/RP
  terrorist/JJ
  infiltration/NN
  ,/,
  clearing/VBG
  out/RP
  insurgent/JJ
  strongholds/NNS
  ,/,
  and/CC
  turning/VBG
  over/RP
  territory/NN
  to/TO
  (GPE Iraqi/NNP)
  security/NN
  forces/NNS
  ./.)
(S
  I/PRP
  am/VBP
  confident/JJ
  in/IN
  our/PRP$
  plan/NN
  for/IN
  victory/NN
  ;/:
  I/PRP
  am/VBP
  confident/JJ
  in/IN
  the/DT
  will/MD
  of/IN
  the/DT
  (GPE Iraqi/NNP)
  people/NNS
  ;/:
  I/PRP
  am/VBP
  confident/JJ
  in/IN
  the/DT
  skill/NN
  and/CC
  spirit/NN
  of/IN
  our/PRP$
  military/JJ
  ./.)
(S
  (GPE Fellow/NNP)
  citizens/NNS
  ,/,
  we/PRP
  are/VBP
  in/IN
  this/DT
  fight/NN
  to/TO
  win/VB
  ,/,
  and/CC
  we/PRP
  are/VBP
  winning/VBG
  ./.)
(S (/( (ORGANIZATION Applause/NNP) ./. )/))
(S
  

## NE Chunk


In [6]:
# Define the text to be analyzed
text = """A joint investigation launched by rights organisations into the death of juvenile undertrial Kamal Basnet,
18, and the escape of other juvenile delinquents and undertrials from a child correction home at Sanothimi, Bhaktapur has blamed the child correction centre administration for the incident.
Ten rights activists representing rights bodies, including INSEC, had investigated the incident. The report was released today.
The report stated that juvenile undertrial Kamal Basnet, a resident of Ramechhap district died due to negligence of the child correction centre."""

# Tokenize the text into words
tokens = nltk.word_tokenize(text)

# Apply part-of-speech tagging to the tokens
tagged = nltk.pos_tag(tokens)

# Apply named entity recognition to the tagged words
entities = nltk.chunk.ne_chunk(tagged)

# Print the entities found in the text
for entity in entities:
    # if hasattr(entity,'label'):
    #     print(entity.label())
    if hasattr(entity, 'label') and entity.label() == 'ORGANIZATION':
        print(entity.label(),'-->', ''.join(c[0] for c in entity))
    elif hasattr(entity, 'label') and entity.label() == 'GPE':
        print(entity.label(), '-->',''.join(c[0] for c in entity))

ORGANIZATION --> Sanothimi
ORGANIZATION --> INSEC
GPE --> Ramechhap


## NER using spacy

In [None]:
# command to run before code
! pip install spacy
! pip install nltk
! python -m spacy download en_core_web_sm

In [8]:
# imports and load spacy english language package
import spacy
from spacy import displacy
from spacy import tokenizer
nlp = spacy.load('en_core_web_sm')

In [9]:
#Load the text and process it
# I copied the text from python wiki
text =("""
THE PRESIDENT: Thank you all. Mr. Speaker, Vice President Cheney, members of Congress, members of the Supreme Court and diplomatic corps, distinguished guests, and fellow citizens: Today our nation lost a beloved, graceful, courageous woman who called America to its founding ideals and carried on a noble dream. Tonight we are comforted by the hope of a glad reunion with the husband who was taken so long ago, and we are grateful for the good life of Coretta Scott King. (Applause.)
President George W. Bush reacts to applause during his State of the Union Address at the Capitol, Tuesday, Jan. 31, 2006. White House photo by Eric DraperEvery time I'm invited to this rostrum, I'm humbled by the privilege, and mindful of the history we've seen together. We have gathered under this Capitol dome in moments of national mourning and national achievement. We have served America through one of the most consequential periods of our history -- and it has been my honor to serve with you.
In a system of two parties, two chambers, and two elected branches, there will always be differences and debate. But even tough debates can be conducted in a civil tone, and our differences cannot be allowed to harden into anger. To confront the great issues before us, we must act in a spirit of goodwill and respect for one another -- and I will do my part. Tonight the state of our Union is strong -- and together we will make it stronger. (Applause.)
In this decisive year, you and I will make choices that determine both the future and the character of our country. We will choose to act confidently in pursuing the enemies of freedom -- or retreat from our duties in the hope of an easier life. We will choose to build our prosperity by leading the world economy -- or shut ourselves off from trade and opportunity. In a complex and challenging time, the road of isolationism and protectionism may seem broad and inviting -- yet it ends in danger and decline. The only way to protect our people, the only way to secure the peace, the only way to control our destiny is by our leadership -- so the United States of America will continue to lead. (Applause.)
Abroad, our nation is committed to an historic, long-term goal -- we seek the end of tyranny in our world. Some dismiss that goal as misguided idealism. In reality, the future security of America depends on it. On September the 11th, 2001, we found that problems originating in a failed and oppressive state 7,000 miles away could bring murder and destruction to our country. Dictatorships shelter terrorists, and feed resentment and radicalism, and seek weapons of mass destruction. Democracies replace resentment with hope, respect the rights of their citizens and their neighbors, and join the fight against terror. Every step toward freedom in the world makes our country safer -- so we will act boldly in freedom's cause. (Applause.)
Far from being a hopeless dream, the advance of freedom is the great story of our time. In 1945, there were about two dozen lonely democracies in the world. Today, there are 122. And we're writing a new chapter in the story of self-government -- with women lining up to vote in Afghanistan, and millions of Iraqis marking their liberty with purple ink, and men and women from Lebanon to Egypt debating the rights of individuals and the necessity of freedom. At the start of 2006, more than half the people of our world live in democratic nations. And we do not forget the other half -- in places like Syria and Burma, Zimbabwe, North Korea, and Iran -- because the demands of justice, and the peace of this world, require their freedom, as well. (Applause.)
President George W. Bush delivers his State of the Union Address at the Capitol, Tuesday, Jan. 31, 2006. White House photo by Eric Draper No one can deny the success of freedom, but some men rage and fight against it. And one of the main sources of reaction and opposition is radical Islam -- the perversion by a few of a noble faith into an ideology of terror and death. Terrorists like bin Laden are serious about mass murder -- and all of us must take their declared intentions seriously. They seek to impose a heartless system of totalitarian control throughout the Middle East, and arm themselves with weapons of mass murder.
Their aim is to seize power in Iraq, and use it as a safe haven to launch attacks against America and the world. Lacking the military strength to challenge us directly, the terrorists have chosen the weapon of fear. When they murder children at a school in Beslan, or blow up commuters in London, or behead a bound captive, the terrorists hope these horrors will break our will, allowing the violent to inherit the Earth. But they have miscalculated: We love our freedom, and we will fight to keep it. (Applause.)
In a time of testing, we cannot find security by abandoning our commitments and retreating within our borders. If we were to leave these vicious attackers alone, they would not leave us alone. They would simply move the battlefield to our own shores. There is no peace in retreat. And there is no honor in retreat. By allowing radical Islam to work its will -- by leaving an assaulted world to fend for itself -- we would signal to all that we no longer believe in our own ideals, or even in our own courage. But our enemies and our friends can be certain: The United States will not retreat from the world, and we will never surrender to evil. (Applause.)
America rejects the false comfort of isolationism. We are the nation that saved liberty in Europe, and liberated death camps, and helped raise up democracies, and faced down an evil empire. Once again, we accept the call of history to deliver the oppressed and move this world toward peace. We remain on the offensive against terror networks. We have killed or captured many of their leaders -- and for the others, their day will come.
President George W. Bush greets members of Congress after his State of the Union Address at the Capitol, Tuesday, Jan. 31, 2006. White House photo by Eric Draper We remain on the offensive in Afghanistan, where a fine President and a National Assembly are fighting terror while building the institutions of a new democracy. We're on the offensive in Iraq, with a clear plan for victory. First, we're helping Iraqis build an inclusive government, so that old resentments will be eased and the insurgency will be marginalized.
Second, we're continuing reconstruction efforts, and helping the Iraqi government to fight corruption and build a modern economy, so all Iraqis can experience the benefits of freedom. And, third, we're striking terrorist targets while we train Iraqi forces that are increasingly capable of defeating the enemy. Iraqis are showing their courage every day, and we are proud to be their allies in the cause of freedom. (Applause.)
Our work in Iraq is difficult because our enemy is brutal. But that brutality has not stopped the dramatic progress of a new democracy. In less than three years, the nation has gone from dictatorship to liberation, to sovereignty, to a constitution, to national elections. At the same time, our coalition has been relentless in shutting off terrorist infiltration, clearing out insurgent strongholds, and turning over territory to Iraqi security forces. I am confident in our plan for victory; I am confident in the will of the Iraqi people; I am confident in the skill and spirit of our military. Fellow citizens, we are in this fight to win, and we are winning. (Applause.)
""")
# text2 = # copy the paragraphs from  https://www.python.org/doc/essays/
doc = nlp(text)
#doc2 = nlp(text2)
sentences = list(doc.sents)
print(sentences)

[
THE PRESIDENT:, Thank you all., Mr. Speaker, Vice President Cheney, members of Congress, members of the Supreme Court and diplomatic corps, distinguished guests, and fellow citizens: Today our nation lost a beloved, graceful, courageous woman who called America to its founding ideals and carried on a noble dream., Tonight we are comforted by the hope of a glad reunion with the husband who was taken so long ago, and we are grateful for the good life of Coretta Scott King., (Applause.)
, President George W. Bush reacts to applause during his State of the Union Address at the Capitol, Tuesday, Jan. 31, 2006., White House photo by Eric DraperEvery time I'm invited to this rostrum, I'm humbled by the privilege, and mindful of the history we've seen together., We have gathered under this Capitol dome in moments of national mourning and national achievement., We have served America through one of the most consequential periods of our history -- and it has been my honor to serve with you.
, 

In [10]:
# tokenization
for token in doc:
    print(token.text)



THE
PRESIDENT
:
Thank
you
all
.
Mr.
Speaker
,
Vice
President
Cheney
,
members
of
Congress
,
members
of
the
Supreme
Court
and
diplomatic
corps
,
distinguished
guests
,
and
fellow
citizens
:
Today
our
nation
lost
a
beloved
,
graceful
,
courageous
woman
who
called
America
to
its
founding
ideals
and
carried
on
a
noble
dream
.
Tonight
we
are
comforted
by
the
hope
of
a
glad
reunion
with
the
husband
who
was
taken
so
long
ago
,
and
we
are
grateful
for
the
good
life
of
Coretta
Scott
King
.
(
Applause
.
)


President
George
W.
Bush
reacts
to
applause
during
his
State
of
the
Union
Address
at
the
Capitol
,
Tuesday
,
Jan.
31
,
2006
.
White
House
photo
by
Eric
DraperEvery
time
I
'm
invited
to
this
rostrum
,
I
'm
humbled
by
the
privilege
,
and
mindful
of
the
history
we
've
seen
together
.
We
have
gathered
under
this
Capitol
dome
in
moments
of
national
mourning
and
national
achievement
.
We
have
served
America
through
one
of
the
most
consequential
periods
of
our
history
--
and
it
has
been
my
honor
t

In [11]:
# print entities
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print(ents)
# now we use displaycy function on doc2
displacy.render(doc, style='ent', jupyter=True)

[('Speaker', 35, 42, 'PERSON'), ('Cheney', 59, 65, 'PERSON'), ('Congress', 78, 86, 'ORG'), ('the Supreme Court', 99, 116, 'ORG'), ('diplomatic corps', 121, 137, 'ORG'), ('Today', 182, 187, 'DATE'), ('America', 253, 260, 'GPE'), ('Tonight', 314, 321, 'TIME'), ('Coretta Scott King', 454, 472, 'PERSON'), ('George W. Bush', 496, 510, 'PERSON'), ('the Union Address', 550, 567, 'ORG'), ('Capitol', 575, 582, 'FAC'), ('Tuesday, Jan. 31, 2006', 584, 606, 'DATE'), ('White House', 608, 619, 'ORG'), ('Eric DraperEvery', 629, 645, 'PERSON'), ('Capitol', 786, 793, 'ORG'), ('America', 872, 879, 'GPE'), ('one', 888, 891, 'CARDINAL'), ('two', 1003, 1006, 'CARDINAL'), ('two', 1016, 1019, 'CARDINAL'), ('two', 1034, 1037, 'CARDINAL'), ('harden', 1199, 1205, 'GPE'), ('Tonight', 1348, 1355, 'TIME'), ('this decisive year', 1446, 1464, 'DATE'), ('the United States of America', 2087, 2115, 'GPE'), ('America', 2339, 2346, 'GPE'), ('September the 11th, 2001', 2365, 2389, 'DATE'), ('7,000 miles', 2459, 2470, 'QUA