NER : finding and labeling essential pieces of information in text
NER uses a combination of rules and machine learning to discover the entities in text. It can identify people, places, organizations, and other important information.
NER is a powerful tool for turning unstructured text into structured data, making it easier to analyze and understand.
NER can be used for tasks such as sentiment analysis, topic modeling, and information retrieval.

In [1]:
import spacy
from spacy import displacy
from spacy import tokenizer
from IPython.display import HTML, display
import re

In [2]:
nlp = spacy.load('en_core_web_sm')

In [3]:
google_text = "Google was founded on September 4, 1998, by computer scientists Larry Page and Sergey Brin while they were PhD students at Stanford University in California. Together they own about 14% of its publicly listed shares and control 56% of its stockholder voting power through super-voting stock. The company went public via an initial public offering (IPO) in 2004. In 2015, Google was reorganized as a wholly owned subsidiary of Alphabet Inc. Google is Alphabet's largest subsidiary and is a holding company for Alphabet's internet properties and interests. Sundar Pichai was appointed CEO of Google on October 24, 2015, replacing Larry Page, who became the CEO of Alphabet. On December 3, 2019, Pichai also became the CEO of Alphabet."

In [4]:
spacy_doc = nlp(google_text)

In [5]:
spacy_doc

Google was founded on September 4, 1998, by computer scientists Larry Page and Sergey Brin while they were PhD students at Stanford University in California. Together they own about 14% of its publicly listed shares and control 56% of its stockholder voting power through super-voting stock. The company went public via an initial public offering (IPO) in 2004. In 2015, Google was reorganized as a wholly owned subsidiary of Alphabet Inc. Google is Alphabet's largest subsidiary and is a holding company for Alphabet's internet properties and interests. Sundar Pichai was appointed CEO of Google on October 24, 2015, replacing Larry Page, who became the CEO of Alphabet. On December 3, 2019, Pichai also became the CEO of Alphabet.

In [6]:
for word in spacy_doc.ents:
    print(word.text, word.label_)

Google ORG
September 4, 1998 DATE
Larry Page PERSON
Sergey Brin PERSON
PhD WORK_OF_ART
Stanford University ORG
California GPE
about 14% PERCENT
56% PERCENT
IPO ORG
2004 DATE
2015 DATE
Google ORG
Alphabet Inc. ORG
Alphabet ORG
Alphabet ORG
Sundar Pichai PERSON
Google ORG
October 24, 2015 DATE
Larry Page PERSON
Alphabet GPE
December 3, 2019 DATE
Pichai PERSON
Alphabet GPE


In [7]:
html = displacy.render(spacy_doc, style='ent', jupyter=False)
display(HTML(html))

In [8]:
google_text_clean = re.sub(r'[^\w\s]','',google_text)

In [9]:
spacy_doc_clean = nlp(google_text_clean)

In [10]:
for word in spacy_doc_clean.ents:
    print(word.text, word.label_)

Google ORG
September 4 1998 DATE
Larry Page PERSON
Sergey Brin PERSON
PhD WORK_OF_ART
Stanford University ORG
California GPE
about 14 CARDINAL
56 CARDINAL
IPO ORG
2004 DATE
2015 DATE
Alphabet Inc Google ORG
Alphabets ORG
Sundar Pichai PERSON
Google ORG
October 24 2015 DATE
Larry Page PERSON
Alphabet On ORG
December 3 2019 DATE
Alphabet ORG


In [11]:
html = displacy.render(spacy_doc_clean, style='ent', jupyter=False)
display(HTML(html))