Installing needed libraries

In [1]:
!pip install spacy 



In [1]:
import spacy

In [2]:
print(spacy.__version__)

3.2.0


In [3]:
!python -m spacy download en_core_web_sm 

[+] Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')


In [3]:
nlp = spacy.load('en_core_web_sm')

In [24]:
doc = nlp("He went to play football")

In [5]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

### **POS Tagging**

In [6]:
for token in doc:
    print(token.text," ", token.pos_)

He   PRON
went   VERB
to   PART
play   VERB
football   NOUN


In [7]:
spacy.explain("PART")

'particle'

### **Dependency Parsing**

In [8]:
for token in doc:
    print(token.text, " ", token.dep_)

He   nsubj
went   ROOT
to   aux
play   advcl
football   dobj


In [9]:
spacy.explain("nsubj"), spacy.explain("aux")

('nominal subject', 'auxiliary')

___
## Fine-grained POS Tag Examples
These are some grammatical examples (shown in **bold**) of specific fine-grained tags. We've removed punctuation and rarely used tags:
<table>
<tr><th>POS</th><th>TAG</th><th>DESCRIPTION</th><th>EXAMPLE</th></tr>
<tr><td>ADJ</td><td>AFX</td><td>affix</td><td>The Flintstones were a **pre**-historic family.</td></tr>
<tr><td>ADJ</td><td>JJ</td><td>adjective</td><td>This is a **good** sentence.</td></tr>
<tr><td>ADJ</td><td>JJR</td><td>adjective, comparative</td><td>This is a **better** sentence.</td></tr>
<tr><td>ADJ</td><td>JJS</td><td>adjective, superlative</td><td>This is the **best** sentence.</td></tr>
<tr><td>ADJ</td><td>PDT</td><td>predeterminer</td><td>Waking up is **half** the battle.</td></tr>
<tr><td>ADJ</td><td>PRP\$</td><td>pronoun, possessive</td><td>**His** arm hurts.</td></tr>
<tr><td>ADJ</td><td>WDT</td><td>wh-determiner</td><td>It's blue, **which** is odd.</td></tr>
<tr><td>ADJ</td><td>WP\$</td><td>wh-pronoun, possessive</td><td>We don't know **whose** it is.</td></tr>
<tr><td>ADP</td><td>IN</td><td>conjunction, subordinating or preposition</td><td>It arrived **in** a box.</td></tr>
<tr><td>ADV</td><td>EX</td><td>existential there</td><td>**There** is cake.</td></tr>
<tr><td>ADV</td><td>RB</td><td>adverb</td><td>He ran **quickly**.</td></tr>
<tr><td>ADV</td><td>RBR</td><td>adverb, comparative</td><td>He ran **quicker**.</td></tr>
<tr><td>ADV</td><td>RBS</td><td>adverb, superlative</td><td>He ran **fastest**.</td></tr>
<tr><td>ADV</td><td>WRB</td><td>wh-adverb</td><td>**When** was that?</td></tr>
<tr><td>CONJ</td><td>CC</td><td>conjunction, coordinating</td><td>The balloon popped **and** everyone jumped.</td></tr>
<tr><td>DET</td><td>DT</td><td>determiner</td><td>**This** is **a** sentence.</td></tr>
<tr><td>INTJ</td><td>UH</td><td>interjection</td><td>**Um**, I don't know.</td></tr>
<tr><td>NOUN</td><td>NN</td><td>noun, singular or mass</td><td>This is a **sentence**.</td></tr>
<tr><td>NOUN</td><td>NNS</td><td>noun, plural</td><td>These are **words**.</td></tr>
<tr><td>NOUN</td><td>WP</td><td>wh-pronoun, personal</td><td>**Who** was that?</td></tr>
<tr><td>NUM</td><td>CD</td><td>cardinal number</td><td>I want **three** things.</td></tr>
<tr><td>PART</td><td>POS</td><td>possessive ending</td><td>Fred**'s** name is short.</td></tr>
<tr><td>PART</td><td>RP</td><td>adverb, particle</td><td>Put it **back**!</td></tr>
<tr><td>PART</td><td>TO</td><td>infinitival to</td><td>I want **to** go.</td></tr>
<tr><td>PRON</td><td>PRP</td><td>pronoun, personal</td><td>**I** want **you** to go.</td></tr>
<tr><td>PROPN</td><td>NNP</td><td>noun, proper singular</td><td>**Kilroy** was here.</td></tr>
<tr><td>PROPN</td><td>NNPS</td><td>noun, proper plural</td><td>The **Flintstones** were a pre-historic family.</td></tr>
<tr><td>VERB</td><td>MD</td><td>verb, modal auxiliary</td><td>This **could** work.</td></tr>
<tr><td>VERB</td><td>VB</td><td>verb, base form</td><td>I want to **go**.</td></tr>
<tr><td>VERB</td><td>VBD</td><td>verb, past tense</td><td>This **was** a sentence.</td></tr>
<tr><td>VERB</td><td>VBG</td><td>verb, gerund or present participle</td><td>I am **going**.</td></tr>
<tr><td>VERB</td><td>VBN</td><td>verb, past participle</td><td>The treasure was **lost**.</td></tr>
<tr><td>VERB</td><td>VBP</td><td>verb, non-3rd person singular present</td><td>I **want** to go.</td></tr>
<tr><td>VERB</td><td>VBZ</td><td>verb, 3rd person singular present</td><td>He **wants** to go.</td></tr>
</table>

In [25]:
# Render the dependency parse immediately inside Jupyter:
from spacy import displacy
displacy.render(doc, style='dep', jupyter=True, options={'distance': 110})

## **Named Entity Recognition**

In [6]:
text = "Sherlock Holmes is a famous detective"

In [7]:
doc = nlp(text)
 
for ent in doc.ents:
    print(ent.text, ent.label_)
    print(ent.text, ent.start_char, ent.end_char,
    ent.label_, spacy.explain(ent.label_))

Sherlock Holmes PERSON
Sherlock Holmes 0 15 PERSON People, including fictional


**Visualising results using displacy**

In [12]:
article = "The university was founded in 1885 by Leland and Jane Stanford in memory of \
their only child, Leland Stanford Jr., who had died of typhoid fever at age 15 the previous \
year. Stanford was a former Governor of California and U.S. Senator; he made his fortune as a railroad tycoon. \
The school admitted its first students on October 1, 1891,[2][3] as a coeducational and non-denominational institution."

In [13]:
document = nlp(article)
print('Original Sentence: %s' % (article))
for element in document.ents:
    print('Type: %s, Value: %s' % (element.label_, element))

Original Sentence: The university was founded in 1885 by Leland and Jane Stanford in memory of their only child, Leland Stanford Jr., who had died of typhoid fever at age 15 the previous year. Stanford was a former Governor of California and U.S. Senator; he made his fortune as a railroad tycoon. The school admitted its first students on October 1, 1891,[2][3] as a coeducational and non-denominational institution.
Type: DATE, Value: 1885
Type: GPE, Value: Leland
Type: PERSON, Value: Jane Stanford
Type: PERSON, Value: Leland Stanford Jr.
Type: DATE, Value: age 15
Type: DATE, Value: the previous year
Type: ORG, Value: Stanford
Type: GPE, Value: California
Type: GPE, Value: U.S.
Type: ORDINAL, Value: first
Type: DATE, Value: October 1
Type: DATE, Value: 1891,[2][3


In [14]:
displacy.render(document, style='ent', jupyter=True)

In [16]:
options = {'ents': ['ORG', 'GPE']}

displacy.render(document, style='ent', jupyter=True, options=options)

## NER Tags
Tags are accessible through the `.label_` property of an entity.
<table>
<tr><th>TYPE</th><th>DESCRIPTION</th><th>EXAMPLE</th></tr>
<tr><td>`PERSON`</td><td>People, including fictional.</td><td>*Fred Flintstone*</td></tr>
<tr><td>`NORP`</td><td>Nationalities or religious or political groups.</td><td>*The Republican Party*</td></tr>
<tr><td>`FAC`</td><td>Buildings, airports, highways, bridges, etc.</td><td>*Logan International Airport, The Golden Gate*</td></tr>
<tr><td>`ORG`</td><td>Companies, agencies, institutions, etc.</td><td>*Microsoft, FBI, MIT*</td></tr>
<tr><td>`GPE`</td><td>Countries, cities, states.</td><td>*France, UAR, Chicago, Idaho*</td></tr>
<tr><td>`LOC`</td><td>Non-GPE locations, mountain ranges, bodies of water.</td><td>*Europe, Nile River, Midwest*</td></tr>
<tr><td>`PRODUCT`</td><td>Objects, vehicles, foods, etc. (Not services.)</td><td>*Formula 1*</td></tr>
<tr><td>`EVENT`</td><td>Named hurricanes, battles, wars, sports events, etc.</td><td>*Olympic Games*</td></tr>
<tr><td>`WORK_OF_ART`</td><td>Titles of books, songs, etc.</td><td>*The Mona Lisa*</td></tr>
<tr><td>`LAW`</td><td>Named documents made into laws.</td><td>*Roe v. Wade*</td></tr>
<tr><td>`LANGUAGE`</td><td>Any named language.</td><td>*English*</td></tr>
<tr><td>`DATE`</td><td>Absolute or relative dates or periods.</td><td>*20 July 1969*</td></tr>
<tr><td>`TIME`</td><td>Times smaller than a day.</td><td>*Four hours*</td></tr>
<tr><td>`PERCENT`</td><td>Percentage, including "%".</td><td>*Eighty percent*</td></tr>
<tr><td>`MONEY`</td><td>Monetary values, including unit.</td><td>*Twenty Cents*</td></tr>
<tr><td>`QUANTITY`</td><td>Measurements, as of weight or distance.</td><td>*Several kilometers, 55kg*</td></tr>
<tr><td>`ORDINAL`</td><td>"first", "second", etc.</td><td>*9th, Ninth*</td></tr>
<tr><td>`CARDINAL`</td><td>Numerals that do not fall under another type.</td><td>*2, Two, Fifty-two*</td></tr>
</table>

___
## Adding a Named Entity to a Span
Normally we would have spaCy build a library of named entities by training it on several samples of text.<br>In this case, we only want to add one value:

In [20]:
doc = nlp(u'Tesla to build a U.K. factory for $6 million')

if doc.ents:
    for ent in doc.ents:
        print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))
else:
    print('No named entities found.')

U.K. - GPE - Countries, cities, states
$6 million - MONEY - Monetary values, including unit


In [21]:
from spacy.tokens import Span

# Get the hash value of the ORG entity label
ORG = doc.vocab.strings[u'ORG']  

# Create a Span for the new entity
new_ent = Span(doc, 0, 1, label=ORG)

# Add the entity to the existing Doc object
doc.ents = list(doc.ents) + [new_ent]

In [22]:
if doc.ents:
    for ent in doc.ents:
        print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))
else:
    print('No named entities found.')

Tesla - ORG - Companies, agencies, institutions, etc.
U.K. - GPE - Countries, cities, states
$6 million - MONEY - Monetary values, including unit


### **Custom Named Entity Recognition**

In [15]:
DATA = [
  ("Search Analytics: Business Value & BigData NoSQL Backend, Otis Gospodnetic ", {'entities': [ (58,75,'PERSON') ] }),
  ("Introduction to Elasticsearch by Radu ", {'entities': [ (16,29,'TECH'), (32, 36, 'PERSON') ] }),
]

In [16]:
nlp.entity.add_label('PERSON')
nlp.entity.add_label('TECH')


optimizer = nlp.begin_training()

for i in range(200):
    #random.shuffle(DATA)
    for text, annotations in DATA:
        nlp.update([text], [annotations], sgd=optimizer)




  gold = GoldParse(doc, **gold)
  gold = GoldParse(doc, **gold)


In [21]:
doc = nlp("Running High Performance And Fault Tolerant Elasticsearch by Radu")
for entity in doc.ents:
      print(entity.label_, ' | ', entity.text)

In [None]:
import json
with open('/content/NER/stock_market_training.json',encoding="utf8") as file:
  TRAIN_DATA = json.load(file)

In [None]:
TRAIN_DATA

{'annotations': [['The share price of PVR rose over 7 percent on Wednesday after the multiplex chain said that it has reduced losses in Q2 despite nil revenue from the core movie exhibition business.',
   {'entities': [[19, 22, 'COMPANY'],
     [33, 42, 'PERCENTAGE'],
     [46, 55, 'WEEKDAY']]}],
  ['The company managed to get rent waivers from most landlords, CFO Nitin Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that, ” he added.',
   {'entities': [[61, 64, 'ROLE'], [65, 75, 'NAME']]}],
  ['Sood further said that they have brought down the fixed cost down by almost 75-80 percent.',
   {'entities': [[0, 4, 'NAME'], [76, 89, 'PERCENTAGE']]}],
  ["The stock rose as much as 7.6 percent to the day's high of Rs 1,186.85 per share on the BSE.",
   {'entities': [[26, 37, 'PERCENTAGE'],
     [59, 70, 'PERCENTAGE'],
     [88, 91, 'COMPANY']]}],
  ['Meanwhile, for the September quar

**Prepare an empty model to train**

In [None]:
nlp = spacy.blank('en')
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)

**Add the custom NER Tags as entities into the model**

In [None]:
for label in TRAIN_DATA["classes"]:
    nlp.entity.add_label(label)

**Training the model**

In [None]:
optimizer = nlp.begin_training()

In [None]:
for itn in range(40):
    for text, annotations in TRAIN_DATA["annotations"]:
        loss = {}
        if len(text) > 0:
            nlp.update([text], [annotations],drop= 0.3, sgd=optimizer, losses= loss)
            print("Current loss",loss)

Current loss {'ner': 1.0543702592388791e-05}
Current loss {'ner': 0.0007745897731913514}
Current loss {'ner': 0.0044470114939623735}
Current loss {'ner': 1.634943158654324e-05}
Current loss {'ner': 5.002866962152886e-05}
Current loss {'ner': 0.2562790723688976}
Current loss {'ner': 3.7534676041222357e-06}
Current loss {'ner': 1.1076708293067133e-07}
Current loss {'ner': 0.0013167336306275509}
Current loss {'ner': 0.0007900404783257226}
Current loss {'ner': 1.4560951283629219e-05}
Current loss {'ner': 9.314342240692522e-07}
Current loss {'ner': 0.0011773366178242492}
Current loss {'ner': 5.6043441575788066e-05}
Current loss {'ner': 0.00015759252101732417}
Current loss {'ner': 0.2172696527948575}
Current loss {'ner': 2.8459472719723857e-08}
Current loss {'ner': 3.1220374595278924e-11}
Current loss {'ner': 5.241847919561377e-05}
Current loss {'ner': 7.323120826862261e-05}
Current loss {'ner': 5.855586548650994e-05}
Current loss {'ner': 0.0033469863621008993}
Current loss {'ner': 0.0006287

**Testing the model**

In [None]:
test_text = "The company managed to get rent waivers from most landlords, CEO Sonu Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that, ”"
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
    print(ent.label_, ent.text)

Entities in 'The company managed to get rent waivers from most landlords, CEO Sonu Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that, ”'
ROLE CEO
NAME Sonu Sood


**Saving the model**

In [None]:
nlp.to_disk('/content/NER/')
print("Model Saved")


Model Saved
