## Python for NLP: Parts of Speech Tagging and Named Entity Recognition
https://stackabuse.com/python-for-nlp-parts-of-speech-tagging-and-named-entity-recognition/

#### Parts of Speech (POS) Tagging

In [2]:
import spacy
sp = spacy.load('en_core_web_sm')

In [3]:
sen = sp(u"I like to play football. I hated it in my childhood though")

In [4]:
print(sen.text)

I like to play football. I hated it in my childhood though


In [6]:
print(sen[7].pos_) # word hated

VERB


In [8]:
print(sen[7].tag_)

VBD


In [9]:
print(spacy.explain(sen[7].tag_))

verb, past tense


In [10]:
for word in sen:
    print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

I            PRON       PRP      pronoun, personal
like         VERB       VBP      verb, non-3rd person singular present
to           PART       TO       infinitival to
play         VERB       VB       verb, base form
football     NOUN       NN       noun, singular or mass
.            PUNCT      .        punctuation mark, sentence closer
I            PRON       PRP      pronoun, personal
hated        VERB       VBD      verb, past tense
it           PRON       PRP      pronoun, personal
in           ADP        IN       conjunction, subordinating or preposition
my           DET        PRP$     pronoun, possessive
childhood    NOUN       NN       noun, singular or mass
though       ADP        IN       conjunction, subordinating or preposition


## Why POS Tagging is Useful?

In [12]:
sen = sp(u'Can you google it?')
word = sen[2]

print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

google       VERB       VB       verb, base form


In [13]:
sen = sp(u'Can you search it on google?')
word = sen[5]

print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

google       PROPN      NNP      noun, proper singular


### Finding the Number of POS Tags

In [15]:
sen = sp(u"I like to play football. I hated it in my childhood though")

num_pos = sen.count_by(spacy.attrs.POS)
num_pos

{85: 2, 90: 1, 92: 2, 94: 1, 95: 3, 97: 1, 100: 3}

In [16]:
for k,v in sorted(num_pos.items()):
    print(f'{k}. {sen.vocab[k].text:{8}}: {v}')

85. ADP     : 2
90. DET     : 1
92. NOUN    : 2
94. PART    : 1
95. PRON    : 3
97. PUNCT   : 1
100. VERB    : 3


### Visualizing Parts of Speech Tags

In [19]:
from spacy import displacy

sen = sp(u"I like to play football. I hated it in my childhood though")
displacy.render(sen, style='dep', jupyter=True, options={'distance': 85})

In [20]:
displacy.serve(sen, style='dep', options={'distance': 120})

# access http://127.0.0.1:5000/.

  "__main__", mod_spec)



Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...



127.0.0.1 - - [30/Sep/2019 16:34:19] "GET / HTTP/1.1" 200 9471
127.0.0.1 - - [30/Sep/2019 16:34:19] "GET /favicon.ico HTTP/1.1" 200 9471


Shutting down server on port 5000.


## Named Entity Recognition

In [22]:
import spacy
sp = spacy.load('en_core_web_sm')

sen = sp(u'Manchester United is looking to sign Harry Kane for $90 million')

In [23]:
print(sen.ents)

(Manchester United, Harry Kane, $90 million)


In [24]:
for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))

Manchester United - ORG - Companies, agencies, institutions, etc.
Harry Kane - PERSON - People, including fictional
$90 million - MONEY - Monetary values, including unit


### Adding New Entities

In [26]:
sen = sp(u'Nesfruita is setting up a new company in India')
for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))


Nesfruita - ORG - Companies, agencies, institutions, etc.
India - GPE - Countries, cities, states


In [28]:
from spacy.tokens import Span

ORG = sen.vocab.strings[u'ORG']
new_entity = Span(sen, 0, 1, label=ORG)
sen.ents = list(sen.ents) + [new_entity]

ValueError: [E103] Trying to set conflicting doc.ents: '(0, 1, 'ORG')' and '(0, 1, 'ORG')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

### Counting Entities

In [30]:
sen = sp(u'Manchester United is looking to sign Harry Kane for $90 million. David demand 100 Million Dollars')
for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))


Manchester United - ORG - Companies, agencies, institutions, etc.
Harry Kane - PERSON - People, including fictional
$90 million - MONEY - Monetary values, including unit
David - PERSON - People, including fictional
100 Million Dollars - MONEY - Monetary values, including unit


In [31]:
len([ent for ent in sen.ents if ent.label_=='PERSON'])


2

### Visualizing Named Entities

In [33]:
from spacy import displacy

sen = sp(u'Manchester United is looking to sign Harry Kane for $90 million. David demand 100 Million Dollars')
displacy.render(sen, style='ent', jupyter=True)