# Part of Speech (PoS) Tagging

- It is defined as the process of assigning one of the parts of speech to the given word
- As we know parts of speech include nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-categories.


In [3]:
import spacy

- There are various spaCy models for different languages. The default model for the English language is designated as en_core_web_sm. Since the models are quite large, it’s best to install them separately—including all languages in one package would make the download too massive.

- Load English tokenizer, tagger, parser and NER

In [4]:
nlp = spacy.load("en_core_web_sm")

In [7]:
doc = nlp("google flew to mars yesterday. He carried burger with him")

- Here, two attributes of the Token class are accessed:

1. - .tag_ displays a fine-grained tag.
2. - .pos_ displays a coarse-grained tag, which is a reduced version of the fine-grained tags

In [8]:
for i in doc:
    print(i, "-", i.pos_)

google - PROPN
flew - VERB
to - ADP
mars - NOUN
yesterday - NOUN
. - PUNCT
He - PRON
carried - VERB
burger - NOUN
with - ADP
him - PRON


In [9]:
for i in doc:
    print(i, "-", spacy.explain(i.pos_))

google - proper noun
flew - verb
to - adposition
mars - noun
yesterday - noun
. - punctuation
He - pronoun
carried - verb
burger - noun
with - adposition
him - pronoun


- explain() to give descriptive details about a particular POS tag

In [10]:
for i in doc:
    print(i, "-",i.pos_, "-", spacy.explain(i.pos_))

google - PROPN - proper noun
flew - VERB - verb
to - ADP - adposition
mars - NOUN - noun
yesterday - NOUN - noun
. - PUNCT - punctuation
He - PRON - pronoun
carried - VERB - verb
burger - NOUN - noun
with - ADP - adposition
him - PRON - pronoun


In [11]:
for i in doc:
    print(i, "-", i.pos)

google - 96
flew - 100
to - 85
mars - 92
yesterday - 92
. - 97
He - 95
carried - 100
burger - 92
with - 85
him - 95


In [19]:
doc1 = nlp("Wow ! Dr.Strange movie has earned $455 billion dollar in ten days")

In [20]:
for i in doc1:
    print(i, "-", i.pos_)

Wow - INTJ
! - PUNCT
Dr. - PROPN
Strange - PROPN
movie - NOUN
has - AUX
earned - VERB
$ - SYM
455 - NUM
billion - NUM
dollar - NOUN
in - ADP
ten - NUM
days - NOUN


In [21]:
for i in doc1:
     print(i, "-", i.pos_, "-", spacy.explain(i.tag_))

Wow - INTJ - interjection
! - PUNCT - punctuation mark, sentence closer
Dr. - PROPN - noun, proper singular
Strange - PROPN - noun, proper singular
movie - NOUN - noun, singular or mass
has - AUX - verb, 3rd person singular present
earned - VERB - verb, past participle
$ - SYM - symbol, currency
455 - NUM - cardinal number
billion - NUM - cardinal number
dollar - NOUN - noun, singular or mass
in - ADP - conjunction, subordinating or preposition
ten - NUM - cardinal number
days - NOUN - noun, plural


In [5]:
sent = "One of the essential parts of spaCy is its ability to create and use custom models for specific NLP tasks, such as named entity recognition or part-of-speech tagging. "

In [6]:
doc = nlp(sent)

In [25]:
final = []
for i in doc:
    if i.pos_ == 'VERB' or i.pos_ == 'ADV' or i.pos_ == 'ADJ':
        final.append(i)

In [26]:
final

[essential, create, use, specific, such, named]

- another way 

In [29]:
final = []
for i in doc:
    if i.pos_ in ['VERB', 'ADV', 'ADJ']:
        final.append(i)

In [30]:
final

[essential, create, use, specific, such, named]

- Opposite of above result

In [31]:
final = []
for i in doc:
    if i.pos_ not in ['VERB', 'ADV', 'ADJ']:
        final.append(i)

In [32]:
final

[One,
 of,
 the,
 parts,
 of,
 spaCy,
 is,
 its,
 ability,
 to,
 and,
 custom,
 models,
 for,
 NLP,
 tasks,
 ,,
 as,
 entity,
 recognition,
 or,
 part,
 -,
 of,
 -,
 speech,
 tagging,
 .]

- Creating dictionary to count no. of POS

In [34]:
count = doc.count_by(spacy.attrs.POS)

In [35]:
count

{93: 1,
 85: 5,
 90: 1,
 84: 3,
 92: 10,
 96: 2,
 87: 1,
 95: 1,
 94: 1,
 100: 3,
 89: 2,
 97: 4}

- if you want name of that POS then

In [36]:
doc.vocab[98].text

'SCONJ'

In [37]:
for i, j in count.items():
    print((doc.vocab[i].text), ":", j)

NUM : 1
ADP : 5
DET : 1
ADJ : 3
NOUN : 10
PROPN : 2
AUX : 1
PRON : 1
PART : 1
VERB : 3
CCONJ : 2
PUNCT : 4


- explanation of each of above

In [38]:
for i, j in count.items():
    print(spacy.explain(doc.vocab[i].text), ":", j)

numeral : 1
adposition : 5
determiner : 1
adjective : 3
noun : 10
proper noun : 2
auxiliary : 1
pronoun : 1
particle : 1
verb : 3
coordinating conjunction : 2
punctuation : 4
