# Part of Speech (POS) Tagging

* It is defined as the process of assigning one of the parts of speech to the given word.

* As we know parts of speech include nouns,verb,adverbs,adjectives,pronouns, conjunction and their sub-categories.

In [1]:
import spacy

* There are various spacy models for different languages. The default model for the 
English language is designated as en_core_web_sm. Since the models are quite large,
it's best to install them separately -- including all languages in one package would make the download too massive.

* Load English tokenizer, tagger, parser and NER

In [2]:
nlp = spacy.load("en_core_web_sm")

In [3]:
doc = nlp("google flew to mars yesterday. He carried burger with him")

* Here, two attributes of the Token class are accessed:
    
1. * .tag_displays a fine-grained tag.
2. * .pos_displays a coarse-grained tag, which is a reduced version of the fine-grained tags

In [4]:
for i in doc:
    print(i, "-", i.pos_)

google - PROPN
flew - VERB
to - ADP
mars - NOUN
yesterday - NOUN
. - PUNCT
He - PRON
carried - VERB
burger - NOUN
with - ADP
him - PRON


In [5]:
for i in doc:
    print(i,"-", spacy.explain(i.pos_))

google - proper noun
flew - verb
to - adposition
mars - noun
yesterday - noun
. - punctuation
He - pronoun
carried - verb
burger - noun
with - adposition
him - pronoun


* explain() to give descriptive details about a particular POS tag

In [6]:
for i in doc:
    print(i, "-", i.pos_, "-", spacy.explain(i.pos_))

google - PROPN - proper noun
flew - VERB - verb
to - ADP - adposition
mars - NOUN - noun
yesterday - NOUN - noun
. - PUNCT - punctuation
He - PRON - pronoun
carried - VERB - verb
burger - NOUN - noun
with - ADP - adposition
him - PRON - pronoun


In [7]:
for i in doc:
    print(i, "-", i.pos)

google - 96
flew - 100
to - 85
mars - 92
yesterday - 92
. - 97
He - 95
carried - 100
burger - 92
with - 85
him - 95


In [8]:
doc1 = nlp("wow ! Dr.Strange movie has earned $455 billion dollar in ten days")

In [9]:
for i in doc1:
    print(i, "_", i.pos_)

wow _ INTJ
! _ PUNCT
Dr. _ PROPN
Strange _ PROPN
movie _ NOUN
has _ AUX
earned _ VERB
$ _ SYM
455 _ NUM
billion _ NUM
dollar _ NOUN
in _ ADP
ten _ NUM
days _ NOUN


In [10]:
for i in doc1:
    print(i, "-", i.pos_, "-", spacy.explain(i.tag_))

wow - INTJ - interjection
! - PUNCT - punctuation mark, sentence closer
Dr. - PROPN - noun, proper singular
Strange - PROPN - noun, proper singular
movie - NOUN - noun, singular or mass
has - AUX - verb, 3rd person singular present
earned - VERB - verb, past participle
$ - SYM - symbol, currency
455 - NUM - cardinal number
billion - NUM - cardinal number
dollar - NOUN - noun, singular or mass
in - ADP - conjunction, subordinating or preposition
ten - NUM - cardinal number
days - NOUN - noun, plural


In [11]:
sent = "One of the essential parts of spacy is its ability to create and use"

In [12]:
doc = nlp(sent)

In [13]:
final = []
for i in doc:
    if i.pos_ == "VERB" or i.pos_ == "ADV" or i.pos_ == "ADJ":
        final.append(i)

In [14]:
final

[essential, create, use]

* Opposite of above result

In [15]:
final = []
for i in doc:
    if i.pos_ not in ["VERB", "ADV", "ADJ"]:
        final.append(i)

In [16]:
final

[One, of, the, parts, of, spacy, is, its, ability, to, and]

* Creating dictionary to count no. of POS

In [17]:
count = doc.count_by(spacy.attrs.POS)

In [18]:
count

{93: 1, 85: 2, 90: 1, 84: 1, 92: 3, 87: 1, 95: 1, 94: 1, 100: 2, 89: 1}

* if you want name of that POS then

In [19]:
doc.vocab[98].text

'SCONJ'

In [22]:
for i, j in count.items():
    print((doc.vocab[i].text), ":", j)

NUM : 1
ADP : 2
DET : 1
ADJ : 1
NOUN : 3
AUX : 1
PRON : 1
PART : 1
VERB : 2
CCONJ : 1


* Explanation of each of above

In [24]:
for i, j in count.items():
    print(spacy.explain(doc.vocab[i].text), ":", j)

numeral : 1
adposition : 2
determiner : 1
adjective : 1
noun : 3
auxiliary : 1
pronoun : 1
particle : 1
verb : 2
coordinating conjunction : 1
