<a href="https://colab.research.google.com/github/sujayrittikar/NLP_Basics/blob/main/POS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part of Speech Tagging (POS)

- Common words might mean different things.
- Same words in different orders have different meanings.
- Use "Linguistic Knowledge" to add useful information.
- SpaCy gets you annotations of linguistically distinct tags for a doc.

In [1]:
import spacy



In [2]:
nlp = spacy.load('en_core_web_sm')

In [3]:
doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

In [4]:
print(doc.text)

The quick brown fox jumped over the lazy dog's back.


In [7]:
print(doc[4], doc[4].tag_)

jumped VBD


In [9]:
print(doc[4].pos_)

VERB


In [11]:
for token in doc:
  print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}")

The        DET        DT         determiner
quick      ADJ        JJ         adjective (English), other noun-modifier (Chinese)
brown      ADJ        JJ         adjective (English), other noun-modifier (Chinese)
fox        NOUN       NN         noun, singular or mass
jumped     VERB       VBD        verb, past tense
over       ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
lazy       ADJ        JJ         adjective (English), other noun-modifier (Chinese)
dog        NOUN       NN         noun, singular or mass
's         PART       POS        possessive ending
back       ADV        RB         adverb    
.          PUNCT      .          punctuation mark, sentence closer


## SpaCy uses surrounding words to determine the POS tag for a token.

In [12]:
doc = nlp(u"I read books on NLP.")

In [13]:
word = doc[1]

In [14]:
word.text

'read'

In [15]:
token = word
print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}")

read       VERB       VBP        verb, non-3rd person singular present


In [22]:
doc = nlp(u"I did read a book on NLP.")

In [23]:
word = doc[1]

In [24]:
token = word
print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}")

did        AUX        VBD        verb, past tense


# POS Counts

In [25]:
doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

In [26]:
POS_counts = doc.count_by(spacy.attrs.POS)

In [27]:
POS_counts

{90: 2, 84: 3, 92: 2, 100: 1, 85: 1, 94: 1, 86: 1, 97: 1}

In [30]:
doc.vocab[100].text

'VERB'

In [35]:
for k, v in sorted(POS_counts.items()):
  print(f"{k:{5}}. {doc.vocab[k].text:{10}} {v:{10}}")

   84. ADJ                 3
   85. ADP                 1
   86. ADV                 1
   90. DET                 2
   92. NOUN                2
   94. PART                1
   97. PUNCT               1
  100. VERB                1


In [37]:
# By Tags
TAG_counts = doc.count_by(spacy.attrs.TAG)
for k, v in sorted(TAG_counts.items()):
  print(f"{k:{10}}. {doc.vocab[k].text:{10}} {v:{10}}")

        74. POS                 1
164681854541413346. RB                  1
1292078113972184607. IN                  1
10554686591937588953. JJ                  3
12646065887601541794. .                   1
15267657372422890137. DT                  2
15308085513773655218. NN                  2
17109001835818727656. VBD                 1


In [38]:
# By Syntactic Dependency
DEP_counts = doc.count_by(spacy.attrs.DEP)
for k, v in sorted(DEP_counts.items()):
  print(f"{k:{10}}. {doc.vocab[k].text:{10}} {v:{10}}")

       402. amod                3
       415. det                 2
       429. nsubj               1
       439. pobj                1
       440. poss                1
       443. prep                1
       445. punct               1
8110129090154140942. case                1
8206900633647566924. ROOT                1


# Visualizing Parts of Speech

In [39]:
from spacy import displacy

In [40]:
displacy.render(doc, style='dep', jupyter=True)