In [47]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [48]:
doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

In [49]:
print(doc.text)

The quick brown fox jumped over the lazy dog's back.


In [50]:
print(doc[4])

jumped


In [51]:
print(doc[4].pos_)

VERB


## View the Token Tags

Recall that you can obtain a particular token by its index position.

To view the coarse POS tag use token.pos_

To view the fine-grained tag use token.tag_

To view the description of either type of tag use spacy.explain(tag)

In [52]:
for token in doc:
    print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}')

The        DET        DT         determiner
quick      ADJ        JJ         adjective 
brown      ADJ        JJ         adjective 
fox        NOUN       NN         noun, singular or mass
jumped     VERB       VBD        verb, past tense
over       ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
lazy       ADJ        JJ         adjective 
dog        NOUN       NN         noun, singular or mass
's         PART       POS        possessive ending
back       NOUN       NN         noun, singular or mass
.          PUNCT      .          punctuation mark, sentence closer


## Working with POS Tags

In the English language, the same string of characters can have different meanings, even within the same sentence. 
For this reason, morphology is important. spaCy uses machine learning algorithms to best predict the use 
of a token in a sentence. 
Is "I read books on NLP" present or past tense? Is wind a verb or a noun?

In [53]:
doc = nlp(u'I read books on NLP.')

In [54]:
word = doc[1]

In [55]:
word.text

'read'

In [56]:
token = word
print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}')

read       VERB       VBP        verb, non-3rd person singular present


In [67]:
doc = nlp(u'I am a teacher.')

In [68]:
token = doc[1]

In [69]:
print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}')

am         VERB       VBP        verb, non-3rd person singular present


In [70]:
doc = nlp(u'I was a teacher.')
r = doc[1]

print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')

was        VERB     VBD    verb, past tense


## Counting POS Tags

The Doc.count_by() method accepts a specific token attribute as its argument, and returns a 
frequency count of the given attribute as a dictionary object. Keys in the dictionary are the 
integer values of the given attribute ID, and values are the 
frequency. Counts of zero are not included.

In [71]:
doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

In [72]:
POS_counts = doc.count_by(spacy.attrs.POS)

In [73]:
POS_counts

{97: 1, 84: 3, 100: 1, 85: 1, 90: 2, 92: 3, 94: 1}

In [74]:
doc.vocab[97].text

'PUNCT'

In [75]:
doc.vocab[92].text

'NOUN'

In [76]:
doc[2]

brown

In [77]:
doc[2].pos

84

In [78]:
doc[2].pos_

'ADJ'

## Create a frequency list of POS tags from the entire document

Since POS_counts returns a dictionary, we can obtain a list of keys with POS_counts.items().

By sorting the list we have access to the tag and its count, in order.

In [84]:
for k,v in sorted(POS_counts.items()):
    print(f'{k:{4}}. {doc.vocab[k].text:{8}} {v}')

  84. ADJ      3
  85. ADP      1
  90. DET      2
  92. NOUN     3
  94. PART     1
  97. PUNCT    1
 100. VERB     1


In [85]:
POS_counts = doc.count_by(spacy.attrs.TAG)

# We can also look at the fine grained tags
for k,v in sorted(POS_counts.items()):
    print(f'{k:{4}}. {doc.vocab[k].text:{8}} {v}')

  74. POS      1
1292078113972184607. IN       1
10554686591937588953. JJ       3
12646065887601541794. .        1
15267657372422890137. DT       2
15308085513773655218. NN       3
17109001835818727656. VBD      1
