# Counting Part of Speech (POS) Tags

Let us import `Counter` from the `collections` library to count the number of POS tags.

In [24]:
from collections import Counter

We will be tokenizing our document into individual sentences and further break them down into words using `sent_tokenize` and `word_tokenize`.

In [25]:
import nltk
from nltk.tokenize import sent_tokenize

The following sample text is extracted from The Hindu Editorial page: https://www.thehindu.com/opinion/editorial/failure-of-justice/article25814414.ece

In [26]:
text = "It is unfortunate that the families of the victims do not have the consolation of anyone being brought to justice. While Sohrabuddin’s killing has ‘encounter’ as an explanation, his wife’s disappearance remains a mystery. It was not proved that she was taken to a farm, killed and her body burnt. And it cannot be a coincidence that Prajapati was killed a year later in Rajasthan in another encounter. It was under a cloud of suspicion over the circumstances of their death that Sohrabuddin’s brother had approached the Supreme Court and obtained an order for an investigation, which was subsequently handed over to the CBI."

In [27]:
tokenized_text = sent_tokenize(text)

The document is broken to a list of sentences as below:

In [28]:
tokenized_text

['It is unfortunate that the families of the victims do not have the consolation of anyone being brought to justice.',
 'While Sohrabuddin’s killing has ‘encounter’ as an explanation, his wife’s disappearance remains a mystery.',
 'It was not proved that she was taken to a farm, killed and her body burnt.',
 'And it cannot be a coincidence that Prajapati was killed a year later in Rajasthan in another encounter.',
 'It was under a cloud of suspicion over the circumstances of their death that Sohrabuddin’s brother had approached the Supreme Court and obtained an order for an investigation, which was subsequently handed over to the CBI.']

Run a `for` loop to iterate through every sentence of `tokenized_text` and perform the following functions:
    1. Turn all the characters into lower case.
    2. Tokenize the words of every sentence.
    3. Assign POS tags.
    4. Count the POS tags in every sentence. Example: The number of nouns, verbs, adjectives, etc. in the first sentence.
    5. Print.

In [23]:
for i in tokenized_text:
    lower_case = i.lower()
    tokens = nltk.word_tokenize(lower_case)
    tags = nltk.pos_tag(tokens)
    counts = Counter( tag for word,  tag in tags)
    print(counts, '\n')

Counter({'IN': 3, 'DT': 3, 'NN': 3, 'NNS': 2, 'PRP': 1, 'VBZ': 1, 'JJ': 1, 'VBP': 1, 'RB': 1, 'VB': 1, 'VBG': 1, 'VBN': 1, 'TO': 1, '.': 1}) 

Counter({'NN': 7, 'NNP': 3, 'IN': 2, 'VBZ': 2, 'DT': 2, 'JJ': 1, 'VBN': 1, 'RB': 1, ',': 1, 'PRP$': 1, '.': 1}) 

Counter({'VBN': 3, 'NN': 3, 'PRP': 2, 'VBD': 2, 'RB': 1, 'IN': 1, 'TO': 1, 'DT': 1, ',': 1, 'CC': 1, 'PRP$': 1, '.': 1}) 

Counter({'NN': 5, 'DT': 3, 'IN': 3, 'RB': 2, 'CC': 1, 'PRP': 1, 'MD': 1, 'VB': 1, 'VBD': 1, 'VBN': 1, '.': 1}) 

Counter({'NN': 9, 'IN': 6, 'DT': 6, 'VBD': 4, 'VBN': 2, 'PRP': 1, 'NNS': 1, 'PRP$': 1, 'VBZ': 1, 'NNP': 1, 'JJ': 1, 'CC': 1, ',': 1, 'WDT': 1, 'RB': 1, 'RP': 1, 'TO': 1, '.': 1}) 



The below table will help us identify the POS tags and their count in each sentence.

<img src = "tagset.png">