# Learning Objectives

In this lab we are going to:

*   Explore POS Tagging using NLTK
*   Learn about Hidden Markov Models (HMM)
*   Perform POS tagging with HMM

In [None]:
# Installing necessary packages from NLTK
import nltk
nltk.download('brown')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

# POS Tagging 
POS tagging is the process of assigning a part-of-speech label to
each word in an input text, where the tagging model takes a sequence of words and a tagset as input and gives the output as a sequence of tags one per token. There are various parts of speech tagsets. The most common tagsets are:

1- <a href= "http://ucrel.lancs.ac.uk/claws5tags.html">Claws5</a>: 62 different tags <br>
2- <a href="https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">Penn Treebank</a>: 45 different tags <br>
3- <a href = "https://en.wikipedia.org/wiki/Brown_Corpus">The Brown Corpus tagset</a>: (87 tags)<br>
4- <a href = "https://universaldependencies.org/u/pos/">UD tagset</a>

## Approaches

POS tagging can be done using different approaches such as:
 

*   Pointwise prediction: a classifier that predicts each word individually such as perceptron.

*   Generative sequence models: a probabilistic model that assigns probabilities to sequences of words such as Hidden Markov Model.


*   Discriminative sequence models: predict whole sequence with a classifier such as conditional random fields (CRF).


# NLTK POS Tagging

The NLTK tagger can be used as follows:


In [None]:
from nltk.tokenize import word_tokenize
# tokenize the sentence before POS tagging
text = word_tokenize("And is it very interesting for everyone?")
nltk.pos_tag(text)

[('And', 'CC'),
 ('now', 'RB'),
 ('for', 'IN'),
 ('something', 'NN'),
 ('completely', 'RB'),
 ('different', 'JJ')]

The brown corpus has been manually tagged with part-of-speech tags which is useful for testing taggers and for training statistical taggers. In order to read a tagged corpus we can use:

In [None]:
from nltk.corpus import brown
# Accessing manually tagged brown corpus
print (brown.tagged_words())

[('The', 'AT'), ('Fulton', 'NP-TL'), ...]


## Exercise 1:
Get the count of each POS tag assigned to the word **(ignore case)** "_dog_" in the **news** category of the Brown corpus.

In [None]:
# Slicing brown corpus with news category
tagged_words = brown.tagged_words(categories='news')

# your code goes here;

"""output should be: {'NN': 7}"""

## Exercise 2:
Find the frequency distribution of each tag in the brown corpus.

In [None]:
# your code goes here;

# output should be 
"""[('NN', 152470),('IN', 120557),('AT', 97959),....]"""

## Exercise 3:

What are the most common verbs in **fiction** category in the brown corpus? 

In [None]:
# your code goes here;

verb_tags = ['VB', 'VBN', 'VBD', 'VBG', 'VBZ']
tagged_words = brown.tagged_words(categories=['fiction'])

# your code goes here

# Answer should be
"""[(('said', 'VBD'), 177),
 (('came', 'VBD'), 91),
 (('went', 'VBD'), 79),
 (('get', 'VB'), 78),
 (('know', 'VB'), 74)]"""

# Hidden Markov Model

The sequence of tags can be viewed as a Markov chain so let us explore the construction and solution of a Hidden Markov Model. 

An HMM has two components:

*   **Transition Probabilities** which represents the probability of a tag occurring given the previous tag i.e. $P(t_i|t_{i-1})$.
  * For Example, modal verbs (`MD`) like *will* are very likely to be followed by a verb in the base form, a `VB`, like *race*, therefore it is more likely that modal verbs will occur with main verb.
  * We compute the maximum likelihood estimate of this transition probability by counting, out of the times we see the first tag in a labeled corpus, how often the first tag is followed by the second:
  $$
  \begin{equation}
  P(t_{i} | t_{i-1}) = \frac{C(t_{i-1}, t_{i})}{C(t_{i-1})} \\
  P(MD | VB) = \frac{C(MD, VB)}{C(MD)}
  \end{equation}
  $$

*   **Emission Probabilities** represents the probability, given a tag that it will be associated with a given word i.e. $P(w_i|t_i)$.
  * For Example, probability of a given tag `MD` associated with the word *will* is:
  $$
  \begin{equation}
  P(w_i|t_i) = \frac{C(t_i, w_i)}{C(t_i)} \\
  P(will|MD) = \frac{C(MD, \text{will})}{C(MD)}
  \end{equation}
  $$

---

## Exercise 4: 

Consider that we have an HMM with hidden states Noun, Verb, Adj and the following transition probability where $p(Y_{i+1}|Y_i)$ is the probability of state $Y_{i+1}$ occuring after $Y_i$ and the table of probabilities is as follows:

| $p(Y_{i+1}|Y_i)$ | $Y_{i+1}$=Start | $Y_{i+1}$=Noun | $Y_{i+1}$=Verb | $Y_{i+1}$=Adj |
|:-----------------|:-----------------|:--------------:|:--------------:|:-------------:|
| $Y_i$=Start      | 0.0      |  0.5           |  0.4           | 0.1           |
| $Y_i$=Noun       | 0.0       |  0.3           |  0.5           | 0.2           |
| $Y_i$=Verb       | 0.0       |  0.7           |  0.2           | 0.1           |
| $Y_i$=Adj        | 0.0        |  0.8           |  0.1           | 0.1           |

Furthermore, consider that the model has a vocabulary as follows, with the probability of $p(X_i|Y_i)$ as follows 

| $p(X_i|Y_i)$ | cats | dogs | drink | water | milk | fresh |
|:-------------|:----:|:----:|:-----:|:-----:|:----:|:-----:|
| $Y_i$=Noun   | 0.2  | 0.2  |  0.2  | 0.2   | 0.1  | 0.0   |
| $Y_i$=Verb   | 0.1  | 0.1  | 0.4   | 0.2   | 0.1  | 0.1   |
| $Y_i$=Adj    | 0.0  | 0.0  | 0.2   | 0.0   | 0.2  | 0.8   |


Implement the above table and write a function that takes a sequence of words and a sequence of part-of-speech tags and returns the probability using the above model. Calculate the probability of the sentence "*cats drink fresh milk*" given the tags "*noun verb adj verb*"

In [3]:
all_tags = ["start","noun","verb","adj"]
all_words = ["cats","dogs","drink","water","milk","fresh"]

In [None]:
transitions = {
  'start': {'noun': 0.5, 'verb': 0.4, 'adj': 0.1, 'start': 0.0},
  'noun': {'noun': 0.3, 'verb': 0.5, 'adj': 0.2, 'start': 0.0},
  'verb': {'noun': 0.7, 'verb': 0.2, 'adj': 0.1, 'start': 0.0},
  'adj': {'noun': 0.8, 'verb': 0.1, 'adj': 0.1, 'start': 0.0},
}


emissions = {
    # your code goes here
}

In [None]:
def hmm(words, tags):
    prob = 1.0
    
    # your code goes here

    return prob

print(hmm(["cats","drink","fresh","milk"], ["noun","verb","adj","verb"]))

## Hint

P(Start|noun) x P(cats|noun) x p(drink|verb) x p(verb|noun) x .....

## Exercise 5 
Write a function that learns the emission and transition probabilities for the Hidden Markov Model using the tagged corpus given below.

In [1]:
all_tags = ["start","noun","verb","adj"]
all_words = ["cats","dogs","drink","water","milk","fresh"]

sentences = [
    ["cats","drink","milk"],
    ["dogs","drink","water"],
    ["fresh","milk"],
    ["dogs","drink","fresh","milk"],
    ["cats","milk"]
]

tagged = [
    ["noun","verb","noun"],
    ["noun","verb","noun"],
    ["adj","noun"],
    ["noun","verb","adj","noun"],
    ["noun","noun"]
]

In [None]:
def hmm_learn(sentences, tagged):
    transitions = {t:{t2:0.0 for t2 in all_tags} for t in all_tags}
    emissions    = {t:{w:0.0 for w in all_words} for t in all_tags}
    # your code goes here
    return transitions, emissions

print(hmm_learn(sentences, tagged))

## Hint

* Iterate through sentence and tag pair
* Iterated through word and tag pair
* Update the transitions and emission counts
* Calculate the transitions_counts total and emissions_counts total by adding counts
* normalize the tag counts
* normalize the word counts