# Grammatical Tagging with LSTM

Create a Recursive Neural Network (RNN) to determine the major 9 categories of words in a sentence: 
- noun
- verb
- article
- adjective 
- preposition 
- pronoun 
- adverb 
- conjunction
- interjection

As this is a simplified example to experiment with Long Short-Term Memory (LSTM) neural network, it will only uses a subset of the 9 categories.  Secifically juss the following 5 catecories:
- noun (N)
- verb (V)
- article (ART)
- adjective (ADJ)
- pronoun (PRO)

With this we can just can just analyze simple sentences, such as "I like McDonalds"

## Prerequisites

In [12]:
import platform
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

%matplotlib inline

print("Python version: ", platform.python_version())
print("Torch version: ", torch.__version__)

Python version:  3.6.10
Torch version:  1.5.0


## Make up some simple sentences as Training Data

In [18]:
# Create a list of some simple sentences as training data and the category tags
training_sentences = [
    ("The cat caught the mouse".lower().split(), ["ART", "N", "V", "ART", "N"]),
    ("The mouse loves cheese".lower().split(), ["ART", "N", "V", "N"]),
    ("The dog hates the cat".lower().split(), ["ART", "N", "V", "ART", "N"]),
    ("The dog sleeps".lower().split(), ["ART", "N", "V"]),
    ("The cat runs".lower().split(), ["ART", "N", "V"]),
    ("I like cheese".lower().split(), ["PRO", "N", "V"]),
    ("You like the cat".lower().split(), ["PRO", "V", "ART", "N"]),
    ("She watches TV".lower().split(), ["PRO", "V", "N"])
]

# print(training_sentences)

# Dictionary to map words to indices
wordIndex = {}
for sentence, tags in training_sentences:
    for word in sentence:
        if word not in wordIndex:
            wordIndex[word] = len(wordIndex)
            
print(wordIndex)

# Dictionary to map tags to indices
tagIndex = {"N": 0, "V": 1, "ART": 2, "ADJ": 3, "PRO": 4 }
print(tagIndex)

{'the': 0, 'cat': 1, 'caught': 2, 'mouse': 3, 'loves': 4, 'cheese': 5, 'dog': 6, 'hates': 7, 'sleeps': 8, 'runs': 9, 'i': 10, 'like': 11, 'you': 12, 'she': 13, 'watches': 14, 'tv': 15}
{'N': 0, 'V': 1, 'ART': 2, 'ADJ': 3, 'PRO': 4}


In [30]:
# Convert a sentence to a numerical tensor
def sentence2tensor(sentence, wordIndex):
    '''Convert a word sentence to numerical tensor'''
    indexes = [wordIndex[word] for word in sentence]
    indexes = np.array(indexes)
    return torch.from_numpy(indexes).type(torch.LongTensor)

# Check the the tensor conversion
sample_tensor = sentence2tensor("I like cheese".lower().split(), wordIndex)
print(sample_tensor)

tensor([10, 11,  5])


## Define the LSTM Neural Network

Simple LSTM that takes in a sentence broken down to sqeuence of words.  The words in the sentence are all from known words list. The network will predict that categories for the words in the sentence.  The prediction is done by applying softmax to the hidden state of the LSTM.  The first layer of the model is an Embeddeding layer.

In [31]:
class GrammaticalTagger(nn.Module):
    
    def __init__(self, embedding_dim, hidden_dim, vocabulary_size, tagset_size):
        '''Init'''
        super(GrammaticalTagger, self).__init__()
        
        self.hidden_dim = hidden_dim
        
        # Embedding layer turning words into a specificied size vector
        self.word_embeddings = nn.Embedding(vocabulary_size, embedding_dim)
        
        # LSTM layer takes embedded word vectors as inputs and output hidden states
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)
        
        # Linear layer maps hidden layer into the output layer with the number of tags
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
        
        # Initialize hidden state
        self.hidden = self.init_hidden()
        
    def init_hidden(self):
        '''Initialize the hidden state'''
        # (number of layers, batch size, hidden_dim)
        return (torch.zeros(1, 1, self.hidden_dim), torch.zeros(1, 1, self.hidden_dim))
    
    def forward(self, sentence):
        '''Model feedfoward inference'''
        # first create embedded word vectors
        embeds = self.word_embeddings(sentence)
        
        # Get Output and hidden states 
        lstm_out, self.hidden = self.lstm(embeds.view(len(sentence), 1, -1), self.hidden)
        
        # Get the scores for tags
        tag_outputs = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_outputs, dim=1)
        
        return tag_scores
                        

## Instantiate model and set hyper parameters

In [32]:
# Embedding_dim defines the size of word vectors
embeddeding_dim = 6
hidden_dim = 6

# Instantiate model
taggerModel = GrammaticalTagger(embeddeding_dim, hidden_dim, len(wordIndex), len(tagIndex))
                                
# Define loss function and optimizer
loss_function = nn.NLLLoss()
optimizer = optim.SGD(taggerModel.parameters(), lr=0.1)


## Sanity Check

Pass a test sentence thru just to check that we get a reasonable response thru forward pass

In [33]:
test_sentence = "The dog caught the cat".lower().split()

input_tensor = sentence2tensor(test_sentence, wordIndex)
print("Input_tensor: ", input_tensor)

tag_scores = taggerModel(input_tensor)
print(tag_scores)

Input_tensor:  tensor([0, 6, 2, 0, 1])
tensor([[-1.5739, -1.6331, -2.0575, -1.2742, -1.6606],
        [-1.5722, -1.6784, -2.0284, -1.2758, -1.6348],
        [-1.6024, -1.6161, -2.0454, -1.2536, -1.6869],
        [-1.5750, -1.6239, -2.0569, -1.2741, -1.6695],
        [-1.5348, -1.6278, -2.0508, -1.3001, -1.6767]],
       grad_fn=<LogSoftmaxBackward>)
