# Assignment 3
Training a simple neural named entity recognizer (NER)

In [None]:
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import csv
from random import sample

In this assignment you are required to build a full training and testing pipeline for a neural sequentail tagger for named entities, using LSTM.

The dataset that you will be working on is called ReCoNLL 2003, which is a corrected version of the CoNLL 2003 dataset: https://www.clips.uantwerpen.be/conll2003/ner/


The three files (train, test and eval) are available from the course git repository (https://github.com/kfirbar/nlp-course)

As you can see, the annotated texts are labeled according to the IOB annotation scheme, for 3 entity types: Person, Organization, Location.

**Task 1:** Write a funtion *read_data* for reading the data from a single file (either train, test or eval). This function recieves a filepath and returns a list of sentence. Every sentence is encoded as a pair of lists, one list contains the words and one list contains the labels.

In [None]:
def read_data(filepath):
    f = open(filepath,'r')  
    lines = f.readlines()
    
    data = []
    words = []
    tags  = []

    for line in lines:        
      if line == '\n': # new sentence
        data.append((words,tags)) 
        words = []
        tags = []
      else:
        word, tag = line.split(' ', 1)
        tag = tag.rstrip()
        words.append(word)
        tags.append(tag)
    data.append((words,tags)) # last sentence

    return data

In [None]:
!git clone https://github.com/kfirbar/nlp-course

Cloning into 'nlp-course'...
remote: Enumerating objects: 58, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (47/47), done.[K
remote: Total 58 (delta 24), reused 31 (delta 8), pack-reused 0[K
Unpacking objects: 100% (58/58), done.


In [None]:
train = read_data('/content/nlp-course/connl03_train.txt')
test = read_data('/content/nlp-course/connl03_test.txt')
dev = read_data('/content/nlp-course/connl03_dev.txt')

The following Vocab class can be served as a dictionary that maps words and tags into Ids. The UNK_TOKEN should be used for words that are not part of the training data.

In [None]:
UNK_TOKEN = 0

class Vocab:
    def __init__(self):
        self.word2id = {"__unk__": UNK_TOKEN}
        self.id2word = {UNK_TOKEN: "__unk__"}
        self.n_words = 1
        
        self.tag2id = {"O":0, "B-PER":1, "I-PER": 2, "B-LOC": 3, "I-LOC": 4, "B-ORG": 5, "I-ORG": 6}
        self.id2tag = {0:"O", 1:"B-PER", 2:"I-PER", 3:"B-LOC", 4:"I-LOC", 5:"B-ORG", 6:"I-ORG"}
        
    def index_words(self, words):
      word_indexes = [self.index_word(w) for w in words]
      return word_indexes

    def index_tags(self, tags):
      tag_indexes = [self.tag2id[t] for t in tags]
      return tag_indexes
    
    def index_word(self, w):
        if w not in self.word2id:
            self.word2id[w] = self.n_words
            self.id2word[self.n_words] = w
            self.n_words += 1
        return self.word2id[w]
            

**Task 2:** Write a function *prepare_data* that takes one of the [train, dev, test] and the Vocab instance, for converting each pair of (words,labels) to a pair of indexes (from Vocab). Each pair should be added to *data_sequences*, which is returned back from the function.

In [None]:
vocab = Vocab()

In [None]:
def prepare_data(data, vocab):
    data_sequences = []

    for words, tags in data:
      words_ids = vocab.index_words(words)
      tags_ids = vocab.index_tags(tags)
      data_sequences.append((words_ids, tags_ids))

    return data_sequences, vocab

In [None]:
train_sequences, vocab = prepare_data(train, vocab)
dev_sequences, vocab = prepare_data(dev, vocab)
test_sequences, vocab = prepare_data(test, vocab)

**Task 3:** Write NERNet, a PyTorch Module for labeling words with NER tags. 

*input_size:* the size of the vocabulary

*embedding_size:* the size of the embeddings

*hidden_size:* the LSTM hidden size

*output_size:* the number tags we are predicting for

*n_layers:* the number of layers we want to use in LSTM

*directions:* could 1 or 2, indicating unidirectional or bidirectional LSTM, respectively

The input for your forward function should be a single sentence tensor.

In [None]:
class NERNet(nn.Module):
    def __init__(self, input_size, embedding_size, hidden_size, output_size, n_layers, directions):
        super(NERNet, self).__init__()
        self.input_size = input_size
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.directions = directions

        self.embedding = nn.Embedding(input_size, embedding_size)
        self.lstm = nn.LSTM(embedding_size, hidden_size, n_layers, bidirectional=(True if directions==2 else False))
        self.out = nn.Linear(hidden_size*directions, output_size)
    
    def forward(self, input_sentence):
        embedded = self.embedding(input_sentence)
        lstm, _ = self.lstm(embedded.view(len(input_sentence), 1, -1))
        output = self.out(lstm.view(len(input_sentence), -1))
        return output

**Task 4:** write a training loop, which takes a model (instance of NERNet) and number of epochs to train on. The loss is always CrossEntropyLoss and the optimizer is always Adam.

In [None]:
def train_loop(model, n_epochs):
  # Loss function
  criterion = nn.CrossEntropyLoss()

  # Optimizer (ADAM is a fancy version of SGD)
  optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
  
  train_loss = []

  for e in range(1, n_epochs + 1):
    train_shuff = sample(train_sequences, len(train_sequences))
    running_loss = 0.0
    for i, sequence in enumerate(train_shuff):
      if len(sequence[0]) == 0:
        continue
      # zero the parameter gradients
      optimizer.zero_grad() 
      
      # forward + backward + optimize
      outputs = model(torch.LongTensor(sequence[0]).cuda())
      loss = criterion(outputs, torch.LongTensor(sequence[1]).cuda())
      loss.backward()
      optimizer.step()

      # print statistics
      running_loss += loss.item()
      if (i+1) % 500 == 0:   
          print('[%d, %5d] loss: %.3f' %
                (e, i + 1, running_loss / 500))
          train_loss.append(running_loss / 500)
          running_loss = 0.0

  print('Finished Training')

**Task 5:** write an evaluation loop on a trained model, using the dev and test datasets. This function print the true positive rate (TPR), also known as Recall and the opposite to false positive rate (FPR), also known as precision, of each label seperately (7 labels in total), and for all the 6 labels (except O) together. The caption argument for the function should be served for printing, so that when you print include it as a prefix.

In [None]:
def evaluate(model, caption):
  print(caption)
  
  # dev
  print("dev")
  all_predicted = []
  all_tags = []

  with torch.no_grad():
    for sequence in dev_sequences:
      if len(sequence[0]) == 0:
        continue
      predicted = model(torch.LongTensor(sequence[0]).cuda()).max(1).indices
      all_predicted.append(predicted.tolist())
      all_tags.append(sequence[1])
    
    all_predicted = [prediction for sublist in all_predicted for prediction in sublist]
    all_tags = [tag for sublist in all_tags for tag in sublist]

    # calculate recall = TP/(TP+FN) = TP/P and precision = TP/(TP+FP)
    TP = np.zeros(7)
    P = np.zeros(7)
    FP = np.zeros(7)

    for i in range(0,len(all_predicted)):
      P[all_tags[i]] += 1
      if all_tags[i] == all_predicted[i]:
        TP[all_tags[i]] += 1
      if all_tags[i] != all_predicted[i]:
        FP[all_predicted[i]] += 1
    
    recall = np.divide(TP,P)
    precision = np.divide(TP, np.add(TP,FP))

    # calculate accuracy
    correct = 0.0
    counter = 0.0

    for i in range(0,len(all_predicted)):
      if all_tags[i] == 0:
        continue
      counter += 1
      if all_tags[i] == all_predicted[i]:
        correct += 1
    
    accuracy = correct / counter
    
    # print evaluation
    temp = np.concatenate((np.reshape(recall, (7,1)), np.reshape(precision, (7,1))), axis=1)
    df = pd.DataFrame(data=temp, index=['O', 'B-PER', 'I-PER', 'B-LOC', 'I-LOC', 'B-ORG', 'I-ORG'], columns=['recall', 'precision'])
    print(df)
    print("accuracy: ", accuracy)

  # test
  print("test")
  all_predicted = []
  all_tags = []

  with torch.no_grad():
    for sequence in test_sequences:
      if len(sequence[0]) == 0:
        continue
      predicted = model(torch.LongTensor(sequence[0]).cuda()).max(1).indices
      all_predicted.append(predicted.tolist())
      all_tags.append(sequence[1])
    
    all_predicted = [prediction for sublist in all_predicted for prediction in sublist]
    all_tags = [tag for sublist in all_tags for tag in sublist]

    # calculate recall = TP/(TP+FN) = TP/P and precision = TP/(TP+FP)
    TP = np.zeros(7)
    P = np.zeros(7)
    FP = np.zeros(7)

    for i in range(0,len(all_predicted)):
      P[all_tags[i]] += 1
      if all_tags[i] == all_predicted[i]:
        TP[all_tags[i]] += 1
      if all_tags[i] != all_predicted[i]:
        FP[all_predicted[i]] += 1
    
    recall = np.divide(TP,P)
    precision = np.divide(TP, np.add(TP,FP))

    # calculate accuracy
    correct = 0.0
    counter = 0.0

    for i in range(0,len(all_predicted)):
      if all_tags[i] == 0:
        continue
      counter += 1
      if all_tags[i] == all_predicted[i]:
        correct += 1
    
    accuracy = correct / counter
    
    # print evaluation
    temp = np.concatenate((np.reshape(recall, (7,1)), np.reshape(precision, (7,1))), axis=1)
    df = pd.DataFrame(data=temp, index=['O', 'B-PER', 'I-PER', 'B-LOC', 'I-LOC', 'B-ORG', 'I-ORG'], columns=['recall', 'precision'])
    print(df)
    print("accuracy: ", accuracy, "\n")

**Task 6:** Train and evaluate a few models, all with embedding_size=300, and with the following hyper parameters (you may use that as captions for the models as well):

Model 1: (hidden_size: 500, n_layers: 1, directions: 1)

Model 2: (hidden_size: 500, n_layers: 2, directions: 1)

Model 3: (hidden_size: 500, n_layers: 3, directions: 1)

Model 4: (hidden_size: 500, n_layers: 1, directions: 2)

Model 5: (hidden_size: 500, n_layers: 2, directions: 2)

Model 6: (hidden_size: 500, n_layers: 3, directions: 2)

Model 4: (hidden_size: 800, n_layers: 1, directions: 2)

Model 5: (hidden_size: 800, n_layers: 2, directions: 2)

Model 6: (hidden_size: 800, n_layers: 3, directions: 2)

In [None]:
input_size = len(vocab.word2id)
output_size = 7
embedding_size = 300

model1 = NERNet(input_size, embedding_size, 500, output_size, 1, 1).cuda()
print("model 1 - start training")
train_loop(model1, 10)

model2 = NERNet(input_size, embedding_size, 500, output_size, 2, 1).cuda()
print("model 2 - start training")
train_loop(model2, 10)

model3 = NERNet(input_size, embedding_size, 500, output_size, 3, 1).cuda()
print("model 3 - start training")
train_loop(model3, 10)

model4 = NERNet(input_size, embedding_size, 500, output_size, 1, 2).cuda()
print("model 4 - start training")
train_loop(model4, 10)

model5 = NERNet(input_size, embedding_size, 500, output_size, 2, 2).cuda()
print("model 5 - start training")
train_loop(model5, 10)

model6 = NERNet(input_size, embedding_size, 500, output_size, 3, 2).cuda()
print("model 6 - start training")
train_loop(model6, 10)

model7 = NERNet(input_size, embedding_size, 800, output_size, 1, 2).cuda()
print("model 7 - start training")
train_loop(model7, 10)

model8 = NERNet(input_size, embedding_size, 800, output_size, 2, 2).cuda()
print("model 8 - start training")
train_loop(model8, 10)

model9 = NERNet(input_size, embedding_size, 800, output_size, 3, 2).cuda()
print("model 9 - start training")
train_loop(model9, 10)

model 1 - start training
[1,   500] loss: 2.963
[1,  1000] loss: 2.245
[1,  1500] loss: 1.973
[2,   500] loss: 1.574
[2,  1000] loss: 1.336
[2,  1500] loss: 1.243
[3,   500] loss: 0.966
[3,  1000] loss: 0.877
[3,  1500] loss: 0.851
[4,   500] loss: 0.602
[4,  1000] loss: 0.588
[4,  1500] loss: 0.596
[5,   500] loss: 0.422
[5,  1000] loss: 0.397
[5,  1500] loss: 0.405
[6,   500] loss: 0.252
[6,  1000] loss: 0.257
[6,  1500] loss: 0.264
[7,   500] loss: 0.153
[7,  1000] loss: 0.165
[7,  1500] loss: 0.177
[8,   500] loss: 0.099
[8,  1000] loss: 0.110
[8,  1500] loss: 0.096
[9,   500] loss: 0.059
[9,  1000] loss: 0.065
[9,  1500] loss: 0.068
[10,   500] loss: 0.042
[10,  1000] loss: 0.035
[10,  1500] loss: 0.043
Finished Training
model 2 - start training
[1,   500] loss: 2.866
[1,  1000] loss: 2.192
[1,  1500] loss: 1.845
[2,   500] loss: 1.321
[2,  1000] loss: 1.230
[2,  1500] loss: 1.157
[3,   500] loss: 0.792
[3,  1000] loss: 0.729
[3,  1500] loss: 0.690
[4,   500] loss: 0.442
[4,  1000

In [None]:
evaluate(model1, "hidden_size: 500, n_layers: 1, directions: 1")
evaluate(model2, "hidden_size: 500, n_layers: 2, directions: 1")
evaluate(model3, "hidden_size: 500, n_layers: 3, directions: 1")
evaluate(model4, "hidden_size: 500, n_layers: 1, directions: 2")
evaluate(model5, "hidden_size: 500, n_layers: 2, directions: 2")
evaluate(model6, "hidden_size: 500, n_layers: 3, directions: 2")
evaluate(model7, "hidden_size: 800, n_layers: 1, directions: 2")
evaluate(model8, "hidden_size: 800, n_layers: 2, directions: 2")
evaluate(model9, "hidden_size: 800, n_layers: 3, directions: 2")

hidden_size: 500, n_layers: 1, directions: 1
dev
         recall  precision
O      0.946059   0.925434
B-PER  0.675000   0.572034
I-PER  0.675159   0.736111
B-LOC  0.688525   0.759036
I-LOC  0.434783   0.833333
B-ORG  0.589286   0.626582
I-ORG  0.387931   0.725806
accuracy:  0.615112160566706
test
         recall  precision
O      0.940155   0.937016
B-PER  0.702765   0.571161
I-PER  0.706081   0.694352
B-LOC  0.702624   0.800664
I-LOC  0.471698   0.833333
B-ORG  0.554286   0.552707
I-ORG  0.400000   0.583942
accuracy:  0.6288782816229117 

hidden_size: 500, n_layers: 2, directions: 1
dev
         recall  precision
O      0.964793   0.929661
B-PER  0.685000   0.774011
I-PER  0.713376   0.888889
B-LOC  0.754098   0.793103
I-LOC  0.391304   0.642857
B-ORG  0.571429   0.585366
I-ORG  0.413793   0.640000
accuracy:  0.6375442739079102
test
         recall  precision
O      0.962692   0.933550
B-PER  0.672811   0.724566
I-PER  0.699324   0.778195
B-LOC  0.758017   0.738636
I-LOC  0.528302   

**Task 6:** Download the GloVe embeddings from https://nlp.stanford.edu/projects/glove/ (use the 300-dim vectors from glove.6B.zip). Then intialize the nn.Embedding module in your NERNet with these embeddings, so that you can start your training with pre-trained vectors. Repeat Task 6 and print the results for each model.

Note: make sure that vectors are aligned with the IDs in your Vocab, in other words, make sure that for example the word with ID 0 is the first vector in the GloVe matrix of vectors that you initialize nn.Embedding with. For a dicussion on how to do that, check it this link:
https://discuss.pytorch.org/t/can-we-use-pre-trained-word-embeddings-for-weight-initialization-in-nn-embedding/1222

In [None]:
!wget http://nlp.stanford.edu/data/glove.6B.zip

--2021-06-03 08:26:05--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2021-06-03 08:26:05--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2021-06-03 08:26:05--  http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


2021-0

In [None]:
!unzip glove.6B.zip

Archive:  glove.6B.zip
  inflating: glove.6B.50d.txt        
  inflating: glove.6B.100d.txt       
  inflating: glove.6B.200d.txt       
  inflating: glove.6B.300d.txt       


In [None]:
# use the 300-dim vectors from glove.6B.zip
!rm glove.6B.50d.txt
!rm glove.6B.100d.txt
!rm glove.6B.200d.txt

In [None]:
def load_glove_embeddings(path, word2id):
  # creats embeddings tensor and makes sure that embedding vectors are aligned with the IDs in the vocabulary
  with open(path, 'r') as f:
      embeddings = np.zeros((len(word2id), 300))
      for line in f.readlines():
          temp = line.split()
          word = temp[0]
          id = word2id.get(word)
          if id:
              embed = np.array(temp[1:], dtype='float32')
              embeddings[id] = embed
      return torch.from_numpy(embeddings).float()

In [None]:
glove_embeddings = load_glove_embeddings('glove.6B.300d.txt', vocab.word2id)

In [None]:
glove_embeddings.size()

torch.Size([8955, 300])

In [None]:
glove_embeddings.type()

'torch.FloatTensor'

In [None]:
class NERNetGlove(nn.Module):
    def __init__(self, input_size, embedding_size, hidden_size, output_size, n_layers, directions):
        super(NERNetGlove, self).__init__()
        self.input_size = input_size
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.directions = directions

        self.embedding = nn.Embedding(glove_embeddings.size(0), glove_embeddings.size(1))
        self.embedding.weight = nn.Parameter(glove_embeddings)
        self.lstm = nn.LSTM(embedding_size, hidden_size, n_layers, bidirectional=(True if directions==2 else False))
        self.out = nn.Linear(hidden_size*directions, output_size)
    
    def forward(self, input_sentence):
        embedded = self.embedding(input_sentence)
        lstm, _ = self.lstm(embedded.view(len(input_sentence), 1, -1))
        output = self.out(lstm.view(len(input_sentence), -1))
        return output

In [None]:
def train_loop_glove(model, n_epochs):
  # Loss function
  criterion = nn.CrossEntropyLoss()

  # Optimizer (ADAM is a fancy version of SGD)
  optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
  
  train_loss = []

  for e in range(1, n_epochs + 1):
    train_shuff = sample(train_sequences, len(train_sequences))
    running_loss = 0.0
    for i, sequence in enumerate(train_shuff):
      if len(sequence[0]) == 0:
        continue
      # zero the parameter gradients
      optimizer.zero_grad() 
      
      # forward + backward + optimize
      outputs = model(torch.LongTensor(sequence[0]).cuda())
      loss = criterion(outputs, torch.LongTensor(sequence[1]).cuda())
      loss.backward()
      optimizer.step()

      # print statistics
      running_loss += loss.item()
      if (i+1) % 500 == 0:   
          print('[%d, %5d] loss: %.3f' %
                (e, i + 1, running_loss / 500))
          train_loss.append(running_loss / 500)
          running_loss = 0.0

  print('Finished Training')

In [None]:
def evaluate_glove(model, caption):
  print(caption)
  
  # dev
  print("dev")
  all_predicted = []
  all_tags = []

  with torch.no_grad():
    for sequence in dev_sequences:
      if len(sequence[0]) == 0:
        continue
      predicted = model(torch.LongTensor(sequence[0]).cuda()).max(1).indices
      all_predicted.append(predicted.tolist())
      all_tags.append(sequence[1])
    
    all_predicted = [prediction for sublist in all_predicted for prediction in sublist]
    all_tags = [tag for sublist in all_tags for tag in sublist]

    # calculate recall = TP/(TP+FN) = TP/P and precision = TP/(TP+FP)
    TP = np.zeros(7)
    P = np.zeros(7)
    FP = np.zeros(7)

    for i in range(0,len(all_predicted)):
      P[all_tags[i]] += 1
      if all_tags[i] == all_predicted[i]:
        TP[all_tags[i]] += 1
      if all_tags[i] != all_predicted[i]:
        FP[all_predicted[i]] += 1
    
    recall = np.divide(TP,P)
    precision = np.divide(TP, np.add(TP,FP))

    # calculate accuracy
    correct = 0.0
    counter = 0.0

    for i in range(0,len(all_predicted)):
      if all_tags[i] == 0:
        continue
      counter += 1
      if all_tags[i] == all_predicted[i]:
        correct += 1
    
    accuracy = correct / counter
    
    # print evaluation
    temp = np.concatenate((np.reshape(recall, (7,1)), np.reshape(precision, (7,1))), axis=1)
    df = pd.DataFrame(data=temp, index=['O', 'B-PER', 'I-PER', 'B-LOC', 'I-LOC', 'B-ORG', 'I-ORG'], columns=['recall', 'precision'])
    print(df)
    print("accuracy: ", accuracy)

  # test
  print("test")
  all_predicted = []
  all_tags = []

  with torch.no_grad():
    for sequence in test_sequences:
      if len(sequence[0]) == 0:
        continue
      predicted = model(torch.LongTensor(sequence[0]).cuda()).max(1).indices
      all_predicted.append(predicted.tolist())
      all_tags.append(sequence[1])
    
    all_predicted = [prediction for sublist in all_predicted for prediction in sublist]
    all_tags = [tag for sublist in all_tags for tag in sublist]

    # calculate recall = TP/(TP+FN) = TP/P and precision = TP/(TP+FP)
    TP = np.zeros(7)
    P = np.zeros(7)
    FP = np.zeros(7)

    for i in range(0,len(all_predicted)):
      P[all_tags[i]] += 1
      if all_tags[i] == all_predicted[i]:
        TP[all_tags[i]] += 1
      if all_tags[i] != all_predicted[i]:
        FP[all_predicted[i]] += 1
    
    recall = np.divide(TP,P)
    precision = np.divide(TP, np.add(TP,FP))

    # calculate accuracy
    correct = 0.0
    counter = 0.0

    for i in range(0,len(all_predicted)):
      if all_tags[i] == 0:
        continue
      counter += 1
      if all_tags[i] == all_predicted[i]:
        correct += 1
    
    accuracy = correct / counter
    
    # print evaluation
    temp = np.concatenate((np.reshape(recall, (7,1)), np.reshape(precision, (7,1))), axis=1)
    df = pd.DataFrame(data=temp, index=['O', 'B-PER', 'I-PER', 'B-LOC', 'I-LOC', 'B-ORG', 'I-ORG'], columns=['recall', 'precision'])
    print(df)
    print("accuracy: ", accuracy, "\n")

In [None]:
input_size = len(vocab.word2id)
output_size = 7
embedding_size = 300

model1 = NERNetGlove(input_size, embedding_size, 500, output_size, 1, 1).cuda()
print("model 1 - start training")
train_loop_glove(model1, 10)

model2 = NERNetGlove(input_size, embedding_size, 500, output_size, 2, 1).cuda()
print("model 2 - start training")
train_loop_glove(model2, 10)

model3 = NERNetGlove(input_size, embedding_size, 500, output_size, 3, 1).cuda()
print("model 3 - start training")
train_loop_glove(model3, 10)

model4 = NERNetGlove(input_size, embedding_size, 500, output_size, 1, 2).cuda()
print("model 4 - start training")
train_loop_glove(model4, 10)

model5 = NERNetGlove(input_size, embedding_size, 500, output_size, 2, 2).cuda()
print("model 5 - start training")
train_loop_glove(model5, 10)

model6 = NERNetGlove(input_size, embedding_size, 500, output_size, 3, 2).cuda()
print("model 6 - start training")
train_loop_glove(model6, 10)

model7 = NERNetGlove(input_size, embedding_size, 800, output_size, 1, 2).cuda()
print("model 7 - start training")
train_loop_glove(model7, 10)
'''
model8 = NERNetGlove(input_size, embedding_size, 800, output_size, 2, 2).cuda()
print("model 8 - start training")
train_loop_glove(model8, 10)
'''
model9 = NERNetGlove(input_size, embedding_size, 800, output_size, 3, 2).cuda()
print("model 9 - start training")
train_loop_glove(model9, 10)

model 1 - start training
[1,   500] loss: 2.494
[1,  1000] loss: 1.885
[1,  1500] loss: 1.471
[2,   500] loss: 1.237
[2,  1000] loss: 1.205
[2,  1500] loss: 1.084
[3,   500] loss: 0.975
[3,  1000] loss: 0.875
[3,  1500] loss: 0.791
[4,   500] loss: 0.700
[4,  1000] loss: 0.590
[4,  1500] loss: 0.539
[5,   500] loss: 0.430
[5,  1000] loss: 0.366
[5,  1500] loss: 0.339
[6,   500] loss: 0.257
[6,  1000] loss: 0.257
[6,  1500] loss: 0.223
[7,   500] loss: 0.148
[7,  1000] loss: 0.167
[7,  1500] loss: 0.149
[8,   500] loss: 0.102
[8,  1000] loss: 0.109
[8,  1500] loss: 0.070
[9,   500] loss: 0.065
[9,  1000] loss: 0.059
[9,  1500] loss: 0.051
[10,   500] loss: 0.039
[10,  1000] loss: 0.042
[10,  1500] loss: 0.037
Finished Training
model 2 - start training
[1,   500] loss: 2.486
[1,  1000] loss: 1.661
[1,  1500] loss: 1.440
[2,   500] loss: 1.176
[2,  1000] loss: 1.049
[2,  1500] loss: 0.911
[3,   500] loss: 0.720
[3,  1000] loss: 0.584
[3,  1500] loss: 0.587
[4,   500] loss: 0.423
[4,  1000

In [None]:
evaluate_glove(model1, "hidden_size: 500, n_layers: 1, directions: 1")
evaluate_glove(model2, "hidden_size: 500, n_layers: 2, directions: 1")
evaluate_glove(model3, "hidden_size: 500, n_layers: 3, directions: 1")
evaluate_glove(model4, "hidden_size: 500, n_layers: 1, directions: 2")
evaluate_glove(model5, "hidden_size: 500, n_layers: 2, directions: 2")
evaluate_glove(model6, "hidden_size: 500, n_layers: 3, directions: 2")
evaluate_glove(model7, "hidden_size: 800, n_layers: 1, directions: 2")
evaluate_glove(model8, "hidden_size: 800, n_layers: 2, directions: 2")
evaluate_glove(model9, "hidden_size: 800, n_layers: 3, directions: 2")


hidden_size: 500, n_layers: 1, directions: 1
dev
         recall  precision
O      0.968992   0.973710
B-PER  0.805000   0.865591
I-PER  0.847134   0.943262
B-LOC  0.874317   0.860215
I-LOC  0.608696   0.583333
B-ORG  0.827381   0.646512
I-ORG  0.655172   0.690909
accuracy:  0.8063754427390791
test
         recall  precision
O      0.968783   0.978920
B-PER  0.831797   0.907035
I-PER  0.847973   0.912727
B-LOC  0.862974   0.865497
I-LOC  0.849057   0.818182
B-ORG  0.868571   0.660870
I-ORG  0.740000   0.691589
accuracy:  0.8383054892601431 

hidden_size: 500, n_layers: 2, directions: 1
dev
         recall  precision
O      0.973191   0.976345
B-PER  0.740000   0.865497
I-PER  0.789809   0.953846
B-LOC  0.857923   0.857923
I-LOC  0.608696   0.583333
B-ORG  0.803571   0.584416
I-ORG  0.637931   0.627119
accuracy:  0.7697756788665879
test
         recall  precision
O      0.971067   0.981832
B-PER  0.788018   0.892950
I-PER  0.804054   0.911877
B-LOC  0.857143   0.849711
I-LOC  0.811321  

**Good luck!**