<a href="https://colab.research.google.com/github/jbishop45/CS-7650/blob/project-2/project_2_NER_release_sp23.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Licensing Information:  You are free to use or extend this project for
# educational purposes provided that (1) you do not distribute or publish
# solutions, (2) you retain this notice, and (3) you provide clear
# attribution to The Georgia Institute of Technology, including a link to  https://aritter.github.io/CS-7650-sp22/

# Attribution Information: This assignment was developed at The Georgia Institute of Technology
# by Alan Ritter (alan.ritter@cc.gatech.edu)
# Contributors: Xurui Zhang (Spring 2022)

# Project #2: Named Entity Recognition

In this assignment, you will implement a bidirectional LSTM-CNN-CRF for sequence labeling, following [this paper by Xuezhe Ma and Ed Hovy](https://www.aclweb.org/anthology/P16-1101.pdf), on the CoNLL named entity recognition dataset.  Before starting the assignment, we recommend reading the Ma and Hovy paper.

First, let's import some libraries and make sure the runtime has access to a GPU.


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
    print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
    print('and then re-execute this cell.')
else:
    print(gpu_info)

print(f'GPU available: {torch.cuda.is_available()}')

Mon Mar 20 17:50:37 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P0    24W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Download the Data

Run the following code to download the English part of the CoNLL 2003 dataset, the evaluation script and pre-filtered GloVe embeddings we are providing for this data.

In [3]:
#CoNLL 2003 data
!wget https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.train
!wget https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.testa
!wget https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.testb
!cat eng.train | awk '{print $1 "\t" $4}' > train
!cat eng.testa | awk '{print $1 "\t" $4}' > dev
!cat eng.testb | awk '{print $1 "\t" $4}' > test

#Evaluation Script
!wget https://raw.githubusercontent.com/aritter/twitter_nlp/master/data/annotated/wnut16/conlleval.pl

#Pre-filtered GloVe embeddings
!wget https://raw.githubusercontent.com/aritter/aritter.github.io/master/files/glove.840B.300d.conll_filtered.txt

--2023-03-20 17:50:38--  https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.train
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3283420 (3.1M) [text/plain]
Saving to: ‘eng.train’


2023-03-20 17:50:38 (82.5 MB/s) - ‘eng.train’ saved [3283420/3283420]

--2023-03-20 17:50:39--  https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.testa
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 827443 (808K) [text/plain]
Saving to: ‘eng.testa’


2023-03-20

## CoNLL Data Format

Run the following cell to see a sample of the data in CoNLL format.  As you can see, each line in the file represents a word and its labeled named entity tag in BIO format.  A blank line is used to seperate sentences.

In [4]:
!head -n 20 train

-DOCSTART-	O
	
EU	I-ORG
rejects	O
German	I-MISC
call	O
to	O
boycott	O
British	I-MISC
lamb	O
.	O
	
Peter	I-PER
Blackburn	I-PER
	
BRUSSELS	I-LOC
1996-08-22	O
	
The	O
European	I-ORG


## Reading in the Data

Below we proivide a bit of code to read in data in the CoNLL format.  This also reads in the filtered GloVe embeddings, to save you some effort - we will discuss this more later.

In [5]:
#Read in the training data
def read_conll_format(filename):
    (words, tags, currentSent, currentTags) = ([],[],['-START-'],['START'])
    for line in open(filename).readlines():
        line = line.strip()
        #print(line)
        if line == "":
            currentSent.append('-END-')
            currentTags.append('END')
            words.append(currentSent)
            tags.append(currentTags)
            (currentSent, currentTags) = (['-START-'], ['START'])
        else:
            (word, tag) = line.split()
            currentSent.append(word)
            currentTags.append(tag)
    return (words, tags)

def sentences2char(sentences):
    return [[['start'] + [c for c in w] + ['end'] for w in l] for l in sentences]


(sentences_train, tags_train) = read_conll_format("train")
(sentences_dev, tags_dev)     = read_conll_format("dev")

print(sentences_train[2])
print(tags_train[2])

sentencesChar = sentences2char(sentences_train)

print(sentencesChar[2])

['-START-', 'Peter', 'Blackburn', '-END-']
['START', 'I-PER', 'I-PER', 'END']
[['start', '-', 'S', 'T', 'A', 'R', 'T', '-', 'end'], ['start', 'P', 'e', 't', 'e', 'r', 'end'], ['start', 'B', 'l', 'a', 'c', 'k', 'b', 'u', 'r', 'n', 'end'], ['start', '-', 'E', 'N', 'D', '-', 'end']]


In [6]:
#Read GloVe embeddings.
def read_GloVe(filename):
    embeddings = {}
    for line in open(filename).readlines():
        #print(line)
        fields = line.strip().split(" ")
        word = fields[0]
        embeddings[word] = [float(x) for x in fields[1:]]
    return embeddings

GloVe = read_GloVe("glove.840B.300d.conll_filtered.txt")

print(GloVe["the"])
print("dimension of glove embedding:", len(GloVe["the"]))

[0.27204, -0.06203, -0.1884, 0.023225, -0.018158, 0.0067192, -0.13877, 0.17708, 0.17709, 2.5882, -0.35179, -0.17312, 0.43285, -0.10708, 0.15006, -0.19982, -0.19093, 1.1871, -0.16207, -0.23538, 0.003664, -0.19156, -0.085662, 0.039199, -0.066449, -0.04209, -0.19122, 0.011679, -0.37138, 0.21886, 0.0011423, 0.4319, -0.14205, 0.38059, 0.30654, 0.020167, -0.18316, -0.0065186, -0.0080549, -0.12063, 0.027507, 0.29839, -0.22896, -0.22882, 0.14671, -0.076301, -0.1268, -0.0066651, -0.052795, 0.14258, 0.1561, 0.05551, -0.16149, 0.09629, -0.076533, -0.049971, -0.010195, -0.047641, -0.16679, -0.2394, 0.0050141, -0.049175, 0.013338, 0.41923, -0.10104, 0.015111, -0.077706, -0.13471, 0.119, 0.10802, 0.21061, -0.051904, 0.18527, 0.17856, 0.041293, -0.014385, -0.082567, -0.035483, -0.076173, -0.045367, 0.089281, 0.33672, -0.22099, -0.0067275, 0.23983, -0.23147, -0.88592, 0.091297, -0.012123, 0.013233, -0.25799, -0.02972, 0.016754, 0.01369, 0.32377, 0.039546, 0.042114, -0.088243, 0.30318, 0.087747, 0.1634

## Mapping Tokens to Indices

As in the last project, we will need to convert words in the dataset to numeric indices, so they can be presented as input to a neural network.  Code to handle this for you with sample usage is provided below.

In [7]:
#Create mappings between tokens and indices.

from collections import Counter
import random

#Will need this later to remove 50% of words that only appear once in the training data from the vocabulary (and don't have GloVe embeddings).
wordCounts = Counter([w for l in sentences_train for w in l])
charCounts = Counter([c for l in sentences_train for w in l for c in w])
singletons = set([w for (w,c) in wordCounts.items() if c == 1 and not w in GloVe.keys()])
charSingletons = set([w for (w,c) in charCounts.items() if c == 1])

#Build dictionaries to map from words, characters to indices and vice versa.
#Save first two words in the vocabulary for padding and "UNK" token.
word2i = {w:i+2 for i,w in enumerate(set([w for l in sentences_train for w in l] + list(GloVe.keys())))}
char2i = {w:i+2 for i,w in enumerate(set([c for l in sentencesChar for w in l for c in w]))}
i2word = {i:w for w,i in word2i.items()}
i2char = {i:w for w,i in char2i.items()}

vocab_size = max(word2i.values()) + 1
char_vocab_size = max(char2i.values()) + 1

#Tag dictionaries.
tag2i = {w:i for i,w in enumerate(set([t for l in tags_train for t in l]))}
i2tag = {i:t for t,i in tag2i.items()}

#When training, randomly replace singletons with UNK tokens sometimes to simulate situation at test time.
def getDictionaryRandomUnk(w, dictionary, train=False):
    if train and (w in singletons and random.random() > 0.5):
        return 1
    else:
        return dictionary.get(w, 1)

#Map a list of sentences from words to indices.
def sentences2indices(words, dictionary, train=False):
    #1.0 => UNK
    return [[getDictionaryRandomUnk(w,dictionary, train=train) for w in l] for l in words]

#Map a list of sentences containing to indices (character indices)
def sentences2indicesChar(chars, dictionary):
    #1.0 => UNK
    return [[[dictionary.get(c,1) for c in w] for w in l] for l in chars]

#Indices
X       = sentences2indices(sentences_train, word2i, train=True)
X_char  = sentences2indicesChar(sentencesChar, char2i)
Y       = sentences2indices(tags_train, tag2i)

print("vocab size:", vocab_size)
print("char vocab size:", char_vocab_size)
print()

print("index of word 'the':", word2i["the"])
print("word of index 253:", i2word[253])
print()

#Print out some examples of what the dev inputs will look like
for i in range(10):
    print(" ".join([i2word.get(w,'UNK') for w in X[i]]))

vocab size: 29148
char vocab size: 88

index of word 'the': 858
word of index 253: Kouame

-START- -DOCSTART- -END-
-START- EU rejects German call to boycott British lamb . -END-
-START- Peter Blackburn -END-
-START- BRUSSELS 1996-08-22 -END-
-START- The European Commission said on Thursday it disagreed with German advice to consumers to shun British lamb until scientists determine whether mad cow disease can be transmitted to sheep . -END-
-START- Germany 's representative to the European Union 's veterinary committee Werner Zwingmann said on Wednesday consumers should buy sheepmeat from countries other than Britain until the scientific advice was clearer . -END-
-START- " We do n't support any such recommendation because we do n't see any grounds for it , " the Commission 's chief spokesman Nikolaus van der Pas told a news briefing . -END-
-START- He said further scientific study was required and if it was found that action was needed it should be taken by the European Union . -END-


In [26]:
[print(i,i2word[i],word2i[i2word[i]]) for i in range(2,10)]

2 6-8 2
3 Bloomsbury 3
4 94.33 4
5 Anguita 5
6 CHANG 6
7 predicting 7
8 slides 8
9 Ice 9


[None, None, None, None, None, None, None, None]

## Padding and Batching

In this assignment, you should train your models using minibatched SGD, rather than using a batch size of 1 as we did in the previous project.  When presenting multiple sentences to the network at the same time, we will need to pad them to be of the same length. We use [torch.nn.utils.rnn.pad_sequence](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pad_sequence.html) to do so.

Below we provide some code to prepare batches of data to present to the network.  pad the sequence so that all the sequences have the same length.

**Side Note:** PyTorch includes utilities in [`torch.utils.data`](https://pytorch.org/docs/stable/data.html) to help with padding, batching, shuffling and some other things, but for this assignment we will do everything from scratch to help you see exactly how this works.

In [8]:
#Pad inputs to max sequence length (for batching)
def prepare_input(X_list):
    X_padded = torch.nn.utils.rnn.pad_sequence([torch.as_tensor(l) for l in X_list], batch_first=True).type(torch.LongTensor) # padding the sequences with 0
    X_mask   = torch.nn.utils.rnn.pad_sequence([torch.as_tensor([1.0] * len(l)) for l in X_list], batch_first=True).type(torch.FloatTensor) # consisting of 0 and 1, 0 for padded positions, 1 for non-padded positions
    return (X_padded, X_mask)

#Maximum word length (for character representations)
MAX_CLEN=32

def prepare_input_char(X_list):
    MAX_SLEN = max([len(l) for l in X_list])
    X_padded  = [l + [[]]*(MAX_SLEN-len(l))  for l in X_list]
    X_padded  = [[w[0:MAX_CLEN] for w in l] for l in X_padded]
    X_padded  = [[w + [1]*(MAX_CLEN-len(w)) for w in l] for l in X_padded]
    return torch.as_tensor(X_padded).type(torch.LongTensor)

#Pad outputs using one-hot encoding
def prepare_output_onehot(Y_list, NUM_TAGS=max(tag2i.values())+1):
    Y_onehot = [torch.zeros(len(l), NUM_TAGS) for l in Y_list]
    for i in range(len(Y_list)):
        for j in range(len(Y_list[i])):
            Y_onehot[i][j,Y_list[i][j]] = 1.0
    Y_padded = torch.nn.utils.rnn.pad_sequence(Y_onehot, batch_first=True).type(torch.FloatTensor)
    return Y_padded

print("max slen:", max([len(x) for x in X_char]))  #Max sequence length in the training data is 39.

(X_padded, X_mask) = prepare_input(X)
X_padded_char      = prepare_input_char(X_char)
Y_onehot           = prepare_output_onehot(Y)

print("X_padded:", X_padded.shape)
print("X_mask:", X_mask.shape)
print("X_padded_char:", X_padded_char.shape)
print("Y_onehot:", Y_onehot.shape)

max slen: 115
X_padded: torch.Size([14987, 115])
X_mask: torch.Size([14987, 115])
X_padded_char: torch.Size([14987, 115, 32])
Y_onehot: torch.Size([14987, 115, 10])


In [9]:
input = prepare_input(X[0:5])[0]
print("input: ", input.shape)

input:  torch.Size([5, 32])


## **Your code starts here:** Basic LSTM Tagger (10 points)

OK, now you should have everything you need to get started.

Recall that your goal is to to implement the BiLSTM-CNN-CRF, as described in [(Ma and Hovy, 2016)](https://www.aclweb.org/anthology/P16-1101.pdf).  This is a relatively complex network with various components.  Below we provide starter code to break down your implementation into increasingly complex versions of the final model, starting with a Basic LSTM tagger.  This way you can be confident that each part is working correctly before incrementally increasing the complexity of your implementation.  This is generally a good approach to take when implementing complex models, since buggy PyTorch code is often partially working, but produces worse results than a correct implementation, so it's hard to know whether added complexities are helping or hurting.  Also, if you aren't able to match published results it's hard to know which component of your model has the problem (or even whether or not it is a problem in the published result!)

Fill in the functions marked as `TODO` in the code block below.  If everything is working correctly, you should be able to achieve an **F1 score of 0.87 on the dev set and 0.83 on the test set (with GloVe embeddings)**. You are required to initialize word embeddings with GloVe later, but you can randomly initialize the word embeddings in the beginning.

In [37]:
class BasicLSTMtagger(nn.Module):
    def __init__(self, DIM_EMB=10, DIM_HID=10, VOCAB_SIZE=29148, debug=False):
        super(BasicLSTMtagger, self).__init__()
        NUM_TAGS = max(tag2i.values())+1
        (self.DIM_EMB, self.NUM_TAGS) = (DIM_EMB, NUM_TAGS)

        #TODO: initialize parameters - embedding layer, nn.LSTM, nn.Linear and nn.LogSoftmax
        bidirectional=True
        in_features = DIM_HID*2 if bidirectional else DIM_HID

        self.word_embeddings = nn.Embedding(num_embeddings=VOCAB_SIZE,embedding_dim=DIM_EMB)
        self.lstm = nn.LSTM(input_size=DIM_EMB, hidden_size=DIM_HID, batch_first=True, bidirectional=bidirectional)
        self.hidden2tag = nn.Linear(in_features=in_features, out_features=NUM_TAGS)

        self.debug = debug
        if self.debug:
          print('VOCAB_SIZE: ' + str(VOCAB_SIZE))
          print('NUM_TAGS: '  +str(NUM_TAGS))
          print('DIM_EMB: ' + str(DIM_EMB))
          print('DIM_HID: ' + str(DIM_HID))
          print('bidirectional?: ' + str(bidirectional) + '\n')

        

    def forward(self, X, train=False):
        # X is X_padded from prepare_input()
        #TODO: Implement the forward computation.
        embeds = self.word_embeddings(X)
        if self.debug:
          print('embeds: ' + str(embeds.shape))

        lstm_out,_ = self.lstm(embeds)
        if self.debug:
          print('lstm_out: ' + str(lstm_out.shape))

        tag_space = self.hidden2tag(lstm_out)
        if self.debug:
          print('tag_space: ' + str(tag_space.shape))

        tag_scores = torch.nn.functional.log_softmax(tag_space,dim=1)
        if self.debug:
          print('tag_scores: ' + str(tag_scores.shape) + '\n')

        return tag_scores
        #return torch.randn((X.shape[0], X.shape[1], self.NUM_TAGS))  #Random baseline.

    def init_glove(self, GloVe):
        #TODO: initialize word embeddings using GloVe (you can skip this part in your first version, if you want, see instructions below).
        for i in range(2,self.word_embeddings.num_embeddings):
          word = i2word[i]
          if word not in singletons:
            try: 
              self.word_embeddings.weight[i,:] = torch.tensor(GloVe[word])
            except KeyError:
              print(i,word,'neither in singletons nor GloVe')
        pass

    def inference(self, sentences):
        X       = prepare_input(sentences2indices(sentences, word2i))[0].cuda()
        pred = self.forward(X).argmax(dim=2)
        return [[i2tag[pred[i,j].item()] for j in range(len(sentences[i]))] for i in range(len(sentences))]

    def print_predictions(self, words, tags):
        Y_pred = self.inference(words)
        for i in range(len(words)):
            print("----------------------------")
            print(" ".join([f"{words[i][j]}/{Y_pred[i][j]}/{tags[i][j]}" for j in range(len(words[i]))]))
            print("Predicted:\t", Y_pred[i])
            print("Gold:\t\t", tags[i])

    def write_predictions(self, sentences, outFile):
        fOut = open(outFile, 'w')
        for s in sentences:
            y = self.inference([s])[0]
            #print("\n".join(y[1:len(y)-1]))
            fOut.write("\n".join(y[1:len(y)-1]))  #Skip start and end tokens
            fOut.write("\n\n")

#The following code will initialize a model and test that your forward computation runs without errors.
lstm_test   = BasicLSTMtagger(DIM_HID=7, DIM_EMB=300)
lstm_output = lstm_test.forward(prepare_input(X[0:5])[0]) # torch.Size([5, 32])
Y_onehot    = prepare_output_onehot(Y[0:5])

#Check the shape of the lstm_output and one-hot label tensors.
print("lstm output shape:", lstm_output.shape)
print("Y onehot shape:", Y_onehot.shape, "\n")

lstm output shape: torch.Size([5, 32, 10])
Y onehot shape: torch.Size([5, 32, 10]) 



In [38]:
#Read in the data

(sentences_dev, tags_dev)     = read_conll_format('dev')
(sentences_train, tags_train) = read_conll_format('train')
(sentences_test, tags_test)   = read_conll_format('test')

# Train your Model (10 points)

Next, implement the function below to train your basic BiLSTM tagger.  See [torch.nn.lstm](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html).  Make sure to save your predictions on the test set (`test_pred_lstm.txt`) for submission to GradeScope. Feel free to change number of epochs, optimizer, learning rate and batch size.

In [42]:
#Training

from random import sample
import tqdm
import os
import subprocess
import random

def shuffle_sentences(sentences, tags):
    shuffled_sentences = []
    shuffled_tags      = []
    indices = list(range(len(sentences)))
    random.shuffle(indices)
    for i in indices:
        shuffled_sentences.append(sentences[i])
        shuffled_tags.append(tags[i])
    return (shuffled_sentences, shuffled_tags)

nEpochs = 100

def train_basic_lstm(sentences, tags, lstm, glove):
  #TODO: initialize optimizer
    #loss_function = nn.NLLLoss()
    with torch.no_grad():
      lstm.init_glove(glove)

    optimizer = optim.Adadelta(lstm.parameters(),lr=0.5) #lr=0.1)
    #optimizer = optim.SGD(lstm.parameters(),lr=0.1)
    batchSize = 50

    for epoch in range(nEpochs):
        totalLoss = 0.0

        (sentences_shuffled, tags_shuffled) = shuffle_sentences(sentences, tags)
        if lstm.debug:
          print('sentences shuffled: ' + str(len(sentences_shuffled)))
          print('tags shuffled: ' + str(len(tags_shuffled)))
        for batch in tqdm.notebook.tqdm(range(0, len(sentences), batchSize), leave=False):
            lstm.zero_grad()
            #TODO: Implement gradient update.
              # https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

            X_batch = sentences2indices(sentences_shuffled[batch:batch+batchSize], word2i, train=True) #train=True
            Y_batch = sentences2indices(tags_shuffled[batch:batch+batchSize], tag2i)
            if lstm.debug:
              print('len(X_batch): ' + str(len(X_batch)))

            X_batch_prepared = prepare_input(X_batch)[0].cuda() # [0] returns padded sequence, [1] returns mask
            Y_batch_onehot   = prepare_output_onehot(Y_batch).cuda() #.argmax(dim=1)
            if lstm.debug:
              print('X_batch_prepared: ' + str(X_batch_prepared.shape))
              print('Y_onehot: ' + str(Y_batch_onehot.shape))

            pred = lstm.forward(X_batch_prepared) #.argmax(dim=1)
            if lstm.debug:
              print('pred: ' + str(pred.shape))
            
            #loss = loss_function(pred, Y_batch_onehot)
            loss = torch.einsum('bij,bij',torch.neg(pred),Y_batch_onehot) / batchSize
            loss.backward()
            optimizer.step()
            totalLoss += loss

        print(f"loss on epoch {epoch} = {totalLoss}")
        lstm.write_predictions(sentences_dev, 'dev_pred')   #Performance on dev set
        print('conlleval:')
        print(subprocess.Popen('paste dev dev_pred | perl conlleval.pl -d "\t"', shell=True, stdout=subprocess.PIPE,stderr=subprocess.STDOUT).communicate()[0].decode('UTF-8'))

        if epoch % 10 == 0:
            s = sample(range(len(sentences_dev)), 5)
            lstm.print_predictions([sentences_dev[i] for i in s], [tags_dev[i] for i in s])

torch.manual_seed(1)

lstm = BasicLSTMtagger(DIM_HID=500, DIM_EMB=300, debug=False).cuda()
train_basic_lstm(sentences_train, tags_train, lstm, GloVe)

126 Times-Stock neither in singletons nor GloVe
186 AL-ANWAR neither in singletons nor GloVe
219 VEREINSBANK neither in singletons nor GloVe
297 AD-DIYAR neither in singletons nor GloVe
323 Tirgoviste neither in singletons nor GloVe
353 Becanovic neither in singletons nor GloVe
488 UNDER-21 neither in singletons nor GloVe
511 4-205 neither in singletons nor GloVe
647 Silpa-archa neither in singletons nor GloVe
693 BAYERISCHE neither in singletons nor GloVe
738 55,000-B neither in singletons nor GloVe
755 Lambrecks neither in singletons nor GloVe
780 Esnaider neither in singletons nor GloVe
841 Moada neither in singletons nor GloVe
944 unimported neither in singletons nor GloVe
958 Cevaer neither in singletons nor GloVe
1191 38:32.149 neither in singletons nor GloVe
1219 Constructorul neither in singletons nor GloVe
1375 Sitanyi neither in singletons nor GloVe
1447 Mushota neither in singletons nor GloVe
1531 Drnovice neither in singletons nor GloVe
1667 42-2-2423-0003 neither in single

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 0 = 10974.0478515625
conlleval:
processed 51578 tokens with 5942 phrases; found: 15777 phrases; correct: 3625.
accuracy:  74.96%; precision:  22.98%; recall:  61.01%; FB1:  33.38
              LOC: precision:  32.11%; recall:  73.54%; FB1:  44.70  4208
             MISC: precision:  12.45%; recall:  63.45%; FB1:  20.81  4699
              ORG: precision:  13.34%; recall:  33.93%; FB1:  19.15  3411
              PER: precision:  35.68%; recall:  66.99%; FB1:  46.56  3459

----------------------------
-START-/START/START France/I-LOC/I-LOC on/O/O Friday/O/O expelled/O/O another/O/O African/I-MISC/I-MISC man/O/O seized/O/O in/O/O a/O/O police/O/O raid/O/O on/O/O a/O/O Paris/I-LOC/I-LOC church/O/O as/O/O about/O/O 100/B-MISC/O Air/I-ORG/I-ORG France/I-LOC/I-ORG workers/O/O denounced/I-PER/O "/O/O charters/I-PER/O of/O/O shame/I-PER/O "/O/O used/O/O to/O/O fly/O/O illegal/O/O immigrants/B-LOC/O home/O/O ./O/O -END-/END/END
Predicted:	 ['START', 'I-LOC', 'O', 'O', 'O', 'O', 'I-

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 1 = 10593.529296875
conlleval:
processed 51578 tokens with 5942 phrases; found: 16035 phrases; correct: 3644.
accuracy:  75.10%; precision:  22.73%; recall:  61.33%; FB1:  33.16
              LOC: precision:  28.91%; recall:  72.24%; FB1:  41.29  4590
             MISC: precision:  15.86%; recall:  65.94%; FB1:  25.57  3833
              ORG: precision:  10.48%; recall:  32.14%; FB1:  15.80  4113
              PER: precision:  36.52%; recall:  69.38%; FB1:  47.86  3499



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 2 = 10524.1220703125
conlleval:
processed 51578 tokens with 5942 phrases; found: 16731 phrases; correct: 3720.
accuracy:  73.71%; precision:  22.23%; recall:  62.61%; FB1:  32.81
              LOC: precision:  28.92%; recall:  72.73%; FB1:  41.39  4619
             MISC: precision:  15.26%; recall:  66.81%; FB1:  24.84  4037
              ORG: precision:  10.43%; recall:  37.29%; FB1:  16.31  4792
              PER: precision:  38.62%; recall:  68.84%; FB1:  49.48  3283



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 3 = 10485.642578125
conlleval:
processed 51578 tokens with 5942 phrases; found: 16917 phrases; correct: 3696.
accuracy:  73.49%; precision:  21.85%; recall:  62.20%; FB1:  32.34
              LOC: precision:  27.96%; recall:  72.56%; FB1:  40.37  4767
             MISC: precision:  16.43%; recall:  67.46%; FB1:  26.43  3785
              ORG: precision:   9.25%; recall:  36.69%; FB1:  14.78  5318
              PER: precision:  40.99%; recall:  67.81%; FB1:  51.09  3047



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 4 = 10457.787109375
conlleval:
processed 51578 tokens with 5942 phrases; found: 17041 phrases; correct: 3687.
accuracy:  73.29%; precision:  21.64%; recall:  62.05%; FB1:  32.08
              LOC: precision:  27.97%; recall:  73.60%; FB1:  40.54  4833
             MISC: precision:  16.43%; recall:  68.00%; FB1:  26.46  3817
              ORG: precision:   8.95%; recall:  35.79%; FB1:  14.32  5362
              PER: precision:  40.54%; recall:  66.67%; FB1:  50.42  3029



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 5 = 10438.181640625
conlleval:
processed 51578 tokens with 5942 phrases; found: 17786 phrases; correct: 3680.
accuracy:  71.61%; precision:  20.69%; recall:  61.93%; FB1:  31.02
              LOC: precision:  25.64%; recall:  71.42%; FB1:  37.73  5118
             MISC: precision:  16.23%; recall:  68.33%; FB1:  26.23  3881
              ORG: precision:   8.45%; recall:  37.21%; FB1:  13.78  5904
              PER: precision:  42.98%; recall:  67.26%; FB1:  52.44  2883



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 6 = 10423.0888671875
conlleval:
processed 51578 tokens with 5942 phrases; found: 18029 phrases; correct: 3751.
accuracy:  71.53%; precision:  20.81%; recall:  63.13%; FB1:  31.30
              LOC: precision:  28.87%; recall:  74.25%; FB1:  41.57  4725
             MISC: precision:  16.13%; recall:  68.00%; FB1:  26.08  3886
              ORG: precision:   7.74%; recall:  37.14%; FB1:  12.81  6432
              PER: precision:  42.26%; recall:  68.51%; FB1:  52.28  2986



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 7 = 10410.75
conlleval:
processed 51578 tokens with 5942 phrases; found: 18155 phrases; correct: 3730.
accuracy:  71.27%; precision:  20.55%; recall:  62.77%; FB1:  30.96
              LOC: precision:  27.17%; recall:  74.69%; FB1:  39.85  5049
             MISC: precision:  16.30%; recall:  69.20%; FB1:  26.39  3914
              ORG: precision:   7.70%; recall:  36.91%; FB1:  12.74  6431
              PER: precision:  44.37%; recall:  66.50%; FB1:  53.23  2761



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 8 = 10401.4033203125
conlleval:
processed 51578 tokens with 5942 phrases; found: 18286 phrases; correct: 3611.
accuracy:  70.83%; precision:  19.75%; recall:  60.77%; FB1:  29.81
              LOC: precision:  26.65%; recall:  72.29%; FB1:  38.94  4984
             MISC: precision:  15.94%; recall:  68.33%; FB1:  25.85  3952
              ORG: precision:   6.85%; recall:  32.96%; FB1:  11.35  6450
              PER: precision:  41.76%; recall:  65.74%; FB1:  51.08  2900



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 9 = 10391.6591796875
conlleval:
processed 51578 tokens with 5942 phrases; found: 18757 phrases; correct: 3675.
accuracy:  69.82%; precision:  19.59%; recall:  61.85%; FB1:  29.76
              LOC: precision:  26.43%; recall:  72.84%; FB1:  38.78  5063
             MISC: precision:  15.47%; recall:  68.76%; FB1:  25.26  4098
              ORG: precision:   6.76%; recall:  33.63%; FB1:  11.26  6670
              PER: precision:  42.79%; recall:  67.97%; FB1:  52.52  2926



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 10 = 10385.3212890625
conlleval:
processed 51578 tokens with 5942 phrases; found: 19215 phrases; correct: 3613.
accuracy:  68.99%; precision:  18.80%; recall:  60.80%; FB1:  28.72
              LOC: precision:  26.93%; recall:  73.22%; FB1:  39.37  4995
             MISC: precision:  15.08%; recall:  68.87%; FB1:  24.74  4211
              ORG: precision:   6.13%; recall:  32.74%; FB1:  10.32  7163
              PER: precision:  41.95%; recall:  64.82%; FB1:  50.94  2846

----------------------------
-START-/START/START Magna/I-LOC/I-ORG 's/B-LOC/O traditional/I-MISC/O strength/O/O has/O/O been/O/O instrument/O/O panels/O/O ,/O/O door/O/O panels/O/O and/O/O other/O/O interior/I-MISC/O components/B-MISC/O ./B-ORG/O -END-/END/END
Predicted:	 ['START', 'I-LOC', 'B-LOC', 'I-MISC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'I-MISC', 'B-MISC', 'B-ORG', 'END']
Gold:		 ['START', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'END']
---

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 11 = 10378.6435546875
conlleval:
processed 51578 tokens with 5942 phrases; found: 19292 phrases; correct: 3615.
accuracy:  68.69%; precision:  18.74%; recall:  60.84%; FB1:  28.65
              LOC: precision:  27.30%; recall:  73.43%; FB1:  39.81  4941
             MISC: precision:  14.98%; recall:  68.87%; FB1:  24.61  4238
              ORG: precision:   6.05%; recall:  32.66%; FB1:  10.21  7235
              PER: precision:  41.45%; recall:  64.77%; FB1:  50.55  2878



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 12 = 10374.25390625
conlleval:
processed 51578 tokens with 5942 phrases; found: 19133 phrases; correct: 3626.
accuracy:  69.03%; precision:  18.95%; recall:  61.02%; FB1:  28.92
              LOC: precision:  26.87%; recall:  73.05%; FB1:  39.29  4995
             MISC: precision:  15.66%; recall:  69.63%; FB1:  25.57  4099
              ORG: precision:   5.90%; recall:  31.32%; FB1:   9.93  7118
              PER: precision:  41.83%; recall:  66.34%; FB1:  51.31  2921



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 13 = 10368.884765625
conlleval:
processed 51578 tokens with 5942 phrases; found: 19760 phrases; correct: 3560.
accuracy:  67.69%; precision:  18.02%; recall:  59.91%; FB1:  27.70
              LOC: precision:  27.39%; recall:  73.05%; FB1:  39.85  4899
             MISC: precision:  14.65%; recall:  67.90%; FB1:  24.10  4273
              ORG: precision:   5.49%; recall:  31.54%; FB1:   9.35  7706
              PER: precision:  40.56%; recall:  63.46%; FB1:  49.49  2882



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 14 = 10365.701171875
conlleval:
processed 51578 tokens with 5942 phrases; found: 19536 phrases; correct: 3585.
accuracy:  68.12%; precision:  18.35%; recall:  60.33%; FB1:  28.14
              LOC: precision:  27.44%; recall:  72.46%; FB1:  39.80  4851
             MISC: precision:  15.02%; recall:  69.31%; FB1:  24.70  4253
              ORG: precision:   5.74%; recall:  32.21%; FB1:   9.75  7524
              PER: precision:  40.68%; recall:  64.22%; FB1:  49.81  2908



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 15 = 10361.9482421875
conlleval:
processed 51578 tokens with 5942 phrases; found: 19562 phrases; correct: 3598.
accuracy:  68.38%; precision:  18.39%; recall:  60.55%; FB1:  28.22
              LOC: precision:  28.07%; recall:  73.49%; FB1:  40.63  4809
             MISC: precision:  15.32%; recall:  69.09%; FB1:  25.07  4159
              ORG: precision:   5.71%; recall:  33.33%; FB1:   9.76  7823
              PER: precision:  42.01%; recall:  63.19%; FB1:  50.47  2771



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 16 = 10358.4443359375
conlleval:
processed 51578 tokens with 5942 phrases; found: 19709 phrases; correct: 3624.
accuracy:  67.85%; precision:  18.39%; recall:  60.99%; FB1:  28.26
              LOC: precision:  28.33%; recall:  74.31%; FB1:  41.02  4819
             MISC: precision:  14.68%; recall:  69.09%; FB1:  24.21  4340
              ORG: precision:   5.91%; recall:  34.23%; FB1:  10.09  7761
              PER: precision:  41.70%; recall:  63.14%; FB1:  50.23  2789



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 17 = 10356.2587890625
conlleval:
processed 51578 tokens with 5942 phrases; found: 19848 phrases; correct: 3561.
accuracy:  67.59%; precision:  17.94%; recall:  59.93%; FB1:  27.62
              LOC: precision:  28.50%; recall:  72.84%; FB1:  40.97  4694
             MISC: precision:  15.29%; recall:  69.63%; FB1:  25.08  4198
              ORG: precision:   5.15%; recall:  31.32%; FB1:   8.84  8163
              PER: precision:  41.57%; recall:  63.03%; FB1:  50.10  2793



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 18 = 10353.49609375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20024 phrases; correct: 3576.
accuracy:  67.19%; precision:  17.86%; recall:  60.18%; FB1:  27.54
              LOC: precision:  28.04%; recall:  72.46%; FB1:  40.43  4747
             MISC: precision:  14.98%; recall:  69.52%; FB1:  24.65  4278
              ORG: precision:   5.28%; recall:  31.84%; FB1:   9.06  8084
              PER: precision:  40.38%; recall:  63.90%; FB1:  49.48  2915



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 19 = 10352.060546875
conlleval:
processed 51578 tokens with 5942 phrases; found: 20196 phrases; correct: 3582.
accuracy:  66.88%; precision:  17.74%; recall:  60.28%; FB1:  27.41
              LOC: precision:  27.91%; recall:  73.05%; FB1:  40.39  4809
             MISC: precision:  14.65%; recall:  69.20%; FB1:  24.18  4355
              ORG: precision:   5.31%; recall:  31.92%; FB1:   9.11  8058
              PER: precision:  39.48%; recall:  63.74%; FB1:  48.75  2974



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 20 = 10350.0693359375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20105 phrases; correct: 3555.
accuracy:  67.12%; precision:  17.68%; recall:  59.83%; FB1:  27.30
              LOC: precision:  29.26%; recall:  73.38%; FB1:  41.84  4607
             MISC: precision:  14.28%; recall:  68.66%; FB1:  23.65  4432
              ORG: precision:   5.38%; recall:  33.33%; FB1:   9.26  8309
              PER: precision:  40.88%; recall:  61.18%; FB1:  49.01  2757

----------------------------
-START-/START/START Mauritania/I-LOC/I-LOC 's/O/O soccer/O/O federation/O/O dissolved/O/O the/O/O national/O/O team/O/O and/O/O suspended/O/O this/O/O season/O/O 's/O/O domestic/O/O championship/O/O on/O/O Saturday/O/O in/O/O the/O/O wake/O/O of/O/O the/O/O country/O/O 's/O/O failure/O/O to/O/O qualify/O/O for/O/O the/B-ORG/O African/I-ORG/I-MISC Nations/B-MISC/I-MISC '/B-ORG/I-MISC Cup/I-MISC/I-MISC ./B-ORG/O -END-/END/END
Predicted:	 ['START', 'I-LOC', 'O', 'O', 'O', 'O', '

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 21 = 10348.89453125
conlleval:
processed 51578 tokens with 5942 phrases; found: 20354 phrases; correct: 3576.
accuracy:  66.60%; precision:  17.57%; recall:  60.18%; FB1:  27.20
              LOC: precision:  28.71%; recall:  73.05%; FB1:  41.22  4675
             MISC: precision:  14.46%; recall:  69.20%; FB1:  23.92  4413
              ORG: precision:   5.34%; recall:  33.48%; FB1:   9.20  8416
              PER: precision:  40.25%; recall:  62.27%; FB1:  48.89  2850



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 22 = 10347.0537109375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20155 phrases; correct: 3577.
accuracy:  67.01%; precision:  17.75%; recall:  60.20%; FB1:  27.41
              LOC: precision:  28.65%; recall:  72.78%; FB1:  41.12  4666
             MISC: precision:  14.55%; recall:  69.74%; FB1:  24.08  4418
              ORG: precision:   5.04%; recall:  31.10%; FB1:   8.68  8272
              PER: precision:  42.16%; recall:  64.06%; FB1:  50.85  2799



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 23 = 10346.0849609375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20068 phrases; correct: 3548.
accuracy:  67.26%; precision:  17.68%; recall:  59.71%; FB1:  27.28
              LOC: precision:  28.52%; recall:  73.54%; FB1:  41.10  4737
             MISC: precision:  14.26%; recall:  68.76%; FB1:  23.62  4446
              ORG: precision:   5.17%; recall:  31.39%; FB1:   8.88  8138
              PER: precision:  41.57%; recall:  62.00%; FB1:  49.77  2747



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 24 = 10344.4443359375
conlleval:
processed 51578 tokens with 5942 phrases; found: 19916 phrases; correct: 3545.
accuracy:  67.66%; precision:  17.80%; recall:  59.66%; FB1:  27.42
              LOC: precision:  28.96%; recall:  73.92%; FB1:  41.62  4689
             MISC: precision:  14.67%; recall:  68.98%; FB1:  24.20  4335
              ORG: precision:   5.26%; recall:  32.07%; FB1:   9.04  8176
              PER: precision:  41.27%; recall:  60.86%; FB1:  49.19  2716



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 25 = 10343.6474609375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20487 phrases; correct: 3573.
accuracy:  66.49%; precision:  17.44%; recall:  60.13%; FB1:  27.04
              LOC: precision:  28.14%; recall:  72.35%; FB1:  40.52  4723
             MISC: precision:  14.04%; recall:  68.00%; FB1:  23.28  4465
              ORG: precision:   5.24%; recall:  33.04%; FB1:   9.05  8449
              PER: precision:  41.19%; recall:  63.74%; FB1:  50.04  2850



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 26 = 10342.615234375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20000 phrases; correct: 3578.
accuracy:  67.60%; precision:  17.89%; recall:  60.22%; FB1:  27.58
              LOC: precision:  29.39%; recall:  72.95%; FB1:  41.89  4560
             MISC: precision:  15.17%; recall:  68.76%; FB1:  24.86  4179
              ORG: precision:   5.13%; recall:  32.21%; FB1:   8.85  8423
              PER: precision:  41.30%; recall:  63.63%; FB1:  50.09  2838



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 27 = 10341.541015625
conlleval:
processed 51578 tokens with 5942 phrases; found: 20395 phrases; correct: 3562.
accuracy:  66.71%; precision:  17.47%; recall:  59.95%; FB1:  27.05
              LOC: precision:  29.42%; recall:  73.43%; FB1:  42.01  4586
             MISC: precision:  14.84%; recall:  69.31%; FB1:  24.44  4307
              ORG: precision:   4.85%; recall:  31.39%; FB1:   8.40  8686
              PER: precision:  40.94%; recall:  62.60%; FB1:  49.51  2816



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 28 = 10340.693359375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20335 phrases; correct: 3530.
accuracy:  66.83%; precision:  17.36%; recall:  59.41%; FB1:  26.87
              LOC: precision:  29.44%; recall:  73.22%; FB1:  41.99  4569
             MISC: precision:  14.42%; recall:  68.55%; FB1:  23.83  4382
              ORG: precision:   5.05%; recall:  32.36%; FB1:   8.73  8602
              PER: precision:  40.22%; recall:  60.75%; FB1:  48.40  2782



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 29 = 10340.001953125
conlleval:
processed 51578 tokens with 5942 phrases; found: 20525 phrases; correct: 3572.
accuracy:  66.36%; precision:  17.40%; recall:  60.11%; FB1:  26.99
              LOC: precision:  28.86%; recall:  73.33%; FB1:  41.42  4667
             MISC: precision:  14.35%; recall:  69.20%; FB1:  23.77  4447
              ORG: precision:   5.15%; recall:  33.11%; FB1:   8.92  8615
              PER: precision:  40.88%; recall:  62.05%; FB1:  49.29  2796



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 30 = 10339.18359375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20556 phrases; correct: 3561.
accuracy:  66.37%; precision:  17.32%; recall:  59.93%; FB1:  26.88
              LOC: precision:  28.79%; recall:  73.27%; FB1:  41.33  4676
             MISC: precision:  14.86%; recall:  68.98%; FB1:  24.45  4280
              ORG: precision:   5.02%; recall:  32.81%; FB1:   8.71  8763
              PER: precision:  40.15%; recall:  61.83%; FB1:  48.69  2837

----------------------------
-START-/START/START Relations/I-ORG/O between/O/O the/O/O two/O/O countries/O/O are/O/O currently/O/O under/O/O strain/O/O because/O/O of/O/O the/O/O testimony/O/O in/O/O a/O/O Berlin/I-LOC/I-LOC court/O/O of/O/O former/O/O Iranian/I-MISC/I-MISC president/O/O Abolhassan/I-PER/I-PER Banisadr/I-PER/I-PER ./B-ORG/O -END-/END/END
Predicted:	 ['START', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'O', 'O', 'I-MISC', 'O', 'I-PER', 'I-PE

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 31 = 10338.4697265625
conlleval:
processed 51578 tokens with 5942 phrases; found: 20337 phrases; correct: 3528.
accuracy:  66.90%; precision:  17.35%; recall:  59.37%; FB1:  26.85
              LOC: precision:  29.40%; recall:  72.78%; FB1:  41.88  4548
             MISC: precision:  14.58%; recall:  68.11%; FB1:  24.02  4307
              ORG: precision:   4.97%; recall:  31.92%; FB1:   8.59  8620
              PER: precision:  39.66%; recall:  61.62%; FB1:  48.26  2862



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 32 = 10338.0146484375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20637 phrases; correct: 3523.
accuracy:  66.07%; precision:  17.07%; recall:  59.29%; FB1:  26.51
              LOC: precision:  29.32%; recall:  73.27%; FB1:  41.88  4591
             MISC: precision:  14.41%; recall:  69.09%; FB1:  23.85  4420
              ORG: precision:   4.74%; recall:  31.02%; FB1:   8.23  8772
              PER: precision:  39.38%; recall:  61.02%; FB1:  47.87  2854



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 33 = 10337.2119140625
conlleval:
processed 51578 tokens with 5942 phrases; found: 20330 phrases; correct: 3529.
accuracy:  66.84%; precision:  17.36%; recall:  59.39%; FB1:  26.87
              LOC: precision:  29.57%; recall:  72.62%; FB1:  42.02  4512
             MISC: precision:  14.47%; recall:  69.09%; FB1:  23.93  4402
              ORG: precision:   4.91%; recall:  31.62%; FB1:   8.50  8636
              PER: precision:  40.79%; recall:  61.56%; FB1:  49.07  2780



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 34 = 10337.0751953125
conlleval:
processed 51578 tokens with 5942 phrases; found: 20483 phrases; correct: 3545.
accuracy:  66.56%; precision:  17.31%; recall:  59.66%; FB1:  26.83
              LOC: precision:  29.63%; recall:  73.76%; FB1:  42.28  4573
             MISC: precision:  14.44%; recall:  69.09%; FB1:  23.89  4411
              ORG: precision:   4.97%; recall:  32.44%; FB1:   8.62  8754
              PER: precision:  40.73%; recall:  60.69%; FB1:  48.75  2745



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 35 = 10336.37890625
conlleval:
processed 51578 tokens with 5942 phrases; found: 20436 phrases; correct: 3524.
accuracy:  66.56%; precision:  17.24%; recall:  59.31%; FB1:  26.72
              LOC: precision:  29.43%; recall:  72.78%; FB1:  41.91  4543
             MISC: precision:  14.72%; recall:  69.31%; FB1:  24.29  4340
              ORG: precision:   4.65%; recall:  30.57%; FB1:   8.08  8813
              PER: precision:  41.53%; recall:  61.78%; FB1:  49.67  2740



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 36 = 10335.7294921875
conlleval:
processed 51578 tokens with 5942 phrases; found: 20569 phrases; correct: 3528.
accuracy:  66.26%; precision:  17.15%; recall:  59.37%; FB1:  26.62
              LOC: precision:  29.71%; recall:  72.51%; FB1:  42.15  4484
             MISC: precision:  14.71%; recall:  69.20%; FB1:  24.26  4338
              ORG: precision:   4.63%; recall:  30.72%; FB1:   8.05  8895
              PER: precision:  40.18%; recall:  62.21%; FB1:  48.83  2852



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 37 = 10335.322265625
conlleval:
processed 51578 tokens with 5942 phrases; found: 20787 phrases; correct: 3513.
accuracy:  65.77%; precision:  16.90%; recall:  59.12%; FB1:  26.29
              LOC: precision:  29.49%; recall:  72.89%; FB1:  41.99  4541
             MISC: precision:  14.37%; recall:  68.11%; FB1:  23.73  4370
              ORG: precision:   4.61%; recall:  31.10%; FB1:   8.03  9048
              PER: precision:  39.92%; recall:  61.29%; FB1:  48.35  2828



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 38 = 10334.7763671875
conlleval:
processed 51578 tokens with 5942 phrases; found: 20778 phrases; correct: 3499.
accuracy:  65.88%; precision:  16.84%; recall:  58.89%; FB1:  26.19
              LOC: precision:  29.07%; recall:  73.05%; FB1:  41.59  4617
             MISC: precision:  14.42%; recall:  68.66%; FB1:  23.84  4389
              ORG: precision:   4.68%; recall:  31.32%; FB1:   8.15  8969
              PER: precision:  39.39%; recall:  59.93%; FB1:  47.53  2803



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 39 = 10334.4580078125
conlleval:
processed 51578 tokens with 5942 phrases; found: 20652 phrases; correct: 3513.
accuracy:  66.03%; precision:  17.01%; recall:  59.12%; FB1:  26.42
              LOC: precision:  29.67%; recall:  73.11%; FB1:  42.21  4527
             MISC: precision:  14.29%; recall:  68.87%; FB1:  23.67  4444
              ORG: precision:   4.78%; recall:  31.54%; FB1:   8.30  8851
              PER: precision:  39.29%; recall:  60.37%; FB1:  47.60  2830



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 40 = 10334.0224609375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20608 phrases; correct: 3521.
accuracy:  66.24%; precision:  17.09%; recall:  59.26%; FB1:  26.52
              LOC: precision:  29.70%; recall:  73.38%; FB1:  42.29  4538
             MISC: precision:  14.35%; recall:  68.76%; FB1:  23.74  4419
              ORG: precision:   4.75%; recall:  31.39%; FB1:   8.25  8870
              PER: precision:  40.20%; recall:  60.69%; FB1:  48.37  2781

----------------------------
-START-/START/START "/O/O The/I-LOC/O people/I-MISC/O have/B-LOC/O asked/I-PER/O us/I-PER/O to/O/O establish/B-LOC/O order/O/O and/O/O that/I-PER/O 's/I-ORG/O our/I-ORG/O main/I-ORG/O aim/B-MISC/O ./B-ORG/O "/B-ORG/O -END-/END/END
Predicted:	 ['START', 'O', 'I-LOC', 'I-MISC', 'B-LOC', 'I-PER', 'I-PER', 'O', 'B-LOC', 'O', 'O', 'I-PER', 'I-ORG', 'I-ORG', 'I-ORG', 'B-MISC', 'B-ORG', 'B-ORG', 'END']
Gold:		 ['START', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 41 = 10333.7529296875
conlleval:
processed 51578 tokens with 5942 phrases; found: 20827 phrases; correct: 3512.
accuracy:  65.80%; precision:  16.86%; recall:  59.10%; FB1:  26.24
              LOC: precision:  29.75%; recall:  73.16%; FB1:  42.30  4518
             MISC: precision:  14.27%; recall:  68.55%; FB1:  23.63  4428
              ORG: precision:   4.56%; recall:  30.95%; FB1:   7.94  9108
              PER: precision:  40.43%; recall:  60.86%; FB1:  48.58  2773



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 42 = 10333.63671875
conlleval:
processed 51578 tokens with 5942 phrases; found: 20828 phrases; correct: 3530.
accuracy:  65.71%; precision:  16.95%; recall:  59.41%; FB1:  26.37
              LOC: precision:  29.77%; recall:  73.22%; FB1:  42.33  4518
             MISC: precision:  14.11%; recall:  69.20%; FB1:  23.44  4521
              ORG: precision:   4.70%; recall:  31.92%; FB1:   8.20  9104
              PER: precision:  41.68%; recall:  60.75%; FB1:  49.44  2685



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 43 = 10333.3955078125
conlleval:
processed 51578 tokens with 5942 phrases; found: 20656 phrases; correct: 3515.
accuracy:  66.16%; precision:  17.02%; recall:  59.16%; FB1:  26.43
              LOC: precision:  29.22%; recall:  72.78%; FB1:  41.70  4575
             MISC: precision:  14.45%; recall:  68.66%; FB1:  23.87  4381
              ORG: precision:   4.78%; recall:  31.54%; FB1:   8.31  8844
              PER: precision:  39.29%; recall:  60.91%; FB1:  47.77  2856



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 44 = 10333.34765625
conlleval:
processed 51578 tokens with 5942 phrases; found: 20826 phrases; correct: 3494.
accuracy:  65.73%; precision:  16.78%; recall:  58.80%; FB1:  26.11
              LOC: precision:  29.41%; recall:  73.22%; FB1:  41.97  4573
             MISC: precision:  14.39%; recall:  68.55%; FB1:  23.79  4391
              ORG: precision:   4.58%; recall:  30.87%; FB1:   7.98  9040
              PER: precision:  39.09%; recall:  59.88%; FB1:  47.30  2822



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 45 = 10332.6318359375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20811 phrases; correct: 3503.
accuracy:  65.78%; precision:  16.83%; recall:  58.95%; FB1:  26.19
              LOC: precision:  30.05%; recall:  73.05%; FB1:  42.58  4466
             MISC: precision:  14.15%; recall:  68.33%; FB1:  23.45  4452
              ORG: precision:   4.61%; recall:  31.17%; FB1:   8.03  9071
              PER: precision:  39.44%; recall:  60.42%; FB1:  47.73  2822



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 46 = 10332.25390625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21213 phrases; correct: 3520.
accuracy:  64.95%; precision:  16.59%; recall:  59.24%; FB1:  25.93
              LOC: precision:  29.43%; recall:  73.65%; FB1:  42.05  4598
             MISC: precision:  14.50%; recall:  68.55%; FB1:  23.94  4358
              ORG: precision:   4.36%; recall:  30.65%; FB1:   7.63  9429
              PER: precision:  39.75%; recall:  61.02%; FB1:  48.14  2828



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 47 = 10332.23046875
conlleval:
processed 51578 tokens with 5942 phrases; found: 20912 phrases; correct: 3486.
accuracy:  65.62%; precision:  16.67%; recall:  58.67%; FB1:  25.96
              LOC: precision:  29.69%; recall:  72.73%; FB1:  42.17  4500
             MISC: precision:  14.19%; recall:  68.33%; FB1:  23.50  4439
              ORG: precision:   4.61%; recall:  31.25%; FB1:   8.03  9093
              PER: precision:  38.23%; recall:  59.77%; FB1:  46.63  2880



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 48 = 10331.9609375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20682 phrases; correct: 3471.
accuracy:  65.98%; precision:  16.78%; recall:  58.41%; FB1:  26.07
              LOC: precision:  29.63%; recall:  72.84%; FB1:  42.12  4516
             MISC: precision:  14.39%; recall:  68.11%; FB1:  23.77  4363
              ORG: precision:   4.53%; recall:  30.57%; FB1:   7.89  9058
              PER: precision:  39.89%; recall:  59.45%; FB1:  47.74  2745



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 49 = 10331.697265625
conlleval:
processed 51578 tokens with 5942 phrases; found: 20940 phrases; correct: 3521.
accuracy:  65.64%; precision:  16.81%; recall:  59.26%; FB1:  26.20
              LOC: precision:  29.54%; recall:  73.54%; FB1:  42.15  4574
             MISC: precision:  14.29%; recall:  68.87%; FB1:  23.66  4445
              ORG: precision:   4.70%; recall:  31.99%; FB1:   8.19  9137
              PER: precision:  39.73%; recall:  60.04%; FB1:  47.82  2784



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 50 = 10331.2021484375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20964 phrases; correct: 3509.
accuracy:  65.50%; precision:  16.74%; recall:  59.05%; FB1:  26.08
              LOC: precision:  29.61%; recall:  72.89%; FB1:  42.11  4522
             MISC: precision:  14.32%; recall:  68.22%; FB1:  23.67  4393
              ORG: precision:   4.61%; recall:  31.69%; FB1:   8.06  9210
              PER: precision:  39.31%; recall:  60.59%; FB1:  47.68  2839

----------------------------
-START-/START/START EASTERN/I-ORG/I-MISC DIVISION/O/I-MISC -END-/END/END
Predicted:	 ['START', 'I-ORG', 'O', 'END']
Gold:		 ['START', 'I-MISC', 'I-MISC', 'END']
----------------------------
-START-/START/START Department/I-ORG/O officials/B-MISC/O said/B-ORG/O July/I-PER/O 's/B-ORG/O slight/B-ORG/O gain/B-ORG/O in/O/O incomes/O/O was/O/O the/O/O weakest/B-ORG/O for/B-MISC/O any/B-ORG/O month/O/O since/O/O January/I-MISC/O ,/O/O when/O/O they/O/O were/O/O flat/B-ORG/O ./B-ORG/

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 51 = 10331.2724609375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21005 phrases; correct: 3516.
accuracy:  65.48%; precision:  16.74%; recall:  59.17%; FB1:  26.10
              LOC: precision:  29.56%; recall:  73.33%; FB1:  42.13  4557
             MISC: precision:  14.08%; recall:  68.44%; FB1:  23.35  4482
              ORG: precision:   4.64%; recall:  31.47%; FB1:   8.08  9104
              PER: precision:  38.99%; recall:  60.59%; FB1:  47.45  2862



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 52 = 10330.990234375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20928 phrases; correct: 3497.
accuracy:  65.53%; precision:  16.71%; recall:  58.85%; FB1:  26.03
              LOC: precision:  29.40%; recall:  72.51%; FB1:  41.83  4531
             MISC: precision:  14.31%; recall:  68.33%; FB1:  23.67  4402
              ORG: precision:   4.42%; recall:  30.20%; FB1:   7.71  9162
              PER: precision:  39.89%; recall:  61.35%; FB1:  48.34  2833



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 53 = 10330.5380859375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20968 phrases; correct: 3517.
accuracy:  65.49%; precision:  16.77%; recall:  59.19%; FB1:  26.14
              LOC: precision:  29.06%; recall:  73.11%; FB1:  41.59  4621
             MISC: precision:  14.46%; recall:  69.31%; FB1:  23.93  4419
              ORG: precision:   4.57%; recall:  31.32%; FB1:   7.98  9183
              PER: precision:  40.62%; recall:  60.53%; FB1:  48.62  2745



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 54 = 10330.5634765625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21109 phrases; correct: 3488.
accuracy:  65.17%; precision:  16.52%; recall:  58.70%; FB1:  25.79
              LOC: precision:  29.31%; recall:  73.05%; FB1:  41.84  4578
             MISC: precision:  14.24%; recall:  68.00%; FB1:  23.54  4404
              ORG: precision:   4.34%; recall:  30.20%; FB1:   7.59  9329
              PER: precision:  39.81%; recall:  60.48%; FB1:  48.02  2798



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 55 = 10330.5830078125
conlleval:
processed 51578 tokens with 5942 phrases; found: 20841 phrases; correct: 3485.
accuracy:  65.83%; precision:  16.72%; recall:  58.65%; FB1:  26.02
              LOC: precision:  29.76%; recall:  72.89%; FB1:  42.27  4499
             MISC: precision:  14.29%; recall:  68.11%; FB1:  23.63  4394
              ORG: precision:   4.48%; recall:  30.72%; FB1:   7.81  9203
              PER: precision:  40.29%; recall:  60.04%; FB1:  48.22  2745



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 56 = 10330.27734375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21063 phrases; correct: 3501.
accuracy:  65.26%; precision:  16.62%; recall:  58.92%; FB1:  25.93
              LOC: precision:  29.81%; recall:  73.00%; FB1:  42.34  4498
             MISC: precision:  14.53%; recall:  68.66%; FB1:  23.98  4357
              ORG: precision:   4.43%; recall:  31.02%; FB1:   7.75  9400
              PER: precision:  39.57%; recall:  60.31%; FB1:  47.78  2808



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 57 = 10330.1396484375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20997 phrases; correct: 3484.
accuracy:  65.50%; precision:  16.59%; recall:  58.63%; FB1:  25.87
              LOC: precision:  30.40%; recall:  72.89%; FB1:  42.91  4404
             MISC: precision:  14.21%; recall:  68.55%; FB1:  23.54  4448
              ORG: precision:   4.47%; recall:  31.17%; FB1:   7.81  9361
              PER: precision:  39.33%; recall:  59.45%; FB1:  47.34  2784



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 58 = 10329.76171875
conlleval:
processed 51578 tokens with 5942 phrases; found: 20945 phrases; correct: 3471.
accuracy:  65.60%; precision:  16.57%; recall:  58.41%; FB1:  25.82
              LOC: precision:  29.87%; recall:  72.46%; FB1:  42.30  4456
             MISC: precision:  14.19%; recall:  68.33%; FB1:  23.49  4441
              ORG: precision:   4.52%; recall:  31.32%; FB1:   7.91  9285
              PER: precision:  39.45%; recall:  59.17%; FB1:  47.34  2763



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 59 = 10329.7021484375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21313 phrases; correct: 3501.
accuracy:  64.85%; precision:  16.43%; recall:  58.92%; FB1:  25.69
              LOC: precision:  30.30%; recall:  72.78%; FB1:  42.78  4413
             MISC: precision:  14.06%; recall:  68.22%; FB1:  23.32  4473
              ORG: precision:   4.43%; recall:  31.77%; FB1:   7.78  9616
              PER: precision:  39.45%; recall:  60.21%; FB1:  47.67  2811



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 60 = 10329.67578125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21062 phrases; correct: 3502.
accuracy:  65.37%; precision:  16.63%; recall:  58.94%; FB1:  25.94
              LOC: precision:  30.27%; recall:  73.11%; FB1:  42.81  4437
             MISC: precision:  14.17%; recall:  68.33%; FB1:  23.48  4445
              ORG: precision:   4.47%; recall:  31.47%; FB1:   7.83  9434
              PER: precision:  40.31%; recall:  60.10%; FB1:  48.26  2746

----------------------------
-START-/START/START -DOCSTART-/O/O -END-/END/END
Predicted:	 ['START', 'O', 'END']
Gold:		 ['START', 'O', 'END']
----------------------------
-START-/START/START Australia/I-LOC/I-LOC will/O/O defend/O/O the/O/O Ashes/I-PER/I-MISC in/B-ORG/O -END-/END/END
Predicted:	 ['START', 'I-LOC', 'O', 'O', 'O', 'I-PER', 'B-ORG', 'END']
Gold:		 ['START', 'I-LOC', 'O', 'O', 'O', 'I-MISC', 'O', 'END']
----------------------------
-START-/START/START Eric/I-PER/I-PER Anthony/I-PER/I-PER hit/O

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 61 = 10329.38671875
conlleval:
processed 51578 tokens with 5942 phrases; found: 21201 phrases; correct: 3470.
accuracy:  65.15%; precision:  16.37%; recall:  58.40%; FB1:  25.57
              LOC: precision:  30.09%; recall:  73.16%; FB1:  42.65  4466
             MISC: precision:  14.08%; recall:  68.11%; FB1:  23.34  4460
              ORG: precision:   4.34%; recall:  30.87%; FB1:   7.61  9533
              PER: precision:  39.53%; recall:  58.85%; FB1:  47.29  2742



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 62 = 10329.37890625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21184 phrases; correct: 3510.
accuracy:  65.15%; precision:  16.57%; recall:  59.07%; FB1:  25.88
              LOC: precision:  30.05%; recall:  73.05%; FB1:  42.58  4466
             MISC: precision:  14.25%; recall:  68.55%; FB1:  23.60  4434
              ORG: precision:   4.48%; recall:  31.69%; FB1:   7.86  9477
              PER: precision:  39.58%; recall:  60.31%; FB1:  47.80  2807



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 63 = 10329.111328125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21267 phrases; correct: 3471.
accuracy:  64.92%; precision:  16.32%; recall:  58.41%; FB1:  25.51
              LOC: precision:  30.03%; recall:  72.84%; FB1:  42.52  4456
             MISC: precision:  14.13%; recall:  68.55%; FB1:  23.43  4472
              ORG: precision:   4.31%; recall:  30.80%; FB1:   7.56  9581
              PER: precision:  39.45%; recall:  59.07%; FB1:  47.30  2758



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 64 = 10328.8203125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21133 phrases; correct: 3483.
accuracy:  65.27%; precision:  16.48%; recall:  58.62%; FB1:  25.73
              LOC: precision:  30.26%; recall:  73.00%; FB1:  42.78  4432
             MISC: precision:  14.36%; recall:  68.22%; FB1:  23.73  4380
              ORG: precision:   4.42%; recall:  31.39%; FB1:   7.75  9522
              PER: precision:  39.01%; recall:  59.28%; FB1:  47.06  2799



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 65 = 10328.705078125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21259 phrases; correct: 3490.
accuracy:  65.00%; precision:  16.42%; recall:  58.73%; FB1:  25.66
              LOC: precision:  29.84%; recall:  72.95%; FB1:  42.35  4491
             MISC: precision:  14.16%; recall:  68.00%; FB1:  23.43  4429
              ORG: precision:   4.37%; recall:  31.10%; FB1:   7.66  9541
              PER: precision:  39.53%; recall:  60.04%; FB1:  47.67  2798



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 66 = 10328.9912109375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21058 phrases; correct: 3477.
accuracy:  65.37%; precision:  16.51%; recall:  58.52%; FB1:  25.76
              LOC: precision:  30.29%; recall:  72.73%; FB1:  42.77  4410
             MISC: precision:  14.62%; recall:  68.55%; FB1:  24.10  4323
              ORG: precision:   4.29%; recall:  30.57%; FB1:   7.53  9550
              PER: precision:  39.60%; recall:  59.66%; FB1:  47.61  2775



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 67 = 10328.951171875
conlleval:
processed 51578 tokens with 5942 phrases; found: 21350 phrases; correct: 3475.
accuracy:  64.77%; precision:  16.28%; recall:  58.48%; FB1:  25.47
              LOC: precision:  30.14%; recall:  72.89%; FB1:  42.64  4443
             MISC: precision:  14.17%; recall:  68.44%; FB1:  23.48  4452
              ORG: precision:   4.30%; recall:  30.95%; FB1:   7.56  9645
              PER: precision:  38.79%; recall:  59.17%; FB1:  46.86  2810



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 68 = 10328.9345703125
conlleval:
processed 51578 tokens with 5942 phrases; found: 20987 phrases; correct: 3467.
accuracy:  65.46%; precision:  16.52%; recall:  58.35%; FB1:  25.75
              LOC: precision:  30.65%; recall:  73.27%; FB1:  43.22  4392
             MISC: precision:  14.41%; recall:  67.90%; FB1:  23.78  4343
              ORG: precision:   4.33%; recall:  30.80%; FB1:   7.60  9533
              PER: precision:  39.79%; recall:  58.74%; FB1:  47.45  2719



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 69 = 10328.478515625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21281 phrases; correct: 3473.
accuracy:  64.92%; precision:  16.32%; recall:  58.45%; FB1:  25.52
              LOC: precision:  30.30%; recall:  72.95%; FB1:  42.81  4423
             MISC: precision:  14.19%; recall:  68.00%; FB1:  23.48  4418
              ORG: precision:   4.38%; recall:  31.39%; FB1:   7.69  9608
              PER: precision:  38.31%; recall:  58.90%; FB1:  46.43  2832



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 70 = 10328.4130859375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21262 phrases; correct: 3482.
accuracy:  64.92%; precision:  16.38%; recall:  58.60%; FB1:  25.60
              LOC: precision:  30.05%; recall:  72.62%; FB1:  42.51  4439
             MISC: precision:  14.23%; recall:  68.11%; FB1:  23.54  4414
              ORG: precision:   4.41%; recall:  31.47%; FB1:   7.73  9576
              PER: precision:  38.76%; recall:  59.61%; FB1:  46.97  2833

----------------------------
-START-/START/START -DOCSTART-/O/O -END-/END/END
Predicted:	 ['START', 'O', 'END']
Gold:		 ['START', 'O', 'END']
----------------------------
-START-/START/START Mladost/I-LOC/I-ORG (/I-ORG/I-ORG L/I-PER/I-ORG )/I-ORG/I-ORG 4/I-MISC/O 2/I-MISC/O 1/I-MISC/O 1/O/O 8/O/O 5/O/O 7/O/O -END-/END/END
Predicted:	 ['START', 'I-LOC', 'I-ORG', 'I-PER', 'I-ORG', 'I-MISC', 'I-MISC', 'I-MISC', 'O', 'O', 'O', 'O', 'END']
Gold:		 ['START', 'I-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', '

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 71 = 10328.1923828125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21320 phrases; correct: 3493.
accuracy:  64.86%; precision:  16.38%; recall:  58.78%; FB1:  25.63
              LOC: precision:  30.24%; recall:  73.22%; FB1:  42.80  4448
             MISC: precision:  14.23%; recall:  68.11%; FB1:  23.54  4414
              ORG: precision:   4.37%; recall:  31.39%; FB1:   7.66  9644
              PER: precision:  39.05%; recall:  59.66%; FB1:  47.21  2814



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 72 = 10328.19921875
conlleval:
processed 51578 tokens with 5942 phrases; found: 21340 phrases; correct: 3474.
accuracy:  64.86%; precision:  16.28%; recall:  58.47%; FB1:  25.47
              LOC: precision:  30.04%; recall:  72.73%; FB1:  42.51  4448
             MISC: precision:  14.09%; recall:  68.11%; FB1:  23.35  4456
              ORG: precision:   4.29%; recall:  30.87%; FB1:   7.54  9646
              PER: precision:  39.28%; recall:  59.50%; FB1:  47.32  2790



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 73 = 10328.0234375
conlleval:
processed 51578 tokens with 5942 phrases; found: 20998 phrases; correct: 3466.
accuracy:  65.52%; precision:  16.51%; recall:  58.33%; FB1:  25.73
              LOC: precision:  30.63%; recall:  72.67%; FB1:  43.10  4358
             MISC: precision:  14.44%; recall:  68.11%; FB1:  23.83  4349
              ORG: precision:   4.40%; recall:  31.17%; FB1:   7.71  9503
              PER: precision:  38.92%; recall:  58.90%; FB1:  46.87  2788



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 74 = 10327.72265625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21310 phrases; correct: 3475.
accuracy:  64.91%; precision:  16.31%; recall:  58.48%; FB1:  25.50
              LOC: precision:  30.28%; recall:  73.11%; FB1:  42.83  4435
             MISC: precision:  14.14%; recall:  68.22%; FB1:  23.42  4449
              ORG: precision:   4.33%; recall:  31.10%; FB1:   7.60  9639
              PER: precision:  38.97%; recall:  58.96%; FB1:  46.92  2787



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 75 = 10327.662109375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21234 phrases; correct: 3465.
accuracy:  64.96%; precision:  16.32%; recall:  58.31%; FB1:  25.50
              LOC: precision:  30.25%; recall:  72.95%; FB1:  42.76  4430
             MISC: precision:  14.14%; recall:  67.79%; FB1:  23.40  4420
              ORG: precision:   4.43%; recall:  31.69%; FB1:   7.77  9603
              PER: precision:  38.66%; recall:  58.36%; FB1:  46.51  2781



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 76 = 10327.720703125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21294 phrases; correct: 3465.
accuracy:  64.93%; precision:  16.27%; recall:  58.31%; FB1:  25.44
              LOC: precision:  30.22%; recall:  72.73%; FB1:  42.70  4421
             MISC: precision:  14.20%; recall:  68.00%; FB1:  23.50  4415
              ORG: precision:   4.34%; recall:  31.32%; FB1:   7.63  9672
              PER: precision:  38.84%; recall:  58.74%; FB1:  46.76  2786



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 77 = 10327.5986328125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21167 phrases; correct: 3484.
accuracy:  65.18%; precision:  16.46%; recall:  58.63%; FB1:  25.70
              LOC: precision:  30.40%; recall:  73.22%; FB1:  42.96  4425
             MISC: precision:  14.26%; recall:  68.00%; FB1:  23.57  4398
              ORG: precision:   4.38%; recall:  31.25%; FB1:   7.69  9562
              PER: precision:  39.29%; recall:  59.34%; FB1:  47.28  2782



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 78 = 10327.7900390625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21270 phrases; correct: 3462.
accuracy:  64.95%; precision:  16.28%; recall:  58.26%; FB1:  25.44
              LOC: precision:  30.35%; recall:  72.95%; FB1:  42.87  4415
             MISC: precision:  14.36%; recall:  68.44%; FB1:  23.74  4393
              ORG: precision:   4.27%; recall:  30.87%; FB1:   7.51  9687
              PER: precision:  38.81%; recall:  58.47%; FB1:  46.65  2775



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 79 = 10327.697265625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21288 phrases; correct: 3457.
accuracy:  64.84%; precision:  16.24%; recall:  58.18%; FB1:  25.39
              LOC: precision:  29.79%; recall:  72.95%; FB1:  42.30  4498
             MISC: precision:  13.99%; recall:  68.00%; FB1:  23.21  4481
              ORG: precision:   4.24%; recall:  30.35%; FB1:   7.44  9593
              PER: precision:  39.87%; recall:  58.79%; FB1:  47.52  2716



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 80 = 10327.697265625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21387 phrases; correct: 3474.
accuracy:  64.68%; precision:  16.24%; recall:  58.47%; FB1:  25.42
              LOC: precision:  30.49%; recall:  73.49%; FB1:  43.10  4427
             MISC: precision:  14.27%; recall:  68.33%; FB1:  23.61  4414
              ORG: precision:   4.30%; recall:  31.47%; FB1:   7.57  9806
              PER: precision:  39.12%; recall:  58.20%; FB1:  46.79  2740

----------------------------
-START-/START/START Hull/I-ORG/I-ORG 4/I-MISC/O 2/I-MISC/O 2/O/O 0/O/O 4/O/O 2/O/O 8/O/O -END-/END/END
Predicted:	 ['START', 'I-ORG', 'I-MISC', 'I-MISC', 'O', 'O', 'O', 'O', 'O', 'END']
Gold:		 ['START', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'END']
----------------------------
-START-/START/START SOCCER/O/O -/O/O UKRAINE/I-PER/I-LOC BEAT/O/O NORTHERN/I-ORG/I-LOC IRELAND/I-LOC/I-LOC IN/O/O WORLD/I-MISC/I-MISC CUP/I-MISC/I-MISC QUALIFIER/B-ORG/O ./O/O -END-/END/END
Predict

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 81 = 10327.5888671875
conlleval:
processed 51578 tokens with 5942 phrases; found: 21327 phrases; correct: 3463.
accuracy:  64.83%; precision:  16.24%; recall:  58.28%; FB1:  25.40
              LOC: precision:  30.23%; recall:  72.95%; FB1:  42.75  4432
             MISC: precision:  14.23%; recall:  68.00%; FB1:  23.54  4406
              ORG: precision:   4.28%; recall:  31.02%; FB1:   7.53  9711
              PER: precision:  38.88%; recall:  58.63%; FB1:  46.75  2778



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 82 = 10327.39453125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21331 phrases; correct: 3464.
accuracy:  64.79%; precision:  16.24%; recall:  58.30%; FB1:  25.40
              LOC: precision:  30.60%; recall:  73.27%; FB1:  43.18  4398
             MISC: precision:  14.08%; recall:  68.55%; FB1:  23.36  4488
              ORG: precision:   4.24%; recall:  30.80%; FB1:   7.46  9730
              PER: precision:  39.52%; recall:  58.25%; FB1:  47.09  2715



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 83 = 10327.501953125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21487 phrases; correct: 3461.
accuracy:  64.47%; precision:  16.11%; recall:  58.25%; FB1:  25.24
              LOC: precision:  30.24%; recall:  72.84%; FB1:  42.74  4424
             MISC: precision:  14.12%; recall:  67.90%; FB1:  23.38  4432
              ORG: precision:   4.23%; recall:  31.02%; FB1:   7.44  9845
              PER: precision:  38.80%; recall:  58.69%; FB1:  46.72  2786



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 84 = 10327.314453125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21410 phrases; correct: 3468.
accuracy:  64.74%; precision:  16.20%; recall:  58.36%; FB1:  25.36
              LOC: precision:  30.16%; recall:  72.73%; FB1:  42.64  4429
             MISC: precision:  14.10%; recall:  67.90%; FB1:  23.35  4440
              ORG: precision:   4.25%; recall:  30.87%; FB1:   7.47  9743
              PER: precision:  39.03%; recall:  59.28%; FB1:  47.07  2798



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 85 = 10327.2333984375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21208 phrases; correct: 3439.
accuracy:  65.01%; precision:  16.22%; recall:  57.88%; FB1:  25.33
              LOC: precision:  30.46%; recall:  72.51%; FB1:  42.90  4373
             MISC: precision:  14.22%; recall:  67.35%; FB1:  23.49  4366
              ORG: precision:   4.21%; recall:  30.35%; FB1:   7.40  9664
              PER: precision:  38.47%; recall:  58.58%; FB1:  46.44  2805



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 86 = 10327.0849609375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21350 phrases; correct: 3454.
accuracy:  64.78%; precision:  16.18%; recall:  58.13%; FB1:  25.31
              LOC: precision:  30.36%; recall:  72.89%; FB1:  42.87  4410
             MISC: precision:  14.20%; recall:  68.11%; FB1:  23.51  4421
              ORG: precision:   4.27%; recall:  31.10%; FB1:   7.51  9761
              PER: precision:  38.80%; recall:  58.09%; FB1:  46.52  2758



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 87 = 10327.1337890625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21476 phrases; correct: 3473.
accuracy:  64.56%; precision:  16.17%; recall:  58.45%; FB1:  25.33
              LOC: precision:  29.93%; recall:  73.00%; FB1:  42.46  4480
             MISC: precision:  14.32%; recall:  67.90%; FB1:  23.65  4371
              ORG: precision:   4.24%; recall:  30.95%; FB1:   7.45  9795
              PER: precision:  38.55%; recall:  59.23%; FB1:  46.70  2830



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 88 = 10326.8837890625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21596 phrases; correct: 3453.
accuracy:  64.27%; precision:  15.99%; recall:  58.11%; FB1:  25.08
              LOC: precision:  30.13%; recall:  73.05%; FB1:  42.66  4454
             MISC: precision:  14.11%; recall:  68.00%; FB1:  23.37  4444
              ORG: precision:   4.18%; recall:  31.02%; FB1:   7.37  9945
              PER: precision:  38.79%; recall:  57.98%; FB1:  46.49  2753



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 89 = 10326.8427734375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21661 phrases; correct: 3459.
accuracy:  64.16%; precision:  15.97%; recall:  58.21%; FB1:  25.06
              LOC: precision:  30.21%; recall:  73.05%; FB1:  42.75  4442
             MISC: precision:  14.06%; recall:  68.11%; FB1:  23.31  4466
              ORG: precision:   4.14%; recall:  30.80%; FB1:   7.30  9973
              PER: precision:  38.71%; recall:  58.41%; FB1:  46.56  2780



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 90 = 10326.7841796875
conlleval:
processed 51578 tokens with 5942 phrases; found: 21395 phrases; correct: 3468.
accuracy:  64.65%; precision:  16.21%; recall:  58.36%; FB1:  25.37
              LOC: precision:  30.04%; recall:  72.95%; FB1:  42.55  4461
             MISC: precision:  14.00%; recall:  67.90%; FB1:  23.21  4472
              ORG: precision:   4.28%; recall:  30.87%; FB1:   7.52  9673
              PER: precision:  39.01%; recall:  59.07%; FB1:  46.99  2789

----------------------------
-START-/START/START Proleter/I-PER/I-ORG (/B-LOC/I-ORG Z/B-MISC/I-ORG )/B-LOC/I-ORG 4/B-ORG/O 0/O/O 1/O/O 3/O/O 2/O/O 9/O/O 1/O/O -END-/END/END
Predicted:	 ['START', 'I-PER', 'B-LOC', 'B-MISC', 'B-LOC', 'B-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'END']
Gold:		 ['START', 'I-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'END']
----------------------------
-START-/START/START Scarborough/I-ORG/I-ORG 4/B-LOC/O 1/O/O 3/O/O 0/O/O 5/O/O 3/O/O 6/O/O -END-/END/END
Pr

  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 91 = 10326.791015625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21327 phrases; correct: 3457.
accuracy:  64.87%; precision:  16.21%; recall:  58.18%; FB1:  25.35
              LOC: precision:  30.34%; recall:  73.00%; FB1:  42.86  4420
             MISC: precision:  14.25%; recall:  67.90%; FB1:  23.55  4394
              ORG: precision:   4.29%; recall:  31.17%; FB1:   7.53  9754
              PER: precision:  38.85%; recall:  58.20%; FB1:  46.60  2759



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 92 = 10326.720703125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21491 phrases; correct: 3455.
accuracy:  64.52%; precision:  16.08%; recall:  58.15%; FB1:  25.19
              LOC: precision:  30.16%; recall:  73.22%; FB1:  42.72  4460
             MISC: precision:  14.14%; recall:  68.33%; FB1:  23.44  4454
              ORG: precision:   4.23%; recall:  31.02%; FB1:   7.45  9833
              PER: precision:  38.78%; recall:  57.76%; FB1:  46.40  2744



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 93 = 10326.7744140625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21458 phrases; correct: 3455.
accuracy:  64.58%; precision:  16.10%; recall:  58.15%; FB1:  25.22
              LOC: precision:  30.15%; recall:  72.89%; FB1:  42.66  4441
             MISC: precision:  14.05%; recall:  67.68%; FB1:  23.27  4442
              ORG: precision:   4.24%; recall:  31.02%; FB1:   7.47  9801
              PER: precision:  38.79%; recall:  58.41%; FB1:  46.62  2774



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 94 = 10326.708984375
conlleval:
processed 51578 tokens with 5942 phrases; found: 21319 phrases; correct: 3455.
accuracy:  64.88%; precision:  16.21%; recall:  58.15%; FB1:  25.35
              LOC: precision:  29.99%; recall:  73.11%; FB1:  42.53  4478
             MISC: precision:  14.29%; recall:  68.33%; FB1:  23.64  4408
              ORG: precision:   4.27%; recall:  30.80%; FB1:   7.50  9669
              PER: precision:  38.68%; recall:  58.03%; FB1:  46.42  2764



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 95 = 10326.4921875
conlleval:
processed 51578 tokens with 5942 phrases; found: 21390 phrases; correct: 3444.
accuracy:  64.71%; precision:  16.10%; recall:  57.96%; FB1:  25.20
              LOC: precision:  30.48%; recall:  73.00%; FB1:  43.00  4400
             MISC: precision:  13.92%; recall:  67.35%; FB1:  23.08  4460
              ORG: precision:   4.29%; recall:  31.25%; FB1:   7.55  9757
              PER: precision:  38.33%; recall:  57.71%; FB1:  46.07  2773



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 96 = 10326.5712890625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21378 phrases; correct: 3459.
accuracy:  64.63%; precision:  16.18%; recall:  58.21%; FB1:  25.32
              LOC: precision:  30.58%; recall:  72.78%; FB1:  43.07  4372
             MISC: precision:  14.21%; recall:  67.90%; FB1:  23.50  4405
              ORG: precision:   4.28%; recall:  31.25%; FB1:   7.52  9799
              PER: precision:  38.44%; recall:  58.47%; FB1:  46.38  2802



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 97 = 10326.423828125
conlleval:
processed 51578 tokens with 5942 phrases; found: 21431 phrases; correct: 3435.
accuracy:  64.58%; precision:  16.03%; recall:  57.81%; FB1:  25.10
              LOC: precision:  30.27%; recall:  73.11%; FB1:  42.81  4437
             MISC: precision:  14.03%; recall:  67.14%; FB1:  23.21  4412
              ORG: precision:   4.20%; recall:  30.72%; FB1:   7.39  9802
              PER: precision:  38.17%; recall:  57.60%; FB1:  45.91  2780



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 98 = 10326.4140625
conlleval:
processed 51578 tokens with 5942 phrases; found: 21591 phrases; correct: 3463.
accuracy:  64.32%; precision:  16.04%; recall:  58.28%; FB1:  25.16
              LOC: precision:  30.71%; recall:  72.78%; FB1:  43.19  4354
             MISC: precision:  14.20%; recall:  68.55%; FB1:  23.53  4451
              ORG: precision:   4.20%; recall:  31.32%; FB1:   7.40  10007
              PER: precision:  38.65%; recall:  58.31%; FB1:  46.48  2779



  0%|          | 0/300 [00:00<?, ?it/s]

loss on epoch 99 = 10326.341796875
conlleval:
processed 51578 tokens with 5942 phrases; found: 21424 phrases; correct: 3448.
accuracy:  64.57%; precision:  16.09%; recall:  58.03%; FB1:  25.20
              LOC: precision:  30.55%; recall:  72.62%; FB1:  43.01  4366
             MISC: precision:  14.14%; recall:  68.00%; FB1:  23.41  4434
              ORG: precision:   4.26%; recall:  31.32%; FB1:   7.49  9867
              PER: precision:  38.70%; recall:  57.93%; FB1:  46.40  2757



In [None]:
#Evaluation on test data
lstm.write_predictions(sentences_test, 'test_pred_lstm.txt')
!wget https://raw.githubusercontent.com/aritter/twitter_nlp/master/data/annotated/wnut16/conlleval.pl
!paste test test_pred_lstm.txt | perl conlleval.pl -d "\t"

## Initialization with GloVe Embeddings (5 points)

If you haven't already, implement the `init_glove()` method in `BasicLSTMtagger` above.

Rather than initializing word embeddings randomly, it is common to use learned word embeddings (GloVe or Word2Vec), as discussed in lecture.  To make this simpler, we have already pre-filtered [GloVe](https://nlp.stanford.edu/projects/glove/) embeddings to only contain words in the vocabulary of the CoNLL NER dataset, and loaded them into a dictionary (`GloVe`) at the beginning of this notebook.



## Character Embeddings (10 points)

Now that you have your basic LSTM tagger working, the next step is to add a convolutional network that computes word embeddings from character representations of words.  See Figure 2 and Figure 3 in the [Ma and Hovy](https://www.aclweb.org/anthology/P16-1101.pdf) paper.  We have provided code in `sentences2input_tensors` to convert sentences into lists of word and character indices.  See also [nn.Conv1d](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html) and [MaxPool1d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool1d.html).

Hint: The nn.Conv1d accepts input size $(N, C_{in}, L_{in})$, but we have input size $(N, \text{SLEN}, \text{CLEN}, \text{EMB_DIM})$. We can reshape and [permute](https://pytorch.org/docs/stable/generated/torch.permute.html) our input to satisfy the nn.Conv1d, and recover the dimensions later.

Make sure to save your predictions on the test set, for submission to GradeScope.  You should be able to achieve **90 F1 / 85 F1 on the dev/test sets**.

In [None]:
import torch.nn.functional as F


class CharLSTMtagger(BasicLSTMtagger): 
    def __init__(self, DIM_EMB=10, DIM_CHAR_EMB=30, DIM_HID=10):
        super(CharLSTMtagger, self).__init__(DIM_EMB=DIM_EMB, DIM_HID=DIM_HID)
        NUM_TAGS = max(tag2i.values())+1

        (self.DIM_EMB, self.NUM_TAGS) = (DIM_EMB, NUM_TAGS)
        #TODO: Initialize parameters.

    def forward(self, X, X_char, train=False):
        #TODO: Implement the forward computation.
        return torch.randn((X.shape[0], X.shape[1], self.NUM_TAGS))  #Random baseline.

    def sentences2input_tensors(self, sentences):
        (X, X_mask)   = prepare_input(sentences2indices(sentences, word2i))
        X_char        = prepare_input_char(sentences2indicesChar(sentences, char2i))
        return (X, X_mask, X_char)

    def inference(self, sentences):
        (X, X_mask, X_char) = self.sentences2input_tensors(sentences)
        pred = self.forward(X.cuda(), X_char.cuda()).argmax(dim=2)
        return [[i2tag[pred[i,j].item()] for j in range(len(sentences[i]))] for i in range(len(sentences))]

    def print_predictions(self, words, tags):
        Y_pred = self.inference(words)
        for i in range(len(words)):
            print("----------------------------")
            print(" ".join([f"{words[i][j]}/{Y_pred[i][j]}/{tags[i][j]}" for j in range(len(words[i]))]))
            print("Predicted:\t", Y_pred[i])
            print("Gold:\t\t", tags[i])

char_lstm_test = CharLSTMtagger(DIM_HID=7, DIM_EMB=300)
lstm_output    = char_lstm_test.forward(prepare_input(X[0:5])[0], prepare_input_char(X_char[0:5]))
Y_onehot       = prepare_output_onehot(Y[0:5])

print("lstm output shape:", lstm_output.shape)
print("Y onehot shape:", Y_onehot.shape)

In [None]:
#Training LSTM w/ character embeddings. Feel free to change number of epochs, optimizer, learning rate and batch size.

nEpochs = 10

def train_char_lstm(sentences, tags, lstm):
  #optimizer = optim.Adadelta(lstm.parameters(), lr=0.1)
  #TODO: initialize optimizer

    batchSize = 50

    for epoch in range(nEpochs):
        totalLoss = 0.0

        (sentences_shuffled, tags_shuffled) = shuffle_sentences(sentences, tags)
        for batch in tqdm.notebook.tqdm(range(0, len(sentences), batchSize), leave=False):
            lstm.zero_grad()
            #TODO: Gradient update


        print(f"loss on epoch {epoch} = {totalLoss}")
        lstm.write_predictions(sentences_dev, 'dev_pred')   #Performance on dev set
        print('conlleval:')
        print(subprocess.Popen('paste dev dev_pred | perl conlleval.pl -d "\t"', shell=True, stdout=subprocess.PIPE,stderr=subprocess.STDOUT).communicate()[0].decode('UTF-8'))

        if epoch % 10 == 0:
            s = sample(range(len(sentences_dev)), 5)
            lstm.print_predictions([sentences_dev[i] for i in s], [tags_dev[i] for i in s])

char_lstm = CharLSTMtagger(DIM_HID=500, DIM_EMB=300).cuda()
train_char_lstm(sentences_train, tags_train, char_lstm)

In [None]:
#Evaluation on test set
char_lstm.write_predictions(sentences_test, 'test_pred_cnn_lstm.txt')
!wget https://raw.githubusercontent.com/aritter/twitter_nlp/master/data/annotated/wnut16/conlleval.pl
!paste test test_pred_cnn_lstm.txt | perl conlleval.pl -d "\t"

## Conditional Random Fields (15 points)

Now we are ready to add a CRF layer to the `CharacterLSTMTagger`.  To train the model, implement `conditional_log_likelihood`, using the score (unnormalized log probability) of the gold sequence, in addition to the partition function, $Z(X)$, which is computed using the forward algorithm.  Then, you can simply use Pytorch's automatic differentiation to compute gradients by running backpropagation through the computation graph of the dynamic program (this should be very simple, so long as you are able to correctly implement the forward algorithm using a computation graph that is supported by PyTorch).  This approach to computing gradients for CRFs is discussed in Section 7.5.3 of the [Eisenstein Book](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)

You will also need to implement the Viterbi algorithm for inference during decoding.

After including CRF training and Viterbi decoding, you should be getting about **92 F1 / 88 F1 on the dev and test set**, respectively.

**IMPORTANT:** Note that training will be substantially slower this time - depending on the efficiency of your implementation, it could take about 5 minutes per epoch (e.g. 50 minutes for 10 iterations).  It is recommended to start out training on a single batch of data (and testing on this same batch), so that you can quickly debug, making sure your model can memorize the labels on a single batch, and then optimize your code.  Once you are fairly confident your code is working properly, then you can train using the full dataset.  We have provided a (commented out) line of code to switch between training on a single batch and the full dataset below.

**Hint #1:** While debugging your implementation of the Forward algorithm it is helpful to look at the loss during training.  The loss should never be less than zero (the log-likelihood should always be negative).

**Hint #2:** To sum log-probabilities in a numerically stable way at the end of the Forward algorithm, you will want to use [`torch.logsumexp`](https://pytorch.org/docs/stable/generated/torch.logsumexp.html).

In [None]:
#For F.max_pool1d()
import torch.nn.functional as F

class LSTM_CRFtagger(CharLSTMtagger):
    def __init__(self, DIM_EMB=10, DIM_CHAR_EMB=30, DIM_HID=10, N_TAGS=max(tag2i.values())+1):
        super(LSTM_CRFtagger, self).__init__(DIM_EMB=DIM_EMB, DIM_HID=DIM_HID, DIM_CHAR_EMB=DIM_CHAR_EMB)

        #TODO: Initialize parameters.

        self.transitionWeights = nn.Parameter(torch.zeros((N_TAGS, N_TAGS), requires_grad=True))
        nn.init.normal_(self.transitionWeights)

    def gold_score(self, lstm_scores, Y):
        #TODO: compute score of gold sequence Y (unnormalized conditional log-probability)
        return 0

    #Forward algorithm for a single sentence
    #Efficiency will eventually be important here.  We recommend you start by 
    #training on a single batch and make sure your code can memorize the 
    #training data.  Then you can go back and re-write the inner loop using 
    #tensor operations to speed things up.
    def forward_algorithm(self, lstm_scores, sLen):
        #TODO: implement forward algorithm.
        return 0

    def conditional_log_likelihood(self, sentences, tags, train=True):
        #Todo: compute conditional log likelihood of Y (use forward_algorithm and gold_score)
        return 0

    def viterbi(self, lstm_scores, sLen):
        #TODO: Implement Viterbi algorithm, soring backpointers to recover the argmax sequence.  Returns the argmax sequence in addition to its unnormalized conditional log-likelihood.
        return (torch.as_tensor([random.randint(0,lstm_scores.shape[1]-1) for x in range(sLen)]), 0)

    #Computes Viterbi sequences on a batch of data.
    def viterbi_batch(self, sentences):
        viterbiSeqs = []
        (X, X_mask, X_char) = self.sentences2input_tensors(sentences)
        lstm_scores = self.forward(X.cuda(), X_char.cuda())
        for s in range(len(sentences)):
            (viterbiSeq, ll) = self.viterbi(lstm_scores[s], len(sentences[s]))
            viterbiSeqs.append(viterbiSeq)
        return viterbiSeqs

    def forward(self, X, X_char, train=False):
        #TODO: Implement the forward computation.
        return torch.randn((X.shape[0], X.shape[1], self.NUM_TAGS))  #Random baseline.

    def print_predictions(self, words, tags):
        Y_pred = self.inference(words)
        for i in range(len(words)):
            print("----------------------------")
            print(" ".join([f"{words[i][j]}/{Y_pred[i][j]}/{tags[i][j]}" for j in range(len(words[i]))]))
            print("Predicted:\t", [Y_pred[i][j] for j in range(len(words[i]))])
            print("Gold:\t\t", tags[i])

    #Need to use Viterbi this time.
    def inference(self, sentences, viterbi=True):
        pred = self.viterbi_batch(sentences)
        return [[i2tag[pred[i][j].item()] for j in range(len(sentences[i]))] for i in range(len(sentences))]

lstm_crf = LSTM_CRFtagger(DIM_EMB=300).cuda()

In [None]:
print(lstm_crf.conditional_log_likelihood(sentences_dev[0:3], tags_dev[0:3]))

In [None]:
#CharLSTM-CRF Training. Feel free to change number of epochs, optimizer, learning rate and batch size.
import tqdm
import os
import subprocess
import random

nEpochs = 10

#Get CoNLL evaluation script
os.system('wget https://raw.githubusercontent.com/aritter/twitter_nlp/master/data/annotated/wnut16/conlleval.pl')

def train_crf_lstm(sentences, tags, lstm):
    #optimizer = optim.Adadelta(lstm.parameters(), lr=1.0)
    #TODO: initialize optimizer

    batchSize = 50

    for epoch in range(nEpochs):
        totalLoss = 0.0
        lstm.train()

        #Shuffle the sentences
        (sentences_shuffled, tags_shuffled) = shuffle_sentences(sentences, tags)
        for batch in tqdm.notebook.tqdm(range(0, len(sentences), batchSize), leave=False):
            lstm.zero_grad()
            #TODO: take gradient step on a batch of data.

        print(f"loss on epoch {epoch} = {totalLoss}")
        lstm.write_predictions(sentences_dev, 'dev_pred')   #Performance on dev set
        print('conlleval:')
        print(subprocess.Popen('paste dev dev_pred | perl conlleval.pl -d "\t"', shell=True, stdout=subprocess.PIPE,stderr=subprocess.STDOUT).communicate()[0].decode('UTF-8'))

        if epoch % 10 == 0:
            lstm.eval()
            s = random.sample(range(50), 5)
            lstm.print_predictions([sentences_train[i] for i in s], [tags_train[i] for i in s])   #Print predictions on train data (useful for debugging)

crf_lstm = LSTM_CRFtagger(DIM_HID=500, DIM_EMB=300, DIM_CHAR_EMB=30).cuda()
train_crf_lstm(sentences_train, tags_train, crf_lstm)             #Train on the full dataset
#train_crf_lstm(sentences_train[0:50], tags_train[0:50])          #Train only the first batch (use this during development/debugging)

In [None]:
crf_lstm.eval()
crf_lstm.write_predictions(sentences_test, 'test_pred_cnn_lstm_crf.txt')
!wget https://raw.githubusercontent.com/aritter/twitter_nlp/master/data/annotated/wnut16/conlleval.pl
!paste test test_pred_cnn_lstm_crf.txt | perl conlleval.pl -d "\t"

## Gradescope

Gradescope allows you to add multiple files to your submission. Please submit this notebook along with the test set prediction:
* test_pred_lstm.txt
* test_pred_cnn_lstm.txt
* test_pred_cnn_lstm_crf.txt
* NER_release.ipynb

To download this notebook, go to `File > Download.ipynb`. You can download the predictions from Colab by clicking the folder icon on the left and finding them under Files. 

Please make sure that you name the files as specified above. You will be able to see the test set accuracy for your predictions. However, the final score will be assigned later based on accuracy and implementation. 

When submitting the .ipynb notebook, please make sure that all the cells run when executed in order starting from a fresh session. If the code doesn't take too long to run, you can re-run everything with `Runtime -> Restart and run all`

You can submit multiple times before the deadline and choose the submission which you want to be graded by going to `Submission History` on gradescope.
