# Lab 4: Recurrent models

This lab is supposed to give you some initial practice with neural models in NLP.

**This is the complete Lab 4, in two parts.** The purpose of the first part of the lab is to get you started with using neural models. The second part of the lab contains exercises on ELMo embeddings, applying them to the task of word sense disambuiguation following the approach from the original paper by Peters et al.


## Part 1 (50 points)

In the first part of lab 4, we will play with training a recurrent model for part of speech tagging. As an easy exercise, you will observe what happens when you plug in pretrained word embeddings into an neural NLP model and will experiment with different sizes of training data.

## Exercise 1: prepare the data (5 points)

Linguistic data come in a variety of formats. You already had a chance to play with POS-annotated corpus data in Lab 1.

In the first exercise, you will access POS-annotated data in one format (NLTK) and save it on the disk in a text format. Start with the tagged sentences from the Brown corpus, which can be retrieved as below:

In [None]:
import random
import numpy as np
import nltk
nltk.download('brown')
import nltk
nltk.download('semcor')
from nltk.corpus import brown
brown.tagged_sents()
!pip install allennlp==0.9
import allennlp

Now randomize the order of all sentences in the corpus using <code>random.shuffle()</code> function and split it into 50K sentences for training, 5K for validation, and the rest for testing.

In [2]:
#Write your code here

tagged_sentence = list(brown.tagged_sents())
random.shuffle(tagged_sentence)

training_brown= tagged_sentence[:50000]
validation_brown=tagged_sentence[50000:55000]
testing_brown=tagged_sentence[55000:]




Define a function for saving your datasets to a text file in the following format:
* one sentence per line
* tokens separated by spaces
* POS tag separated from the token by "###", for example <code>said###VBD</code>.

In [3]:
def write_posdata(sentences,outfile):
    #Write your code here
    with open(outfile, 'w') as f:
      for list_ in sentences:
        for sentence in list_:
          temp = '###'.join(sentence[:])
          f.write(temp + ' ')
        f.write('\n')

Now save your data partitions in different sizes. We will start with small data samples since training on a large dataset may be very slow depending on your machine.

In [4]:
write_posdata(training_brown,"train_brown.txt")
write_posdata(testing_brown,"test_brown.txt")
write_posdata(validation_brown,"validation_brown.txt")
write_posdata(training_brown[:50],"train_brown_50.txt")
write_posdata(validation_brown[:50],"validation_brown_50.txt")
write_posdata(training_brown[:500],"train_brown_500.txt")
write_posdata(validation_brown[:500],"validation_brown_500.txt")
write_posdata(training_brown[:5000],"train_brown_5000.txt")
write_posdata(training_brown[:5000],"validation_brown_5000.txt")

Congratulations, you have now saved the POS tagged data for model training purposes!

## Exercise 2: train neural POS tagger models (35 points)

We will now play with a neural model. First of all, install <code>allennlp</code>. The LSTM model we will train follows the AllenNLP tutorial https://allennlp.org/tutorials which contains ample explanations of the underlying code. Let us start by loading the model code and data, starting with a tiny sample for demonstration purposes

In [5]:
from lstm_tutorial import *

train_dataset_tiny = reader.read("train_brown_50.txt")
validation_dataset_tiny = reader.read("validation_brown_50.txt")

50it [00:00, 18449.48it/s]
50it [00:00, 9887.56it/s]


Fist of all we need to initialize the vocabulary and define an embedding (vector) for each token. We set the embedding size at 300, common in realistic applications. By default, the embeddings are initialized randomly and updated during trining (this can be changed but we start with a standard configuration). We also need to specify the <code>HIDDEN_DIM</code> parameter: the dimensionality of the hidden vector representations in the LSTM cell.

In [6]:
vocab_tiny = Vocabulary.from_instances(train_dataset_tiny + validation_dataset_tiny)

EMBEDDING_DIM = 300
HIDDEN_DIM = 20

token_embedding_tiny = Embedding(num_embeddings=vocab_tiny.get_vocab_size('tokens'),
                            embedding_dim=EMBEDDING_DIM)

100%|██████████| 100/100 [00:00<00:00, 18257.54it/s]


Download the smallest pretrained word vector model from https://nlp.stanford.edu/projects/glove/, unzip it, and extract the relevant file <code>'glove.6B.300d.txt'</code> in your working directory.

In [None]:
glove_token_embedding_tiny = Embedding.from_params(vocab=vocab_tiny,
                            params=Params({'pretrained_file':'glove.6B.300d.txt',
                                           'embedding_dim' : EMBEDDING_DIM}))

400000it [00:01, 260294.89it/s]


Now from embedding a single word with <code>token_embedding_tiny</code> we can proceed to mapping a word sequence into a sequence of vectors:

In [None]:
word_embeddings_tiny = BasicTextFieldEmbedder({"tokens": token_embedding_tiny})

The following initializes parameters of an LSTM model using <code>word_embeddings_tiny</code> input encoding

In [None]:
lstm = PytorchSeq2SeqWrapper(torch.nn.LSTM(EMBEDDING_DIM, HIDDEN_DIM, batch_first=True))

model_tiny = LstmTagger(word_embeddings_tiny, lstm, vocab_tiny)

Now define an LSTM model called <code>glove_model_tiny</code> that uses <code>glove_token_embedding_tiny</code>:

In [None]:
#write your code here
glove_embeddings_tiny = BasicTextFieldEmbedder({"tokens": glove_token_embedding_tiny})
glove_model_tiny = LstmTagger(glove_embeddings_tiny, lstm, vocab_tiny)

Train the basic model for the tiny dataset:

In [None]:
basic_trainer_tiny=initialize_trainer(model_tiny,vocab_tiny,train_dataset_tiny,validation_dataset_tiny,batch_size=50)
basic_trainer_tiny.train()

You have trained an LSTM POS tagger for the basic model. Now train the <code>glove_model_tiny</code>. 

In [None]:
#Write your code here
glove_trainer_tiny=initialize_trainer(glove_model_tiny,vocab_tiny,train_dataset_tiny,validation_dataset_tiny,batch_size=50)
glove_trainer_tiny.train()

## Exercise 3: Explore training parameters (10 points)

Create separate models on the basis of bigger datasets: the 500 sentence training and 500 sentence validation and 5000 sentence training and 5000 sentence validation. Using the full training set (50K sentences) is optional (your machine might be too slow). Initialize and train the basic model on 500 sentence training and 500 sentence validation data:

In [None]:
#train the basic model on 500 sentences

train_dataset_tiny_500 = reader.read("train_brown_500.txt")
validation_dataset_tiny_500 = reader.read("validation_brown_500.txt")
vocab_tiny_500 = Vocabulary.from_instances(train_dataset_tiny_500 + validation_dataset_tiny_500)

token_embedding_tiny_500 = Embedding(num_embeddings=vocab_tiny_500.get_vocab_size('tokens'),
                            embedding_dim=EMBEDDING_DIM)

word_embeddings_tiny_500 = BasicTextFieldEmbedder({"tokens": token_embedding_tiny_500})
model_tiny_500 = LstmTagger(word_embeddings_tiny_500, lstm, vocab_tiny_500)

basic_trainer_tiny_500=initialize_trainer(model_tiny_500,vocab_tiny_500,train_dataset_tiny_500,validation_dataset_tiny_500,batch_size=5)
basic_trainer_tiny_500.train()

Now do the same training (500 sentence training and 500 sentence validation sets) with GloVE embeddings:

In [None]:
glove_token_embedding_tiny_500 = Embedding.from_params(vocab=vocab_tiny_500,
                            params=Params({'pretrained_file':'glove.6B.300d.txt',
                                           'embedding_dim' : EMBEDDING_DIM}))
glove_word_embeddings_tiny_500 = BasicTextFieldEmbedder({"tokens": glove_token_embedding_tiny_500})
glove_model_tiny_500 = LstmTagger(glove_word_embeddings_tiny_500, lstm, vocab_tiny_500)

glove_trainer_tiny_500=initialize_trainer(glove_model_tiny_500,vocab_tiny_500,train_dataset_tiny_500,validation_dataset_tiny_500,batch_size=50)
glove_trainer_tiny_500.train()

Use a bigger training set now with 5K sentence training and 5K sentence validation sets and random initial embeddings:

In [None]:
train_dataset_tiny_5000 = reader.read("train_brown_5000.txt")
validation_dataset_tiny_5000 = reader.read("validation_brown_5000.txt")
vocab_tiny_5000 = Vocabulary.from_instances(train_dataset_tiny_5000 + validation_dataset_tiny_5000)

token_embedding_tiny_5000 = Embedding(num_embeddings=vocab_tiny_5000.get_vocab_size('tokens'),
                            embedding_dim=EMBEDDING_DIM)

word_embeddings_tiny_5000 = BasicTextFieldEmbedder({"tokens": token_embedding_tiny_5000})
model_tiny_5000 = LstmTagger(word_embeddings_tiny_5000, lstm, vocab_tiny_5000)

basic_trainer_tiny_5000=initialize_trainer(model_tiny_5000,vocab_tiny_5000,train_dataset_tiny_5000,validation_dataset_tiny_5000,batch_size=50)
basic_trainer_tiny_5000.train()

Now do the same training (5K sentence training and 5K sentence validation sets) with GloVE embeddings:

In [None]:
glove_token_embedding_tiny_5000 = Embedding.from_params(vocab=vocab_tiny_5000,
                            params=Params({'pretrained_file':'glove.6B.300d.txt',
                                           'embedding_dim' : EMBEDDING_DIM}))
glove_word_embeddings_tiny_5000 = BasicTextFieldEmbedder({"tokens": glove_token_embedding_tiny_5000})
glove_model_tiny_5000 = LstmTagger(glove_word_embeddings_tiny_5000, lstm, vocab_tiny_5000)

glove_trainer_tiny_5000=initialize_trainer(glove_model_tiny_5000,vocab_tiny_5000,train_dataset_tiny_5000,validation_dataset_tiny_5000,batch_size=50)
glove_trainer_tiny_5000.train()

For each trained model, record validation accuracy and training duration (they are returned along with other training stats after training a model) and accuracy on the training set. Fill in the numbers in the table below:

| model | validation accuracy | training accuracy | training duration|
|-------|---------------------|---------------|-------------------------------------------
| basic model on 50 sentences||||
| glove model on 50 sentences||||
| basic model on 500 sentences|0.73|0.93|0:04:12|
| glove model on 500 sentences|0.78|0.91|0:05:32|
| basic model on 5000 sentences|0.98|0.98|0:41:50|
| glove model on 5000 sentences|0:95|0:95|0:41:54|

**Question.** What do you conclude from these comparisons? when can it be especially beneficial to initialize a model with pretrained embeddings?

**Answer.** WRITE YOU ANSWER HERE

From the table, we see that the glove model generalizes better to the  validation data than the basic model. However, we also notiice that the perfomance of the basic model increased with larger training data. It is beneficially to use pre-trained data when you have small training set and some similarity between the pretrained model and actual training data.

During training, data is processed in batches so that the model performs computation for multiple examples simultaneously. How does batching affect model training? Modify the training to have smaller batches of data - let's use batches of 5 or 500 instead of 50. How does this affect the results? 

In [None]:
#Define your trainers with alternative batching here: batches of 5, 50 sentences
basic_trainer_tiny_50_b5=initialize_trainer(model_tiny,vocab_tiny,train_dataset_tiny,validation_dataset_tiny,batch_size=5)
basic_trainer_tiny_50_b5.train()

In [None]:
# batches of 5, 500 sentences
basic_trainer_tiny_500_b5=initialize_trainer(model_tiny_500,vocab_tiny_500,train_dataset_tiny_500,validation_dataset_tiny_500,batch_size=5)
basic_trainer_tiny_500_b5.train()

In [None]:
#batches of 500, 50 sentences
basic_trainer_tiny_50_b500=initialize_trainer(model_tiny,vocab_tiny,train_dataset_tiny,validation_dataset_tiny,batch_size=500)
basic_trainer_tiny_50_b500.train()

In [None]:
#batches of 500, 500 sentences
basic_trainer_tiny_500_b500=initialize_trainer(model_tiny_500,vocab_tiny_500,train_dataset_tiny_500,validation_dataset_tiny_500,batch_size=500)
basic_trainer_tiny_500_b500.train()

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Report your results below:

**batches of 5**:

| model | validation accuracy | training accuracy | training duration|
|-------|---------------------|---------------|-------------------------------------------
| basic model on 50 sentences|0.54|0.90|0:00:59|
| basic model on 500 sentences|0.73|0.94|0:01:25|

**batches of 500**:

| model | validation accuracy | training accuracy | training duration|
|-------|---------------------|---------------|-------------------------------------------
| basic model on 50 sentences|0.55|0.92|0:00:37|
| basic model on 500 sentences|0:73|0.95|0:02:12|

**Question.** What do these results tell you?
**Answer.** WRITE YOUR ANSWER HERE

The larger the batch size the better the performance on both the training and validation set. By seeing more of the data during each training iteration, we see that the performance of model increases, however, we notice that the model significantly overfits when it has a small training data and a large batch size.

## Comment 
In this lab we used pretrained GloVe embeddings in a model for part of speech tagging. GloVe in its turn is also a neural word embedding model, but it had been trained on a completely different objective. GloVe vectors had been optimised on word cooccurrence matrix decomposition, i.e. on the task of predicting which words tend to occur with which other words. Part of speech certainly plays a role in determining statistical cooccurrence of words, but this role is indirect, and explicit part of speech information has not been used in training GloVe.

This makes our application an example of **transfer learning**, whereby a learned model trained on one objective (e.g. word cooccurrence) can benefit a different application (e.g. POS tagging), because some information is shared between them. 

## Part 2 - ELMo vectors (50 points)

In the second part of this lab we will reproduce the word sense disambiguation strategy that the authors of the ELMo vectors explored. The strategy consists in the following:

- create ELMo embeddings for all tokens in a sense-annotated corpus
- calculate mean sense vectors for each word sense in the training partition of the corpus
- for each sense-annotated token in the test partition of the corpus, assign it to the sense of the word to which its ELMo vector is the closest according to the cosine distance metric
- as a backup strategy, use the 1st sense of the word by default.

As a sense annotated corpus, we can use SemCor, conveniently available within NLTK. <code>semcor.sents()</code> iterates over all sentences represented as lists of tokens, while <code>semcor.tagged_sents()</code> iterates over the same sentences with additional annotation including WordNet lemma identifiers (lemmas in WordNet stand for a word taken in a specific sense).

In [7]:
from nltk.corpus import semcor
from nltk.corpus import wordnet as wn
import nltk
nltk.download('wordnet')
#wn.lemmas("fish")
semcor.sents()
semcor.tagged_sents(tag="sem")
semcor_sentences = list(semcor.tagged_sents(tag="sem"))

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


## Exercise 1. Extract relevant data from SemCor (5 points)

First, split all the sentences in SemCor randomly into 90% training and 10% testing partitions:

In [8]:
#write your code here
random.shuffle(semcor_sentences)
semcor_train= semcor_sentences[:int(0.9*len(semcor_sentences))]
semcor_test= semcor_sentences[int(0.9*len(semcor_sentences)):]


Create a function that takes as input a sentence from SemCor and extracts a list which contains, for each token of the sentence, either the corresponding WordNet Lemma (e.g. <code>Lemma('friday.n.01.Friday')</code>) or <code>None</code>. <code>None</code> corresponds to tokens that are either 1) not annotated for word senses (e.g. articles); 2) are marked up as (part of) a named entity (e.g. "City of Atlanta" or placename "Fulton" annotated as  <code>Tree(Lemma('location.n.01.location'), [Tree('NE', ['Fulton'])])</code>)

In [14]:
def get_lemmas(semcor_sentence):
  #sentence= semcor_sentence
  word_to_lemma = dict()
  for i in range(len(semcor_sentence)):
    if str(type(semcor_sentence[i])) == "<class 'nltk.tree.Tree'>":
      temp = str(semcor_sentence[i])
      temp = temp.strip('( )')
      if len(temp.split()) == 2 and  ("Lemma('location.n.01.location')" not in temp or "(NE" not in temp):
        word_to_lemma[temp.split()[1]] =  temp.split()[0]
      elif len(temp.split()) > 2 or ("Lemma('location.n.01.location')" in temp or "(NE"  in temp):
        for token in temp.split()[1:]:
          if token != "(NE":
            word_to_lemma[token] = "None"
    elif type(semcor_sentence[i]) == list:
      word_to_lemma["".join(semcor_sentence[i])] = "None"


  return word_to_lemma    #we return a dictionary instead of just the list of lemmas/none incase we need to do a lookup later and always get the list of values from the dict
 

You are now able to extract word senses (instantiated by WordNet lemmas) from the corpus. The next step is to associate senses with ELMo vectors. Create a dictionary of contextualized token embeddings from the training corpus grouped by the WordNet sense:

In [25]:
from collections import defaultdict
import numpy

Train_embeddings=defaultdict(list)

Now let's create contextualized ELMo word embeddings for the tokens in this corpus. We can load the pretrained ELMo model and define a function <code>sentences_to_elmo()</code> that receives a list of tokenized sentences as input and produces their ELMo vectors.

In [11]:
from allennlp.modules.elmo import Elmo, batch_to_ids

options_file = "https://allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "https://allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"
elmo = Elmo(options_file, weight_file, 1, dropout=0)

def sentences_to_elmo(sentences):
    character_ids = batch_to_ids(sentences)
    embeddings = elmo(character_ids)
    return embeddings

Now you can process the corpus sentences and produce their ELMo vectors. It is recommended to pass the input to ELMo encoder in batches. A suggested batch size is 50 sentences. For example, the code below processes the first 50 sentences from the corpus:

In [None]:
sentences=semcor.sents()[:50]
embeddings=sentences_to_elmo(sentences)

[['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', 'Atlanta', "'s", 'recent', 'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities', 'took', 'place', '.'], ['The', 'jury', 'further', 'said', 'in', 'term', 'end', 'presentments', 'that', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had', 'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves', 'the', 'praise', 'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta', "''", 'for', 'the', 'manner', 'in', 'which', 'the', 'election', 'was', 'conducted', '.'], ['The', 'September', 'October', 'term', 'jury', 'had', 'been', 'charged', 'by', 'Fulton', 'Superior', 'Court', 'Judge', 'Durwood', 'Pye', 'to', 'investigate', 'reports', 'of', 'possible', '``', 'irregularities', "''", 'in', 'the', 'hard-fought', 'primary', 'which', 'was', 'won', 'by', 'Mayor-nominate', 'Ivan', 'Allen', 'Jr.', '.'], ['``', 'Only', 'a', 'relative', 'handful', 'of', 's

The <code>embeddings</code> that we obtained is a dictionary that contains a list of ELMo embeddings and a list of masks. The mask tells us which embeddings correspond to tokens in the original input sentences and which correspond to the padding (introduced to give all sentences in the batch the same length).
In principle all embeddings are stored in PyTorch tensors so that they can be used in bigger neural models, but we are not going to do it now. For our purposes, PyTorch tensors can be converted to numpy arrays:

In [None]:
embeddings['elmo_representations'][0].detach().numpy()

We can check the size of the embeddings we got. It has three dimensions: 1) the number of sentences 2) the number of tokens (corresponds to the tokens in the longest original sentence of the batch; shorter ones were padded)

In [None]:
embeddings['elmo_representations'][0].detach().size()

torch.Size([50, 59, 1024])

Another thing contained in the <code>embeddings</code> is the mask, a tensor encoding which tokens vectors correspond to original tokens and which are paddings:

In [None]:
embeddings['mask']



tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])

## Exercise 2. Extract ELMo encoding of sentences using a mask (5 points)  

Now define a function <code>get_masked_vectors(embeddings)</code> that takes embeddings as input and returns a list of ELMo sentence encodings to which the mask has been applied, i.e. where the padding vectors have been removed so the representation of each sentence contains as many vectors as there were tokens in the original sentence.

In [13]:
def get_masked_vectors(embeddings):
    #Your code here
    sentences =list()
    emb = embeddings['elmo_representations'][0].detach().numpy()
    for i in range(len(embeddings['mask'])):
      index_no_padding = max(np.where(embeddings['mask'][i] == 1)[0])
      sentence = emb[i,:index_no_padding+1]
      sentences.append(sentence)

    return sentences


## Exercise 3. Collect ELMo vectors from the training corpus (15 points)

Process the corpus updating your train word sense vectors. Iterate over the all the train sentences in the corpus, and retrieve for each lemma-annotated token (where lemma is not <code>None</code>) the corresponding ELMo vector. Store the ELMo sense embeddings that correspond to each lemma in the dictionary <code>Train_embeddings</code>.

In [23]:
del Train_embeddings

In [None]:
#Your code here
counter =0
j_index=50
i_index=0
while j_index <= len(semcor_train[:10000]): #RAM issues when i try to train on more dataset 
  print(i_index,j_index)
  token_with_lemmas = list()
  lemmas_list = list()
  tokens_ = list()
  lemmas_ = list()
  semcor_train_ = semcor_train[i_index : j_index]

  for each in semcor_train_:
    lemas = get_lemmas(each)
    for tokken, lemma in lemas.items():
      if lemma != "None":
        token_with_lemmas.append(tokken)
        lemmas_list.append(lemma)
    if len(token_with_lemmas) !=0:
      tokens_.append(token_with_lemmas)
      lemmas_.append(lemmas_list)
    token_with_lemmas = []
    lemmas_list = []

  sentence_encode = get_masked_vectors(sentences_to_elmo(tokens_))
  for i in range(len(sentence_encode)):
    for j, lemma in enumerate(lemmas_[i]):
      Train_embeddings[lemma].append(sentence_encode[i][j]) 
  tokens_ =[]
  lemmas_ = []
  sentence_encode = []
  i_index = j_index
  j_index += 50
  

#divide training embeddings into different dictionaries then combine at the end. 

  

## Exercise 4. Vector averaging (5 points)

Now you can calculate the average ELMo vector for each word sense in the training corpus:

In [27]:
#Your code here
Train_embeddings_=dict()
sum_vec =0
for lemma, vectors  in Train_embeddings.items():
  length_vec = len(vectors)
  for vector in vectors:
    sum_vec += vector/ length_vec
  Train_embeddings_[lemma] = sum_vec

## Exercise 5. Testing the sense vectors (20 points)

Test your sense embeddings on your test data, which is a subset of the SemCor corpus. Use the strategy outlined above, with 1st WordNet sense as a fallback: 

- rely on mean sense vectors for each word sense in the training partition of the corpus
- for each sense-annotated token <i>t</i> (e.g. the verb "run") in the test partition of the corpus, assign it to the sense of the word "Lemma(*.v*.run)" to which ithe ELMo vector <i>t</i> is the closest according to the cosine distance metric
- as a backup strategy, use the 1st sense of the word (e.g. <code>Lemma(run.v.01.run)</code>) by default.

Report WSD accuracy in percentage points on your test data.

In [None]:
# Your code here
from sklearn.metrics.pairwise import cosine_similarity
lema_vec = list()
lemmas_sysnet =list()
backup_strategy = list()
cosine_values = list()
predicted_lemma = defaultdict(list)
lemmas_in_training = list()
count = 0
total_lemmas_count =0
for sentences in semcor_test:
  sentence_dict = get_lemmas(sentences)
  for word, values in sentence_dict.items():
    if values != "None":
      total_lemmas_count +=1
      lemmas = wn.lemmas(word)
      lemmas_sysnet.append(wn.synsets(word))
      lemmas_sysnet_ =[item for i in lemmas_sysnet for item in i]
      for i,lemma in enumerate(lemmas):
        if str(lemma) in Train_embeddings_.keys():
          lemma_elmo = get_masked_vectors(sentences_to_elmo(lemmas_sysnet_[i].name()))[0][0]
          elmo_cosine_score = cosine_similarity(Train_embeddings_[str(lemma)].reshape((1,-1)), lemma_elmo.reshape(1,-1))
          cosine_values.append(elmo_cosine_score)
          lemmas_in_training.append(lemma)
        else:
          backup = lemmas[0]
          backup_strategy.append(backup)
 
      cosine_values_ = sorted(cosine_values, reverse= True)
      lemmas_in_training_ = [x for _,x in sorted(zip(cosine_values,lemmas_in_training),reverse=True)]

      if len(lemmas_in_training_) != 0:
        if str(values) == str(lemmas_in_training_[0]):
          count += 1
      elif len(backup_strategy) !=0:
        if str(values) == str(backup_strategy[0]):
          count += 1

      lema_vec = []
      lemmas_sysnet =[]
      backup_strategy = []
      cosine_values = []
      lemmas_in_training_ =[]


#Accuracy

accuracy =  count/total_lemmas_count



print(accuracy)

## The end
Congratulations! this is the end of Lab 4.

**Acknowledgements** Tejaswini Deoskar has given valuable comments that helped improve this lab assignment. Timothee Mickus helped to test this assignment and gave extensive feedback on the instructions. Many thanks to both.