Copy this notebook (File>Save a copy in Drive), rename it using your name, and then work on your copy.
==
To send me your work: use the sharing menu (top-right of the window) to share it with timothee.m.r.bernard@gmail.com.
(I don't check this address very often, so, for questions, please use Moodle or my u-paris.fr address.)

Goal
==

We are about to design and train a neural system to perform sentiment analysis on film reviews. More precisely, the network will have to output the probability that the input review expresses a positive opinion (overall).

The system will be a bag-of-words model using GloVe embeddings. It will have to first average the embeddings of the words of the input review, and then send the result through a simple network that should output a probability.

There is a lot of code already written at the beginning of the notebook. It is important that you understand it as you will have to reuse/reproduce it for future work.

Remarks:
==
*   Follow the instructions very carefully. Do not ignore any comment.
*   Keep in mind all remarks given in TP 1.
*   Comment your code (including the role of all functions and the type of their arguments). A piece of code not appropriately commented can be considered incorrect (irrespectively of whether it works or not).
*   Indicate the shape of each tensor that you define.
*   Comment all the changes that you make. Any work that is not properly explained might be ignored.

Loading PyTorch is important.
==

In [None]:
# Imports PyTorch.
import torch

Downloading the dataset
==
The dataset we are going to use is the Large Movie Review Dataset (https://ai.stanford.edu/~amaas/data/sentiment/).

Downloading the dataset and pre-processing it might take several minutes, so ask Colab to execute all cells while you are reading the code.

In [None]:
# Downloads the dataset.
import urllib

tmp = urllib.request.urlretrieve("https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz")
filename = tmp[0]

In [None]:
filename

'/tmp/tmpnqmq8cph'

In [None]:
# Extracts the dataset.
import tarfile
tar = tarfile.open(filename)
tar.extractall()
tar.close()

In [None]:
import os # Useful library to read files and inspect directories.

In [None]:
# Shows which files and directories are present at the root of the file system.
for filename in os.listdir("."):
  print(filename)

.config
aclImdb
sample_data


In [None]:
dataset_root = "aclImdb"
# Shows which files and directories are present at the root of the dataset directory.
for filename in os.listdir(dataset_root):
  print(filename)

train
test
imdb.vocab
imdbEr.txt
README


In [None]:
# Shows several reviews.
dirname = os.path.join(dataset_root, "train", "neg") # "aclImdb/{train|test}/{neg|pos}"
for idx, filename in enumerate(os.listdir(dirname)):
  if(idx >= 5): break # Stops after the 5th file.

  print(filename)
  with open(os.path.join(dirname, filename)) as f:
    review = f.read()
    print(review)
  print()

8510_1.txt
When we were in junior high school, some of us boys would occasionally set off stinkbombs. It was considered funny then. But the producers, directors and cast of "Semana Santa" ("Angel of Death" in the DVD section of your local video rental) are adults and they are STILL setting them off.<br /><br />Like the previous reviewer who wondered if the cast were anxious to get off the set and home, I doubt more than one take was done for any of the scenes.<br /><br />Mira Sorvino, hot in "Mighty Aphrodite" and other top-rated films, seems to have undersold herself to this project. Her acting is non-existent, confined mostly to wistful stares that are supposed to indicate how "sensitive" she is to the plight of the film's various victims.<br /><br />But let me warn you--do not be the next victim! Step away from the DVD if you find it on the shelf. Tbere are not many good leg shots of Mira (the only high points I could find in the film) and the supporting cast is of inferior quality,

Preprocessing the dataset
==

In [None]:
import nltk # Imports NLTK, an NLP library.
nltk.download('punkt') # Loads a module required for tokenization.
import collections # This library defines useful data structures.

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
newline = "<br />" # The reviews sometimes contain this HTLM tag to indicate a line break.
def preprocess(text):
  text = text.replace(newline, " ") # Replaces the newline HTML tag with a space.
  tokens = nltk.word_tokenize(text); # Converts the text to a list of tokens (strings).
  tokens = [token.lower() for token in tokens] # Lowercases all tokens.

  return tokens

# Reads and pre-processes the reviews.
dataset = {"train": [], "test": []}
binary_classes = {"neg": 0, "pos": 1}
for part_name, l in dataset.items():
  for class_name, value in binary_classes.items():
    path = os.path.join(dataset_root, part_name, class_name)
    print("Processing %s..." % path, end='');
    for filename in os.listdir(path):
        with open(os.path.join(path, filename)) as f:
          review_text = f.read()
          review_tokens = preprocess(review_text)

          l.append((review_tokens, value))
    print(" done")

Processing aclImdb/train/neg... done
Processing aclImdb/train/pos... done
Processing aclImdb/test/neg... done
Processing aclImdb/test/pos... done


In [None]:
# Splits the train set into a proper train set and a development/validation set.
# 'dataset["train"]' happens to be a list composed of a certain number of negative examples followed by the same number of positive examples.
# We are going to use 3/4 of the original train set as our actual train set, and 1/4 as our development set.
# We want to keep balanced train and development sets, i.e. for both, half of the reviews should be positive and half should be negative.
if("dev" in dataset): print("This should only be run once.")
else:
  dev_set_half_size = int((len(dataset["train"]) / 4) / 2) # Half of a quarter of the training set size.
  dataset["dev"] = dataset["train"][:dev_set_half_size] + dataset["train"][-dev_set_half_size:] # Takes some negative examples at the beginning and some positive ones at the end.
  dataset["train"] = dataset["train"][dev_set_half_size:-dev_set_half_size] # Removes the examples used for the development set.

  for (part, data) in dataset.items():
    class_counts = collections.defaultdict(int)
    for (_, p) in data: class_counts[p] += 1
    print(f"{part}: {class_counts}")
  print("Train set split into train/dev.")

train: defaultdict(<class 'int'>, {0: 9375, 1: 9375})
test: defaultdict(<class 'int'>, {0: 12500, 1: 12500})
dev: defaultdict(<class 'int'>, {0: 3125, 1: 3125})
Train set split into train/dev.


Loading the word embeddings
==
We are going to use GloVe embeddings.

All word forms with a frequency below a given threshold are going to be considered unknown forms.

In [None]:
# Computes the frequency of all word forms in the train set.
word_counts = collections.defaultdict(int)
for tokens, _ in dataset["train"]:
  for token in tokens: word_counts[token] += 1

print(word_counts)
print(len(word_counts))

89384


In [None]:
# Builds a vocabulary containing only those words present in the train set with a frequency above a given threshold.
count_threshold = 4;
vocabulary = set()
for word, count in word_counts.items():
    if(count > count_threshold): vocabulary.add(word)

print(vocabulary)
print(len(vocabulary))

26263


In [None]:
import zipfile
import numpy as np

In [None]:
# Returns a dictionary {word[String]: id[Integer]} and a list of Numpy arrays.
# `data_path` is the path of the directory containing the GloVe files (if None, 'glove.6B' is used)
# `max_size` is the number of word embeddings read (starting from the most frequent; in the GloVe files, the words are sorted)
# If `vocabulary` is specified (as a set of strings, or a dictionary from strings to integers), the output vocabulary contains the intersection of `vocabulary` and the words with a defined embedding. Otherwise, all words with a defined embedding are used.
def get_glove(dim=50, vocabulary=None, max_size=-1, data_path=None):
  dimensions = set([50, 100, 200, 300]) # Available dimensions for GloVe 6B
  fallback_url = 'http://nlp.stanford.edu/data/glove.6B.zip' # (Remember that in GloVe 6B, words are lowercased.)

  assert (dim in dimensions), (f'Unavailable GloVe 6B dimension: {dim}.')

  if(data_path is None): data_path = 'glove.6B'

  # Checks that the data is here, otherwise downloads it.
  if(not os.path.isdir(data_path)):
    #print('Directory "%s" does not exist. Creation.' % data_path)
    os.makedirs(data_path)

  glove_weights_file_path = os.path.join(data_path, f'glove.6B.{dim}d.txt')

  if(not os.path.isfile(glove_weights_file_path)):
    local_zip_file_path = os.path.join(data_path, os.path.basename(fallback_url))

    if(not os.path.isfile(local_zip_file_path)):
      print(f'Retreiving GloVe embeddings from {fallback_url}.')
      urllib.request.urlretrieve(fallback_url, local_zip_file_path)

    with zipfile.ZipFile(local_zip_file_path, 'r') as z:
      print(f'Extracting GloVe embeddings from {local_zip_file_path}.')
      z.extractall(path=data_path)

  assert os.path.isfile(glove_weights_file_path), (f"GloVe file {glove_weights_file_path} not found.")

  # Reads GloVe data.
  print('Reading GloVe embeddings.')
  new_vocabulary = {} # A dictionary {word[String]: id[Integer]}
  embeddings = [] # The list of embeddings (Numpy arrays)
  with open(glove_weights_file_path, 'r') as f:
    for line in f: # Each line consist of the word followed by a space and all of the coefficients of the vector separated by a space.
      values = line.split()

      # Here, I'm trying to detect where on the line the word ends and where the vector begins. As in some version(s) of GloVe words can contain spaces, this is not entirely trivial.
      vector_part = ' '.join(values[-dim:])
      x = line.find(vector_part)
      word = line[:(x - 1)]

      if((vocabulary is not None) and (not word in vocabulary)): # If a vocabulary was specified and if the word is not in it…
        continue # …this word is skipped.

      new_vocabulary[word] = len(new_vocabulary)
      embedding = np.asarray(values[-dim:], dtype=np.float32)
      embeddings.append(embedding)

      if(len(new_vocabulary) == max_size): break
  print('(GloVe embeddings loaded.)')
  print()

  return (new_vocabulary, embeddings)

In [None]:
%%time
(new_vocabulary, embeddings) = get_glove(dim=100, vocabulary=vocabulary)

Retreiving GloVe embeddings from http://nlp.stanford.edu/data/glove.6B.zip.
Extracting GloVe embeddings from glove.6B/glove.6B.zip.
Reading GloVe embeddings.
(GloVe embeddings loaded.)

CPU times: user 21.4 s, sys: 5.07 s, total: 26.5 s
Wall time: 3min 7s


In [None]:
print(len(new_vocabulary)) # Shows the size of the vocabulary.
print(new_vocabulary) # Shows each word and its id.

25481


Batch generator
==

In [None]:
# Defines a class of objects that produce batches from the dataset.
class BatchGenerator:
  def __init__(self, dataset, vocabulary):
    self.dataset = dataset
    for part in self.dataset.values(): # Shuffles the dataset so that positive and negative examples are mixed.
      np.random.shuffle(part)

    self.vocabulary = vocabulary # Dictonary {word[String]: id[Integer]}
    self.unknown_word_id = len(vocabulary) # Id for unknown forms
    self.padding_idx = len(vocabulary) + 1 # Not all reviews of a given batch will have the same length. We will "pad" shorter reviews with a special token id so that the batch can be represented by a matrix.

  def length(self, data_type='train'):
    return len(self.dataset[data_type])

  # Returns a random batch.
  # Batches are output as a triples (word_ids, polarity, texts).
  # If `subset` is an integer, only a subset of the corpus is used. This can be useful to debug the system.
  def get_batch(self, batch_size, data_type, subset=None):
    data = self.dataset[data_type] # selects the relevant portion of the dataset.

    max_i = len(data) if(subset is None) else min(subset, len(data))
    instance_ids = np.random.randint(max_i, size=batch_size) # Randomly picks some instance ids.

    return self._ids_to_batch(data, instance_ids)

  def _ids_to_batch(self, data, instance_ids):
    word_ids = [] # Will be a list of lists of word ids (Integer)
    polarity = [] # Will be a list of review polarities (Boolean)
    texts = [] # Will be a list of lists of words (String)
    for instance_id in instance_ids:
      text, p = data[instance_id]

      word_ids.append([self.vocabulary.get(w, self.unknown_word_id) for w in text])
      polarity.append(p)
      texts.append(text)

    # Padding
    self.pad(word_ids)

    word_ids = torch.tensor(word_ids, dtype=torch.long) # Conversion to a tensor
    polarity = torch.tensor(polarity, dtype=torch.bool) # Conversion to a tensor

    return (word_ids, polarity, texts) # We don't really need `texts` but it might be useful to debug the system.

  # Pads a list of lists (i.e. adds fake word ids so that all sequences in the batch have the same length, so that we can use a matrix to represent them).
  # In place
  def pad(self, word_ids):
    max_length = max([len(s) for s in word_ids])
    for s in word_ids: s.extend([self.padding_idx] * (max_length - len(s)))

  # Returns a generator of batches for a full epoch.
  # If `subset` is an integer, only a subset of the corpus is used. This can be useful to debug the system.
  def all_batches(self, batch_size, data_type="train", subset=None):
    data = self.dataset[data_type]

    max_i = len(data) if(subset is None) else min(subset, len(data))

    # Loop that generates all full batches (batches of size 'batch_size').
    i = 0
    while((i + batch_size) <= max_i):
      instance_ids = np.arange(i, (i + batch_size))
      yield self._ids_to_batch(data, instance_ids)
      i += batch_size

    # Possibly generates the last (not full) batch.
    if(i < max_i):
      instance_ids = np.arange(i, max_i)
      yield self._ids_to_batch(data, instance_ids)

  # Turns a list of arbitrary pre-processed texts into a batch.
  # This function will be used to infer the polarity of a unannotated review.
  def turn_into_batch(self, texts):
    word_ids = [[self.vocabulary.get(w, self.unknown_word_id) for w in text] for text in texts]
    self.pad(word_ids)
    return torch.tensor(word_ids, dtype=torch.long)

batch_generator = BatchGenerator(dataset=dataset, vocabulary=new_vocabulary)
print(batch_generator.length('train')) # Prints the number of instance in the train set.

18750


In [None]:
tmp = batch_generator.get_batch(3, data_type="train")
print(tmp[0]) # Prints the matrix of token ids. This matrix is what will be fed as input to the model (defined below).
print(tmp[1]) # Prints the vector of polarities. This vector will be used to compute the loss when training the model.
print(tmp[2]) # Prints the list of reviews.

tensor([[   40,   796,     0,  ..., 25482, 25482, 25482],
        [    7,  4876,   871,  ...,  7054,  1473,     2],
        [   40,  1698,    59,  ..., 25482, 25482, 25482]])
tensor([ True,  True, False])
[['i', 'thought', 'the', 'movie', 'was', 'extremely', 'funny', 'and', 'actually', 'very', 'interesting', '.', 'it', 'was', 'raw', 'and', 'honest', 'and', 'felt', 'as', 'if', 'i', 'was', 'really', 'watching', 'the', '``', 'real', 'people', "''", 'not', 'actors', '.', 'it', "'s", 'great', 'entertainment', ',', 'it', 'also', 'painted', 'the', 'people', 'as', 'human', 'on', 'our', 'level', 'not', 'below', 'us', '.', 'it', 'is', 'a', 'very', 'good', 'film', '.'], ['a', 'remarkable', 'example', 'of', 'cinematic', 'alchemy', 'at', 'work', ',', 'with', 'a', "trite'n'turgid", 'lump', 'of', 'lead', 'script', '(', 'penned', 'by', 'numbingly', 'mediocre', 'hollywood', 'hack', 'nonpareil', 'jole', 'schumacher', ',', 'no', 'less', ')', 'being', 'magically', 'converted', 'into', 'a', 'choice', 'chun

In [None]:
len(list(batch_generator.all_batches(batch_size=3, data_type="train"))) # Number of batches in the training set for batches of size 3

6250

The model
==
Here you have to complete the implementation of the model.
This model is expected to accept as input a matrix of token id (in which each line represents a review) and to output a vector (in which each value represents the probability that the corresponding review is positive).

Please, **pay attention to all comments**.
They contain useful information.

You might wonder what the ".to" method of tensors is for.
To execute the neural network faster, we will run it on a GPU instead of a CPU.
To do so, data and parameters should be sent on the GPU, which is done by using the ".to" method.
This is possible if the parameters of the notebook allow it (i.e. if Edit/Notebook Settings/Hardware Accelerator is "GPU"), which should be the case.
If you implement things correctly, you should not need to add any call to the ".to" method here (and only one or two later during the training process).

For your system to be efficient, you should **never loop over a tensor** whenever it is possible to do otherwise.

In [None]:
class SentimentClassifier(torch.nn.Module):
  # embeddings: list of Numpy arrays
  # hidden_sizes: list of the size (Integer) of each hidden layer; there may be 0 or more hidden layers
  # freeze_embeddings: boolean; indicates whether the embeddings should be frozen (i.e., not fine-tuned) during training
  # device: string; indicates on which type of hardware PyTorch computation should be run
  def __init__(self, embeddings, hidden_sizes, freeze_embeddings=True, device='cpu'):
    embeddings = list(embeddings) # Creates a copy of the list of embeddings, so we can add or remove entries without affecting the original list.
    super().__init__() # Calls the constructor of the parent class. Usually, this is necessary when creating a custom module.

    # Here you have to (i) define a vector for unknown forms (the average of actual word embeddings) and a vector for the padding token (full of 0·s)
    # and (ii) define an embedding layer 'self.embeddings' using torch.nn.Embedding.from_pretrained and without forgetting to use the 'freeze' and 'padding_idx' arguments (this last argument is used to keep the padding embedding at 0 even when fine-tuning the other embeddings).
    # The following error (if you get it) indicates that the value provided for 'padding_idx' does not correspond to any embedding in the matrix that you provide (in other words, the matrix is likely to be incomplete): "Padding_idx must be within num_embeddings".
    #################
    embeddings = torch.Tensor(embeddings)
    self.embedding_size = embeddings.shape[1]
    self.unknown_embedding = torch.mean(embeddings, dim=0).view((1, self.embedding_size))
    self.padding_embedding = torch.zeros(size=(1, self.embedding_size))

    embeddings = torch.cat((embeddings, self.unknown_embedding, self.padding_embedding), 0)

    self.embeddings = torch.nn.Embedding.from_pretrained(embeddings,
                                                         freeze=True,
                                                         padding_idx=embeddings.shape[0]-1)
    #################

    self.embeddings = self.embeddings.to(device) # Sends the word embeddings to 'device', which is potentially a GPU.
    # Here you have to define self.main_part, the network that computes a probability for any review given as input (represented as the average of the embeddings of the tokens).
    # The number of hidden layers is determined by 'hidden_sizes, which is a list of integers describing the (output) size of each of them.
    # Use torch.nn.Linear to build linear layers.
    # torch.nn.Sequential takes one argument per module and not a list of modules as argument, but if 'modules' is a list of modules, 'torch.nn.Sequential(*modules)' (with the star notation) works.
    #################
    modules = []
    out_sizes = hidden_sizes + [1]
    in_size = self.embedding_size
    while(True):
      out_size= out_sizes.pop(0)
      modules.append(torch.nn.Linear(in_size,out_size))
      if (len(out_sizes)==0):break
      modules.append(torch.nn.ReLU())
      in_size=out_size
    modules.append(torch.nn.Sigmoid())
    self.main_part = torch.nn.Sequential(*modules)
    #################
    self.main_part = self.main_part.to(device) # Sends the network to 'device', which is potentially a GPU.

    self.device = device

  # 'batch' is 2D tensor (i.e. a matrix) of word ids (Integer).
  def forward(self, batch):
    # Here you have to (i) turn 'batch' into a matrix of embeddings (i.e. a tensor of rank 3), (ii) average all embeddings for a given review while being careful not to take into account padding vectors, (iii) send these bag-of-words representations to the network.
    # Return a tensor of shape (batch size) instead of (batch size, 1).
    # Once you think the function works, check that the presence of padding ids does NOT impact the result in any way (i.e. the same probability should be computed for a given review independently of the number of padding ids).
    #################
    embeddings_batch = self.embeddings(batch)
    embeddings_batch = embeddings_batch.sum(dim=1) / embeddings_batch.count_nonzero(dim=1)

    probabilities = []
    out = self.main_part(embeddings_batch)
    out = out.squeeze().to(self.device)
    return out
    #################

model = SentimentClassifier(embeddings, hidden_sizes=[], freeze_embeddings=True)
batch = batch_generator.get_batch(3, data_type="train")
print(model(batch[0])) # This output (its shape) should be checked.


tensor([0.5075, 0.4891, 0.4873], grad_fn=<SqueezeBackward0>)


In [None]:
model = SentimentClassifier(embeddings, hidden_sizes=[100], freeze_embeddings=True)

batch = batch_generator.get_batch(3, data_type="train")
print(model(batch[0])) # This output (its shape) should be checked.

ok
okok
tensor([0.4952, 0.5004, 0.4957], grad_fn=<SqueezeBackward0>)


In [None]:
# Function that computes the accuracy of the model on a given part of the dataset.
evaluation_batch_size = 256
def evaluation(model, data_type, subset=None):
  nb_correct = 0
  total = 0
  for batch in batch_generator.all_batches(evaluation_batch_size, data_type=data_type, subset=subset):
    prob = model(batch[0].to(model.device)) # Forward pass
    answer = (prob > 0.5) # Shape: (evaluation_batch_size, 1)
    nb_correct += (answer == batch[1].to(model.device)).sum().item()
    total += batch[0].shape[0]

  accuracy = (nb_correct / total)
  return accuracy

Training
==
Once everything works, try to find better hyperparameters.
The goal is to maximise the accuracy on the development set.
If everything works properly and if you want to maximise your grade, find ways to improve the model and/or the training process.
Graphs used for visualising the training process are also much welcome.
(These instructions apply to all future TPs as well.)

You should document in a text cell as much as possible what you do and, when relevant, how it affects the performance of the model.

In [None]:
!pip install numpy --pre torch torchvision torchaudio --force-reinstall

In [None]:
model = SentimentClassifier(embeddings, hidden_sizes=[200,100], freeze_embeddings=True, device="cpu")

# Tests the model on a couple of instance before training.
model.eval() # Tells PyTorch we are in evaluation/inference mode (can be useful if dropout is used, for instance).
print(model(batch_generator.turn_into_batch([preprocess(text) for text in ["This movie was terrible!!", "Pure gold!"]]).to(model.device)))

# Training procedure
learning_rate = 0.004
l2_reg = 0.001
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.99, weight_decay=l2_reg) # Once the backward propagation has been done, just call the 'step' method (with no argument) of this object to update the parameters.
batch_size = 64
subset = None # Use an integer to train on a smaller portion of the training set, otherwise use None.
epoch_size = batch_generator.length("train") if(subset is None) else subset # In number of instances

nb_epoch = 20
epoch_id = 0 # Id of the current epoch
instances_processed = 0 # Number of instances trained on in the current epoch
epoch_loss = [] # Will contain the loss for each batch of the current epoch
while(epoch_id < nb_epoch):
  model.train() # Tells PyTorch that we are in training mode (can be useful if dropout is used, for instance).

  batch = batch_generator.get_batch(batch_size, data_type="train", subset=subset) # The batch to train on at this iteration.

  # You have to (i) compute the prediction of the model (you might want to use ".to(model.device)" on the input of the model), (ii) compute the loss (use an average over the batch), (iii) call "backward" on the loss and (iv) store the loss in "epoch_loss".
  ###################
  model.zero_grad() # Makes sure the gradient is reinitialised to zero.

  predictions = model(batch[0].to(model.device))
  gold = torch.tensor([float(label) for label in batch[1]], device=model.device)

  loss = torch.nn.BCELoss(reduction="mean")(predictions, gold)

  loss.backward()
  epoch_loss.append(loss.item())
  ###################
  optimizer.step() # Updates the parameters.

  instances_processed += batch_size
  if(instances_processed > epoch_size): # If this iteration corresponds to the end of an epoch.
    print(f"-- END OF EPOCH {epoch_id}.")
    print(f"Average loss: {sum(epoch_loss) / len(epoch_loss)}.")

    # Evaluation
    model.eval() # Tells PyTorch we are in evaluation/inference mode (can be useful if dropout is used, for instance).
    with torch.no_grad(): # Deactivates Autograd (it is computationaly expensive and we don't need it here).
      accuracy = evaluation(model.to(model.device), "train")
      print(f"Accuracy on the train set: {accuracy}.")

      accuracy = evaluation(model, "dev")
      print(f"Accuracy on the dev set: {accuracy}.")

    epoch_id += 1
    instances_processed -= epoch_size
    epoch_loss = []

tensor([0.5046, 0.5130], grad_fn=<SqueezeBackward0>)
-- END OF EPOCH 0.
Average loss: 0.6822266570537163.
Accuracy on the train set: 0.6196266666666667.
Accuracy on the dev set: 0.62736.
-- END OF EPOCH 1.
Average loss: 0.5828959327508972.
Accuracy on the train set: 0.7549866666666667.
Accuracy on the dev set: 0.75792.
-- END OF EPOCH 2.
Average loss: 0.5714536743562783.
Accuracy on the train set: 0.6467733333333333.
Accuracy on the dev set: 0.64864.
-- END OF EPOCH 3.
Average loss: 0.527533637683953.
Accuracy on the train set: 0.7810133333333333.
Accuracy on the dev set: 0.7848.
-- END OF EPOCH 4.
Average loss: 0.5028950535396667.
Accuracy on the train set: 0.78256.
Accuracy on the dev set: 0.7816.
-- END OF EPOCH 5.
Average loss: 0.5097179383344618.
Accuracy on the train set: 0.7723733333333334.
Accuracy on the dev set: 0.77264.
-- END OF EPOCH 6.
Average loss: 0.5316626697066701.
Accuracy on the train set: 0.67968.
Accuracy on the dev set: 0.6792.
-- END OF EPOCH 7.
Average loss: 0.

In [None]:
function = torch.nn.Sigmoid()
out = function(torch.Tensor([0.4, 0.3]))
print(out.requires_grad)

In [None]:
model.eval() # Tells PyTorch that we are in evaluation/inference mode (can be useful if dropout is used, for instance).
model(batch_generator.turn_into_batch([preprocess(text) for text in ["This movie was terrible!!", "Pure gold!", "Bad.", "Not bad!"]]).to(model.device))