# 2-Step Sentiment Analysis for Movie Reviews

**[SA2] Sentiment Analysis 2 project**

Objective: Classification of Movie Reviews into positive and negative.

Implement 2-step classification:
* Sentence-level subjectivity detection;
* Aggregate into document-level sentiment-polarity.

Data Set: [Movie Reviews](https://ai.stanford.edu/~amaas/data/sentiment/)

Evaluation: [F1 score](https://en.wikipedia.org/wiki/F-score)


Download English language model from [spaCy](https://spacy.io/models/en#en_core_web_lg)

In [None]:
!pip install spacy==3.1.2
!python -m spacy download en_core_web_lg-3.1.0 --direct

Collecting spacy==3.1.2
  Downloading spacy-3.1.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 5.3 MB/s 
Collecting thinc<8.1.0,>=8.0.8
  Downloading thinc-8.0.8-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (621 kB)
[K     |████████████████████████████████| 621 kB 32.2 MB/s 
[?25hCollecting spacy-legacy<3.1.0,>=3.0.7
  Downloading spacy_legacy-3.0.8-py2.py3-none-any.whl (14 kB)
Collecting pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4
  Downloading pydantic-1.8.2-cp37-cp37m-manylinux2014_x86_64.whl (10.1 MB)
[K     |████████████████████████████████| 10.1 MB 112 kB/s 
[?25hCollecting srsly<3.0.0,>=2.4.1
  Downloading srsly-2.4.1-cp37-cp37m-manylinux2014_x86_64.whl (456 kB)
[K     |████████████████████████████████| 456 kB 10.2 MB/s 
Collecting catalogue<2.1.0,>=2.0.4
  Downloading catalogue-2.0.6-py3-none-any.whl (17 kB)
Collecting typer<0.4.0,>=0.3.0
  Downloading typer-0.3.2-py3-none-any.whl (21 kB)
Collect

Download and import all the needed packages for executing the code

In [None]:
import nltk
import numpy as np
import re

nltk.download("vader_lexicon")
from nltk.sentiment.vader import SentimentIntensityAnalyzer

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

import spacy
nlp = spacy.load("en_core_web_lg", exclude=["tagger", "parser", "attribute_ruler", "lemmatizer", "ner"])
nlp.add_pipe("sentencizer")
print(nlp.pipeline)

import torch
import torch.nn as nn
from torchtext.datasets import IMDB
from torch.utils.data import DataLoader, Dataset

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec object at 0x7f4ccfdfe9b0>), ('sentencizer', <spacy.pipeline.sentencizer.Sentencizer object at 0x7f4cd00ac280>)]


Get movie review data using the [IMDB dataset available in PyTorch](https://pytorch.org/text/stable/datasets.html#imdb)

In [None]:
train_labels = []
train_reviews = []
for label, review in IMDB(split="train"):
  train_labels.append(1 if label == "pos" else 0)
  train_reviews.append(review)


test_labels = []
test_reviews = []
for label, review in IMDB(split="test"):
  test_labels.append(1 if label == "pos" else 0)
  test_reviews.append(review)

aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:02<00:00, 41.8MB/s]


## Baseline using CountVectorizer and SVC for document-level sentiment-polarity classification

In [None]:
vectorizer = CountVectorizer()
classifier = SVC()

vectors = vectorizer.fit_transform(train_reviews + test_reviews)
train_vectors = vectors[:len(test_reviews)]
test_vectors = vectors[len(test_reviews):]

classifier.fit(train_vectors, train_labels)
test_labels_predictions = classifier.predict(test_vectors)

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.870     0.851     0.861     12500
           1      0.854     0.873     0.864     12500

    accuracy                          0.862     25000
   macro avg      0.862     0.862     0.862     25000
weighted avg      0.862     0.862     0.862     25000



## Baseline using [VADER](https://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader) for document-level sentiment-polarity classification

In [None]:
analyzer = SentimentIntensityAnalyzer()

In [None]:
train_scores_predictions = [analyzer.polarity_scores(review) for review in train_reviews]
train_scores_predictions = [0 if score["neg"] > score["pos"] else 1 for score in train_scores_predictions]

print(classification_report(train_labels, train_scores_predictions, digits=3))


test_scores_predictions = [analyzer.polarity_scores(review) for review in test_reviews]
test_scores_predictions = [0 if score["neg"] > score["pos"] else 1 for score in test_scores_predictions]

print(classification_report(test_labels, test_scores_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.788     0.523     0.629     12500
           1      0.643     0.859     0.736     12500

    accuracy                          0.691     25000
   macro avg      0.716     0.691     0.682     25000
weighted avg      0.716     0.691     0.682     25000

              precision    recall  f1-score   support

           0      0.793     0.523     0.630     12500
           1      0.644     0.864     0.738     12500

    accuracy                          0.693     25000
   macro avg      0.719     0.693     0.684     25000
weighted avg      0.719     0.693     0.684     25000



## Baseline using [VADER](https://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader) for sentence-level sentiment-polarity classification and using objectivity remotion

In [None]:
train_list_of_sentences = [[sent.text for sent in nlp(review).sents] for review in train_reviews]
train_scores_predictions = [[analyzer.polarity_scores(sent) for sent in doc] for doc in train_list_of_sentences]


test_list_of_sentences = [[sent.text for sent in nlp(review).sents] for review in test_reviews]
test_scores_predictions = [[analyzer.polarity_scores(sent) for sent in doc] for doc in test_list_of_sentences]

Simple count of labels and take the popular label among the sentences of a review

In [None]:
train_labels_predictions = [[0 if sent_score["neg"] > sent_score["pos"] else 1 for sent_score in doc] for doc in train_scores_predictions]
train_labels_predictions = [0 if doc.count(0) > doc.count(1) else 1 for doc in train_labels_predictions]

print(classification_report(train_labels, train_labels_predictions, digits=3))


test_labels_predictions = [[0 if sent_score["neg"] > sent_score["pos"] else 1 for sent_score in doc] for doc in test_scores_predictions]
test_labels_predictions = [0 if doc.count(0) > doc.count(1) else 1 for doc in test_labels_predictions]

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.826     0.265     0.401     12500
           1      0.562     0.944     0.705     12500

    accuracy                          0.605     25000
   macro avg      0.694     0.605     0.553     25000
weighted avg      0.694     0.605     0.553     25000

              precision    recall  f1-score   support

           0      0.829     0.259     0.395     12500
           1      0.561     0.946     0.704     12500

    accuracy                          0.603     25000
   macro avg      0.695     0.603     0.550     25000
weighted avg      0.695     0.603     0.550     25000



Summing the sentence positive and negative contributions to determine the dominant sentiment

In [None]:
train_labels_predictions = []
for doc in train_scores_predictions:
  positive_score = 0.0
  negative_score = 0.0
  for sent_score in doc:
    positive_score += sent_score["pos"]
    negative_score += sent_score["neg"]
  train_labels_predictions.append(0 if negative_score > positive_score else 1)

print(classification_report(train_labels, train_labels_predictions, digits=3))


test_labels_predictions = []
for doc in test_scores_predictions:
  positive_score = 0.0
  negative_score = 0.0
  for sent_score in doc:
    positive_score += sent_score["pos"]
    negative_score += sent_score["neg"]
  test_labels_predictions.append(0 if negative_score > positive_score else 1)

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.801     0.544     0.648     12500
           1      0.655     0.865     0.745     12500

    accuracy                          0.705     25000
   macro avg      0.728     0.705     0.697     25000
weighted avg      0.728     0.705     0.697     25000

              precision    recall  f1-score   support

           0      0.808     0.545     0.651     12500
           1      0.657     0.870     0.749     12500

    accuracy                          0.708     25000
   macro avg      0.732     0.708     0.700     25000
weighted avg      0.732     0.708     0.700     25000



Summing the compound information

In [None]:
train_labels_predictions = []
for doc in train_scores_predictions:
  compound_score = 0.0
  for sent_score in doc:
    compound_score += sent_score["compound"]
  train_labels_predictions.append(0 if compound_score < 0 else 1)

print(classification_report(train_labels, train_labels_predictions, digits=3))


test_labels_predictions = []
for doc in test_scores_predictions:
  compound_score = 0.0
  for sent_score in doc:
    compound_score += sent_score["compound"]
  test_labels_predictions.append(0 if compound_score < 0 else 1)

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.798     0.544     0.647     12500
           1      0.654     0.862     0.744     12500

    accuracy                          0.703     25000
   macro avg      0.726     0.703     0.695     25000
weighted avg      0.726     0.703     0.695     25000

              precision    recall  f1-score   support

           0      0.800     0.545     0.649     12500
           1      0.655     0.864     0.745     12500

    accuracy                          0.705     25000
   macro avg      0.728     0.705     0.697     25000
weighted avg      0.728     0.705     0.697     25000



Taking into account objectivity at sentence level and use this information on the previous approaches

In [None]:
train_labels_predictions = [[0 if sent_score["neg"] > sent_score["pos"] else 1 for sent_score in doc if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]] for doc in train_scores_predictions]
train_labels_predictions = [0 if doc.count(0) > doc.count(1) else 1 for doc in train_labels_predictions]

print(classification_report(train_labels, train_labels_predictions, digits=3))


test_labels_predictions = [[0 if sent_score["neg"] > sent_score["pos"] else 1 for sent_score in doc if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]] for doc in test_scores_predictions]
test_labels_predictions = [0 if doc.count(0) > doc.count(1) else 1 for doc in test_labels_predictions]

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.800     0.194     0.312     12500
           1      0.541     0.951     0.690     12500

    accuracy                          0.573     25000
   macro avg      0.670     0.573     0.501     25000
weighted avg      0.670     0.573     0.501     25000

              precision    recall  f1-score   support

           0      0.801     0.188     0.304     12500
           1      0.540     0.953     0.689     12500

    accuracy                          0.570     25000
   macro avg      0.670     0.570     0.497     25000
weighted avg      0.670     0.570     0.497     25000



In [None]:
train_labels_predictions = []
for doc in train_scores_predictions:
  positive_score = 0.0
  negative_score = 0.0
  for sent_score in doc:
    if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]:
      positive_score += sent_score["pos"]
      negative_score += sent_score["neg"]
  train_labels_predictions.append(0 if negative_score > positive_score else 1)

print(classification_report(train_labels, train_labels_predictions, digits=3))


test_labels_predictions = []
for doc in test_scores_predictions:
  positive_score = 0.0
  negative_score = 0.0
  for sent_score in doc:
    if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]:
      positive_score += sent_score["pos"]
      negative_score += sent_score["neg"]
  test_labels_predictions.append(0 if negative_score > positive_score else 1)

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.781     0.207     0.327     12500
           1      0.543     0.942     0.689     12500

    accuracy                          0.574     25000
   macro avg      0.662     0.574     0.508     25000
weighted avg      0.662     0.574     0.508     25000

              precision    recall  f1-score   support

           0      0.790     0.203     0.323     12500
           1      0.543     0.946     0.690     12500

    accuracy                          0.575     25000
   macro avg      0.666     0.575     0.506     25000
weighted avg      0.666     0.575     0.506     25000



In [None]:
train_labels_predictions = []
for doc in train_scores_predictions:
  compound_score = 0.0
  for sent_score in doc:
    if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]:
      compound_score += sent_score["compound"]
  train_labels_predictions.append(0 if compound_score < 0 else 1)

print(classification_report(train_labels, train_labels_predictions, digits=3))


test_labels_predictions = []
for doc in test_scores_predictions:
  compound_score = 0.0
  for sent_score in doc:
    if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]:
      compound_score += sent_score["compound"]
  test_labels_predictions.append(0 if compound_score < 0 else 1)

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.787     0.209     0.330     12500
           1      0.544     0.944     0.690     12500

    accuracy                          0.576     25000
   macro avg      0.666     0.576     0.510     25000
weighted avg      0.666     0.576     0.510     25000

              precision    recall  f1-score   support

           0      0.789     0.202     0.321     12500
           1      0.542     0.946     0.689     12500

    accuracy                          0.574     25000
   macro avg      0.666     0.574     0.505     25000
weighted avg      0.666     0.574     0.505     25000



Taking into account objectivity at sentence level and use it to remove objective sentences and then compute the final document-level sentiment-polarity

In [None]:
train_labels_predictions = []
for i, doc in enumerate(train_scores_predictions):
  subjective_text = " ".join([train_list_of_sentences[i][j] for j, sent_score in enumerate(doc) if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]])
  train_labels_predictions.append(0 if analyzer.polarity_scores(subjective_text)["compound"] < 0 else 1)

print(classification_report(train_labels, train_labels_predictions, digits=3))


test_labels_predictions = []
for i, doc in enumerate(test_scores_predictions):
  subjective_text = " ".join([test_list_of_sentences[i][j] for j, sent_score in enumerate(doc) if sent_score["neg"] > sent_score["neu"] or sent_score["pos"] > sent_score["neu"]])
  test_labels_predictions.append(0 if analyzer.polarity_scores(subjective_text)["compound"] < 0 else 1)

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.787     0.208     0.329     12500
           1      0.544     0.944     0.690     12500

    accuracy                          0.576     25000
   macro avg      0.665     0.576     0.509     25000
weighted avg      0.665     0.576     0.509     25000

              precision    recall  f1-score   support

           0      0.786     0.202     0.321     12500
           1      0.542     0.945     0.689     12500

    accuracy                          0.573     25000
   macro avg      0.664     0.573     0.505     25000
weighted avg      0.664     0.573     0.505     25000



## Objectivity Classification

Download the [Rotten_IMDB subjectivity dataset](https://www.cs.cornell.edu/people/pabo/movie-review-data/)

In [None]:
!wget http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
!mkdir rotten_imdb
!tar -xvf rotten_imdb.tar.gz -C rotten_imdb

--2021-08-31 16:43:38--  http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
Resolving www.cs.cornell.edu (www.cs.cornell.edu)... 132.236.207.36
Connecting to www.cs.cornell.edu (www.cs.cornell.edu)|132.236.207.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 519599 (507K) [application/x-gzip]
Saving to: ‘rotten_imdb.tar.gz’


2021-08-31 16:43:38 (2.63 MB/s) - ‘rotten_imdb.tar.gz’ saved [519599/519599]

quote.tok.gt9.5000
plot.tok.gt9.5000
subjdata.README.1.0


Read the files and split the dataset

In [None]:
def read_file(path: str):
  with open(path, encoding = "ISO-8859-1") as file_to_read:
    content = np.array(file_to_read.readlines())

  return np.array([line.strip().lower() for line in content])

In [None]:
objective_sentences = read_file("rotten_imdb/plot.tok.gt9.5000")
subjective_sentences = read_file("rotten_imdb/quote.tok.gt9.5000")
objective_labels = np.ones(len(objective_sentences))
subjective_labels = np.zeros(len(subjective_sentences))
X = np.append(objective_sentences, subjective_sentences)
y = np.append(objective_labels, subjective_labels)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Define functions for preprocessing a review with a embedding-padding process

In [None]:
embedding_size = 300
sequence_length = 128

In [None]:
def preprocess_review(review: str, embedding_size: int, sequence_length: int) -> list:
  embedded_review = []
  for word in review.split():
    vector = nlp.vocab[word].vector.astype(np.float32)
    embedded_review.append(vector.tolist())

  zero = list(np.zeros(embedding_size, dtype=np.float32))
  if len(embedded_review) <= sequence_length:
    zeros = [zero for each in range(sequence_length - len(embedded_review))]
    padded_review = zeros + embedded_review
  else:
    padded_review = embedded_review[: sequence_length]

  return padded_review

In [None]:
x_train = [preprocess_review(x, embedding_size, sequence_length) for x in X_train]
x_test = [preprocess_review(x, embedding_size, sequence_length) for x in X_test]

Define dataset and dataloader

In [None]:
class ObjectivityDataset(Dataset):

  def __init__(self, reviews: list, labels: list) -> None:
    super().__init__()
    self.reviews = reviews
    self.labels = labels

  def __len__(self) -> int:
    return len(self.reviews)

  def __getitem__(self, index: int) -> tuple:
    return torch.tensor(self.reviews[index]), torch.tensor(self.labels[index])

In [None]:
train_dataset = ObjectivityDataset(x_train, y_train)
test_dataset = ObjectivityDataset(x_test, y_test)

batch_size = 128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2, drop_last=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True, num_workers=2, drop_last=False)

Define the device that will be used for training and testing the neural networks

In [None]:
if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")
print(device)

cuda


Define the neural network for objectivity classification

In [None]:
class ObjectivityNetwork(nn.Module):
  """
  This neural network will be used to perform Objectivity classification
  """

  def __init__(self, output_size: int, embedding_size: int, sequence_length: int, hidden_dimension: int, layers_number: int) -> None:
    super().__init__()
    self.output_size = output_size
    self.embedding_size = embedding_size
    self.sequence_length = sequence_length
    self.hidden_dimension = hidden_dimension
    self.layers_number = layers_number

    # GRU layer
    self.gru = nn.GRU(embedding_size, hidden_dimension, layers_number, dropout=0.5, batch_first=True, bidirectional=True)

    # Dropout and activation function layers
    self.dropout = nn.Dropout(0.3)
    self.activation = nn.LeakyReLU(0.1)

    # Linear layers
    self.fc = nn.Linear(2 * hidden_dimension * sequence_length, hidden_dimension)
    self.fc_out = nn.Linear(hidden_dimension, output_size)

  def forward(self, x: torch.tensor, hidden: torch.tensor) -> torch.tensor:
    gru_out, hidden = self.gru(x, hidden)
    out = self.dropout(gru_out.contiguous().view(x.size(0), -1))
    out = self.activation(out)
    out = self.fc(out)
    out = self.dropout(out)
    out = self.activation(out)
    out = self.fc_out(out)
    return out, hidden

In [None]:
objectivity_classifier = ObjectivityNetwork(output_size=2, embedding_size=embedding_size, sequence_length=sequence_length, hidden_dimension=128, layers_number=3)
objectivity_classifier = objectivity_classifier.to(device)

Define eval function

In [None]:
def evaluate(model: ObjectivityNetwork, loader: DataLoader) -> None:
  model.eval()
  criterion = nn.CrossEntropyLoss()
  h = torch.zeros((3 * 2, batch_size, 128)).to(device)
  total_loss = 0.0
  true_positive = 0
  false_negative = 0
  positive_predictions = 0
  true_negative = 0
  false_positive = 0
  negative_predictions = 0
  for x, y in loader:
    x, y = x.to(device), y.to(device)
    if x.size(0) < batch_size:
      h = torch.zeros((3 * 2, x.size(0), 128)).to(device)
    h = h.detach().clone().to(device)
    outputs, h = model(x, h)
    loss = criterion(outputs, y.long())
    total_loss += loss.item()
    _, predicted = outputs.max(1)

    for prediction_index, prediction in enumerate(predicted):
      label = y[prediction_index].item()
      if label == 1 and prediction == label:
        true_positive += 1
      if prediction == 0 and label == 1:
        false_negative += 1
      if prediction == 1:
        positive_predictions += 1
      if label == 0 and prediction == label:
        true_negative += 1
      if prediction == 1 and label == 0:
        false_positive += 1
      if prediction == 0:
        negative_predictions += 1

  positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
  positive_recall = true_positive / (true_positive + false_negative)
  positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
  negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
  negative_recall = true_negative / (true_negative + false_positive)
  negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
  overall_f1_score = (positive_f1_score + negative_f1_score)/2
  print(f"loss: {total_loss/len(loader.dataset)}, accuracy: {(true_positive + true_negative)/len(loader.dataset)}, f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

Define training function

In [None]:
def train(model: ObjectivityNetwork, loader: DataLoader, epochs: int = 5, lr: float = 0.001, weight_decay: float = 0.0001) -> None:
  criterion = nn.CrossEntropyLoss()
  optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
  model.train()
  for i in range(epochs):
    h = torch.zeros((3 * 2, batch_size, 128)).to(device)
    total_loss = 0.0
    true_positive = 0
    false_negative = 0
    positive_predictions = 0
    true_negative = 0
    false_positive = 0
    negative_predictions = 0
    for x, y in loader:
      x, y = x.to(device), y.to(device)
      if x.size(0) < batch_size:
        h = torch.zeros((3 * 2, x.size(0), 128)).to(device)
      h = h.detach().clone().to(device)
      outputs, h = model(x, h)
      loss = criterion(outputs, y.long())
      total_loss += loss.item()
      _, predicted = outputs.max(1)

      for prediction_index, prediction in enumerate(predicted):
        label = y[prediction_index].item()
        if label == 1 and prediction == label:
          true_positive += 1
        if prediction == 0 and label == 1:
          false_negative += 1
        if prediction == 1:
          positive_predictions += 1
        if label == 0 and prediction == label:
          true_negative += 1
        if prediction == 1 and label == 0:
          false_positive += 1
        if prediction == 0:
          negative_predictions += 1

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

    positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
    positive_recall = true_positive / (true_positive + false_negative)
    positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
    negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
    negative_recall = true_negative / (true_negative + false_positive)
    negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
    overall_f1_score = (positive_f1_score + negative_f1_score)/2
    print(f"loss: {total_loss/len(loader.dataset)}, accuracy: {(true_positive + true_negative)/len(loader.dataset)}, f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

Train and evaluate the network

In [None]:
print("train set:")
evaluate(objectivity_classifier, train_loader)

print("test set:")
evaluate(objectivity_classifier, test_loader)

train set:
loss: 0.005458624057471752, accuracy: 0.494875, f1 score: 0.4888436667535828 (positive: 0.5443680234524749, negative: 0.43331931005469077)
test set:
loss: 0.0055452798902988435, accuracy: 0.4905, f1 score: 0.48487470297516455 (positive: 0.5387052965142599, negative: 0.4310441094360692)


In [None]:
train(objectivity_classifier, train_loader)

loss: 0.0028308588732033966, accuracy: 0.836, f1 score: 0.8359701055517368 (positive: 0.8337557019766851, negative: 0.8381845091267884)
loss: 0.0017218348290771247, accuracy: 0.91625, f1 score: 0.9162498115620761 (positive: 0.9161241862794192, negative: 0.9163754368447329)
loss: 0.0015444577597081661, accuracy: 0.9185, f1 score: 0.9184970658943722 (positive: 0.9180080482897384, negative: 0.918986083499006)
loss: 0.001272466266527772, accuracy: 0.937875, f1 score: 0.9378708985085343 (positive: 0.9373660995589161, negative: 0.9383756974581525)
loss: 0.0011273515694774688, accuracy: 0.945375, f1 score: 0.9453734218040142 (positive: 0.9450798039462108, negative: 0.9456670396618176)


In [None]:
print("train set:")
evaluate(objectivity_classifier, train_loader)

print("test set:")
evaluate(objectivity_classifier, test_loader)

train set:
loss: 0.0008216371382586658, accuracy: 0.96, f1 score: 0.9599909979745442 (positive: 0.9593908629441623, negative: 0.9605911330049262)
test set:
loss: 0.0014615438170731067, accuracy: 0.937, f1 score: 0.936999432994897 (positive: 0.9368104312938816, negative: 0.9371884346959123)


## Trial using a CNN for objectivity classification

In [None]:
class ObjectivityCNN(nn.Module):
  """
  This neural network will be used to perform Objectivity classification
  """

  def __init__(self, output_size: int, embedding_size: int, filters_number: int) -> None:
    super().__init__()
    self.output_size = output_size
    self.embedding_size = embedding_size
    self.filters_number = filters_number

    # Convolutional layers
    self.convs = nn.ModuleList([nn.Conv1d(in_channels=embedding_size, out_channels=filters_number, kernel_size=filter_size) for filter_size in range(1, 9)])

    # Dropout and activation function layers
    self.dropout = nn.Dropout(p=0.4)
    self.activation = nn.LeakyReLU(0.1)

    # Classifier set of layers
    self.classifier = nn.Sequential(
      nn.BatchNorm1d(8 * filters_number),
      self.activation,
      self.dropout,
      nn.Linear(8 * filters_number, 128),
      nn.BatchNorm1d(128),
      self.activation,
      nn.Linear(128, output_size)
    )

  def forward(self, x: torch.tensor) -> torch.tensor:
    cnn_outputs = [conv(x) for conv in self.convs]
    pooled_outputs = [nn.MaxPool1d(kernel_size=cnn_out.shape[2])(cnn_out).squeeze(2) for cnn_out in cnn_outputs]
    out = self.classifier(torch.cat(pooled_outputs, dim=1))
    return out

In [None]:
cnn_objectivity_classifier = ObjectivityCNN(output_size=2, embedding_size=embedding_size, filters_number=64)
cnn_objectivity_classifier = cnn_objectivity_classifier.to(device)

In [None]:
def evaluate(model: ObjectivityCNN, loader: DataLoader) -> None:
  model.eval()
  criterion = nn.CrossEntropyLoss()
  total_loss = 0.0
  true_positive = 0
  false_negative = 0
  positive_predictions = 0
  true_negative = 0
  false_positive = 0
  negative_predictions = 0
  for x, y in loader:
    x, y = x.to(device), y.to(device)
    outputs = model(x.swapaxes(1, 2))
    loss = criterion(outputs, y.long())
    total_loss += loss.item()
    _, predicted = outputs.max(1)

    for prediction_index, prediction in enumerate(predicted):
      label = y[prediction_index].item()
      if label == 1 and prediction == label:
        true_positive += 1
      if prediction == 0 and label == 1:
        false_negative += 1
      if prediction == 1:
        positive_predictions += 1
      if label == 0 and prediction == label:
        true_negative += 1
      if prediction == 1 and label == 0:
        false_positive += 1
      if prediction == 0:
        negative_predictions += 1

  positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
  positive_recall = true_positive / (true_positive + false_negative)
  positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
  negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
  negative_recall = true_negative / (true_negative + false_positive)
  negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
  overall_f1_score = (positive_f1_score + negative_f1_score)/2
  print(f"loss: {total_loss/len(loader.dataset)}, accuracy: {(true_positive + true_negative)/len(loader.dataset)}, f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

In [None]:
def train(model: ObjectivityCNN, loader: DataLoader, epochs: int = 5, lr: float = 0.001, weight_decay: float = 0.0001) -> None:
  criterion = nn.CrossEntropyLoss()
  optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
  model.train()
  for i in range(epochs):
    total_loss = 0.0
    true_positive = 0
    false_negative = 0
    positive_predictions = 0
    true_negative = 0
    false_positive = 0
    negative_predictions = 0
    for x, y in loader:
      x, y = x.to(device), y.to(device)
      outputs = model(x.swapaxes(1, 2))
      loss = criterion(outputs, y.long())
      total_loss += loss.item()
      _, predicted = outputs.max(1)

      for prediction_index, prediction in enumerate(predicted):
        label = y[prediction_index].item()
        if label == 1 and prediction == label:
          true_positive += 1
        if prediction == 0 and label == 1:
          false_negative += 1
        if prediction == 1:
          positive_predictions += 1
        if label == 0 and prediction == label:
          true_negative += 1
        if prediction == 1 and label == 0:
          false_positive += 1
        if prediction == 0:
          negative_predictions += 1

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

    positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
    positive_recall = true_positive / (true_positive + false_negative)
    positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
    negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
    negative_recall = true_negative / (true_negative + false_positive)
    negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
    overall_f1_score = (positive_f1_score + negative_f1_score)/2
    print(f"loss: {total_loss/len(loader.dataset)}, accuracy: {(true_positive + true_negative)/len(loader.dataset)}, f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

In [None]:
print("train set:")
evaluate(cnn_objectivity_classifier, train_loader)

print("test set:")
evaluate(cnn_objectivity_classifier, test_loader)

train set:
loss: 0.005462134793400765, accuracy: 0.492125, f1 score: 0.4796848417918723 (positive: 0.5601385731298041, negative: 0.39923111045394055)
test set:
loss: 0.005549371659755707, accuracy: 0.506, f1 score: 0.4903932946486138 (positive: 0.5795744680851064, negative: 0.40121212121212124)


In [None]:
train(cnn_objectivity_classifier, train_loader)

loss: 0.002072600895538926, accuracy: 0.891125, f1 score: 0.8911178120743127 (positive: 0.8902331442974165, negative: 0.892002479851209)
loss: 0.0006555817448534072, accuracy: 0.973875, f1 score: 0.9738746077109064 (positive: 0.9737733718157863, negative: 0.9739758436060266)
loss: 0.00024829809193033724, accuracy: 0.990625, f1 score: 0.9906249225091251 (positive: 0.9905979691613389, negative: 0.9906518758569114)
loss: 0.00019522384798619896, accuracy: 0.991375, f1 score: 0.9913749157706618 (positive: 0.9913479623824452, negative: 0.9914018691588785)
loss: 0.00020271150389453396, accuracy: 0.991625, f1 score: 0.9916249307748185 (positive: 0.9916008524507961, negative: 0.9916490090988409)


In [None]:
print("train set:")
evaluate(cnn_objectivity_classifier, train_loader)

print("test set:")
evaluate(cnn_objectivity_classifier, test_loader)

train set:
loss: 3.8168397979461585e-05, accuracy: 0.999125, f1 score: 0.9991249914549947 (positive: 0.9991222570532915, negative: 0.9991277258566977)
test set:
loss: 0.001789900705218315, accuracy: 0.933, f1 score: 0.9329957117255503 (positive: 0.9335317460317459, negative: 0.9324596774193548)


## Sentence-level objectivity-remotion and document-level sentiment-polarity classification

Remove objective sentences from sentiment dataset using the ObjectivityNetwork trained on Rotten_IMDB dataset

In [None]:
def objectivity_remotion(model: ObjectivityNetwork, review: str, embedding_size: int, sequence_length: int) -> str:
  model.eval()
  review = review.strip().lower()

  # Remove HTML tags and also the " that can occour in the reviews
  review = re.sub(r'<.*?>', "", review)
  review = re.sub(r'"', "", review)

  # Pass inside the ObjectivityNetwork all the sentences and remove the objective ones
  sentences = []
  padded_sentences = []
  for sentence in nlp(review).sents:
    sentences.append(sentence.text)
    padded_sentences.append(preprocess_review(sentence.text, embedding_size, sequence_length))
  h = torch.zeros((3 * 2, len(padded_sentences), 128)).to(device)
  outputs, h = model(torch.tensor(padded_sentences).to(device), h)
  _, objectivity = outputs.max(1)
  if objectivity.all() == 1:  # Maintain all the sententences if all are classified as objectives
    subjective_text = " ".join([sentences[i] for i, item in enumerate(objectivity)])
  else:
    subjective_text = " ".join([sentences[i] for i, item in enumerate(objectivity) if item == 0])

  return subjective_text

In [None]:
train_reviews = [objectivity_remotion(objectivity_classifier, review, embedding_size, sequence_length) for review in train_reviews]
test_reviews = [objectivity_remotion(objectivity_classifier, review, embedding_size, sequence_length) for review in test_reviews]

Test again CountVectorizer + SVC on preprocessed data without objective sentences

In [None]:
vectorizer = CountVectorizer()
classifier = SVC()

vectors = vectorizer.fit_transform(train_reviews + test_reviews)
train_vectors = vectors[:len(test_reviews)]
test_vectors = vectors[len(test_reviews):]

classifier.fit(train_vectors, train_labels)
test_labels_predictions = classifier.predict(test_vectors)

print(classification_report(test_labels, test_labels_predictions, digits=3))

              precision    recall  f1-score   support

           0      0.877     0.837     0.857     12500
           1      0.844     0.883     0.863     12500

    accuracy                          0.860     25000
   macro avg      0.861     0.860     0.860     25000
weighted avg      0.861     0.860     0.860     25000



Define dataset and dataloader

In [None]:
sequence_length = 384

In [None]:
class SentimentDataset(Dataset):

  def __init__(self, reviews: list, labels: list, embedding_size: int, sequence_length: int) -> None:
    super().__init__()
    self.reviews = reviews
    self.labels = labels
    self.embedding_size = embedding_size
    self.sequence_length = sequence_length

  def __len__(self) -> int:
    return len(self.reviews)

  def __getitem__(self, index: int) -> tuple:
    return torch.tensor(preprocess_review(self.reviews[index], self.embedding_size, self.sequence_length)), torch.tensor(self.labels[index])

In [None]:
train_dataset = SentimentDataset(train_reviews, train_labels, embedding_size, sequence_length)
test_dataset = SentimentDataset(test_reviews, test_labels, embedding_size, sequence_length)

batch_size = 128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2, drop_last=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True, num_workers=2, drop_last=False)

Define the network for sentiment-polarity classification (using the same structure as for objectivity detection)

In [None]:
class SentimentNetwork(nn.Module):
  """
  This neural network will be used to perform sentiment-polarity classification
  """

  def __init__(self, output_size: int, embedding_size: int, sequence_length: int, hidden_dimension: int, layers_number: int) -> None:
    super().__init__()
    self.output_size = output_size
    self.embedding_size = embedding_size
    self.sequence_length = sequence_length
    self.hidden_dimension = hidden_dimension
    self.layers_number = layers_number

    # GRU layer
    self.gru = nn.GRU(embedding_size, hidden_dimension, layers_number, dropout=0.5, batch_first=True, bidirectional=True)

    # Dropout and activation function layers
    self.dropout = nn.Dropout(0.3)
    self.activation = nn.LeakyReLU(0.1)

    # Linear layers
    self.fc = nn.Linear(2 * hidden_dimension * sequence_length, hidden_dimension)
    self.fc_out = nn.Linear(hidden_dimension, output_size)

  def forward(self, x: torch.tensor, hidden: torch.tensor) -> torch.tensor:
    gru_out, hidden = self.gru(x, hidden)
    out = self.dropout(gru_out.contiguous().view(x.size(0), -1))
    out = self.activation(out)
    out = self.fc(out)
    out = self.dropout(out)
    out = self.activation(out)
    out = self.fc_out(out)
    return out, hidden

In [None]:
sentiment_classifier = SentimentNetwork(output_size=2, embedding_size=embedding_size, sequence_length=sequence_length, hidden_dimension=128, layers_number=3)
sentiment_classifier = sentiment_classifier.to(device)

Define code for training and evalute the network

In [None]:
epochs = 10
f1_score_max = 0.0
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(sentiment_classifier.parameters(), lr=0.001, weight_decay=0.0001)

In [None]:
for epoch in range(epochs):
  print(f"Epoch {epoch + 1}")
  # Training phase
  sentiment_classifier.train()
  h = torch.zeros((3 * 2, batch_size, 128)).to(device)
  total_loss = 0.0
  true_positive = 0
  false_negative = 0
  positive_predictions = 0
  true_negative = 0
  false_positive = 0
  negative_predictions = 0
  for inputs, labels in train_loader:
    inputs, labels = inputs.to(device), labels.to(device)
    if inputs.size(0) < batch_size:
      h = torch.zeros((3 * 2, inputs.size(0), 128)).to(device)

    h = h.detach().clone().to(device)
    outputs, h = sentiment_classifier(inputs, h)
    loss = criterion(outputs, labels)
    total_loss += loss.item()
    _, predicted = outputs.max(1)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    for prediction_index, prediction in enumerate(predicted):
      label = labels[prediction_index].item()
      if label == 1 and prediction == label:
        true_positive += 1
      if prediction == 0 and label == 1:
        false_negative += 1
      if prediction == 1:
        positive_predictions += 1
      if label == 0 and prediction == label:
        true_negative += 1
      if prediction == 1 and label == 0:
        false_positive += 1
      if prediction == 0:
        negative_predictions += 1

  positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
  positive_recall = true_positive / (true_positive + false_negative)
  positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
  negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
  negative_recall = true_negative / (true_negative + false_positive)
  negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
  overall_f1_score = (positive_f1_score + negative_f1_score)/2
  print(f"train loss: {total_loss/len(train_loader.dataset)}, train accuracy: {(true_positive + true_negative)/len(train_loader.dataset)}, train f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

  # Evaluation phase
  sentiment_classifier.eval()
  h = torch.zeros((3 * 2, batch_size, 128)).to(device)
  total_loss = 0.0
  true_positive = 0
  false_negative = 0
  positive_predictions = 0
  true_negative = 0
  false_positive = 0
  negative_predictions = 0
  for inputs, labels in test_loader:
    inputs, labels = inputs.to(device), labels.to(device)
    if inputs.size(0) < batch_size:
      h = torch.zeros((3 * 2, inputs.size(0), 128)).to(device)

    h = h.detach().clone().to(device)
    outputs, h = sentiment_classifier(inputs, h)
    loss = criterion(outputs, labels)
    total_loss += loss.item()
    _, predicted = outputs.max(1)

    for prediction_index, prediction in enumerate(predicted):
      label = labels[prediction_index].item()
      if label == 1 and prediction == label:
        true_positive += 1
      if prediction == 0 and label == 1:
        false_negative += 1
      if prediction == 1:
        positive_predictions += 1
      if label == 0 and prediction == label:
        true_negative += 1
      if prediction == 1 and label == 0:
        false_positive += 1
      if prediction == 0:
        negative_predictions += 1

  positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
  positive_recall = true_positive / (true_positive + false_negative)
  positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
  negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
  negative_recall = true_negative / (true_negative + false_positive)
  negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
  overall_f1_score = (positive_f1_score + negative_f1_score)/2
  print(f"test loss: {total_loss/len(test_loader.dataset)}, test accuracy: {(true_positive + true_negative)/len(test_loader.dataset)}, test f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

  if overall_f1_score >= f1_score_max:
    print(f"Increase in f1 score from {f1_score_max} to {overall_f1_score}")
    torch.save(sentiment_classifier.state_dict(), "SentimentNetwork.pth")
    f1_score_max = overall_f1_score
  print(10 * "================")

Epoch 1
train loss: 0.003938392232060432, train accuracy: 0.75024, train f1 score: 0.750186457563375 (positive: 0.7465291873021028, negative: 0.753843727824647)
test loss: 0.0026351734215021134, test accuracy: 0.856, test f1 score: 0.8558213568940459 (positive: 0.8507462686567164, negative: 0.8608964451313754)
Increase in f1 score from 0.0 to 0.8558213568940459
Epoch 2
train loss: 0.0025699855095148087, train accuracy: 0.86016, train f1 score: 0.8601571932989324 (positive: 0.8595306975249116, negative: 0.8607836890729531)
test loss: 0.0024470092803239823, test accuracy: 0.86604, test f1 score: 0.8656660656961948 (positive: 0.858578607322326, negative: 0.8727535240700635)
Increase in f1 score from 0.8558213568940459 to 0.8656660656961948
Epoch 3
train loss: 0.0023251209831237794, train accuracy: 0.87448, train f1 score: 0.8744798425475144 (positive: 0.8743392599711676, negative: 0.8746204251238613)
test loss: 0.002685638816356659, test accuracy: 0.86472, test f1 score: 0.863973507502845

Use the stored SentimentNetwork

In [None]:
sentiment_classifier.load_state_dict(torch.load("SentimentNetwork.pth"))

<All keys matched successfully>

In [None]:
sentiment_classifier.eval()
h = torch.zeros((3 * 2, batch_size, 128)).to(device)
total_loss = 0.0
true_positive = 0
false_negative = 0
positive_predictions = 0
true_negative = 0
false_positive = 0
negative_predictions = 0
for inputs, labels in test_loader:
  inputs, labels = inputs.to(device), labels.to(device)
  if inputs.size(0) < batch_size:
    h = torch.zeros((3 * 2, inputs.size(0), 128)).to(device)

  h = h.detach().clone().to(device)
  outputs, h = sentiment_classifier(inputs, h)
  loss = criterion(outputs, labels)
  total_loss += loss.item()
  _, predicted = outputs.max(1)

  for prediction_index, prediction in enumerate(predicted):
    label = labels[prediction_index].item()
    if label == 1 and prediction == label:
      true_positive += 1
    if prediction == 0 and label == 1:
      false_negative += 1
    if prediction == 1:
      positive_predictions += 1
    if label == 0 and prediction == label:
      true_negative += 1
    if prediction == 1 and label == 0:
      false_positive += 1
    if prediction == 0:
      negative_predictions += 1

positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
positive_recall = true_positive / (true_positive + false_negative)
positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
negative_recall = true_negative / (true_negative + false_positive)
negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
overall_f1_score = (positive_f1_score + negative_f1_score)/2
print(f"test loss: {total_loss/len(test_loader.dataset)}, test accuracy: {(true_positive + true_negative)/len(test_loader.dataset)}, test f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

test loss: 0.0022735300797224046, test accuracy: 0.88592, test f1 score: 0.885917102111861 (positive: 0.8864920799172171, negative: 0.8853421243065048)


## Trial using a CNN for document-level sentiment-polarity classification

In [None]:
class SentimentCNN(nn.Module):
  """
  This neural network will be used to perform sentiment-polarity classification
  """

  def __init__(self, output_size: int, embedding_size: int, filters_number: int) -> None:
    super().__init__()
    self.output_size = output_size
    self.embedding_size = embedding_size
    self.filters_number = filters_number

    # Convolutional layers
    self.convs = nn.ModuleList([nn.Conv1d(in_channels=embedding_size, out_channels=filters_number, kernel_size=filter_size) for filter_size in range(1, 9)])

    # Dropout and activation function layers
    self.dropout = nn.Dropout(p=0.4)
    self.activation = nn.LeakyReLU(0.1)

    # Classifier set of layers
    self.classifier = nn.Sequential(
      nn.BatchNorm1d(8 * filters_number),
      self.activation,
      self.dropout,
      nn.Linear(8 * filters_number, 128),
      nn.BatchNorm1d(128),
      self.activation,
      nn.Linear(128, output_size)
    )

  def forward(self, x: torch.tensor) -> torch.tensor:
    cnn_outputs = [conv(x) for conv in self.convs]
    pooled_outputs = [nn.MaxPool1d(kernel_size=cnn_out.shape[2])(cnn_out).squeeze(2) for cnn_out in cnn_outputs]
    out = self.classifier(torch.cat(pooled_outputs, dim=1))
    return out

In [None]:
cnn_sentiment_classifier = SentimentCNN(output_size=2, embedding_size=embedding_size, filters_number=64)
cnn_sentiment_classifier = cnn_sentiment_classifier.to(device)

In [None]:
epochs = 10
f1_score_max = 0.0
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(cnn_sentiment_classifier.parameters(), lr=0.001, weight_decay=0.0001)

In [None]:
for epoch in range(epochs):
  print(f"Epoch {epoch + 1}")
  # Training phase
  cnn_sentiment_classifier.train()
  total_loss = 0.0
  true_positive = 0
  false_negative = 0
  positive_predictions = 0
  true_negative = 0
  false_positive = 0
  negative_predictions = 0
  for inputs, labels in train_loader:
    inputs, labels = inputs.to(device), labels.to(device)
    outputs = cnn_sentiment_classifier(inputs.swapaxes(1, 2))
    loss = criterion(outputs, labels)
    total_loss += loss.item()
    _, predicted = outputs.max(1)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    for prediction_index, prediction in enumerate(predicted):
      label = labels[prediction_index].item()
      if label == 1 and prediction == label:
        true_positive += 1
      if prediction == 0 and label == 1:
        false_negative += 1
      if prediction == 1:
        positive_predictions += 1
      if label == 0 and prediction == label:
        true_negative += 1
      if prediction == 1 and label == 0:
        false_positive += 1
      if prediction == 0:
        negative_predictions += 1

  positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
  positive_recall = true_positive / (true_positive + false_negative)
  positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
  negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
  negative_recall = true_negative / (true_negative + false_positive)
  negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
  overall_f1_score = (positive_f1_score + negative_f1_score)/2
  print(f"train loss: {total_loss/len(train_loader.dataset)}, train accuracy: {(true_positive + true_negative)/len(train_loader.dataset)}, train f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

  # Evaluation phase
  cnn_sentiment_classifier.eval()
  total_loss = 0.0
  true_positive = 0
  false_negative = 0
  positive_predictions = 0
  true_negative = 0
  false_positive = 0
  negative_predictions = 0
  for inputs, labels in test_loader:
    inputs, labels = inputs.to(device), labels.to(device)
    outputs = cnn_sentiment_classifier(inputs.swapaxes(1, 2))
    loss = criterion(outputs, labels)
    total_loss += loss.item()
    _, predicted = outputs.max(1)
        
    for prediction_index, prediction in enumerate(predicted):
      label = labels[prediction_index].item()
      if label == 1 and prediction == label:
        true_positive += 1
      if prediction == 0 and label == 1:
        false_negative += 1
      if prediction == 1:
        positive_predictions += 1
      if label == 0 and prediction == label:
        true_negative += 1
      if prediction == 1 and label == 0:
        false_positive += 1
      if prediction == 0:
        negative_predictions += 1

  positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
  positive_recall = true_positive / (true_positive + false_negative)
  positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
  negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
  negative_recall = true_negative / (true_negative + false_positive)
  negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
  overall_f1_score = (positive_f1_score + negative_f1_score)/2
  print(f"test loss: {total_loss/len(test_loader.dataset)}, test accuracy: {(true_positive + true_negative)/len(test_loader.dataset)}, test f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

  if overall_f1_score >= f1_score_max:
    print(f"Increase in f1 score from {f1_score_max} to {overall_f1_score}")
    torch.save(cnn_sentiment_classifier.state_dict(), "SentimentCNN.pth")
    f1_score_max = overall_f1_score
  print(10 * "================")

Epoch 1
train loss: 0.0030546901553869248, train accuracy: 0.82068, train f1 score: 0.8206458437944922 (positive: 0.8231209311501282, negative: 0.8181707564388563)
test loss: 0.002373743687868118, test accuracy: 0.87124, test f1 score: 0.87123465963778 (positive: 0.8720639084297127, negative: 0.8704054108458472)
Increase in f1 score from 0.0 to 0.87123465963778
Epoch 2
train loss: 0.0020627069264650346, train accuracy: 0.89036, train f1 score: 0.8903546273767413 (positive: 0.8911221449851041, negative: 0.8895871097683786)
test loss: 0.002300197804570198, test accuracy: 0.8754, test f1 score: 0.8753993522759929 (positive: 0.8756834417528037, negative: 0.8751152627991822)
Increase in f1 score from 0.87123465963778 to 0.8753993522759929
Epoch 3
train loss: 0.0015444465312361717, train accuracy: 0.92188, train f1 score: 0.9218791389206209 (positive: 0.9221385001794044, negative: 0.9216197776618373)
test loss: 0.0023242112112045288, test accuracy: 0.87888, test f1 score: 0.8788782876296792 

Use the stored SentimentCNN

In [None]:
cnn_sentiment_classifier.load_state_dict(torch.load("SentimentCNN.pth"))

<All keys matched successfully>

In [None]:
cnn_sentiment_classifier.eval()
total_loss = 0.0
true_positive = 0
false_negative = 0
positive_predictions = 0
true_negative = 0
false_positive = 0
negative_predictions = 0
for inputs, labels in test_loader:
  inputs, labels = inputs.to(device), labels.to(device)
  outputs = cnn_sentiment_classifier(inputs.swapaxes(1, 2))
  loss = criterion(outputs, labels)
  total_loss += loss.item()
  _, predicted = outputs.max(1)

  for prediction_index, prediction in enumerate(predicted):
    label = labels[prediction_index].item()
    if label == 1 and prediction == label:
      true_positive += 1
    if prediction == 0 and label == 1:
      false_negative += 1
    if prediction == 1:
      positive_predictions += 1
    if label == 0 and prediction == label:
      true_negative += 1
    if prediction == 1 and label == 0:
      false_positive += 1
    if prediction == 0:
      negative_predictions += 1

positive_precision = true_positive / positive_predictions if positive_predictions != 0 else 0
positive_recall = true_positive / (true_positive + false_negative)
positive_f1_score = 2 * (positive_precision * positive_recall) / (positive_precision + positive_recall) if positive_precision + positive_recall != 0 else 0
negative_precision = true_negative / negative_predictions if negative_predictions != 0 else 0
negative_recall = true_negative / (true_negative + false_positive)
negative_f1_score = 2 * (negative_precision * negative_recall) / (negative_precision + negative_recall) if negative_precision + negative_recall != 0 else 0
overall_f1_score = (positive_f1_score + negative_f1_score)/2
print(f"test loss: {total_loss/len(test_loader.dataset)}, test accuracy: {(true_positive + true_negative)/len(test_loader.dataset)}, test f1 score: {overall_f1_score} (positive: {positive_f1_score}, negative: {negative_f1_score})")

test loss: 0.0023254449409246445, test accuracy: 0.87888, test f1 score: 0.8788782876296792 (positive: 0.8793337052681917, negative: 0.8784228699911668)
