# CNN for Sentiment Analysis of Movie Reviews

The idea of this project is to implement Yoon Kim's [paper](https://arxiv.org/abs/1408.5882) for using a convolutional neural network to classify the sentiment of movie reviews. In this paper, they have shown that even a simple convolutional neural network can achieve state of the art results, with a pretrained word embedding like **word2vec**.

My approach for this project is to incrementally build a CNN with randomly-initialized, non-static, word embeddings and work towards using a pre-trained word embedding. Once that's achieved, I am planning to add more "channels" (i.e., a static and non-static word embeddings) just like in Kim's paper.

## Setup

In this notebook, I have similarly followed what Kim's paper setup under section *3.1: Hyperparameters and Training*. I have ensured that validation set is using 10% of the training dataset (since there is no dedicated validation set). Also, the batch size used is 50. All filters have the sizes between 3 to 5 with 100 channel outputs. 

The only difference is that I am using the model with the highest seen validation accuracy after running the specified amount of epochs. This is different from the idea that early stopping is employed in Kim's paper.

Lastly, the **GloVe** word embeddings are used instead of **word2vec**.

## Imports

There are several packages needed to be imported to get started.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchtext
from torchtext.datasets import IMDB
from torchtext.data import BucketIterator
from torchtext.data import Field, LabelField
import torchwordemb
import spacy
import numpy as np
from tqdm import tqdm_notebook
import copy

  return f(*args, **kwds)
  return f(*args, **kwds)


Note that `torchtext` makes the importing of our files much easier for the IMDB review dataset as it provides the `IMDB` class for us to work with. Similarly, integrating other datasets with `torchtext` is pretty straightforward.

## Dataset

Several helper functions will be made for better organization of loading the IMDB movie reviews dataset.

In [2]:
def create_fields(input_tokenizer="spacy"):
    """Creates the input and output fields for extracting the reviews and the labels
    
    Args:
        input_tokenizer: The tokenizer to use for the input that is a supported tokenizer or custom function
    
    Returns:
        The input and output fields to use
    """
    input_field = Field(tokenize=input_tokenizer)
    output_field = LabelField(tensor_type=torch.FloatTensor)
    
    return input_field, output_field

In [3]:
def prepare_and_load_dataset(input_field, output_field, dataset_path=".data", validation_set_split_ratio=0.0, input_vectors_cache="vectors", input_vectors=None):
    """Loads the dataset and builds the vocabularies and returns the appropriate datasets
    
    Args:
        input_field: The input field to put the review for each example
        output_field: The output field to put the label of each example onto
        dataset_path: The path to extract and save the dataset to
        validation_set_split_ratio: A ratio to split the training set into training and validation sets by
        input_vectors_cache: The location to store the pre-trained word embeddings
        input_vectors: The name of the pre-trained word embeddings to use
        
    Returns:
        The training, optional validation, and test set iterators
    """
    training_dataset, test_dataset = IMDB.splits(input_field, output_field, root=dataset_path)
    if validation_set_split_ratio > 0.0:
        training_dataset, validation_dataset = training_dataset.split(split_ratio=1.0 - validation_set_split_ratio)
        
    if input_vectors is None:
        input_field.build_vocab(training_dataset)
    else:
        input_field.build_vocab(training_dataset, vectors=input_vectors, vectors_cache=input_vectors_cache)
        
    output_field.build_vocab(training_dataset)
    
    if validation_dataset is None:
        return BucketIterator.splits((training_dataset, test_dataset), sort_key=lambda batch: IMDB.sort_key(batch), batch_size=50, repeat=False)
    else:
        return BucketIterator.splits((training_dataset, validation_dataset, test_dataset), sort_key=lambda batch: IMDB.sort_key(batch), batch_size=50, repeat=False)

Note that `split_ratio` is the ratio of how much data goes to the training dataset. Hence, the `validation_set_split_ratio` must be subtracted from `1.0`.

Now, the dataset will be loaded into `dataset_path` and will build the vocabulary of the input and output fields.

In [155]:
input_field, output_field = create_fields()

kwargs = {"dataset_path": "data", "validation_set_split_ratio": 0.1}
training_dataset, validation_dataset, test_dataset = prepare_and_load_dataset(input_field, output_field, **kwargs)

## Model training and evaluation

Before starting with the different model architectures, it is ideal to setup our helper functions that I will be using to train, validate, and test our model architectures. Two functions will be made: one for training the model and one for evaluating the model's performance. Note that _accuracy_ is the metric that I will consider.

In [4]:
def run_model(model, loss_function, dataset, optimizer=None, is_training_model=True):
    """Runs the model given its dataset and using the loss function and optimizer
    
    Args:
        model: The model to train
        loss_function: The loss function to use
        dataset: The dataset
        optimizer: The optimizer to use when training the model
        is_training_model: A flag indicating whether to run the model in training or evaluation mode
        
    Returns:
        The average loss and accuracy for running through the dataset
    """
    if is_training_model:
        model.train()
    else:
        model.eval()
        
    accuracies = 0.0
    losses = 0.0
    
    for example in tqdm_notebook(dataset):
        if is_training_model:
            model.zero_grad()
        
        inputs, targets = example.text, example.label
        
        predictions = model(inputs)
        loss = loss_function(predictions, targets)
        accuracy = calculate_accuracy(torch.sigmoid(predictions), targets)
        
        losses += loss.cpu().data.item()
        accuracies += accuracy.cpu().data.item()
        
        if is_training_model:
            loss.backward()
            optimizer.step()
            
    return losses / len(dataset), accuracies / len(dataset)

In [5]:
def calculate_accuracy(predictions, targets):
    """Calculates the accuracy given the predictions and targets
    
    Args:
        predictions: The predictions made by the model
        targets: The targets for the set of examples
        
    Returns:
        The accuracy of the model
    """
    return (predictions.round() == targets).float().mean()

The `run_model()` function will handle both training and evaluation phases for every epoch. The last function that needs to be created is the one responsible for running the model for a certain amount of epochs. Note that this function employs the optimization of saving the best parameters with the highest validation accuracy across all epochs.

In [6]:
def train_evaluate_model(model, loss_function, optimizer, training_dataset, test_dataset, validation_dataset=None, num_epochs=5):
    """Trains and evaluates the given model
    
    Args:
        model: The model to train and evaluate
        loss_function: The loss function
        optimizer: The optimizer to use when training the model
        training_dataset: The dataset to use for training the model
        test_dataset: The dataset to use for testing the model on unseen data
        validation_dataset: The dataset to validate the model after training
        num_epochs: The number of epochs to train and evaluate the model for
    """
    
    if torch.cuda.is_available():
        model = model.cuda()
        
    best_model_weights = copy.deepcopy(model.state_dict())
    best_valid_accuracy = 0.0
        
    for epoch in range(1, num_epochs + 1):
        training_loss, training_accuracy = run_model(model, loss_function, training_dataset, optimizer=optimizer)
        
        print("[Training - Epoch {}] Loss: {:.2f}".format(epoch, training_loss))
        print("[Training - Epoch {}] Accuracy: {:.2f}".format(epoch, training_accuracy))
        
        if validation_dataset is not None:
            validation_loss, validation_accuracy = run_model(model, loss_function, validation_dataset, is_training_model=False)
            
            print("[Validation - Epoch {}] Loss: {:.2f}".format(epoch, validation_loss))
            print("[Validation - Epoch {}] Accuracy: {:.2f}".format(epoch, validation_accuracy))
            
            if validation_accuracy > best_valid_accuracy:
                best_valid_accuracy = validation_accuracy
                best_model_weights = copy.deepcopy(model.state_dict())
            
    model.load_state_dict(best_model_weights)
    
    test_loss, test_accuracy = run_model(model, loss_function, test_dataset, is_training_model=False)
    
    print("[Test] Loss: {:.2f}".format(test_loss))
    print("[Test] Accuracy: {:.2f}".format(test_accuracy))

## Model 1: CNNRand

In this model, I am going to create a convolutional neural network where the embeddings used for each word in every review is randomly initialized and will be learned during training. This is a baseline model that I am going to build upon. In particular, I am going to use the following layers: `Embedding, Conv2d, MaxPool1d, Linear, Dropout`.

In [104]:
class CNNRand(nn.Module):
    def __init__(self, num_vocab, embedding_dim, channel_out, filter_sizes, num_output, dropout_prob):
        super(CNNRand, self).__init__()
        
        self.embedding = nn.Embedding(num_vocab, embedding_dim)
        self.convs = nn.ModuleList([nn.Conv2d(1, channel_out, filter_sizes[i]) for i in range(len(filter_sizes))])
        self.fc = nn.Linear(len(filter_sizes) * channel_out, num_output)
        self.dropout = nn.Dropout(dropout_prob)
    
    def forward(self, x):
        x = x.t()
        
        x = self.embedding(x).unsqueeze(1)
        x_convs = [F.relu(conv(x)).squeeze(3) for conv in self.convs]
        x_pools = [F.max_pool1d(x_conv, x_conv.size(2)).squeeze(2) for x_conv in x_convs]
        x_cat = self.dropout(torch.cat(x_pools, 1))
        x_fc = self.fc(x_cat).squeeze(1)
        
        return x_fc

### Training and evaluation

These are the values that we will be using as arguments to this model:

In [161]:
EMBEDDING_DIM = 50
CHANNEL_OUT = 100
FILTER_SIZES = [(i, EMBEDDING_DIM) for i in range(3, 6)]
NUM_OUTPUT = 1
DROPOUT_PROB = 0.5

In [162]:
args = [len(input_field.vocab), EMBEDDING_DIM, CHANNEL_OUT, FILTER_SIZES, NUM_OUTPUT, DROPOUT_PROB]
cnn_rand = CNNRand(*args)

The loss function that will be used is `BCEWithLogitsLoss` and the optimizer used is `Adam`.

In [163]:
loss_function = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(cnn_rand.parameters())

Now the `CNNRand` model is going to be trained and evaluated.

In [164]:
train_evaluate_model(cnn_rand, loss_function, optimizer, training_dataset, test_dataset, validation_dataset=validation_dataset)

HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 1] Loss: 0.67
[Training - Epoch 1] Accuracy: 0.60


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

  return Variable(arr, volatile=not train)


[Validation - Epoch 1] Loss: 0.52
[Validation - Epoch 1] Accuracy: 0.77


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 2] Loss: 0.53
[Training - Epoch 2] Accuracy: 0.74


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 2] Loss: 0.46
[Validation - Epoch 2] Accuracy: 0.77


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 3] Loss: 0.46
[Training - Epoch 3] Accuracy: 0.78


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 3] Loss: 0.38
[Validation - Epoch 3] Accuracy: 0.83


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 4] Loss: 0.40
[Training - Epoch 4] Accuracy: 0.82


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 4] Loss: 0.34
[Validation - Epoch 4] Accuracy: 0.85


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 5] Loss: 0.34
[Training - Epoch 5] Accuracy: 0.85


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 5] Loss: 0.31
[Validation - Epoch 5] Accuracy: 0.86


HBox(children=(IntProgress(value=0, max=500), HTML(value='')))

[Test] Loss: 0.32
[Test] Accuracy: 0.86


### Results

After trying out multiple filter sizes, this model has reached approximately 86% accuracy in the test set.

Multiple filter sizes are not able to find the most important words that determine whether a movie review has a positive or negative sentiment. This is due to the randomly-initialized embedding matrix that is learned during training - which provides no information about each word in a sentence. I expect that using a pre-trained word embedding, like word2vec or GloVe, should give the model more information about the words and be able to detect the most important words for sentiment analysis.

Finally, it is ideal to note that training and evaluating a CNN for sentiment classification is much faster than an RNN and that the accuracy is achieved in less than 10 epochs, based from my exploration in [here](https://gitlab.com/bigbawsboy/IMDB-sentiment-analysis).

## Model 2: CNNStatic

In this model, the `Embedding` layer will use the weights from a pre-trained word embedding. In particular, I will be using GloVe: a 100 dimension word embedding. This model is static because the embedding layer will not update its parameters during training while the rest of the layers will update their parameters.

In order to support pre-trained word embeddings, the `create_fields()` function has been modified to accept a name of one of the supported pre-trained word embeddings and the directory to download the embeddings at.

In [8]:
input_field, output_field = create_fields()

kwargs = {"dataset_path": "data", "validation_set_split_ratio": 0.1, "input_vectors_cache": "vectors", "input_vectors": "glove.6B.100d"}
training_dataset, validation_dataset, test_dataset = prepare_and_load_dataset(input_field, output_field, **kwargs)

  return f(*args, **kwds)
  return f(*args, **kwds)


Here is the implementation of the model.

In [125]:
class CNNStatic(nn.Module):
    def __init__(self, embedding_weights, num_vocab, embedding_dim, channel_out, filter_sizes, num_output, dropout_prob):
        super(CNNStatic, self).__init__()
        
        self.embedding = nn.Embedding(num_vocab, embedding_dim)
        self.embedding.weight.data.copy_(embedding_weights)
        self.embedding.requires_grad = False
        
        self.convs = nn.ModuleList([nn.Conv2d(1, channel_out, filter_sizes[i]) for i in range(len(filter_sizes))])
        self.fc = nn.Linear(len(filter_sizes) * channel_out, num_output)
        self.dropout = nn.Dropout(dropout_prob)
    
    def forward(self, x):
        x = x.t()
        
        x = self.embedding(x).unsqueeze(1)
        x_convs = [F.relu(conv(x)).squeeze(3) for conv in self.convs]
        x_pools = [F.max_pool1d(x_conv, x_conv.size(2)).squeeze(2) for x_conv in x_convs]
        x_cat = self.dropout(torch.cat(x_pools, 1))
        x_fc = self.fc(x_cat).squeeze(1)
        
        return x_fc

### Training and evaluation

The hyperparameters used for this model are as follows:

In [9]:
EMBEDDING_WEIGHTS = input_field.vocab.vectors
NUM_VOCAB = EMBEDDING_WEIGHTS.size(0)
EMBEDDING_DIM = EMBEDDING_WEIGHTS.size(1)
CHANNEL_OUT = 100
FILTER_SIZES = [(i, EMBEDDING_DIM) for i in range(3, 6)]
NUM_OUTPUT = 1
DROPOUT_PROB = 0.5

Note that most of the hyperparameters are the same except for the embedding weights and its dimensions.

In [13]:
args = [EMBEDDING_WEIGHTS, NUM_VOCAB, EMBEDDING_DIM, CHANNEL_OUT, FILTER_SIZES, NUM_OUTPUT, DROPOUT_PROB]

In [150]:
cnn_static = CNNStatic(*args)

loss_function = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(cnn_static.parameters())

train_evaluate_model(cnn_static, loss_function, optimizer, training_dataset, test_dataset, validation_dataset=validation_dataset)

HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 1] Loss: 0.46
[Training - Epoch 1] Accuracy: 0.78


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

  return Variable(arr, volatile=not train)


[Validation - Epoch 1] Loss: 0.32
[Validation - Epoch 1] Accuracy: 0.87


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 2] Loss: 0.27
[Training - Epoch 2] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 2] Loss: 0.26
[Validation - Epoch 2] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 3] Loss: 0.17
[Training - Epoch 3] Accuracy: 0.94


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 3] Loss: 0.26
[Validation - Epoch 3] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 4] Loss: 0.08
[Training - Epoch 4] Accuracy: 0.97


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 4] Loss: 0.29
[Validation - Epoch 4] Accuracy: 0.90


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 5] Loss: 0.04
[Training - Epoch 5] Accuracy: 0.99


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 5] Loss: 0.32
[Validation - Epoch 5] Accuracy: 0.90


HBox(children=(IntProgress(value=0, max=500), HTML(value='')))

[Test] Loss: 0.38
[Test] Accuracy: 0.87


### Results

After 5 epochs, this model is able to achieve an accuracy of 87% on the test set. I have used the same number of filter sizes as in the `CNNRand` model. Note that since the `Embedding` layer's parameters have not been learned during training, it could not adjust its information accordingly based on the context from the training dataset.

By allowing the `Embedding` layer to adjust its parameters to learn more about the context of words in each example in the training dataset, the accuracy should increase much more.

## Model 3: CNNNonStatic

In this model, the `Embedding` layer will adjust its parameters during the training phase of the model. The model architecture still stays the same. As a result, I will subclass the `CNNStatic` class and set `requires_grad` to `True`. Another approach to making this change is to change the initializer to accept a `embedding_requires_grad` and its value will be used when setting the property to the `Embedding` layer. However, I am going with the former solution to have a clear distinction between `CNNStatic` and `CNNNonStatic`.

In [131]:
class CNNNonStatic(CNNStatic):
    def __init__(self, embedding_weights, num_vocab, embedding_dim, channel_out, filter_sizes, num_output, dropout_prob):
        super(CNNNonStatic, self).__init__(embedding_weights, num_vocab, embedding_dim, channel_out, filter_sizes, num_output, dropout_prob)
        
        self.embedding.requires_grad = True
        
    def forward(self, x):
        return super().forward(x)

In [151]:
cnn_non_static = CNNNonStatic(*args)

loss_function = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(cnn_non_static.parameters())

train_evaluate_model(cnn_non_static, loss_function, optimizer, training_dataset, test_dataset, validation_dataset=validation_dataset)

HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 1] Loss: 0.46
[Training - Epoch 1] Accuracy: 0.77


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

  return Variable(arr, volatile=not train)


[Validation - Epoch 1] Loss: 0.30
[Validation - Epoch 1] Accuracy: 0.87


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 2] Loss: 0.27
[Training - Epoch 2] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 2] Loss: 0.26
[Validation - Epoch 2] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 3] Loss: 0.17
[Training - Epoch 3] Accuracy: 0.94


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 3] Loss: 0.25
[Validation - Epoch 3] Accuracy: 0.90


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 4] Loss: 0.08
[Training - Epoch 4] Accuracy: 0.97


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 4] Loss: 0.29
[Validation - Epoch 4] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))

[Training - Epoch 5] Loss: 0.04
[Training - Epoch 5] Accuracy: 0.99


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

[Validation - Epoch 5] Loss: 0.31
[Validation - Epoch 5] Accuracy: 0.90


HBox(children=(IntProgress(value=0, max=500), HTML(value='')))

[Test] Loss: 0.28
[Test] Accuracy: 0.89


### Results

After 5 epochs, the test set accuracy of this model is 89% on the test set. There's a 2% increase in the accuracy from the `CNNStatic` model. Using pre-trained word embeddings provided more information for every word in the sentence. Next, I will create a 2-channel embedding, and see observe its performance.

## Model 4: CNNMultiChannel

In this model architecture, two channels are used as input to the convolutional layer. Both channels are initialized using the pre-trained word embedding. However, one of those channels will have its parameters learned during training while the other does not.

In [16]:
class CNNMultiChannel(nn.Module):
    def __init__(self, embedding_weights, num_vocab, embedding_dim, channel_out, filter_sizes, num_output, dropout_prob):
        super(CNNMultiChannel, self).__init__()
        
        self.non_static_embedding = nn.Embedding(num_vocab, embedding_dim)
        self.non_static_embedding.weight.data.copy_(embedding_weights)
        
        self.static_embedding = nn.Embedding(num_vocab, embedding_dim)
        self.static_embedding.weight.data.copy_(embedding_weights)
        self.static_embedding.requires_grad = False
        
        self.convs = nn.ModuleList([nn.Conv2d(2, channel_out, filter_sizes[i]) for i in range(len(filter_sizes))])
        self.fc = nn.Linear(len(filter_sizes) * channel_out, num_output)
        self.dropout = nn.Dropout(dropout_prob)
    
    def forward(self, x):
        x = x.t()
        
        x_non_static_embed = self.non_static_embedding(x).unsqueeze(1)
        x_static_embed = self.static_embedding(x).unsqueeze(1)
        x_embed = torch.cat((x_non_static_embed, x_static_embed), 1)
        x_convs = [F.relu(conv(x_embed)).squeeze(3) for conv in self.convs]
        x_pools = [F.max_pool1d(x_conv, x_conv.size(2)).squeeze(2) for x_conv in x_convs]
        x_cat = self.dropout(torch.cat(x_pools, 1))
        x_fc = self.fc(x_cat).squeeze(1)
        
        return x_fc

In [17]:
cnn_multi_channel = CNNMultiChannel(*args)

loss_function = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(cnn_multi_channel.parameters())

train_evaluate_model(cnn_multi_channel, loss_function, optimizer, training_dataset, test_dataset, validation_dataset=validation_dataset)

HBox(children=(IntProgress(value=0, max=450), HTML(value='')))


[Training - Epoch 1] Loss: 0.44
[Training - Epoch 1] Accuracy: 0.79


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

  return Variable(arr, volatile=not train)



[Validation - Epoch 1] Loss: 0.31
[Validation - Epoch 1] Accuracy: 0.88


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))


[Training - Epoch 2] Loss: 0.27
[Training - Epoch 2] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))


[Validation - Epoch 2] Loss: 0.26
[Validation - Epoch 2] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))


[Training - Epoch 3] Loss: 0.15
[Training - Epoch 3] Accuracy: 0.94


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))


[Validation - Epoch 3] Loss: 0.26
[Validation - Epoch 3] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))


[Training - Epoch 4] Loss: 0.08
[Training - Epoch 4] Accuracy: 0.97


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))


[Validation - Epoch 4] Loss: 0.31
[Validation - Epoch 4] Accuracy: 0.89


HBox(children=(IntProgress(value=0, max=450), HTML(value='')))


[Training - Epoch 5] Loss: 0.03
[Training - Epoch 5] Accuracy: 0.99


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))


[Validation - Epoch 5] Loss: 0.35
[Validation - Epoch 5] Accuracy: 0.90


HBox(children=(IntProgress(value=0, max=500), HTML(value='')))


[Test] Loss: 0.42
[Test] Accuracy: 0.87


### Results

After 5 epochs, this model has achieved a test set accuracy of 87%. As in Kim's paper, they have not seen much improvement with the usage of static and non-static channels. The hope is that the static channel will help the non-static channel not deviate too far from the information already present in the pre-trained word embedding, at the same time learn specific relationships of words with regards to movie reviews.

## Conclusion

Going through Kim's paper about the usage of convolutional neural networks in sentiment analysis has shown potential to be used as one of the standard architectures in this field, alongside with recurrent neural networks. Overall, `CNNStatic`, `CNNNonStatic`, and `CNNMultiChannel` has shown similar test set accuracies. More hyperparameter adjustments could possibly increase an already good accuracy for sentiment analysis with IMDB movie reviews dataset.

Furthermore, I have learned a lot of using the PyTorch framework to build this (and thanks to [this](https://github.com/bentrevett/pytorch-sentiment-analysis) reference for guiding me through). Lastly, this project has taught me how to write more streamlined functions to train and evaluate models.