# Convolutional Neural Networks for Sentence Classification using Ignite

This is a tutorial on using Ignite to train neural network models, setup experiments and validate models.

In this experiment, we'll be replicating [
Convolutional Neural Networks for Sentence Classification by Yoon Kim](https://arxiv.org/abs/1408.5882)! This paper uses CNN for text classification, a task typically reserved for RNNs, Logistic Regression, Naive Bayes.

In this notebook, we'll be using PyTorch to create the model, torchtext to import data and Ignite to train and monitor the models! 

## Setting Up Environment


Example for virtualenv setup on a Linux machine:

`virtualenv --python=/usr/bin/python3.5 env`

`source env/bin/activate`

`pip install torch torchtext pytorch-ignite spacy`

`python -m spacy download en`

## Import Libraries

In [1]:
import random

`torchtext` is a library that provides multiple datasets for NLP tasks, similar to `torchvision`. Below we import the following:
* **data**: A module to setup the data in the form Fields and Labels.
* **datasets**: A module to download NLP datasets.
* **GloVe**: A module to download and use pretrained GloVe embedings.

In [2]:
from torchtext import data
from torchtext import datasets
from torchtext.vocab import GloVe

We import torch, nn and functional modules to create our models! 

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

SEED = 1234
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)

`Ignite` is a High-level library to help with training neural networks in PyTorch. It comes with an Engine to setup a training loop, various metrics, handlers and a helpful contrib section! 

Below we import the following:
* **Engine**: Runs a given process_function over each batch of a dataset, emitting events as it goes.
* **Events**: Allows users to attach functions to an Engine to fire functions at a specific event. Eg: EPOCH_COMPLETED, ITERATION_STARTED, etc.
* **Accuracy**: Metric to calculate accuracy over a dataset, for binary, multiclass, multilabel cases. 
* **Loss**: General metric that takes a loss function as a parameter, calculate loss over a dataset.
* **RunningAverage**: General metric to attach to Engine during training. 
* **ModelCheckpoint**: Handler to checkpoint models. 
* **EarlyStopping**: Handler to stop training based on a score function. 
* **ProgressBar**: Handler to create a tqdm progress bar.

In [4]:
from ignite.engine import Engine, Events
from ignite.metrics import Accuracy, Loss, RunningAverage
from ignite.handlers import ModelCheckpoint, EarlyStopping
from ignite.contrib.handlers import ProgressBar

## Processing Data

The code below first sets up TEXT and LABEL as general data objects. 

* TEXT converts any text to lowercase and produces tensors with the batch dimension first. 


* LABEL is a data object that will convert any labels to floats.

Next IMDB training and test datasets are downloaded, the training data is split into training and validation datasets, default for torchtext is 70% and 30% split. It takes TEXT and LABEL as inputs so that the data is processed as specified. 

This datasets are now converted to iterators using torchtext's BucketIterator. BucketIterator generates batches, and pads each element of the batch to the length of the longest sequence in that batch.

Along with this, we download the pretrained 100 dimensional GloVe vectors and create the word embedding matrix with the using the vocabulary of the training data only. These word vectors are downloaded and saved as an attribute to TEXT. 

Similarly, label vocublary is created for the training data, which is a mapping of the classes to numbers. IMDB dataset is a binary problem, so that labels are mapped to 0 and 1 only. 

In [5]:
TEXT = data.Field(lower=True, batch_first=True)
LABEL = data.LabelField(dtype=torch.float)

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL, root='/tmp/imdb/')
train_data, valid_data = train_data.split(random_state=random.seed(SEED))

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size=64, 
    device=device)


TEXT.build_vocab(train_data, vectors=GloVe(name='6B', dim=100, cache='/tmp/glove/'))
LABEL.build_vocab(train_data)

## TextCNN Model

Here is the replication of the model, here are the operations of the model:
* **Embedding**: Embeds a batch of text of shape (N, L) to (N, L, D), where N is batch size, L is maximum length of the batch, D is the embedding dimension. 

* **Convolutions**: Runs parallel convolutions across the embedded words with kernel sizes of 3, 4, 5 to mimic trigrams, four-grams, five-grams. This results in outputs of (N, L - k + 1, D) per convolution, where k is the kernel_size. 

* **Activation**: ReLu activation is applied to each convolution operation.

* **Pooling**: Runs parallel maxpooling operations on the activated convolutions with window sizes of L - k + 1, resulting in 1 value per channel i.e. a shape of (N, 1, D) per pooling. 

* **Concat**: The pooling outputs are concatenated and squeezed to result in a shape of (N, 3D). This is a single embedding for a sentence.

* **Dropout**: Dropout is applied to the embedded sentence. 

* **Fully Connected**: The dropout output is passed through a fully connected layer of shape (3D, 1) to give a single output for each example in the batch. sigmoid is applied to the output of this layer.

* **load_embeddings**: This is a method defined for TextCNN to load embeddings based on user input. There are 3 modes - rand which results in randomly initialized weights, static which results in frozen pretrained weights, nonstatic which results in trainable pretrained weights. 

In [6]:
class TextCNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, kernel_sizes, num_filters, num_classes, d_prob):
        super(TextCNN, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.kernel_sizes = kernel_sizes
        self.num_filters = num_filters
        self.num_classes = num_classes
        self.d_prob = d_prob
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.conv = nn.ModuleList([nn.Conv1d(in_channels=embedding_dim,
                                             out_channels=num_filters,
                                             kernel_size=k, stride=1) for k in kernel_sizes])
        self.dropout = nn.Dropout(d_prob)
        self.fc = nn.Linear(len(kernel_sizes) * num_filters, num_classes)

    def forward(self, x):
        batch_size, sequence_length = x.shape
        x = self.embedding(x).transpose(1, 2)
        x = [F.relu(conv(x)) for conv in self.conv]
        x = [F.max_pool1d(c, c.size(-1)).squeeze(dim=-1) for c in x]
        x = torch.cat(x, dim=1)
        x = self.fc(self.dropout(x))
        return torch.sigmoid(x).squeeze()

    def load_embeddings(self, mode):
        if 'static' in mode:
            self.embedding.weight.data.copy_(TEXT.vocab.vectors)
            if 'non' not in mode:
                self.embedding.weight.data.requires_grad = False
            else:
                self.embedding.weight.data.requires_grad = True
        elif mode == 'rand':
            pass
        else:
            raise ValueError('Unexpected value of mode. Please choose from static, nonstatic, rand.')

## Creating Model, Optimizer and Loss

Below we create an instance of the TextCNN model and load embeddings in **static** mode. The model is placed on a device and then a loss function of Binary Cross Entropy and Adam optimizer are setup. 

In [7]:
vocab_size, embedding_dim = TEXT.vocab.vectors.shape

model = TextCNN(vocab_size=vocab_size,
                embedding_dim=embedding_dim,
                kernel_sizes=[3, 4, 5],
                num_filters=100,
                num_classes=1, d_prob=0.3)
model.load_embeddings(mode='static')
model.to(device)
optimizer = torch.optim.Adam(model.parameters())
criterion = nn.BCELoss()

## Training and Evaluating using Ignite

### Trainer Engine - process_function

Ignite's Engine allows user to define a process_function to process a given batch, this is applied to all the batches of the dataset. This is a general class that can be applied to train and validate models! A process_function has two parameters engine and batch. 


Let's walk through what the function of the trainer does:

* Sets model in train mode. 
* Sets the gradients of the optimizer to zero.
* Generate x and y from batch.
* Performs a forward pass to calculate y_pred using model and x.
* Calculates loss using y_pred and y.
* Performs a backward pass using loss to calculate gradients for the model parameters.
* model parameters are optimized using gradients and optimizer.
* Returns scalar loss. 

Below is a single operation during the trainig process. This process_function will be attached to the training engine.

In [8]:
def process_function(engine, batch):
    model.train()
    optimizer.zero_grad()
    x, y = batch.text, batch.label
    y_pred = model(x)
    loss = criterion(y_pred, y)
    loss.backward()
    optimizer.step()
    return loss.item()

### Evaluator Engine - process_function

Similar to the training process function, we setup a function to evaluate a single batch. Here is what the eval_function does:

* Sets model in eval mode.
* Generates x and y from batch.
* With torch.no_grad(), no gradients are calculated for any succeding steps.
* Performs a forward pass on the model to calculate y_pred based on model and x.
* Returns y_pred and y.

Ignite suggests attaching metrics to evaluators and not trainers because during the training the model parameters are constantly changing and it is best to evaluate model on a stationary model. This information is important as there is a difference in the functions for training and evaluating. Training returns a single scalar loss. Evaluating returns y_pred and y as that output is used to calculate metrics per batch for the entire dataset.

All metrics in Ignite require y_pred and y as outputs of the function attached to the Engine. 

In [9]:
def eval_function(engine, batch):
    model.eval()
    with torch.no_grad():
        x, y = batch.text, batch.label
        y_pred = model(x)
        return y_pred, y

### Instantiating Training and Evaluating Engines

Below we create 3 engines, a trainer, a training evaluator and a validation evaluator. You'll notice that train_evaluator and validation_evaluator use the same function, we'll see later why this was done! 

In [10]:
trainer = Engine(process_function)
train_evaluator = Engine(eval_function)
validation_evaluator = Engine(eval_function)

### Metrics - RunningAverage, Accuracy and Loss

To start, we'll attach a metric of Running Average to track a running average of the scalar loss output for each batch. 

In [11]:
RunningAverage(output_transform=lambda x: x).attach(trainer, 'loss')

Now there are two metrics that we want to use for evaluation - accuracy and loss. This is a binary problem, so for Loss we can simply pass the Binary Cross Entropy function as the loss_function. 

For Accuracy, Ignite requires y_pred and y to be comprised of 0's and 1's only. Since our model outputs from a sigmoid layer, values are between 0 and 1. We'll need to write a function that transforms engine.state.output which is comprised of y_pred and y. 

Below thresholded_output_transform does just that, it rounds y_pred to convert y_pred to 0's and 1's, and then returns rounded y_pred and y. This function is the output_transform function used to transform the engine.state.output to achieve Accuracy's desired purpose.

Now, we attach Loss and Accuracy (with thresholded_output_transform) to train_evaluator and validation_evaluator. 

To attach a metric to engine, the following format is used:
* Metric(output_transform=output_transform, ...).attach(engine, 'metric_name')


In [12]:
def thresholded_output_transform(output):
    y_pred, y = output
    y_pred = torch.round(y_pred)
    return y_pred, y

In [13]:
Accuracy(output_transform=thresholded_output_transform).attach(train_evaluator, 'accuracy')
Loss(criterion).attach(train_evaluator, 'bce')

In [14]:
Accuracy(output_transform=thresholded_output_transform).attach(validation_evaluator, 'accuracy')
Loss(criterion).attach(validation_evaluator, 'bce')

### Progress Bar

Next we create an instance of Ignite's progess bar and attach it to the trainer and pass it a key of engine.state.metrics to track. In this case, the progress bar will be tracking engine.state.metrics['loss']

In [15]:
pbar = ProgressBar(persist=True)
pbar.attach(trainer, ['loss'])

### EarlyStopping - Tracking Validation Loss

Now we'll setup a Early Stopping handler for this training process. EarlyStopping requires a score_function that allows the user to define whatever criteria to stop trainig. In this case, if the loss of the validation set does not decrease in 5 epochs, the training process will stop early.  

In [16]:
def score_function(engine):
    val_loss = engine.state.metrics['bce']
    return -val_loss

handler = EarlyStopping(patience=5, score_function=score_function, trainer=trainer)
validation_evaluator.add_event_handler(Events.COMPLETED, handler)

### Attaching Custom Functions to Engine at specific Events

Below you'll see ways to define your own custom functions and attaching them to various Events of the training process.

The functions below both achieve similar tasks, they print the results of the evaluator run on a dataset. One function does that on the training evaluator and dataset, while the other on the validation. Another difference is how these functions are attached in the trainer engine.

The first method involves using a decorator, the syntax is simple - @trainer.on(Events.EPOCH_COMPLETED), means that the decorated function will be attached to the trainer and called at the end of each epoch. 

The second method involves using the add_event_handler method of trainer - trainer.add_event_handler(Events.EPOCH_COMPLETED, custom_function). This achieves the same result as the above. 

In [17]:
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(engine):
    train_evaluator.run(train_iterator)
    metrics = train_evaluator.state.metrics
    avg_accuracy = metrics['accuracy']
    avg_bce = metrics['bce']
    pbar.log_message(
        "Training Results - Epoch: {}  Avg accuracy: {:.2f} Avg loss: {:.2f}"
        .format(engine.state.epoch, avg_accuracy, avg_bce))
    
def log_validation_results(engine):
    validation_evaluator.run(valid_iterator)
    metrics = validation_evaluator.state.metrics
    avg_accuracy = metrics['accuracy']
    avg_bce = metrics['bce']
    pbar.log_message(
        "Validation Results - Epoch: {}  Avg accuracy: {:.2f} Avg loss: {:.2f}"
        .format(engine.state.epoch, avg_accuracy, avg_bce))
    pbar.n = pbar.last_print_n = 0

trainer.add_event_handler(Events.EPOCH_COMPLETED, log_validation_results)

### ModelCheckpoint

Lastly, we want to checkpoint this model. It's important to do so, as training processes can be time consuming and if for some reason something goes wrong during training, a model checkpoint can be helpful to restart training from the point of failure.

Below we'll use Ignite's ModelCheckpoint handler to checkpoint models at the end of each epoch. 

In [18]:
checkpointer = ModelCheckpoint('/tmp/models', 'textcnn', save_interval=1, n_saved=2, create_dir=True, save_as_state_dict=True)
trainer.add_event_handler(Events.EPOCH_COMPLETED, checkpointer, {'textcnn': model})

### Run Engine

Next, we'll run the trainer for 10 epochs and monitor results. Below we can see that progess bar prints the loss per iteration, and prints the results of training and validation as we specified in our custom function. 

In [19]:
trainer.run(train_iterator, max_epochs=10)

Epoch [1/10]: [274/274] 100%|██████████, loss=3.98e-01 [00:14<00:00]


Training Results - Epoch: 1  Avg accuracy: 0.87 Avg loss: 0.31


Epoch [2/10]: [1/274]   0%|          , loss=3.17e-01 [00:00<00:13]

Validation Results - Epoch: 1  Avg accuracy: 0.84 Avg loss: 0.36


Epoch [2/10]: [274/274] 100%|██████████, loss=2.72e-01 [00:14<00:00]


Training Results - Epoch: 2  Avg accuracy: 0.95 Avg loss: 0.15


Epoch [3/10]: [1/274]   0%|          , loss=1.86e-01 [00:00<00:16]

Validation Results - Epoch: 2  Avg accuracy: 0.87 Avg loss: 0.30


Epoch [3/10]: [274/274] 100%|██████████, loss=1.51e-01 [00:14<00:00]


Training Results - Epoch: 3  Avg accuracy: 0.99 Avg loss: 0.05


Epoch [4/10]: [1/274]   0%|          , loss=4.67e-02 [00:00<00:12]

Validation Results - Epoch: 3  Avg accuracy: 0.88 Avg loss: 0.29


Epoch [4/10]: [274/274] 100%|██████████, loss=4.77e-02 [00:14<00:00]


Training Results - Epoch: 4  Avg accuracy: 1.00 Avg loss: 0.01


Epoch [5/10]: [1/274]   0%|          , loss=1.32e-02 [00:00<00:15]

Validation Results - Epoch: 4  Avg accuracy: 0.88 Avg loss: 0.31


Epoch [5/10]: [274/274] 100%|██████████, loss=1.35e-02 [00:14<00:00]


Training Results - Epoch: 5  Avg accuracy: 1.00 Avg loss: 0.00


Epoch [6/10]: [1/274]   0%|          , loss=8.02e-03 [00:00<00:15]

Validation Results - Epoch: 5  Avg accuracy: 0.88 Avg loss: 0.34


Epoch [6/10]: [274/274] 100%|██████████, loss=5.12e-03 [00:14<00:00]


Training Results - Epoch: 6  Avg accuracy: 1.00 Avg loss: 0.00


Epoch [7/10]: [1/274]   0%|          , loss=3.06e-03 [00:00<00:16]

Validation Results - Epoch: 6  Avg accuracy: 0.88 Avg loss: 0.36


Epoch [7/10]: [274/274] 100%|██████████, loss=2.86e-03 [00:14<00:00]


Training Results - Epoch: 7  Avg accuracy: 1.00 Avg loss: 0.00


Epoch [8/10]: [1/274]   0%|          , loss=1.55e-03 [00:00<00:11]

Validation Results - Epoch: 7  Avg accuracy: 0.88 Avg loss: 0.38


Epoch [8/10]: [274/274] 100%|██████████, loss=1.56e-03 [00:14<00:00]


Training Results - Epoch: 8  Avg accuracy: 1.00 Avg loss: 0.00
Validation Results - Epoch: 8  Avg accuracy: 0.88 Avg loss: 0.40


<ignite.engine.engine.State at 0x7f88919f5940>

That's it! We have successfully trained and evaluated a Convolutational Neural Network for Text Classification. 