In this notebook we implement a simple (yet efficient) neural network for supervised text classification.

The input sequence of words is first passed to an embedding layer. The vectors at the output of the embedding layer are then averged into the text representation vector. Finally the text representation vector is linearly projected to the output vector that is followed by a softmax activation function.

__For more details:__ A Joulin, E Grave, P Bojanowski, T Mikolov - arXiv preprint arXiv:1607.01759, 2016
https://arxiv.org/pdf/1607.01759.pdf


In [1]:
import torch
from torch import nn
from torch import optim
from data_loader import get_loader
import time

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Let's start by implementing the model as described before.

In [None]:
class Model(nn.Module):
    def __init__(self, embedding_size, vocab_size, number_classes):
        super(Model, self).__init__()
        self.embedding_size = embedding_size # embedding space dimension
        self.vocab_size = vocab_size
        self.embedding = nn.Embedding(vocab_size, embedding_size)
        self.linear = nn.Linear(embedding_size, number_classes)
        self.softmax = nn.LogSoftmax(dim=-1)

    def forward(self, input_tensor):
        batch_size = input_tensor.shape[0]
        x = self.embedding(input_tensor) # seq_len * batch_size * embedding_size
        x = torch.mean(x, 0) # batch_size * embedding_size
        logits = self.linear(x) # batch_size * number_classes
        return self.softmax(logits)

Now we write a function that trains the model by applying one update. This is equivalent to:
* One forward propagation. 
* Computing the gradients through back-propagation. 
* Updating the weights using a specific optimizer (Adam in this case)

In [3]:
def trainOneBatch(model, batch_input, optimizer, criterion):
    optimizer.zero_grad()
    sequences = batch_input[0] # get input sequence of shape: batch_size * sequence_len
    targets = batch_input[1] # get targets of shape : batch_size
    out = model.forward(sequences) # shape: batch_size * number_classes 
    loss = criterion(out, targets)
    loss.backward() # compute the gradient
    optimizer.step() # update network parameters
    return loss.item() # return loss value

To evaluate the model we need a function that computes the accuracy of the predictions.

In [9]:
def evaluate(model, data_loader):
    count_batch = 0
    accuracy = 0
    for batch in data_loader:
        sequences = batch[0]
        target = batch[1]
        out = model.forward(sequences)
        predicted = torch.argmax(out, -1)
        accuracy += torch.sum(predicted==target).item()/(sequences.shape[-1])
        count_batch += 1
    accuracy = accuracy/count_batch
    return accuracy

It's time to train the model. The model will be trained for $n$ epochs. In each epoch a new dataloader is used to generate the mini-batches used in the training.

After each epoch, the model is evaluated on the training and the validation set, to verify that no overfitting is occuring.

In [18]:
def trainModel(model, path_documents_train, path_labels_train, path_documents_valid, 
               path_labels_valid, word2ind, n_epochs=5, batch_size=16,  printEvery=100):
    data_loader_train_params = (path_documents_train, path_labels_train, word2ind, str(device), batch_size)
    data_loader_valid_params = (path_documents_valid, path_labels_valid, word2ind, str(device), batch_size)
    epoch = 0
    loss = 0
    count_iter = 0
    optimizer = optim.Adam(model.parameters(), lr=0.0015)
    #negative log likelihood
    criterion = nn.NLLLoss()
    time1 = time.time()
    training_accuracy_epochs = [] # save training accuracy for each epoch
    validation_accuracy_epochs = [] # save validation accuracy for each epoch 
    for i in range(n_epochs):
        loader = get_loader(*data_loader_train_params)
        for batch in loader:
            loss += trainOneBatch(model, batch, optimizer, criterion)
            count_iter += 1
            if count_iter % printEvery == 0:
                time2 = time.time()
                print("Iteration: {0}, Time: {1:.4f} s, training loss: {2:.4f}".format(count_iter,
                                                                          time2 - time1, loss/printEvery))
                loss = 0
        training_accuracy = evaluate(model, get_loader(*data_loader_train_params))
        validation_accuracy = evaluate(model, get_loader(*data_loader_valid_params))
        print('Epoch {0} done: training_accuracy = {1:.3f}, validation_accuracy = {2:.3f}'.format(i+1, training_accuracy, validation_accuracy))

In [19]:
path_cat2ind = 'data/cat2ind.csv'
path_word_count = 'data/word2count.txt'

#load index to category mapping
ind2category = {}
word2ind = {'PAD':0, 'OOV':1}
with open(path_cat2ind, encoding='utf-8') as f:
    for line in f:
        mapping = line.split(',')
        ind2category[int(mapping[1])] = mapping[0]

#load word to index mapping
count = 2
with open(path_word_count) as f:
    for line in f:
        mapping = line.split('\t')
        word2ind[mapping[0]] = count
        count+=1

In [24]:
my_model = Model(50, len(word2ind), len(ind2category)).to(device)

In [25]:
path_documents_train = 'data/train_documents.txt'
path_labels_train = 'data/train_labels.txt'
path_documents_valid = 'data/valid_documents.txt'
path_labels_valid = 'data/valid_labels.txt'
trainModel(my_model, path_documents_train, path_labels_train, path_documents_valid,
           path_labels_valid, word2ind, n_epochs=10, printEvery=300)

Iteration: 300, Time: 1.4388 s, training loss: 1.5830
Iteration: 600, Time: 2.7316 s, training loss: 1.4717
Iteration: 900, Time: 4.0555 s, training loss: 1.2740
Iteration: 1200, Time: 5.3446 s, training loss: 1.0362
Iteration: 1500, Time: 6.6182 s, training loss: 0.8424
Epoch 1 done: training_accuracy = 0.841, validation_accuracy = 0.803
Iteration: 1800, Time: 10.9914 s, training loss: 0.6899
Iteration: 2100, Time: 12.2999 s, training loss: 0.5625
Iteration: 2400, Time: 13.6255 s, training loss: 0.4971
Iteration: 2700, Time: 14.9367 s, training loss: 0.4236
Iteration: 3000, Time: 16.2334 s, training loss: 0.3873
Epoch 2 done: training_accuracy = 0.936, validation_accuracy = 0.902
Iteration: 3300, Time: 20.4897 s, training loss: 0.3551
Iteration: 3600, Time: 21.7920 s, training loss: 0.3050
Iteration: 3900, Time: 23.0869 s, training loss: 0.2888
Iteration: 4200, Time: 24.4169 s, training loss: 0.2621
Iteration: 4500, Time: 25.7423 s, training loss: 0.2461
Epoch 3 done: training_accurac