<a href="https://colab.research.google.com/github/m-wallner/nlp-document-classification-lstm/blob/main/nlp-document-classification-lstm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Natural language processing: Document classification using LSTM

## Project overview

The goal of this project was to create a complete NLP pipeline for document classification using the pre-trained vectors of a word embedding model combined with an LSTM model.

### Preprocessing
First, the data set is loaded, all documents get tokenized, and a dictionary of vocabularies is created from the tokenized text, with tokens with low frequencies being excluded. Next, a lookup for the embeddings of all the words in the dictionary is created – this is an embedding matrix that maps the ID of each word to the respective pre-trained vector from the embedding model, which is GloVe with a vector length of 300 in this case. Words that are not found in the embedding model are replaced by randomly initialized vectors. The preprocessed and embedded data is then pickled to save time in future runs, and a PyTorch Dataset object is created for the training, validation and test set for optimized data loading during training and inference time.

### Training the model
The words in each tokenized document in a batch are turned into word IDs based on the embedding matrix and the respective pretrained word embeddings are fetched. Next, the LSTM model calculates the hidden states of each given document and uses the last hidden state as document embedding which is sent through a dropout layer with a dropout probability of 0.5, a linear decoder layer, and finally a Softmax layer to predict the probability distribution over the output classes.
This model is then used for further experiments, with each experiment just applying a single change to the baseline architecture: The number of hidden dimensions is increased from 128 to 512, dropout is increased from 0.5 to 0.8, and finally a bidirectional LSTM is used, with each experiment reporting the results in terms of document classification accuracy.


## 1 Imports and data loading


In [None]:
!pip install torchtext==0.8.1
!nvidia-smi

In [None]:
import os
import time
import math
import pickle

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data.dataset import random_split
from torch.nn.utils.rnn  import pad_sequence, pack_padded_sequence, pad_packed_sequence

import torchtext
from torchtext.datasets import text_classification
from torchtext.data import Field, Dataset, Example, BucketIterator

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import spacy

from IPython.display import clear_output

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

# Define data paths
data_path = '/content/gdrive/My Drive/Colab Notebooks/data/NLP/A1'
labels_path = '/content/gdrive/My Drive/Colab Notebooks/data/NLP/A1/thedeep/thedeep.labels.txt'

Mounted at /content/gdrive


### 1.1 Loading thedeep dataset

In [None]:
# Load thedeep training dataset into Pandas dataframe
thedeep_df_train = pd.read_csv(
  os.path.join(data_path, 'thedeep/thedeep.medium.train.txt'),
  sep=',',
  names=['sentence_id', 'text', 'label'],
  index_col=0,
  skiprows=[0]
)

# Load thedeep validation dataset into Pandas dataframe
thedeep_df_valid = pd.read_csv(
  os.path.join(data_path, 'thedeep/thedeep.medium.validation.txt'),
  sep=',',
  names=['sentence_id', 'text', 'label'],
  index_col=0,
  skiprows=[0]
)

# Load thedeep test dataset into Pandas dataframe
thedeep_df_test = pd.read_csv(
  os.path.join(data_path, 'thedeep/thedeep.medium.test.txt'),
  sep=',',
  names=['sentence_id', 'text', 'label'],
  index_col=0,
  skiprows=[0]
)

# Show structure of thedeep dataset
thedeep_df_train.head()

Unnamed: 0_level_0,text,label
sentence_id,Unnamed: 1_level_1,Unnamed: 2_level_1
28291,The primary reported needs for IDPs across the...,4
9695,Some 602 000 IDPs are now spread across the co...,3
7781,South Sudanese soldiers accused of raping at l...,9
31382,"Since the beginning of 2017, 18 882 suspected/...",11
19919,The number of new suspected cholera cases in 2...,11


### 1.2 Basic information about thedeep

In [None]:
# Load label captions
labelcaptions = {}
with open(labels_path) as fr:
  for label in fr:
    vals = label.strip().split(',')
    labelcaptions[vals[1]] = int(vals[0])
    
# Show labels and corresponding numbers
labelcaptions

{'Agriculture': 0,
 'Cross': 1,
 'Education': 2,
 'Food': 3,
 'Health': 4,
 'Livelihood': 5,
 'Logistic': 6,
 'NFI': 7,
 'Nutrition': 8,
 'Protection': 9,
 'Shelter': 10,
 'WASH': 11}

In [None]:
# Show number of training samples per label
thedeep_df_train['label'].value_counts()

4     5419
9     4618
3     4341
10    2553
11    2178
5     1712
2     1278
8     1207
1     1066
7     1054
0      743
6      430
Name: label, dtype: int64

In [None]:
# Show number of validation samples per label
thedeep_df_valid['label'].value_counts()

4     1196
9      960
3      954
10     474
11     463
5      378
2      300
8      264
1      232
7      229
0      168
6       81
Name: label, dtype: int64

In [None]:
# Show number of test samples per label
thedeep_df_test['label'].value_counts()

4     1181
9      957
3      944
10     509
11     484
5      382
2      283
8      272
1      223
7      193
0      177
6       94
Name: label, dtype: int64

## 2 Data preprocessing, word embedding and saving

Just executed once

### 2.1 Define torchtext.Field and apply preprocessing steps to thedeep dataset

In [None]:
# Define torchtext.Field objects for Tensor representation of data
text_field = Field(tokenize='spacy', lower=True, batch_first=True)
label_field = Field(sequential=False, use_vocab=False, batch_first=True)
fields = [('')]

# Apply preprocessing to training, validation and test set
text_train_pre = thedeep_df_train['text'].apply(lambda x: text_field.preprocess(x))
text_valid_pre = thedeep_df_valid['text'].apply(lambda x: text_field.preprocess(x))
text_test_pre = thedeep_df_test['text'].apply(lambda x: text_field.preprocess(x))

text_train_pre



sentence_id
28291    [the, primary, reported, needs, for, idps, acr...
9695     [some, 602,  , 000, idps, are, now, spread, ac...
7781     [south, sudanese, soldiers, accused, of, rapin...
31382    [since, the, beginning, of, 2017, ,, 18, 882, ...
19919    [the, number, of, new, suspected, cholera, cas...
                               ...                        
36292    [cholera, continues, to, spread, in, yemen, ,,...
5566     [an, estimated, 165,000, children, are, expect...
19676    [on, 3, march, 2017, ,, tropical, storm, enawo...
29831    [the, presence, of, uxo, was, reported, in, 15...
27747    [as, at, week, 27, (, july, 1, -, 7, ,, 2017, ...
Name: text, Length: 26599, dtype: object

### 2.2 Load GloVe.6B.300d word embeddings, create dictionary and word embeddings


In [None]:
# Load GloVe6B.300d word embedding - takes a LOOONG time - and build
# GloVe-based vocabulary for all datasets
text_field.build_vocab(text_train_pre, vectors='glove.6B.300d')
text_field.build_vocab(text_valid_pre, vectors='glove.6B.300d')
text_field.build_vocab(text_test_pre, vectors='glove.6B.300d')

100%|█████████▉| 399998/400000 [00:37<00:00, 10858.42it/s]

In [None]:
# Checking total number of different words in corpus after preprocessing
text_pre = [text_train_pre, text_valid_pre, text_test_pre]
dictionary = {}
for text in text_pre:
  for doc in text:
    for word in doc:
      if word not in dictionary: dictionary[word] = 1
      else: dictionary[word] += 1

print(f'Length of dictionary: {len(dictionary)} words')

Length of dictionary: 48817 words


### 2.3 Initialize words not found in vocabulary with random values from a normal distribution

In [None]:
# Get torchtext.vocab instance and show the structure of the tensor.
# The whole corpus is in one big tensor.
text_field.vocab.vectors

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0466,  0.2132, -0.0074,  ...,  0.0091, -0.2099,  0.0539],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]])

In [None]:
# Turn all words which were not contained in Glove vocabulary from zero vectors into random
# vectors with a normal distribution

# Define zero vector to compare other vectors to
zero_tensor = torch.zeros_like(text_field.vocab.vectors[0])

# Turn zero vectors in vocabulary.vectors to random vectors with std = 1
counter = 0
for i, vector in enumerate(text_field.vocab.vectors):
  if torch.all(torch.eq(vector, zero_tensor)):
    text_field.vocab.vectors[i] = torch.randn_like(zero_tensor)
    counter += 1

print(f'{counter} new words initialized randomly with normally distributed values \n')

# Show updated tensor without zero-vectors
text_field.vocab.vectors

4538 new words initialized randomly with normally distributed values 



tensor([[-1.3601, -0.4453,  1.6286,  ..., -0.9567,  0.9818,  0.1765],
        [-2.4346, -0.3123, -0.9448,  ..., -2.1006, -0.7532,  0.8999],
        [ 0.0466,  0.2132, -0.0074,  ...,  0.0091, -0.2099,  0.0539],
        ...,
        [ 1.0273,  0.0028, -0.3037,  ..., -2.4012, -0.5784, -0.6563],
        [ 0.3459,  0.4757,  0.1960,  ...,  0.8434,  2.1771,  0.0535],
        [-0.3181, -1.0090,  0.6965,  ..., -0.9082,  0.0988, -1.5894]])

### 2.4 Save preprocessed data

In [None]:
# Pickle preprocessed and embedded data
with open(os.path.join(data_path, 'text_field.pickle'), 'wb') as f:
    pickle.dump(text_field, f)
with open(os.path.join(data_path, 'label_field.pickle'), 'wb') as f:
    pickle.dump(label_field, f)

## 3 Load preprocessed and embedded data and construct Dataset object from pandas dataframe

In [None]:
# Load preprocessed and embedded data
with open(os.path.join(data_path, 'text_field.pickle'), 'rb') as f:
    text_field = pickle.load(f)
with open(os.path.join(data_path, 'label_field.pickle'), 'rb') as f:
    label_field = pickle.load(f)

# Define torchtext Dataset class to load pandas DataFrame
class DataFrameDataset(Dataset):
    def __init__(self, df:pd.DataFrame, fields:list):
        super(DataFrameDataset, self).__init__(
            [Example.fromlist(list(r), fields) for i, r in df.iterrows()], fields
        )

# Construct DataFrameDataset for all datasets
fields = (('text', text_field), ('label', label_field))
train_dataset = DataFrameDataset(df=thedeep_df_train, fields=fields)
valid_dataset = DataFrameDataset(df=thedeep_df_valid, fields=fields)
test_dataset = DataFrameDataset(df=thedeep_df_test, fields=fields)



In [None]:
# Example sentence in torchtext.data.Example object
train_dataset[0].text

['the',
 'primary',
 'reported',
 'needs',
 'for',
 'idps',
 'across',
 'the',
 'whole',
 'of',
 'libya',
 'were',
 'access',
 'to',
 'food',
 ',',
 'health',
 'services',
 'and',
 'shelter',
 '.',
 'the',
 'main',
 'issues',
 'related',
 'to',
 'the',
 'above',
 '-',
 'mentioned',
 'needs',
 'are',
 'that',
 'goods',
 'are',
 'too',
 'expensive',
 'and',
 'therefore',
 'idps',
 'have',
 'limit',
 'access',
 '.',
 'other',
 'issues',
 'cited',
 'for',
 'access',
 'to',
 'health',
 'included',
 'irregular',
 'supply',
 'of',
 'medicines',
 'and',
 'low',
 'quality',
 'of',
 'available',
 'health',
 'services',
 'due',
 'to',
 'overcrowded',
 'facilities',
 ',',
 'lack',
 'of',
 'medical',
 'staff',
 'and',
 'a',
 'diminished',
 'availability',
 'of',
 'female',
 'doctors',
 '.']

## 4 Definition of LSTM model and training loop

### 4.1 Define LSTM model architecture

In [None]:
class ClassificationRNNModel(nn.Module):
  def __init__(self, vocab_size, embed_dim, hidden_dim, num_class, dropout, bidirectional=False):
    super().__init__()
    
    self.vocab_size = vocab_size
    self.embed_dim = embed_dim
    self.num_class = num_class
    self.bidirectional = bidirectional

    self.embedding = nn.Embedding(vocab_size, embed_dim)
    self.lstm = nn.LSTM(input_size=embed_dim,
                    hidden_size=hidden_dim,
                    num_layers=1,
                    bidirectional=bidirectional)
    self.dropout = nn.Dropout(dropout)
    self.fc = nn.Linear(hidden_dim, num_class)

  def forward(self, text, lengths):
    if self.bidirectional:
      embedded = self.embedding(text)
      packed_embedded = pack_padded_sequence(embedded, lengths.to('cpu'), batch_first=True, enforce_sorted=False)
      packed_output, (h_t, c_t) = self.lstm(packed_embedded)
      hidden = self.dropout(h_t[-1])
      # Use LSTM's last hidden state as input for FC layer
      text_features = self.fc(hidden)
    else:
      embedded = self.embedding(text)
      packed_embedded = pack_padded_sequence(embedded, lengths.to('cpu'), batch_first=True, enforce_sorted=False)
      packed_output, (h_t, c_t) = self.lstm(packed_embedded)
      hidden = self.dropout(h_t[-1])
      # Use LSTM's last hidden state as input for FC layer
      text_features = self.fc(hidden)

    return text_features

### 4.2 collate_fn function for PyTorch DataLoader

In [None]:
# Data batching for RNN
# Text entries have different lengths => use custom function to generate data
# batches (texts, lengths, labels), then pass it to collate_fn in Pytorch DataLoader

def generate_batch_rnn(batch):
  texts, lengths, labels = [], [], []
  
  for sample in batch:
    sample_list = [vocabulary[word] for word in sample.text]
    texts.append(torch.LongTensor(sample_list))
    lengths.append(len(sample_list))
    labels.append(sample.label)

  # Even out text lengths
  texts = pad_sequence(texts, batch_first=True)
  lengths = torch.tensor(lengths)
  labels = torch.tensor (labels)

  return texts, lengths, labels

### 4.3 Train and test functions

In [None]:
def train(model, optimizer, criterion, train_data, valid_data, test_data, epochs):
  valid_loss, best_val_loss = 10001, 10000
  counter = 0

  for epoch in range(epochs):
    start_time = time.time()
    train_loss, train_acc = 0, 0
    
    for i, (text, lengths, labels) in enumerate(train_data):
      optimizer.zero_grad()
      text, lengths, labels = text.to(device), lengths.to(device), labels.to(device)
      output = model(text, lengths)
      loss = criterion(output, labels)
      train_loss += loss.item()
      loss.backward()
      optimizer.step()
      train_acc = train_acc + (output.argmax(1) == labels).sum().item() / len(text)

    train_loss = train_loss / len(train_data)
    train_acc = train_acc / len(train_data)

    secs = int(time.time() - start_time)

    print(f'Epoch: {epoch+1} | Loss: {train_loss:.4f} | Acc: {train_acc:.4f} | {secs:.1f} sec')
    if valid_data and (epoch+1) % 5 == 0: 
      valid_loss = test(model, criterion, valid_data, mode='valid')[0]
      counter += 1

    # Early stopping
    if valid_loss < best_val_loss:
      with open('model.pt', 'wb') as f:
        torch.save(model, f) # save current state of model
        best_val_loss = valid_loss
        counter = 0 # reset counter if a new best validation loss was found
    
    if counter == 3:
      # break for loop and stop training
      print(f'Early stopping triggered.')
      break

  with open('model.pt', 'rb') as f:
    model = torch.load(f)

  # Test best model on test dataset
  if test_data: test(model, criterion, test_data, mode='test')

  return model


def test(model, criterion, data, mode='test'):
  loss, acc = 0, 0

  for text, lengths, labels in data:
    text, lengths, labels = text.to(device), lengths.to(device), labels.to(device)
    
    with torch.no_grad():
      output = model(text, lengths)
      loss = criterion(output, labels)
      loss += loss.item()
      acc = acc + (output.argmax(1) == labels).sum().item() / len(text)
  
  loss = loss / len(data)
  acc = acc / len(data)
  if mode == 'test':
    print('Testing best model...\n')
    print(f'\nTest | Loss: {loss:.4f} | Acc: {acc:.4f}\n')
  else:
    print(f'\nValidation | Loss: {loss:.4f} | Acc: {acc:.4f}\n')

  return loss, acc

## 5 Training and evaluation

### 5.1 Standard parameters

In [None]:
# Hyperparameters
BATCH_SIZE = 64
VOCAB_SIZE = len(text_field.vocab.vectors)
EMBED_DIM = 300
HIDDEN_DIM = 128
DROPOUT = 0.5
N_CLASSES = len(labelcaptions)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = ClassificationRNNModel(VOCAB_SIZE, EMBED_DIM, HIDDEN_DIM, N_CLASSES, DROPOUT).to(device)
criterion = torch.nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) 

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch_rnn)
valid_loader = DataLoader(valid_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)

model_trained = train(model, optimizer, criterion, train_loader, valid_loader, test_loader, epochs=50)

Epoch: 1 | Loss: 2.1578 | Acc: 0.2773 | 8.0 sec
Epoch: 2 | Loss: 1.7314 | Acc: 0.4696 | 8.0 sec
Epoch: 3 | Loss: 1.6218 | Acc: 0.4994 | 8.0 sec
Epoch: 4 | Loss: 1.4128 | Acc: 0.5732 | 8.0 sec
Epoch: 5 | Loss: 1.2711 | Acc: 0.6147 | 8.0 sec

Validation | Loss: 0.0614 | Acc: 0.6062

Epoch: 6 | Loss: 1.1772 | Acc: 0.6387 | 8.0 sec
Epoch: 7 | Loss: 1.0967 | Acc: 0.6561 | 8.0 sec
Epoch: 8 | Loss: 1.0628 | Acc: 0.6638 | 8.0 sec
Epoch: 9 | Loss: 0.9951 | Acc: 0.6808 | 8.0 sec
Epoch: 10 | Loss: 0.9449 | Acc: 0.6950 | 8.0 sec

Validation | Loss: 0.0364 | Acc: 0.6190

Epoch: 11 | Loss: 0.8934 | Acc: 0.7078 | 8.0 sec
Epoch: 12 | Loss: 0.8476 | Acc: 0.7216 | 8.0 sec
Epoch: 13 | Loss: 0.8080 | Acc: 0.7289 | 8.0 sec
Epoch: 14 | Loss: 0.7633 | Acc: 0.7383 | 8.0 sec
Epoch: 15 | Loss: 0.7419 | Acc: 0.7450 | 8.0 sec

Validation | Loss: 0.0365 | Acc: 0.6171

Epoch: 16 | Loss: 0.7032 | Acc: 0.7549 | 8.0 sec
Epoch: 17 | Loss: 0.6726 | Acc: 0.7607 | 8.0 sec
Epoch: 18 | Loss: 0.6404 | Acc: 0.7687 | 8.0 sec
E

### 5.2 Experiment 1: Increased number of LSTM's hidden dimensions: 128 => 512

In [None]:
# Hyperparameters
BATCH_SIZE = 64
VOCAB_SIZE = len(text_field.vocab.vectors)
EMBED_DIM = 300
HIDDEN_DIM = 512 # Increased from 128 to 512
DROPOUT = 0.5
N_CLASSES = len(labelcaptions)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

model = ClassificationRNNModel(VOCAB_SIZE, EMBED_DIM, HIDDEN_DIM, N_CLASSES, DROPOUT).to(device)
criterion = torch.nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) 

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch_rnn)
valid_loader = DataLoader(valid_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)

# Define TensorBoard summary writer for simple network
writer = SummaryWriter('runs/thedeep_rnn')

model_trained = train(model, optimizer, criterion, train_loader, valid_loader, test_loader, epochs=50)

cuda
Epoch: 1 | Loss: 2.1112 | Acc: 0.3182 | 19.0 sec
Epoch: 2 | Loss: 1.9899 | Acc: 0.3541 | 19.0 sec
Epoch: 3 | Loss: 1.5493 | Acc: 0.5318 | 19.0 sec
Epoch: 4 | Loss: 1.2765 | Acc: 0.6111 | 19.0 sec
Epoch: 5 | Loss: 1.1465 | Acc: 0.6413 | 19.0 sec

Validation | Loss: 0.0350 | Acc: 0.6281

Epoch: 6 | Loss: 1.0410 | Acc: 0.6657 | 19.0 sec
Epoch: 7 | Loss: 0.9442 | Acc: 0.6894 | 19.0 sec
Epoch: 8 | Loss: 0.8625 | Acc: 0.7089 | 19.0 sec
Epoch: 9 | Loss: 0.7935 | Acc: 0.7257 | 19.0 sec
Epoch: 10 | Loss: 0.7180 | Acc: 0.7456 | 19.0 sec

Validation | Loss: 0.0253 | Acc: 0.6234

Epoch: 11 | Loss: 0.6587 | Acc: 0.7572 | 19.0 sec
Epoch: 12 | Loss: 0.6048 | Acc: 0.7725 | 19.0 sec
Epoch: 13 | Loss: 0.5553 | Acc: 0.7816 | 19.0 sec
Epoch: 14 | Loss: 0.5291 | Acc: 0.7854 | 19.0 sec
Epoch: 15 | Loss: 0.5068 | Acc: 0.7897 | 19.0 sec

Validation | Loss: 0.0168 | Acc: 0.6069

Epoch: 16 | Loss: 0.4840 | Acc: 0.7924 | 19.0 sec
Epoch: 17 | Loss: 0.4615 | Acc: 0.7958 | 19.0 sec
Epoch: 18 | Loss: 0.4505 | A

### 5.3 Experiment 2: Increased dropout rate: 0.5 => 0.8

In [None]:
# Hyperparameters
BATCH_SIZE = 64
VOCAB_SIZE = len(text_field.vocab.vectors)
EMBED_DIM = 300
HIDDEN_DIM = 128
DROPOUT = 0.8 # Increased dropout rate from 0.5 to 0.8
N_CLASSES = len(labelcaptions)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

model = ClassificationRNNModel(VOCAB_SIZE, EMBED_DIM, HIDDEN_DIM, N_CLASSES, DROPOUT).to(device)
criterion = torch.nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) 

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch_rnn)
valid_loader = DataLoader(valid_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)

# Define TensorBoard summary writer for simple network
writer = SummaryWriter('runs/thedeep_rnn')

model_trained = train(model, optimizer, criterion, train_loader, valid_loader, test_loader, epochs=50)

cuda
Epoch: 1 | Loss: 2.2211 | Acc: 0.2560 | 8.0 sec
Epoch: 2 | Loss: 1.9976 | Acc: 0.3633 | 8.0 sec
Epoch: 3 | Loss: 1.7547 | Acc: 0.4683 | 8.0 sec
Epoch: 4 | Loss: 1.5591 | Acc: 0.5379 | 8.0 sec
Epoch: 5 | Loss: 1.4916 | Acc: 0.5624 | 8.0 sec

Validation | Loss: 0.0395 | Acc: 0.5581

Epoch: 6 | Loss: 1.3432 | Acc: 0.6111 | 8.0 sec
Epoch: 7 | Loss: 1.2752 | Acc: 0.6259 | 8.0 sec
Epoch: 8 | Loss: 1.2070 | Acc: 0.6390 | 8.0 sec
Epoch: 9 | Loss: 1.1719 | Acc: 0.6446 | 8.0 sec
Epoch: 10 | Loss: 1.1240 | Acc: 0.6563 | 8.0 sec

Validation | Loss: 0.0472 | Acc: 0.6027

Epoch: 11 | Loss: 1.0733 | Acc: 0.6676 | 8.0 sec
Epoch: 12 | Loss: 1.0327 | Acc: 0.6751 | 8.0 sec
Epoch: 13 | Loss: 1.0042 | Acc: 0.6855 | 8.0 sec
Epoch: 14 | Loss: 0.9715 | Acc: 0.6913 | 8.0 sec
Epoch: 15 | Loss: 0.9434 | Acc: 0.6985 | 8.0 sec

Validation | Loss: 0.0566 | Acc: 0.6038

Epoch: 16 | Loss: 0.9150 | Acc: 0.7051 | 8.0 sec
Epoch: 17 | Loss: 0.8878 | Acc: 0.7096 | 8.0 sec
Epoch: 18 | Loss: 0.8758 | Acc: 0.7163 | 8.0 

### 5.4 Experiment 3: Bidirectional LSTM

In [None]:
# Hyperparameters
BATCH_SIZE = 64
VOCAB_SIZE = len(text_field.vocab.vectors)
EMBED_DIM = 300
HIDDEN_DIM = 128
DROPOUT = 0.5
N_CLASSES = len(labelcaptions)
BIDIRECTIONAL = True # Bidirectional LSTM

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

model = ClassificationRNNModel(VOCAB_SIZE, EMBED_DIM, HIDDEN_DIM, N_CLASSES, DROPOUT, BIDIRECTIONAL).to(device)
criterion = torch.nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) 

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch_rnn)
valid_loader = DataLoader(valid_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, collate_fn=generate_batch_rnn)

# Define TensorBoard summary writer for simple network
writer = SummaryWriter('runs/thedeep_rnn')

model_trained = train(model, optimizer, criterion, train_loader, valid_loader, test_loader, epochs=50)

cuda
Epoch: 1 | Loss: 2.0724 | Acc: 0.3193 | 12.0 sec
Epoch: 2 | Loss: 1.6556 | Acc: 0.4987 | 12.0 sec
Epoch: 3 | Loss: 1.3980 | Acc: 0.5826 | 12.0 sec
Epoch: 4 | Loss: 1.2619 | Acc: 0.6196 | 12.0 sec
Epoch: 5 | Loss: 1.1575 | Acc: 0.6451 | 12.0 sec

Validation | Loss: 0.0469 | Acc: 0.6020

Epoch: 6 | Loss: 1.0970 | Acc: 0.6577 | 12.0 sec
Epoch: 7 | Loss: 1.0361 | Acc: 0.6700 | 12.0 sec
Epoch: 8 | Loss: 0.9758 | Acc: 0.6861 | 12.0 sec
Epoch: 9 | Loss: 0.9252 | Acc: 0.6964 | 12.0 sec
Epoch: 10 | Loss: 0.8771 | Acc: 0.7100 | 12.0 sec

Validation | Loss: 0.0459 | Acc: 0.6138

Epoch: 11 | Loss: 0.8336 | Acc: 0.7214 | 12.0 sec
Epoch: 12 | Loss: 0.8317 | Acc: 0.7243 | 12.0 sec
Epoch: 13 | Loss: 0.7590 | Acc: 0.7447 | 12.0 sec
Epoch: 14 | Loss: 0.7232 | Acc: 0.7505 | 12.0 sec
Epoch: 15 | Loss: 0.6880 | Acc: 0.7585 | 12.0 sec

Validation | Loss: 0.0284 | Acc: 0.6041

Epoch: 16 | Loss: 0.6618 | Acc: 0.7672 | 12.0 sec
Epoch: 17 | Loss: 0.6397 | Acc: 0.7696 | 12.0 sec
Epoch: 18 | Loss: 0.6148 | A

**Main sources:**

https://towardsdatascience.com/lstm-text-classification-using-pytorch-2c6c657f8fc0

https://www.kaggle.com/swarnabha/pytorch-text-classification-torchtext-lstm
