# Text Classification Example:

This example reproduces the sentiment analysis with the fast text classifier described in [Bag of Tricks for Efficient Text Classification](https://arxiv.org/abs/1607.01759). In the papaer, the sentences are represent as bag of words (BoW) and train a linear classifier. Recently, a few text classification datasets, including 
    - AG_NEWS,
    - SogouNews, 
    - DBpedia, 
    - YelpReviewPolarity,
    - YelpReviewFull, 
    - YahooAnswers, 
    - AmazonReviewPolarity,
    - AmazonReviewFull
are added to PyTorch/torchtext and can be loaded with a single command. This example shows the applciation of TextClassificationDataset and reproduce the results of the paper.

## Load data with ngrams
A bag of ngrams features is used in the paper to capture some partial information about the local word order. In practice, bi-gram or tri-gram are applied to provide more benefits as word groups than only one word. An example:

    "load data with ngrams"
    Bi-grams results: "load data", "data with", "with ngrams"
    Tri-grams results: "load data with", "data with ngrams"

TextClassificationDataset supports the ngrams method. By setting ngrams to 2, the example text in the dataset will be a list of single words plus bi-grams string.

Data iterators are loaded via the iters() function in the instance with a batch size of 128 and computation device. At the same time, the word strings are numericalized (i.e. converted from a list of tokens to a list of indexs).

In [None]:
import torch
import torchtext
from torchtext.datasets import AG_NEWS
NGRAMS = 2
txt_cls = AG_NEWS(ngrams=NGRAMS)
BATCH_SIZE = 512
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Define the model

The model is composed of the embedding layer and the linear layer. Between the two layers, we apply a 2D average pooling function over the input signal (see [avg_pool2d](https://pytorch.org/docs/stable/nn.html?highlight=avg_pool2d#torch.nn.AvgPool2d)). Then, we use the log-softmax function to compute the probability distribution over the classes.
<img src="./pictures/text_sentiment_ngrams_model.png" width="600" height="360">

In [None]:
import torch.nn as nn
import torch.nn.functional as F
class TextSentiment(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_class, pad_idx):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()
        
    def forward(self, text):
        embedded = self.embedding(text) # Sent_Len * Batch_Size * Embed_Dim
        embedded = embedded.transpose(0, 1) # Batch_Size * Sent_Len * Embed_Dim
        pooled = F.avg_pool2d(embedded, (embedded.shape[1], 1)).squeeze(1) # Batch_Size * Embed_Dim
        out = self.fc(pooled) # Batch_Size * Num_Class
        return F.log_softmax(out, dim=1)

## Initiate an instance

The AG_NEWS dataset has four labels and therefore the number of classes is four.

    1 : World
    2 : Sports
    3 : Business
    4 : Sci/Tec

The vocab size is equal to the length of vocab (including single word and ngrams).

In [None]:
VOCAB_SIZE = len(txt_cls.fields['text'].vocab)
EMBED_DIM = 128
NUN_CLASS = 4
PAD_IDX = txt_cls.fields['text'].vocab.stoi[txt_cls.fields['text'].pad_token]
UNK_IDX = txt_cls.fields['text'].vocab.stoi[txt_cls.fields['text'].unk_token]
model = TextSentiment(VOCAB_SIZE, EMBED_DIM, NUN_CLASS, PAD_IDX).to(device)
model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBED_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBED_DIM)

## Add optimizer and loss function

[Adam](https://pytorch.org/docs/stable/optim.html?highlight=adam#torch.optim.Adam) algorithm is used to optimize the model. 
[NLLLoss](https://pytorch.org/docs/stable/nn.html?highlight=nll_loss#torch.nn.NLLLoss) function calculates the negative log likelihood loss, which is used to train a classification problem with C classes.

In [None]:
import torch.optim as optim
optimizer = optim.Adam(model.parameters())
loss_func = F.nll_loss

## Define a function to train the model and evaluate results.

In [None]:
def train_evaluate_func(model, examples, fields, loss_func, device, batch_size=64, status='train', optimizer=None):
    
    _loss = 0
    _acc = 0
    
    if status == 'train':
        model.train()
    else:
        model.eval()

    import random
    random.shuffle(examples)
    loss = torch.Tensor([0.0]).to(device) 
    for ex_idx, ex in enumerate(examples):
        if status == 'train':
            optimizer.zero_grad()

        text_input = fields['text'].numericalize(fields['text'].pad([ex.text])).to(device)
        target = torch.Tensor([ex.label - 1]).long().to(device)
        output = model(text_input)
        loss += loss_func(output, target)

        if output.max(1)[1].item() == target.item():
            _acc += 1
                
        if (1 + ex_idx) % batch_size == 0 or ex_idx == len(examples) - 1:
            if status == 'train':
                loss.backward()
                optimizer.step()
        
            _loss += loss.item()
            loss = torch.Tensor([0.0]).to(device)

    return _loss / len(examples), _acc / len(examples)

## Run the model

In [None]:
import time
N_EPOCHS = 6
min_valid_loss = float('inf')
for epoch in range(N_EPOCHS):

    start_time = time.time()

    rnd = torchtext.data.dataset.RandomShuffler(None)
    train_examples, _test_examples, valid_examples = \
        torchtext.data.dataset.rationed_split(txt_cls.train_examples, 0.7, 0.0, 0.3, rnd)
    
    train_loss, train_acc = train_evaluate_func(model, train_examples, txt_cls.fields, loss_func, device=device, status='train', optimizer=optimizer)
    with torch.no_grad():
        valid_loss, valid_acc = train_evaluate_func(model, valid_examples, txt_cls.fields, loss_func, device=device, status='valid')
    
    end_time = time.time()
    _mins = int((end_time - start_time) / 60)
    _secs = int(end_time - start_time - _mins * 60)

    if valid_loss < min_valid_loss:
        min_valid_loss = valid_loss
        torch.save(model, 'text_classification.pt')
    
    print('Epoch: %d' %(epoch + 1), " | time in %d minutes, %d seconds" %(_mins, _secs))
    print(f'\tLoss: {train_loss:.3f}(train)\t|\t{valid_loss:.3f}(valid)')
    print(f'\tAcc: {train_acc * 100:.1f}%(train)\t|\t{valid_acc * 100:.1f}%(valid)')

Running the model on GPU with the following information:

Epoch: 1  | time in 12 minutes, 33 seconds

        Loss: 0.529(train)      |       0.283(valid)
        Acc: 85.9%(train)       |       91.2%(valid)
        
Epoch: 2  | time in 18 minutes, 31 seconds

        Loss: 0.178(train)      |       0.164(valid)
        Acc: 94.6%(train)       |       94.8%(valid)
        
Epoch: 3  | time in 10 minutes, 11 seconds

        Loss: 0.094(train)      |       0.088(valid)
        Acc: 97.2%(train)       |       97.3%(valid)
        
Epoch: 4  | time in 10 minutes, 55 seconds          

        Loss: 0.049(train)      |       0.051(valid)
        Acc: 98.5%(train)       |       98.4%(valid)
        
Epoch: 5  | time in 10 minutes, 57 seconds         

        Loss: 0.028(train)      |       0.030(valid)
        Acc: 99.1%(train)       |       99.1%(valid)
        
Epoch: 6  | time in 18 minutes, 19 seconds          

        Loss: 0.018(train)      |       0.020(valid)
        Acc: 99.4%(train)       |       99.2%(valid)

## Evaluate the model with test dataset

In [None]:
print('Checking the results of test dataset...')
with torch.no_grad():
    test_loss, test_acc = train_evaluate_func(model, txt_cls.test_examples, txt_cls.fields, loss_func, device=device, status='test')
print(f'\tLoss: {test_loss:.3f}(test)\t|\tAcc: {test_acc * 100:.1f}%(test)')

Checking the results of test dataset...

        Loss: 0.274(test)       |       Acc: 92.4%(test)
        
The results are consistent with the reference paper [Bag of Tricks for Efficient Text Classification](https://arxiv.org/abs/1607.01759).

## Test on a random news

Use the best model so far and test a golf news.

In [None]:
import re
from torchtext.data.utils import generate_ngrams

ag_news_label = {1 : "World",
                 2 : "Sports",
                 3 : "Business",
                 4 : "Sci/Tec"}

def predict_label(text_string, ngrams):
    model.eval()
    text_string = re.sub(r'[^a-zA-Z0-9\s]', ' ', text_string)
    tokens = text_string.split(" ")
    ngrams_tokens = generate_ngrams(tokens, ngrams)
    indexed = [txt_cls.fields['text'].vocab.stoi[item] for item in ngrams_tokens]
    indexed_tensor = torch.LongTensor(indexed).to(device).unsqueeze(1)
    result = model(indexed_tensor)
    label_index = result.max(1, keepdim=True)[1].item() + 1 # label starts from 1
    return ag_news_label[label_index]

example_text_string = "Defending champion Bryson DeChambeau and 2017 \
                       champion Dustin Johnson have committed to play \
                       in THE NORTHERN TRUST this August for the first \
                       event of the FedExCup Playoffs."

new_label = ""
with  open("text_classification.pt", 'rb') as f:
    model = torch.load(f)
    news_label = predict_label(example_text_string, 2)

print("This is a %s news" %news_label)

This is a Sports news

## Run the model with TorchText Iterator/Batch (optional)

There is an option to use TorchText Iterator/Batch. Train the model with batch will significantly reduce the computation time under GPU.

### Define a simple dataset for data examples

In [None]:
class TextDataset(torch.utils.data.Dataset):
    def __init__(self, examples, fields):
        self.examples = examples
        self.fields = fields

    def __getitem__(self, i):
        return self.examples[i]

    def __len__(self):
        try:
            return len(self.examples)
        except TypeError:
            return 2**32

    def __iter__(self):
        for x in self.examples:
            yield x

### Generate train/test/valid iterators

In [None]:
from torchtext.data import dataset
from torchtext.data.iterator import Iterator

split_ratio = 0.7
rnd = dataset.RandomShuffler(None)
train_ratio, valid_ratio = split_ratio, 1 - split_ratio
train_examples, _test_examples, valid_examples = \
    dataset.rationed_split(txt_cls.train_examples,
                           train_ratio, 0.0,
                           valid_ratio, rnd)

train = TextDataset(train_examples, txt_cls.fields)
train.sort_key = txt_cls.sort_key
train_iterator = Iterator(train, batch_size=BATCH_SIZE, train=True, device=device)

test = TextDataset(txt_cls.test_examples, txt_cls.fields)
test.sort_key = txt_cls.sort_key
test_iterator = Iterator(test, batch_size=BATCH_SIZE, train=False, device=device)

valid = TextDataset(valid_examples, txt_cls.fields)
valid.sort_key = txt_cls.sort_key
valid_iterator = Iterator(valid, batch_size=BATCH_SIZE, train=False, device=device)


### Reset the model

In [None]:
model = TextSentiment(VOCAB_SIZE, EMBED_DIM, NUN_CLASS, PAD_IDX).to(device)
model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBED_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBED_DIM)
optimizer = optim.Adam(model.parameters())

### Train the model with batch/iterators

In [None]:
def train_evaluate_func(model, iterator, loss_func, status='train', optimizer=None):
    
    _loss = 0
    _acc = 0
    
    if status == 'train':
        model.train()
    else:
        model.eval()

    for batch in iterator:
        if status == 'train':
            optimizer.zero_grad()
        
        output = model(batch.text)
        target = batch.label.long()
        
        loss = loss_func(output, target)
                
        if status == 'train':
            loss.backward()
            optimizer.step()
        
        _loss += loss.item()
            
        pred = output.max(1, keepdim=True)[1]
        acc = pred.eq(target.view_as(pred)).sum().item() / len(target)
        _acc += acc 
    return _loss / len(iterator), _acc / len(iterator)

### Run the model

In [None]:
import time
N_EPOCHS = 6
min_valid_loss = float('inf')
for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train_evaluate_func(model, train_iterator, loss_func, status='train', optimizer=optimizer)
    with torch.no_grad():
        valid_loss, valid_acc = train_evaluate_func(model, valid_iterator, loss_func, status='valid')
    
    end_time = time.time()
    _mins = int((end_time - start_time) / 60)
    _secs = int(end_time - start_time - _mins * 60)

    if valid_loss < min_valid_loss:
        min_valid_loss = valid_loss
        torch.save(model, 'text_classification.pt')
    
    print('Epoch: %d' %(epoch + 1), " | time in %d minutes, %d seconds" %(_mins, _secs))
    print(f'\tLoss: {train_loss:.3f}(train)\t|\t{valid_loss:.3f}(valid)')
    print(f'\tAcc: {train_acc * 100:.1f}%(train)\t|\t{valid_acc * 100:.1f}%(valid)')

Running the model on GPU with the following information:

Epoch: 1  | time in 0 minutes, 16 seconds

        Loss: 0.963(train)      |       0.362(valid)
        Acc: 80.5%(train)       |       88.5%(valid)
        
Epoch: 2  | time in 0 minutes, 16 seconds

        Loss: 0.375(train)      |       0.349(valid)
        Acc: 91.7%(train)       |       90.6%(valid)
        
Epoch: 3  | time in 0 minutes, 16 seconds

        Loss: 0.228(train)      |       0.353(valid)
        Acc: 94.5%(train)       |       91.4%(valid)
        
Epoch: 4  | time in 0 minutes, 16 seconds

        Loss: 0.151(train)      |       0.359(valid)
        Acc: 96.4%(train)       |       91.8%(valid)
        
Epoch: 5  | time in 0 minutes, 16 seconds

        Loss: 0.102(train)      |       0.373(valid)
        Acc: 97.6%(train)       |       91.9%(valid)
        
Epoch: 6  | time in 0 minutes, 16 seconds

        Loss: 0.069(train)      |       0.394(valid)
        Acc: 98.5%(train)       |       91.9%(valid)