## Assignment 2.4: Text classification via CNN (20 points)

In this assignment you should perform sentiment analysis of the IMDB reviews based on CNN architecture. Read carefully [Convolutional Neural Networks for Sentence Classification](https://arxiv.org/pdf/1408.5882.pdf) by Yoon Kim.

In [1]:
import numpy as np
import torch

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torchtext import datasets
from torchtext.data import Field, LabelField
from torchtext.data import Iterator

### Preparing Data

In [2]:
TEXT = Field(sequential=True, lower=True, batch_first=True)
LABEL = LabelField(batch_first=True)

In [3]:
train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split()

In [4]:
# %%time
TEXT.build_vocab(trn)

In [5]:
LABEL.build_vocab(trn)

### Creating the Iterator (2 points)

Define an iterator here

In [6]:
from torchtext.data import BucketIterator

train_iter, val_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(64, 64, 64),
        sort=True,
        sort_key=lambda x: len(x.text),
        sort_within_batch=False,
        device='cuda',
        repeat=False
)

### Define CNN-based text classification model (8 points)

In [7]:
class CNN(nn.Module):
    def __init__(self, V, D, kernel_sizes, dropout=0.5):
        super(CNN, self).__init__()
        self.emb = nn.Embedding(V, D)
        self.convs = nn.Sequential(
            *[nn.Sequential(
                nn.Conv1d(D // (i + 1), D // (i + 2), kernel_size),
                nn.ReLU(),
            ) for i, kernel_size in enumerate(kernel_sizes)],
            nn.AdaptiveAvgPool1d(1)
            
        )
        self.fc = nn.Linear(D // (len(kernel_sizes) + 1), 1)
        
    def forward(self, x):
        embedded = self.emb(x)
        embedded = embedded.permute(0, 2, 1)
        conv_out = self.convs(embedded).permute(0, 2, 1).view(x.shape[0], -1)
        logits = self.fc(conv_out)
        return logits

In [8]:
kernel_sizes = [3,4,5]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300

model = CNN(vocab_size, dim, kernel_sizes, dropout)

In [9]:
model.cuda()

CNN(
  (emb): Embedding(201569, 300)
  (convs): Sequential(
    (0): Sequential(
      (0): Conv1d(300, 150, kernel_size=(3,), stride=(1,))
      (1): ReLU()
    )
    (1): Sequential(
      (0): Conv1d(150, 100, kernel_size=(4,), stride=(1,))
      (1): ReLU()
    )
    (2): Sequential(
      (0): Conv1d(100, 75, kernel_size=(5,), stride=(1,))
      (1): ReLU()
    )
    (3): AdaptiveAvgPool1d(output_size=1)
  )
  (fc): Linear(in_features=75, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

### The training loop (3 points)

Define the optimization function and the loss functions.

In [10]:
opt = optim.Adam(model.parameters(), lr=3e-4)
loss_func = nn.BCEWithLogitsLoss()

Think carefully about the stopping criteria. 

In [11]:
epochs = 4

In [12]:
%%time
for epoch in range(1, epochs + 1):
    running_loss = 0.0
    running_corrects = 0
    model.train() 
    for batch in train_iter:         
        
        x = batch.text
        y = batch.label.view(-1, 1).type(torch.float)
        
        opt.zero_grad()
        preds = model(x)
        loss = loss_func(preds, y)
        loss.backward()
        opt.step()
        running_loss += loss.item()
        
    epoch_loss = running_loss / len(trn)
    
    val_loss = 0.0
    model.eval()
    correct = 0
    total = 0 
    for batch in val_iter:
        
        x = batch.text
        y = batch.label.view(-1, 1).type(torch.float)
        
        preds = model(x)
        loss = loss_func(preds, y)
        val_loss += loss.item()
        
    val_loss /= len(vld)
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, epoch_loss, val_loss))

Epoch: 1, Training Loss: 0.009209398308822087, Validation Loss: 0.008331149073441823
Epoch: 2, Training Loss: 0.006368021609101977, Validation Loss: 0.006764837757746379
Epoch: 3, Training Loss: 0.00552402412210192, Validation Loss: 0.014815388214588165
Epoch: 4, Training Loss: 0.003656219011545181, Validation Loss: 0.006566977659861247
CPU times: user 30.3 s, sys: 78.7 ms, total: 30.4 s
Wall time: 30.6 s


### Calculate performance of the trained model (2 points)

In [13]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

y_true = np.zeros(len(tst))
y_pred = np.zeros(len(tst))

model.eval()

with torch.no_grad():
    for i, batch in enumerate(test_iter):
        x = batch.text
        y = batch.label
        y_batch_pred = torch.exp(model(x))
        y_true[i * 64 : (i + 1) * 64] = y.cpu().numpy()
        y_pred[i * 64 : (i + 1) * 64] = y_batch_pred.cpu().numpy().flatten() > 0.5

print(f'accuracy: {round(accuracy_score(y_true, y_pred), 3)}')
print(f'precision: {round(precision_score(y_true, y_pred), 3)}')
print(f'recall: {round(recall_score(y_true, y_pred), 3)}')
print(f'f1: {round(f1_score(y_true, y_pred), 3)}')

accuracy: 0.826
precision: 0.788
recall: 0.894
f1: 0.837


Write down the calculated performance

### Accuracy: 0.826
### Precision: 0.788
### Recall: 0.894
### F1: 0.837

### Experiments (5 points)

Experiment with the model and achieve better results. Implement and describe your experiments in details, mention what was helpful.

### 1. ?
### 2. ?
### 3. ?