# RTML Final 2021

In this exam, we'll have some practical exercises using RNNs and some short answer questions regarding the Transformer/attention
and reinforcement learning.

Consider the AGNews text classification dataset:

In [60]:
from torchtext.datasets import AG_NEWS
from torchtext.data.utils import get_tokenizer
from collections import Counter
from torchtext.vocab import Vocab
import pandas as pd
import string
import re

#train_iter = pd.read_csv("./data/.data/train.csv")
#train_iter['summary'] = train_iter['Title'] + ' ' + train_iter['Description']
train_iter = AG_NEWS(root='.data',split='train')
tokenizer = get_tokenizer('basic_english')
counter = Counter()


def clean(line):
    line = line.replace('\\', ' ')
    line = re.sub('(\s+)(a|an|and|the)(\s+)', '\1\3', line)
    line = re.sub('[%s]' % re.escape(string.punctuation), '', line)

    return line

labels = {}
for (label, line) in train_iter:
    if label in labels:
        labels[label] += 1
    else:
        labels[label] = 1
    counter.update(tokenizer(clean(line)))

vocab = Vocab(counter, min_freq=0, max_size=1000)

print('Label frequencies:', labels)
print('A few token frequencies:', vocab.freqs.most_common(5))
print('Label meanings: 1: World news, 2: Sports news, 3: Business news, 4: Sci/Tech news')

Label frequencies: {3: 30000, 4: 30000, 2: 30000, 1: 30000}
A few token frequencies: [('to', 106167), ('of', 71310), ('in', 64953), ('on', 47273), ('for', 36960)]
Label meanings: 1: World news, 2: Sports news, 3: Business news, 4: Sci/Tech news


Here's how we can get a sequence of tokens for a sentence with the cleaner, tokenizer, and vocabulary:

In [54]:
[vocab[token] for token in tokenizer(clean('Bangkok, or The Big Mango, is one of the great cities of Asia'))]

[3914, 96, 8, 291, 0, 11, 45, 7438, 1990, 3, 1057]

Let's make pipelines for processing a news story and a label:

In [55]:
text_pipeline = lambda x: [vocab[token] for token in tokenizer(clean(x))]
label_pipeline = lambda x: int(x) - 1

Here's how to create dataloaders for the training and test datasets:

In [56]:
import torch
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def collate_batch(batch):
    label_list, text_list, length_list = [], [], []
    for (_label, _text) in batch:
        label_list.append(label_pipeline(_label))
        processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
        length_list.append(processed_text.shape[0])
        text_list.append(processed_text)
    label_list = torch.tensor(label_list, dtype=torch.int64)
    text_list = pad_sequence(text_list, padding_value=0)
    length_list = torch.tensor(length_list, dtype=torch.int64)
    return label_list.to(device), text_list.to(device), length_list.to(device)

train_iter = AG_NEWS(split='train')
train_dataset = list(train_iter)
test_iter = AG_NEWS(split='test')
test_dataset = list(test_iter)
train_dataloader = DataLoader(train_dataset, batch_size=8, shuffle=True, collate_fn=collate_batch)
test_dataloader = DataLoader(test_dataset, batch_size=8, shuffle=False, collate_fn=collate_batch)

Here's how to get a batch from one of these dataloaders. The first entry is a 1D tensor of labels for the batch
(8 values between 0 and 3), then a 2D tensor representing the stories with dimension T x B (number of tokens x batch size). 

In [57]:
batch = next(enumerate(train_dataloader))
print(batch)

(0, (tensor([3, 1, 0, 0, 0, 2, 3, 1], device='cuda:0'), tensor([[   491,    136,   1145,  59356,   5454,   3052,   1388,   9830],
        [  3516,    895,    272,   1124,   1435,      2,   1352, 231994],
        [  4962,   2025,    424,   5895,   1637,    251,   3800,    242],
        [ 50357,   2308,     16,  54052,   1083,   9010,    220,      6],
        [   491,     23,  49994,  13942,    846,      6,    457,     22],
        [    16,   5530,     19,     39,     72,  18237,      6,   2812],
        [213605,  41516,     19,  12709,     72,     60,  96872,   9830],
        [    65,   7562,    744,  39289,   1486,    177,     19, 140479],
        [   201,  40739,    424,     25,     56,     17,     19, 227501],
        [  7628,    540,  72416,  29607,   2217,   3052, 165581,    634],
        [ 50357,  39511,  71440,    843,   1435,     84,    363,      4],
        [ 29398,  11110,   1145,     72,   8047,   1913,    162,   1191],
        [   129,  12040,    272,     72,  57174,     24,

## Question 1, 10 points

The vocabulary currently is too large for a simple one-hot embedding. Let's reduce the vocabulary size
so that we can use one-hot. First, add a step that removes tokens from a list of "stop words" to the `text_pipeline` function.
You probably want to remove punctuation ('.', ',', '-', etc.) and articles ("a", "the").

Once you've removed stop words, modify the vocabulary to include only the most frequent 1000 tokens (including 0 for an unknown/infrequent word).

Write your revised code in the cell below and output the 999 top words with their frequencies:

In [6]:
# Place code for Question 1 here
def clean(line):
    line = line.replace('\\', ' ')
    #Add for question one
    line = re.sub('(\s+)(a|an|and|the)(\s+)', '\1\3', line)
    line = re.sub('[%s]' % re.escape(string.punctuation), '', line)
    return line



## Question 2, 30 points

Next, let's build a simple RNN for classification of the AGNews dataset. Use a one-hot embedding of the vocabulary
entries and the basic RNN from Lab 10. Use the lengths tensor (the third element in the batch returned by the dataloaders)
to determine which output to apply the loss to.

Place your training code below, and plot the training and test accuracy as a
function of epoch. Finally, output a confusion matrix for the test set.

*Do not spend a lot of time on the training! A few minutes is enough. The point is to show that the model is
learning, not to get the best possible performance.*

In [103]:
# Place code for Question 2 here
import torch
import torch.nn as nn
import torch.optim as optim
num_class = len(set([label for (label, text) in train_iter]))
vocab_size = len(vocab)
emsize = 64
n_hidden = 128 
n_tag = 4

class RNN(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_class):
        super(RNN, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)
#train
net = RNN(vocab_size,emsize,num_class).to(device)
criterion = nn.NLLLoss()
opt = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

    


In [104]:
import time

def train(dataloader):
    net.train()
    total_acc, total_count = 0, 0
    log_interval = 500
    start_time = time.time()

    for idx, (label, text, offsets) in enumerate(dataloader):
        opt.zero_grad()
        predited_label = net(text, offsets)
        loss = criterion(predited_label, label)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
        opt.step()
        total_acc += (predited_label.argmax(1) == label).sum().item()
        total_count += label.size(0)
        if idx % log_interval == 0 and idx > 0:
            elapsed = time.time() - start_time
            print('| epoch {:3d} | {:5d}/{:5d} batches '
                  '| accuracy {:8.3f}'.format(epoch, idx, len(dataloader),
                                              total_acc/total_count))
            total_acc, total_count = 0, 0
            start_time = time.time()

def evaluate(dataloader):
    net.eval()
    total_acc, total_count = 0, 0

    with torch.no_grad():
        for idx, (label, text, offsets) in enumerate(dataloader):
            predited_label = net(text, offsets)
            loss = criterion(predited_label, label)
            total_acc += (predited_label.argmax(1) == label).sum().item()
            total_count += label.size(0)
    return total_acc/total_count

In [105]:
EPOCHS = 10 # epoch
LR = 5  # learning rate
BATCH_SIZE = 64 # batch size for training

for epoch in range(1, EPOCHS + 1):
    epoch_start_time = time.time()
    train(train_dataloader)
    accu_val = evaluate(valid_dataloader)
    if total_accu is not None and total_accu > accu_val:
      scheduler.step()
    else:
       total_accu = accu_val
    print('-' * 59)
    print('| end of epoch {:3d} | time: {:5.2f}s | '
          'valid accuracy {:8.3f} '.format(epoch,
                                           time.time() - epoch_start_time,
                                           accu_val))
    print('-' * 59)

TypeError: forward() takes 2 positional arguments but 3 were given

## Question 3, 10 points

Next, replace the SRNN from Question 2 with a single-layer LSTM. Give the same output (training and testing accuracy as a function of epoch, as well as confusion
matrix for the test set). Comment on the differences you observe between the two models.

In [8]:
# Place code for Question 3 here

## Question 4, 10 points

Explain how you could use the Transformer model to perform the same task you explored in Questions 2 and 3.
How would attention be useful for this text classification task? Give a precise and detailed answer. Be sure to discuss what
parts of the original Transformer you would use and what you would have to remove.

Answer:
Transformer model can use a sequence of text as a input of model. This model can do as parallelization of sequence. For this task, there are many tag or many word in sentence. self-Attention machanism could be help for AgNews dataset which it try to looks at an input sequence and decides at each step which other parts of the sequence are important. To modify code for self-Attention, we need to separate part into docoder and encoder,and add more linear layer as a V, Q, K (multihead attention) after that we use same softmax function for output.

## Question 5, 10 points

In Lab 13, you implemented a DQN model for tic-tac-toe. You method learned to play against a fairly dumb `expert_action` opponent, however.  Also,
DQN has proven to be less stable than other methods such as Double DQN, also discussed in Lab 13.

Explain below how you would apply double DQN and self-play to improve your tic-tac-toe agent.
Provide pseudocode for the algorithm below.

Answer:
Double DQN is part of RL model which contain 2 neural network model.First, model learn during the experience play as same as DQN. Second, copy last episode of the first model to compare for Q-value. If values of the second model are lower that the main model, we use second model to attain Q-value. Sometime DQN overestimate the reward so double DQN decoupling the actions selection from the action evaluation. To apply double DQN for tic-tac-toe, the input is as same as DQN but we need to create 2 neural network. First neural network will decides which one is the best next action and then second network evaluates this action to know Q-value.




In [None]:
#Pseudo code
def select_greedy_actions(states: torch.Tensor, q_network: nn.Module) -> torch.Tensor:
    
    _, actions = q_network(states).max(dim=1, keepdim=True)
    return actions


def evaluate_selected_actions(states: torch.Tensor,
                              actions: torch.Tensor,
                              rewards: torch.Tensor,
                              dones: torch.Tensor,
                              gamma: float,
                              q_network: nn.Module) -> torch.Tensor:
    next_q_values = q_network(states).gather(dim=1, index=actions)        
    q_values = rewards + (gamma * next_q_values * (1 - dones))
    return q_values
def double_q_learning_update(states: torch.Tensor,
                             rewards: torch.Tensor,
                             dones: torch.Tensor,
                             gamma: float,
                             q_network_1: nn.Module,
                             q_network_2: nn.Module) -> torch.Tensor:
  
    actions = select_greedy_actions(states, q_network_1)
    q_values = evaluate_selected_actions(states, actions, rewards, dones, gamma, q_network_2)
    return q_values

#Pseudo code

q_network1 is first neural network

q_network2 is secone neural network



Select_greedy_actions(state,q_network)

        Selection action for q_network

evaluate_action(state,action,reward,done,q_network)

        findding a next q value from q_network
                
        evaluate q value

DDQN_update(state,reward,done,q_network1,q_network2)

        selection from Select_greedy_actions by using state and q q_network1

        evaluate action from evaluate_action which using q_network2 which return q value from seconde neural network



## Question 6, 30 points

Based on your existing DQN implementation, implement the double DQN and self-play training method
you just described. After some training (don't spend too much time on training -- again, we just want to see that the model can
learn), show the result you playing a game against your learned agent.

In [9]:
# Code for training and playing goes here