# COLX 563 Lab Assignment 4: Slot filling
## Assignment Objectives

In this lab, you will build an end-to-end system for basic (binary) intent recognition and slot filling in the context of a dialogue system. It is a team assignment, and you have nearly complete freedom with regards to your solution, with a few restrictions mentioned below. For this lab, you will work with your capstone team.

## Getting Started

Add imports below.

In [1]:
! pip install transformers
! pip install sentencepiece



In [2]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import accuracy_score
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from tqdm import tqdm, trange
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score, classification_report, confusion_matrix
from transformers import *
import pandas as pd
from itertools import cycle
from collections import defaultdict
from tqdm import tqdm, trange


In [3]:
#from google.colab import drive
#drive.mount('/content/drive')

In [4]:
manual_seed = 11
torch.manual_seed(manual_seed)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
n_gpu = torch.cuda.device_count()
if n_gpu > 0:
    torch.cuda.manual_seed(manual_seed)

print(torch.cuda.get_device_name(0))

cuda
Tesla V100-SXM2-16GB


For this lab, you'll be working with the MultiWOZ dataset of goal-oriented dialogues (2.2). You can look at the full corpus [here](https://github.com/budzianowski/multiwoz/tree/master/data/MultiWOZ_2.2). It has an impressively detailed annotation involving multiple turns and multiple goals which we have simplified to just the initiating request (first turn) and involving two possible intents and the corresponding slots for those intents. Download the data from [github](https://github.ubc.ca/jungyeul/COLX_563_adv-semantics_lab_students/raw/master/Multiwoz.zip), unzip it into a directory outside of your lab repo and change the path below.

## Set up directories and uploead data. Ignore this cell if you are using data in the drive

In [5]:
import os
woz_directory ="../content/data/Multiwoz/"
data_directory = "../content/data/"
if os.path.exists(woz_directory) == False:
    os.makedirs(woz_directory)
if os.path.exists(data_directory) == False:
    os.makedirs(data_directory)

# train, dev, test data dir
train_path = data_directory + "train/"
if os.path.exists(train_path) == False:
    os.makedirs(train_path)
dev_path = data_directory + "dev/"
if os.path.exists(dev_path) == False:
    os.makedirs(dev_path)
test_path = data_directory + "test/"
if os.path.exists(test_path) == False:
    os.makedirs(test_path)

# output dir
output_dir = data_directory + 'mtl-rb/'
if os.path.exists(output_dir) == False:
    os.makedirs(output_dir)

## Tidy Submission
rubric={mechanics:1}

To get the marks for tidy submission:
- Submit the assignment by filling in this Jupyter notebook with your answers embedded
- Be sure to follow the instructions

## Inspecting the data

Let's look at corresponding pairs of utterances and answers from the training portion of our corpus

In [6]:
count = 0
with open(woz_directory + "WOZ_train_utt.txt") as f1:
    with open(woz_directory + "WOZ_train_ans.txt") as f2:
        while count < 20:
            print(f1.readline().strip())
            print(f2.readline().strip())
            print("------")
            count += 1

Guten Tag, I am staying overnight in Cambridge and need a place to sleep. I need free parking and internet.
find_hotel|hotel-area=centre|hotel-internet=yes|hotel-parking=yes
------
Hi there! Can you give me some info on Cityroomz?
find_hotel|hotel-name=cityroomz
------
I am looking for a hotel named alyesbray lodge guest house.
find_hotel|hotel-name=alyesbray lodge guest house
------
I am looking for a restaurant. I would like something cheap that has Chinese food.
find_restaurant|restaurant-food=chinese|restaurant-pricerange=cheap
------
I'm looking for an expensive restaurant in the centre if you could help me.
find_restaurant|restaurant-area=centre|restaurant-pricerange=expensive
------
I'm looking for a places to go and see during my upcoming trip to Cambridge.
find_hotel
------
Yeah, could you recommend a good gastropub?
find_restaurant|restaurant-food=gastropub
------
I want to find an expensive restaurant and serves european food. Can i also have the address, phone number and it

The utterances consists of a request for information about either hotels or restaurants. The first part of the answer starts with the intent (either find_restaurant or find_hotel) and then lists the slots that have been filled in based on the utterance. Your goal is to generate this string of intents and slots based purely on the utterance. A few things to note:

* Not all slots are filled in, and sometimes there are no slots filled in at all (but there is always an intent).
* There are a fixed number of slots for each intent, and they always appear in a particular order, when they are filled in
* The slot values sometimes but do not always correspond to what appears in the utterance. For example, a mention of wanting wifi in the request becomes hotel-internet=yes.

We will be evaluating based on exact duplication of the entire output string, so before you start coding a solution, you should look carefully at examples in the training set and make sure you understand all the different components of the output, and how they related to the input utterance. In particular, you should identify the various constituent parts of the task, and judge which are likely to be easy, and which are likely to be more difficult.

In [7]:
## Preprocessing

## Preprocessing: extract sub-aspects and generate tsv file for each task

In [8]:
def get_aspects(aspects):
    result = []
    for aspect in aspects:
        result.append(aspect.split("=")[0])
    return result   


def get_X(utt_file):
    X = []
    with open(woz_directory + utt_file) as f1:
        for line in f1:
            line = line.strip()
            X.append(line)
    return X


def get_full_aspect_set(ans_file):
    full_aspect_set = set()
    y = []
    with open(woz_directory + ans_file) as f2:
        for line in f2:
            line = line.strip()
            line_lst = line.split('|')
            aspects = get_aspects(line_lst[1:])
            full_aspect_set.update(aspects)
            y.append(aspects)
    return y, full_aspect_set


def get_sub_aspect_set(full_aspect_set):
    sub_aspect_set = set()
    for aspect in full_aspect_set:
        sub_aspect_set.add(aspect.split("-")[1])
    return sub_aspect_set


def get_sub_y(y, sub_aspect_lst):
    result = []
    for sub_aspect in sub_aspect_lst:
        tmp = []
        for tags in y:
            not_found = True
            for tag in tags:
                if sub_aspect in tag:
                    tmp.append(sub_aspect)
                    not_found = False
                    break
            if not_found: 
                tmp.append("not")
        result.append(tmp)
    return result


def write_tsv(utt_file, ans_file="", split_type = "train"):
    X = get_X(utt_file)
    print(len(X))
    # write contexts file
    with open(f'./data/{split_type}/contents.txt', 'w') as file:
        for context in X:
            file.write('%s\n' % context)
    if split_type!="test":
        y, full_aspect_set = get_full_aspect_set(ans_file)
        sub_aspect_set = get_sub_aspect_set(full_aspect_set)
        sub_aspect_lst = sorted(list(sub_aspect_set))
        print(sub_aspect_lst)
        all_sub_y = get_sub_y(y, sub_aspect_lst)
        
        for i, y_sub_aspect in enumerate(all_sub_y):
            # write tag files
            with open(f'./data/{split_type}/{sub_aspect_lst[i]}.txt', 'w') as file:
                for tag in y_sub_aspect:
                    file.write('%s\n' % tag)
            # write tsv files
            with open(f"./data/{split_type}/content_{sub_aspect_lst[i]}.tsv", "w") as fout:
                fout.write("content\tlabel\n")
                for content, tag in zip(X, y_sub_aspect):
                    fout.write(content + "\t" + tag + "\n")
    
        return X, all_sub_y
    else:
        with open(f"./data/{split_type}/content.tsv", "w") as fout:
            fout.write("content\tlabel\n")
            for content in X:
                fout.write(content + "\t" + "not" + "\n")

        return X

In [9]:
write_tsv("WOZ_train_utt.txt", "WOZ_train_ans.txt")
write_tsv("WOZ_dev_utt.txt", "WOZ_dev_ans.txt", "dev")
write_tsv("WOZ_test_utt.txt", "", "test");

3760
['area', 'food', 'internet', 'name', 'parking', 'pricerange', 'stars', 'type']
413
['area', 'food', 'internet', 'name', 'parking', 'pricerange', 'stars', 'type']
400


## Dataset, Encoder, Dataloader

In [10]:
class CustomDataset(Dataset):
    # initialization
    def __init__(self, dataframe, tokenizer, max_len, lab2ind):
        """
          dataframe: pandas DataFrame.
          tokenizer: Hugginfance BERT/RoBERTa tokenizer
          max_len: maximal length of input sequence
          lab2ind: dictionary of label classes
        """
        self.tokenizer = tokenizer
        self.data = dataframe
        self.comment_text = self.data.content
        self.labels = self.data.label
        self.max_len = max_len
        self.lab2ind = lab2ind

    # get the size of the dataset
    def __len__(self):
        return len(self.comment_text)

    # generate sample by index
    def __getitem__(self, index):
        # get ith sample and label
        comment_text = str(self.comment_text[index])
        label = str(self.labels[index])

        label = self.lab2ind[label]
        # use encode_plus() of Transformers to tokenize and vectorize input seuqnce and covert it to tensors. 
        # this method truncate or pad sequence to the maximal length and then return pytorch tensors. 
        inputs = self.tokenizer.encode_plus(
            comment_text,
            None,
            add_special_tokens=True,
            padding="max_length",
            truncation=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            return_tensors = "pt"
        )
        return {
            'ids': inputs['input_ids'].squeeze(0),  # shape of input_ids: [1, max_length]
            'masks': inputs['attention_mask'].squeeze(0), # shape of attention_mask: [1, max_length]
            'targets': torch.tensor(label, dtype=torch.long)
        }

In [11]:
def regular_encode(file_path, tokenizer, lab2ind, shuffle=True, num_workers = 2, batch_size=64, maxlen = 32, mode = 'train'): 
    '''
      file_path: path to your dataset file
      tokenizer: tokenizer method
      lab2ind: label-to-index dictionary
      shuffle: shuffle the dataset or not
      num_workers: a number of data processors
      batch_size: the number of batch size
      maxlen: maximal sequence length
      mode: the type of dataset
    '''
    # if we are in train mode, we will load two columns (i.e., text and label).
    if mode == 'train':
        # Use pandas to load dataset, the dataset should be a tsv file where the first line is the header.
        df = pd.read_csv(file_path, delimiter='\t',header=0, names=['content','label'], encoding='utf-8', quotechar=None, quoting=3)
    
    # if we are in predict mode, we will load one column (i.e., text).
    elif mode == 'predict':
        df = pd.read_csv(file_path, delimiter='\t',header=0, names=['content', 'label'])
        
    else:
        print("the type of mode should be either 'train' or 'predict'. ")
        return
        
    print("{} Dataset: {}".format(file_path, df.shape))
    # instantiate the dataset instance 
    custom_set = CustomDataset(df, tokenizer, maxlen,lab2ind)
    num_samples = len(custom_set)
    num_labels = len(lab2ind)
    dataset_params = {'batch_size': batch_size, 'shuffle': shuffle, 'num_workers': num_workers}

    batch_data_loader = DataLoader(custom_set, **dataset_params)
    # return a data iterator
    return batch_data_loader, num_samples, num_labels

In [12]:
model_name_path = "roberta-base"

max_seq_length = 64
train_batch_size = 32
eval_batch_size = 128
hidden_size = 768

lr = 2e-5
max_grad_norm = 1.0
warmup_proportion = 0.1
num_train_epochs = 5

In [13]:
tokenizer = RobertaTokenizerFast.from_pretrained(model_name_path)

In [14]:
task_names = ['area', 'food', 'internet', 'name', 'parking', 'pricerange', 'stars', 'type']
all_lab2ind = [{'area': 0, 'not': 1},{'food': 0, 'not': 1},
               {'internet': 0, 'not': 1},{'name': 0, 'not': 1},
               {'parking': 0, 'not': 1},{'pricerange': 0, 'not': 1},
               {'stars': 0, 'not': 1},{'type': 0, 'not': 1}]

In [15]:
train_loaders = []
valid_loaders = []
test_loaders = []
data_sizes = []
total_training_batch = 0
for i, task in enumerate(task_names):
    lab2ind = all_lab2ind[i]
    ##############################
    train_loader, num_samples, num_label  = regular_encode(os.path.join(train_path, f"content_{task}.tsv"), tokenizer, lab2ind, shuffle=True, batch_size=train_batch_size, maxlen = max_seq_length)
    
    data_sizes.append(num_samples)
    total_training_batch += len(train_loader)
    train_loaders.append(iter(train_loader))
    
    ##############################
    valid_loader, _, _  = regular_encode(os.path.join(dev_path, f"content_{task}.tsv"), tokenizer, lab2ind, shuffle=False, batch_size=eval_batch_size, maxlen = max_seq_length)
    valid_loaders.append(valid_loader)
    
    ###############################
    test_loader, _, _  = regular_encode(os.path.join(test_path, "content.tsv"), tokenizer, lab2ind, shuffle=False, batch_size=eval_batch_size, maxlen = max_seq_length, mode="predict")
    test_loaders.append(test_loader)

../content/data/train/content_area.tsv Dataset: (3760, 2)
../content/data/dev/content_area.tsv Dataset: (413, 2)
../content/data/test/content.tsv Dataset: (400, 2)
../content/data/train/content_food.tsv Dataset: (3760, 2)
../content/data/dev/content_food.tsv Dataset: (413, 2)
../content/data/test/content.tsv Dataset: (400, 2)
../content/data/train/content_internet.tsv Dataset: (3760, 2)
../content/data/dev/content_internet.tsv Dataset: (413, 2)
../content/data/test/content.tsv Dataset: (400, 2)
../content/data/train/content_name.tsv Dataset: (3760, 2)
../content/data/dev/content_name.tsv Dataset: (413, 2)
../content/data/test/content.tsv Dataset: (400, 2)
../content/data/train/content_parking.tsv Dataset: (3760, 2)
../content/data/dev/content_parking.tsv Dataset: (413, 2)
../content/data/test/content.tsv Dataset: (400, 2)
../content/data/train/content_pricerange.tsv Dataset: (3760, 2)
../content/data/dev/content_pricerange.tsv Dataset: (413, 2)
../content/data/test/content.tsv Dataset:

## CLS layer, train, evaluation, optimizer, scheduler

In [16]:
class CLS_LAYER(nn.Module):
    def __init__(self, label_num, hidden_size):
        super(CLS_LAYER, self).__init__()
        self.hidden_size = hidden_size
        self.label_num = label_num
        
        self.dense = nn.Linear(self.hidden_size, self.hidden_size)
        self.dropout = nn.Dropout(0.1)

        # the output dimention is the number of classes in the task. 
        self.fc = nn.Linear(self.hidden_size, self.label_num)
        # initialization
        initial_module(self.dense)
        initial_module(self.fc)
  
    def forward(self, pooler_output):
        
        x = self.dense(pooler_output)
        x = torch.tanh(x)
        x = self.dropout(x)
        logits = self.fc(x)

        return logits

In [17]:
def initial_module(module):
    torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
    torch.nn.init.constant_(module.bias, 0)

In [18]:
class MT_BERT(nn.Module):
    def __init__(self, model_name_path, classifier_layers):
        super(MT_BERT, self).__init__()

        self.bert_model = RobertaModel.from_pretrained(model_name_path)
        self.classifiers = nn.ModuleList(classifier_layers)

    def forward(self, input_ids, input_mask, task_id):
        outputs = self.bert_model(input_ids = input_ids, attention_mask = input_mask)
        pooler_output = outputs['pooler_output']
        
        # select classification module according to the task index
        logits = self.classifiers[task_id](pooler_output)
        
        return logits

In [19]:
def create_model(model_name_path, label_list, hidden_size):
    # create a classification module for each task
    classification_layers = [CLS_LAYER(len(task_label2ind), hidden_size) for task_label2ind in label_list]

    model = MT_BERT(model_name_path, classification_layers)
    return model

In [20]:
print("total number of training batches:", total_training_batch)
train_loaders = [cycle(it) for it in train_loaders]

total number of training batches: 944


In [21]:
def train(model, optimizer, scheduler, loss_func, data_sizes, num_per_epoch, train_loaders):
    '''
    model: multi-task model
    optimizer: AdamW optimizer
    scheduler: learning rate scheduler
    loss_func: loss funtion
    data_sizes: a list of sizes of training sets
    num_per_epoch: training steps of each epoch
    train_loaders: a list of training dataloaders
    '''
    model.train()

    # record training losses of all the tasks
    tr_loss = [0. for i in range(len(data_sizes))]

    # At each step, we sample a training dataloader to generate a batch. 
    # The sampling probability is based on the size of training set of each task. 
    total_sample = sum(data_sizes)
    probs = [p/total_sample for p in data_sizes]

    task_id = 0
    epoch = 0

    for step in range(num_per_epoch):
        # Select a training dataloader by the sampling probability. 
        task_id = np.random.choice(int(len(data_sizes)), p=probs)

        # Generate batch of selected task.
        batch = next(train_loaders[task_id])
        
         # load data batch
        input_ids = batch['ids'].to(device)
        input_mask = batch['masks'].to(device)
        labels = batch['targets'].to(device)
        
        # forward
        outputs = model(input_ids, input_mask, task_id)
        loss = loss_func(outputs, labels)

        # delete used variables to free GPU memory
        del batch, input_ids, input_mask, labels
        optimizer.zero_grad()
            
        loss.backward()

        optimizer.step()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) 
        scheduler.step()
    
        # free GPU memory
        if device == 'cuda':
            torch.cuda.empty_cache()

    return tr_loss

In [22]:
def evaluate(model, iterator, loss_func, task_id):
    
    model.eval()
    
    epoch_loss = 0
    all_pred=[]
    all_label = []
    
    with torch.no_grad():
        
        for i, batch in enumerate(iterator):

            input_ids = batch['ids'].to(device)
            input_mask = batch['masks'].to(device)
            labels = batch['targets'].to(device)

            outputs = model(input_ids, input_mask, task_id)

            loss = loss_func(outputs, labels)
            # delete used variables to free GPU memory
            del batch, input_ids, input_mask

            # identify the predicted class for each example in the batch
            probabilities, predicted = torch.max(outputs.cpu().data, 1)
            # put all the true labels and predictions to two lists
            #print(predicted)
            #print(labels)
            all_pred.extend(predicted)
            all_label.extend(labels.cpu())
    
    accuracy = accuracy_score(all_label, all_pred)
    f1score = f1_score(all_label, all_pred, average='macro') 

    return epoch_loss / len(iterator), accuracy, f1score

In [23]:
def create_optimizer_and_scheduler(model, num_training_steps, warmup_steps, learning_rate):
    """
    Setup the optimizer and the learning rate scheduler.
    num_training_steps: the number of training steps
    warmup_steps: the number of warm-up steps
    learning_rate: the peak learning rate
    """
    optimizer = AdamW(
    model.parameters(),
    lr=learning_rate
    )
    
    lr_scheduler = get_linear_schedule_with_warmup(
        optimizer, 
        num_warmup_steps=warmup_steps, 
        num_training_steps=num_training_steps
    )

    return optimizer, lr_scheduler

In [24]:
model = create_model(model_name_path, all_lab2ind, hidden_size).to(device)

In [25]:
num_training_steps  = total_training_batch * num_train_epochs
num_warmup_steps = num_training_steps * warmup_proportion

In [26]:
optimizer, scheduler = create_optimizer_and_scheduler(model, num_training_steps, num_warmup_steps, lr)
#loss_func = nn.NLLLoss()
loss_func = nn.CrossEntropyLoss()

## Train the model for each sub aspect 

In [27]:
all_result_acc_dev = defaultdict(list)
all_result_loss_dev = defaultdict(list)
all_result_f1_dev = defaultdict(list)

if os.path.isdir(output_dir) == False:
    os.mkdir(output_dir)

for epoch in trange(num_train_epochs, desc="Epoch"):
    text_file = open(os.path.join(output_dir,"results.txt"), "a")
    _ = train(model, optimizer, scheduler, loss_func, data_sizes, total_training_batch, train_loaders)  
    
    # Evaluate at end of each epoch and save the evaluation results to a txt file.
    text_file.write(' Epoch [{}/{}]\n'.format(epoch+1, num_train_epochs))

    for i, task in enumerate(task_names): 
        val_loss, val_acc, val_f1 = evaluate(model, valid_loaders[i], loss_func, i)
        
        all_result_acc_dev[task].append(val_acc)
        all_result_loss_dev[task].append(val_loss)
        all_result_f1_dev[task].append(val_f1)


        text_file.write(' Task {}:\n Validation Accuracy: {:.6f}, Validation F1: {:.6f}\n'.format(task, val_acc, val_f1))
        print(' Task {}:\n Validation Accuracy: {:.6f}, Validation F1: {:.6f}\n'.format(task, val_acc, val_f1))

    text_file.write("\n\n")
    text_file.close()

    final_result = {}
    final_result["all_result_acc_dev"] = all_result_acc_dev
    final_result["all_result_loss_dev"] = all_result_loss_dev
    final_result["all_result_f1_dev"] = all_result_f1_dev

    torch.save(final_result, os.path.join(output_dir, "all_res.pt"))
    
    # Create a model checkpoint at end of each epoch
    if torch.cuda.device_count() <= 1:
        state_dict_model = model.state_dict()
    else:
        state_dict_model = model.module.state_dict()

    state = {
    'epoch': epoch,
    'state_dict': state_dict_model,
    'optimizer': optimizer.state_dict(),
    'scheduler': scheduler.state_dict()
    }
    
    torch.save(state, os.path.join(output_dir,"mt{}_{}.pt".format(len(task_names),str(epoch+1))))

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

 Task area:
 Validation Accuracy: 0.987893, Validation F1: 0.984895

 Task food:
 Validation Accuracy: 0.992736, Validation F1: 0.990267

 Task internet:
 Validation Accuracy: 0.997579, Validation F1: 0.993443

 Task name:
 Validation Accuracy: 0.990315, Validation F1: 0.987065

 Task parking:
 Validation Accuracy: 0.995157, Validation F1: 0.987021

 Task pricerange:
 Validation Accuracy: 0.990315, Validation F1: 0.988727

 Task stars:
 Validation Accuracy: 1.000000, Validation F1: 1.000000

 Task type:
 Validation Accuracy: 0.924939, Validation F1: 0.847824



Epoch:  20%|██        | 1/5 [02:18<09:12, 138.03s/it]

 Task area:
 Validation Accuracy: 0.990315, Validation F1: 0.987884

 Task food:
 Validation Accuracy: 0.987893, Validation F1: 0.983884

 Task internet:
 Validation Accuracy: 0.997579, Validation F1: 0.993443

 Task name:
 Validation Accuracy: 0.992736, Validation F1: 0.990267

 Task parking:
 Validation Accuracy: 0.997579, Validation F1: 0.993576

 Task pricerange:
 Validation Accuracy: 0.995157, Validation F1: 0.994339

 Task stars:
 Validation Accuracy: 1.000000, Validation F1: 1.000000

 Task type:
 Validation Accuracy: 0.929782, Validation F1: 0.859598



Epoch:  40%|████      | 2/5 [04:31<06:50, 136.75s/it]

 Task area:
 Validation Accuracy: 0.987893, Validation F1: 0.984895

 Task food:
 Validation Accuracy: 0.992736, Validation F1: 0.990267

 Task internet:
 Validation Accuracy: 0.997579, Validation F1: 0.993443

 Task name:
 Validation Accuracy: 0.987893, Validation F1: 0.983562

 Task parking:
 Validation Accuracy: 0.997579, Validation F1: 0.993576

 Task pricerange:
 Validation Accuracy: 0.995157, Validation F1: 0.994339

 Task stars:
 Validation Accuracy: 0.997579, Validation F1: 0.992093

 Task type:
 Validation Accuracy: 0.917676, Validation F1: 0.846716



Epoch:  60%|██████    | 3/5 [06:45<04:31, 135.81s/it]

 Task area:
 Validation Accuracy: 0.987893, Validation F1: 0.984895

 Task food:
 Validation Accuracy: 0.992736, Validation F1: 0.990267

 Task internet:
 Validation Accuracy: 0.997579, Validation F1: 0.993443

 Task name:
 Validation Accuracy: 0.990315, Validation F1: 0.986894

 Task parking:
 Validation Accuracy: 0.995157, Validation F1: 0.987021

 Task pricerange:
 Validation Accuracy: 0.992736, Validation F1: 0.991527

 Task stars:
 Validation Accuracy: 1.000000, Validation F1: 1.000000

 Task type:
 Validation Accuracy: 0.932203, Validation F1: 0.859592



Epoch:  80%|████████  | 4/5 [08:59<02:15, 135.18s/it]

 Task area:
 Validation Accuracy: 0.987893, Validation F1: 0.984895

 Task food:
 Validation Accuracy: 0.992736, Validation F1: 0.990267

 Task internet:
 Validation Accuracy: 0.997579, Validation F1: 0.993443

 Task name:
 Validation Accuracy: 0.990315, Validation F1: 0.986894

 Task parking:
 Validation Accuracy: 0.995157, Validation F1: 0.987021

 Task pricerange:
 Validation Accuracy: 0.995157, Validation F1: 0.994339

 Task stars:
 Validation Accuracy: 1.000000, Validation F1: 1.000000

 Task type:
 Validation Accuracy: 0.920097, Validation F1: 0.848420



Epoch: 100%|██████████| 5/5 [11:12<00:00, 134.44s/it]


## Predict and output sub_aspect_pred.txt

In [29]:
model = create_model(model_name_path, all_lab2ind, hidden_size)

In [30]:
checkpoint = torch.load(output_dir+"mt8_4.pt", map_location='cpu')
model.load_state_dict(checkpoint['state_dict'])
model = model.to(device)

(0.0, 0.9322033898305084, 0.8595920349684313)

In [33]:
# task_names = ['area', 'food', 'internet', 'name', 'parking', 'pricerange', 'stars', 'type']
def predict(model, iterator, task_id):
    
    model.eval()
    
    epoch_loss = 0
    all_pred=[]
    #all_label = []
    
    with torch.no_grad():
        
        for i, batch in enumerate(iterator):

            input_ids = batch['ids'].to(device)
            
            input_mask = batch['masks'].to(device)
            labels = batch['targets'].to(device)

            outputs = model(input_ids, input_mask, task_id)

            #loss = loss_func(outputs, labels)
            # delete used variables to free GPU memory
            del batch, input_ids, input_mask
            
            # identify the predicted class for each example in the batch
            probabilities, predicted = torch.max(outputs.cpu().data, 1)
            # put all the true labels and predictions to two lists
            all_pred.extend(predicted)
            #all_label.extend(labels.cpu())
    
    #accuracy = accuracy_score(all_label, all_pred)
    #f1score = f1_score(all_label, all_pred, average='macro') 

    return all_pred

In [34]:
all_ind2lab = []
for task in all_lab2ind:
    tmp = {}
    for key,value in task.items():
        tmp[value] = key
    all_ind2lab.append(tmp)
all_ind2lab

[{0: 'area', 1: 'not'},
 {0: 'food', 1: 'not'},
 {0: 'internet', 1: 'not'},
 {0: 'name', 1: 'not'},
 {0: 'parking', 1: 'not'},
 {0: 'pricerange', 1: 'not'},
 {0: 'stars', 1: 'not'},
 {0: 'type', 1: 'not'}]

In [35]:
def write_output(output,fn):
    with open(data_directory + f'pred_{fn}.txt', 'w') as file:
        for sub_aspect in output:
            file.write('%s\n' % sub_aspect)

In [36]:
for i, task in enumerate(task_names):
    output = []
    preds = predict(model, test_loaders[i], i)
    for pred in preds:
        output.append(all_ind2lab[i][pred.item()])
    write_output(output, task)

In [37]:
task_names

['area', 'food', 'internet', 'name', 'parking', 'pricerange', 'stars', 'type']

## Combine step1 and step2

In [38]:
constrains = {"find_hotel":["area", "internet", "name", "parking", "pricerange", "stars", "type"], "find_restaurant":["area", "food", "name", "pricerange"]}

In [39]:
result = []
with open(data_directory + "hotel_res.txt", "r") as f:
    lines = f.readlines()
    for line in lines:
      result.append([line.strip()])

for i, task in enumerate(task_names):
    with open(data_directory + f"pred_{task}.txt", "r") as f:
        lines = f.readlines()
        for i, line in enumerate(lines):
            sub_aspect = line.strip()
            domain = result[i][0]
            if sub_aspect in constrains[domain]:
                result[i].append(domain[5:]+"-"+sub_aspect)

In [40]:
with open(data_directory+"domain_aspect_pred.txt", "w") as f:
    f.writelines("%s\n" % i for i in ["|".join(i) for i in result])

## Solution
rubric={accuracy:10,quality:5,efficiency:3}

You will build a system that, when provided with an utterance, predicts the appropriate intent and slots in the format used in the provided answers. This is an open-ended problem and you may solve it however you like, with the following restrictions:

* Your solution should include at least one of token-level prediction models used in Labs 1-3 of this course, i.e. you should make use of a CRF, an LSTM, or a BERT model. You may use multiple models.
* You may use basic NLP tools (tokenizer, POS, parser) and unsupervised resources such as word embeddings, but you should NOT use an existing NER system, or any additional labeled data for this task.
* Your solution should be appropriately decomposed into parts, and documented. This is a complex enough problem that you should have several functions. You may wrap things up into a single class if you like, but you don't have to.
* Use the provided assert to test `dev_predicted`, the output of your complete model on the dev set, you will need to pass the assert to get full accuracy points. 
* Though you may use dev *accuracy* to guide the development of your model, you should not look at either utterances or answers for the dev (or the test) when developing your model. Limit your inspection of the data (e.g. for the purposes of error analysis) to the training set.

Other things to consider:

* You may want to build "standard" (non-sequential) ML classifiers for some aspects of this problem, but you don't have to!
* You may want to use appropriate lexicons. You can build them yourself, or find some.
* Rather than using statistical classifiers, you may want to use rule-based methods to solve some of the problems you're facing.
* You should probably do regular error analysis, some kind of crossvalidation in the training set is a good approach for this, or you can create another (inspectable) internal dev set by splitting up the training set.
* If you're looking for just a little bit more performance, don't forget to tune your hyperparameters!

## Report
rubric={raw:2,reasoning:3,writing:1}

Describe your system, and discuss what your thinking about particular choices and any experiments you tried. Please talk about things you tried but didn't work, or things you thought of doing but didn't. Finally, discuss how each group member contributed to the project. As usual, there is an expectation that every group member will have made some significant contribution to the project. 

## Submit to Kaggle 
rubric={accuracy:2}

Run your system over the test data, and submit the result (in the same format as the train/dev answers) to the Kaggle competition. The competition is hosted [here](https://www.kaggle.com/c/mds-cl-2020-21-colx-563-lab-assignment-4). To get full points, you need to beat the public baseline. Use your capstone partner as your team name please!


## Exercise: Kaggle competition (Optional)
rubric={raw:2}

As a team, compete to get the best result in the task. Since there are only 8 teams, the distribution of marks is a bit different than usual, only the top 3 groups will get bonus points. As usual, the rankings will be based on the score on the private leaderboard:


- 1st place: 2
- 2nd place: 1
- 3rd place: 0.5