# COLX 563 Lab Assignment 4: Slot filling
## Assignment Objectives

In this lab, you will build an end-to-end system for basic (binary) intent recognition and slot filling in the context of a dialogue system. It is a team assignment, and you have nearly complete freedom with regards to your solution, with a few restrictions mentioned below. For this lab, you will work with your capstone team.

## Getting Started

Add imports below.

In [1]:
import numpy as np
import pandas as pd
from collections import defaultdict
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

In [2]:
import os 

For this lab, you'll be working with the MultiWOZ dataset of goal-oriented dialogues (2.2). You can look at the full corpus [here](https://github.com/budzianowski/multiwoz/tree/master/data/MultiWOZ_2.2). It has an impressively detailed annotation involving multiple turns and multiple goals which we have simplified to just the initiating request (first turn) and involving two possible intents and the corresponding slots for those intents. Download the data from [github](https://github.ubc.ca/jungyeul/COLX_563_adv-semantics_lab_students/raw/master/Multiwoz.zip), unzip it into a directory outside of your lab repo and change the path below.

In [3]:
#provided code
woz_directory ="./data/"

## Tidy Submission
rubric={mechanics:1}

To get the marks for tidy submission:
- Submit the assignment by filling in this Jupyter notebook with your answers embedded
- Be sure to follow the instructions

## Inspecting the data

Let's look at corresponding pairs of utterances and answers from the training portion of our corpus

In [4]:
count = 0
with open(woz_directory + "WOZ_train_utt.txt") as f1:
    with open(woz_directory + "WOZ_train_ans.txt") as f2:
        while count < 20:
            print(f1.readline().strip())
            print(f2.readline().strip())
            print("------")
            count += 1

Guten Tag, I am staying overnight in Cambridge and need a place to sleep. I need free parking and internet.
find_hotel|hotel-area=centre|hotel-internet=yes|hotel-parking=yes
------
Hi there! Can you give me some info on Cityroomz?
find_hotel|hotel-name=cityroomz
------
I am looking for a hotel named alyesbray lodge guest house.
find_hotel|hotel-name=alyesbray lodge guest house
------
I am looking for a restaurant. I would like something cheap that has Chinese food.
find_restaurant|restaurant-food=chinese|restaurant-pricerange=cheap
------
I'm looking for an expensive restaurant in the centre if you could help me.
find_restaurant|restaurant-area=centre|restaurant-pricerange=expensive
------
I'm looking for a places to go and see during my upcoming trip to Cambridge.
find_hotel
------
Yeah, could you recommend a good gastropub?
find_restaurant|restaurant-food=gastropub
------
I want to find an expensive restaurant and serves european food. Can i also have the address, phone number and it

In [5]:
def data_processor(split):
    
    with open(woz_directory + f"WOZ_{split}_utt.txt") as f1:
        data_utts = [s.strip() for s in f1.readlines()]
    
    with open(woz_directory + f"WOZ_{split}_ans.txt") as f2:
        data_ans = [s.strip() for s in f2.readlines()]
    
    data_ans_first = []
    data_ans_second = []
    for ans in data_ans:
        ans_split = ans.split("|")
        type_dict = {}
        data_ans_first.append(ans_split[0])
        for type_ in ans_split[1:]:
            type_split = type_.split("-")[1].split("=")
            type_dict[type_split[0]] = type_split[1]
        data_ans_second.append(type_dict)
    
    data_find_hotel_ids = [idx for idx, ans in enumerate(data_ans) if ans.split("|")[0] == "find_hotel"]
    data_hotel_ans = np.array(data_ans_second)[data_find_hotel_ids]
    data_hotel_utts = np.array(data_utts)[data_find_hotel_ids]
    hotel_df = pd.DataFrame.from_dict(list(data_hotel_ans)).fillna("")
    hotel_df['utts'] = data_hotel_utts
    
    data_find_rest_ids = [idx for idx, ans in enumerate(data_ans) if ans.split("|")[0] == "find_restaurant"]
    data_rest_ans = np.array(data_ans_second)[data_find_rest_ids]
    data_rest_utts = np.array(data_utts)[data_find_rest_ids]
    rest_df = pd.DataFrame.from_dict(list(data_rest_ans)).fillna("")
    rest_df['utts'] = data_rest_utts
    
    hotel_df.to_csv(f"./data/{split}_hotel.csv", index=False)
    rest_df.to_csv(f"./data/{split}_restaurant.csv", index=False)
    
    h = ["find_hotel" for _ in range(hotel_df.shape[0])]
    r = ["find_restaurant" for _ in range(rest_df.shape[0])]
    df = pd.DataFrame({"utts": list(data_hotel_utts) + list(data_rest_utts), "labels": h + r})
    df.to_csv(f"./data/{split}.tsv", sep='\t', index=False)

data_processor("train")
data_processor("dev")

The utterances consists of a request for information about either hotels or restaurants. The first part of the answer starts with the intent (either find_restaurant or find_hotel) and then lists the slots that have been filled in based on the utterance. Your goal is to generate this string of intents and slots based purely on the utterance. A few things to note:

* Not all slots are filled in, and sometimes there are no slots filled in at all (but there is always an intent).
* There are a fixed number of slots for each intent, and they always appear in a particular order, when they are filled in
* The slot values sometimes but do not always correspond to what appears in the utterance. For example, a mention of wanting wifi in the request becomes hotel-internet=yes.

We will be evaluating based on exact duplication of the entire output string, so before you start coding a solution, you should look carefully at examples in the training set and make sure you understand all the different components of the output, and how they related to the input utterance. In particular, you should identify the various constituent parts of the task, and judge which are likely to be easy, and which are likely to be more difficult.

## Solution
rubric={accuracy:10,quality:5,efficiency:3}

You will build a system that, when provided with an utterance, predicts the appropriate intent and slots in the format used in the provided answers. This is an open-ended problem and you may solve it however you like, with the following restrictions:

* Your solution should include at least one of token-level prediction models used in Labs 1-3 of this course, i.e. you should make use of a CRF, an LSTM, or a BERT model. You may use multiple models.
* You may use basic NLP tools (tokenizer, POS, parser) and unsupervised resources such as word embeddings, but you should NOT use an existing NER system, or any additional labeled data for this task.
* Your solution should be appropriately decomposed into parts, and documented. This is a complex enough problem that you should have several functions. You may wrap things up into a single class if you like, but you don't have to.
* Use the provided assert to test `dev_predicted`, the output of your complete model on the dev set, you will need to pass the assert to get full accuracy points. 
* Though you may use dev *accuracy* to guide the development of your model, you should not look at either utterances or answers for the dev (or the test) when developing your model. Limit your inspection of the data (e.g. for the purposes of error analysis) to the training set.

Other things to consider:

* You may want to build "standard" (non-sequential) ML classifiers for some aspects of this problem, but you don't have to!
* You may want to use appropriate lexicons. You can build them yourself, or find some.
* Rather than using statistical classifiers, you may want to use rule-based methods to solve some of the problems you're facing.
* You should probably do regular error analysis, some kind of crossvalidation in the training set is a good approach for this, or you can create another (inspectable) internal dev set by splitting up the training set.
* If you're looking for just a little bit more performance, don't forget to tune your hyperparameters!

## Report
rubric={raw:2,reasoning:3,writing:1}

Describe your system, and discuss what your thinking about particular choices and any experiments you tried. Please talk about things you tried but didn't work, or things you thought of doing but didn't. Finally, discuss how each group member contributed to the project. As usual, there is an expectation that every group member will have made some significant contribution to the project. 

In [6]:
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.feature_extraction import DictVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, f1_score, classification_report
import sklearn_crfsuite
from sklearn_crfsuite import metrics
import csv
from nltk import pos_tag, word_tokenize

In [7]:
import pycrfsuite

In [10]:
path_to_data = "./data/"

In [130]:
def train_binary_clf(train_split, dev_split, test=False):
    """Trains binary CRF to sort between Hotel & Restaurant Sentences"""
    X_train = [x.strip() for x in open(f"./data/WOZ_{train_split}_utt.txt", "r").readlines()]
    y_train = [x.strip().split("|")[0] for x in open(f"./data/WOZ_{train_split}_ans.txt", "r").readlines()]
    X_dev = [x.strip() for x in open(f"./data/WOZ_{dev_split}_utt.txt", "r").readlines()]
    if not test:
        y_dev = [x.strip().split("|")[0] for x in open(f"./data/WOZ_{dev_split}_ans.txt", "r").readlines()]
    preprocessor = CountVectorizer()
    pipe_line = make_pipeline(
        preprocessor, DecisionTreeClassifier()
    )
    pipe_line.fit(X_train, y_train)

    predictions = pipe_line.predict(X_dev).tolist()
    if not test:
        print(classification_report(y_dev, predictions, digits=6))
    return predictions

predictions = train_binary_clf("train", "dev")

                 precision    recall  f1-score   support

     find_hotel   0.994898  0.994898  0.994898       196
find_restaurant   0.995392  0.995392  0.995392       217

       accuracy                       0.995157       413
      macro avg   0.995145  0.995145  0.995145       413
   weighted avg   0.995157  0.995157  0.995157       413



In [13]:
REST_LEX = defaultdict(list)
HOTEL_LEX = defaultdict(list)

def generate_lexicon(split):
    """Addes keys, values to global Hotel and Restuarant Lexicons"""
    with open(path_to_data + 'WOZ_'+split+'_ans.txt', encoding='utf-8') as inF:
        for line in inF:
            line = line.strip('\n')
            split_line = line.split('|')
            if split_line[0] == 'find_hotel':
                for ele in split_line[1:]:
                    key, value = ele.split('=')
                    _, key = key.split('-')
                    if value not in HOTEL_LEX[key]:
                        HOTEL_LEX[key].append(value)
            
            elif split_line[0] == 'find_restaurant':
                for ele in split_line[1:]:
                    key, value = ele.split('=')
                    _, key = key.split('-')
                    if value not in REST_LEX[key]:
                        REST_LEX[key].append(value)


generate_lexicon('train')
generate_lexicon('dev')

In [14]:
HOTEL_LEX['internet']

['yes', 'dontcare', 'no']

In [15]:
REST_LEX['pricerange']

['cheap', 'expensive', 'moderate', 'dontcare']

In [16]:
def word2features(sentence, idx):
    
    word_features = {}
    word = sentence[idx]
    sentence_length_m1 = len(sentence) - 1 
    sentence_length = len(sentence)

    word_features['word_uncased'] = word
    word_features['word_lowercase'] = word.lower()
    word_features['word_cased'] = word.istitle() 

    word_features['POS'] = pos_tag([word])[0][1]

    word_features['word_in_hotel_area'] = True if word in HOTEL_LEX['area'] else False
    word_features['word_in_hotel_internet'] = True if word in HOTEL_LEX['internet'] else False
    word_features['word_in_hotel_parking'] = True if word in HOTEL_LEX['parking'] else False
    word_features['word_in_hotel_names'] = True if word in HOTEL_LEX['name'] else False
    word_features['word_in_hotel_pricerange'] = True if word in HOTEL_LEX['pricerange'] else False
    word_features['word_in_hotel_type'] = True if word in HOTEL_LEX['type'] else False
    word_features['word_in_hotel_stars'] = True if word in HOTEL_LEX['stars'] else False

    word_features['word_in_rest_food'] = True if word in REST_LEX['food'] else False
    word_features['word_in_rest_pricerange'] = True if word in REST_LEX['pricerange'] else False
    word_features['word_in_rest_area'] = True if word in REST_LEX['area'] else False
    word_features['word_in_rest_name'] = True if word in REST_LEX['name'] else False
       
    word_features['idxMinusOne'] = sentence[idx - 1] if idx - 1 >= 0 else ''
    word_features['idxMinusTwo'] = sentence[idx - 2] if idx - 2 >= 0 else ''
    word_features['idxMinusThree'] = sentence[idx - 3] if idx - 3 >= 0 else ''

    word_features['idxPlusOne'] = sentence[idx + 1] if idx + 1 <= sentence_length-1  else ''
    word_features['idxPlusTwo'] = sentence[idx + 2] if idx + 2 <= sentence_length-1 else ''
    word_features['idxPlusThree'] = sentence[idx + 3] if idx + 3 <= sentence_length-1 else ''
    
    return word_features
    
    
def sentence2features(sentence):
    return [word2features(sentence, idx) for idx in range(len(sentence))]

In [17]:
REST = pd.read_csv(path_to_data+'train_restaurant.csv')
HOTEL = pd.read_csv(path_to_data+'train_hotel.csv')

In [18]:
HOTEL.head()

Unnamed: 0,area,internet,parking,name,pricerange,type,stars,utts
0,centre,yes,yes,,,,,"Guten Tag, I am staying overnight in Cambridge..."
1,,,,cityroomz,,,,Hi there! Can you give me some info on Cityroomz?
2,,,,alyesbray lodge guest house,,,,I am looking for a hotel named alyesbray lodge...
3,,,,,,,,I'm looking for a places to go and see during ...
4,,yes,,,,,,I need a place to stay that has free wifi.


In [103]:
def get_bies(sentence, tags, type_, curr_tags):
    tags = tags.split()
    check = False
    idx = 0
    for ix, word in enumerate(sentence):
        if (idx < len(tags)) and (word == tags[idx]):
            if idx == 0:
                curr_tags[ix] = "B-"+type_
            else:
                curr_tags[ix] = "I-"+type_
            idx += 1
    return curr_tags


def get_labels_dict(label):
    label_dict = {}
    label_splits = label.split("|")[1:]
    for s in label_splits:
        key = s.split("-")[1].split("=")[0]
        tag = s.split("-")[1].split("=")[1]
        label_dict[key] = tag
    return label_dict


def sentence2iob(sentence, label_dict):
    sentence = word_tokenize(sentence.lower())
    curr_tags = ["O" for _ in range(len(sentence))]
    for key, value in label_dict.items():
        if type(value) == str:
            curr_tags = get_bies(sentence, value, key, curr_tags)
#     for 
    return sentence, curr_tags


def prepare_cmd_crf_feature_dicts(split):
    lol_of_dicts = []
    lol_of_token_tags = []
    with open(f"./data/WOZ_{split}_utt.txt", "r") as f1, open(f"./data/WOZ_{split}_ans.txt", "r") as f2:
        sentences = f1.readlines()
        labels = f2.readlines()
        for sentence, labels in zip(sentences, labels):
            sentence = sentence.strip()
            labels = labels.strip()
            label_dict = get_labels_dict(labels)
            tokens, tags = sentence2iob(sentence, label_dict)
            sentence_dicts = []
            for jdx, token in enumerate(tokens):
                feature_dict = word2features(tokens, jdx)
                sentence_dicts.append(feature_dict)
            
            lol_of_dicts.append(sentence_dicts)
            lol_of_token_tags.append(tags)
    lol_of_sents = [word_tokenize(x.lower()) for x in sentences]
    return lol_of_sents, lol_of_dicts, lol_of_token_tags


In [138]:
def prepare_test_feature_dicts(path):
    lol_of_dicts = []
    lol_of_token_tags = []
    with open(f"./data/WOZ_{path}_utt.txt", "r") as f1:
        sentences = f1.readlines()
        for sentence in sentences:
            sentence = sentence.strip()
            tokens, tags = sentence2iob(sentence, {})
            sentence_dicts = []
            for jdx, token in enumerate(tokens):
                feature_dict = word2features(tokens, jdx)
                sentence_dicts.append(feature_dict)
            
            lol_of_dicts.append(sentence_dicts)
            lol_of_token_tags.append(tags)
    lol_of_sents = [word_tokenize(x.lower()) for x in sentences]
    return lol_of_sents, lol_of_dicts, lol_of_token_tags

In [104]:
train_sents, train_dicts, train_tags = prepare_cmd_crf_feature_dicts('train')

In [105]:
dev_sents, dev_dicts, dev_tags = prepare_cmd_crf_feature_dicts('dev')

In [139]:
test_sents, test_dicts, _ = prepare_test_feature_dicts("test")

In [191]:
# crf = sklearn_crfsuite.CRF(verbose=True, num_memories=12, c1=.1, c2=.1, max_linesearch=30, min_freq=2, max_iterations=200)
# crf.fit(train_dicts, train_tags)
y_pred = crf.predict(dev_dicts)

print(f"Micro F1-Score: {round(metrics.flat_f1_score(dev_tags, y_pred, average='micro'), 4)*100}%")
print(f"Macro F1-Score: {round(metrics.flat_f1_score(dev_tags, y_pred, average='macro'), 4)*100}% \n")
print(f"Classification Report: \n {metrics.flat_classification_report(dev_tags, y_pred, digits=4)}")

Micro F1-Score: 98.72%
Macro F1-Score: 83.02000000000001% 

Classification Report: 
               precision    recall  f1-score   support

      B-area     0.9720    0.9905    0.9811       105
      B-food     0.9417    0.9700    0.9557       100
      B-name     0.9667    0.8969    0.9305        97
   B-parking     0.0000    0.0000    0.0000         1
B-pricerange     0.9725    1.0000    0.9860       106
     B-stars     0.9615    1.0000    0.9804        25
      B-type     0.7347    0.6792    0.7059        53
      I-food     0.8571    0.8571    0.8571        14
      I-name     0.9268    0.8976    0.9120       127
           O     0.9928    0.9941    0.9935      5723

    accuracy                         0.9872      6351
   macro avg     0.8326    0.8285    0.8302      6351
weighted avg     0.9869    0.9872    0.9870      6351



  _warn_prf(average, modifier, msg_start, len(result))


In [192]:
predictions = train_binary_clf("train", "dev")

                 precision    recall  f1-score   support

     find_hotel   0.994898  0.994898  0.994898       196
find_restaurant   0.995392  0.995392  0.995392       217

       accuracy                       0.995157       413
      macro avg   0.995145  0.995145  0.995145       413
   weighted avg   0.995157  0.995157  0.995157       413



In [197]:
predictions = train_binary_clf("train", "test", test=True)
y_pred = crf.predict(test_dicts)

In [198]:
pred_dicts = []
for idx, (sent, tags) in enumerate(zip(test_sents, y_pred)):
    dict_ = defaultdict(list)
    sent_string = " ".join(sent)
    for word, tag in zip(sent, tags):
        if tag != "O":
            dict_[tag.split("-")[1]].append(word)
        if word == "internet":
            dict_["internet"].append("yes")
        if word == "parking":
            dict_["parking"].append("yes")
    pred_dicts.append(dict_)

In [199]:
final = []
for first , pred_dict in zip(predictions,pred_dicts):
    output = [first]
    unsorted = []
    type_ = first.split("_")[1]
    for key,value in pred_dict.items():
        unsorted.append(f"{key}={' '.join(value)}")
    sortd = sorted(unsorted)
    output.extend([f"{type_}-{x}" for x in sortd])
    final.append("|".join(output))

In [195]:
dev_ans = [x.strip() for x in open("./data/WOZ_dev_ans.txt", "r").readlines()]

In [202]:
df = pd.DataFrame({"ID": list(range(len(final))), "Expected": final})
df.to_csv('kaggle_submission.csv', index=False)

In [203]:
final

['find_restaurant|restaurant-name=golden wok',
 'find_hotel|hotel-type=hotel hotel',
 'find_hotel|hotel-area=north|hotel-stars=4|hotel-type=hotel',
 'find_restaurant|restaurant-food=asian oriental|restaurant-pricerange=cheap',
 'find_hotel',
 'find_hotel|hotel-area=east|hotel-pricerange=cheap',
 'find_restaurant|restaurant-food=modern european|restaurant-pricerange=moderate',
 'find_restaurant|restaurant-area=south|restaurant-food=indian|restaurant-pricerange=expensive',
 'find_hotel',
 'find_hotel|hotel-pricerange=moderate|hotel-type=guesthouse',
 'find_hotel|hotel-name=carolina bed and breakfast',
 'find_hotel|hotel-area=north|hotel-type=guesthouse',
 'find_restaurant|restaurant-area=north|restaurant-food=french',
 'find_hotel|hotel-parking=yes',
 'find_restaurant|restaurant-food=indian|restaurant-pricerange=expensive',
 'find_restaurant|restaurant-name=chiquito restaurant bar',
 'find_restaurant|restaurant-name=charlie chan',
 'find_restaurant|restaurant-food=chinese|restaurant-pric

In [196]:
accuracy_score(dev_ans, final)

0.6246973365617433

## Submit to Kaggle 
rubric={accuracy:2}

Run your system over the test data, and submit the result (in the same format as the train/dev answers) to the Kaggle competition. The competition is hosted [here](https://www.kaggle.com/c/mds-cl-2020-21-colx-563-lab-assignment-4). To get full points, you need to beat the public baseline. Use your capstone partner as your team name please!


## Exercise: Kaggle competition (Optional)
rubric={raw:2}

As a team, compete to get the best result in the task. Since there are only 8 teams, the distribution of marks is a bit different than usual, only the top 3 groups will get bonus points. As usual, the rankings will be based on the score on the private leaderboard:


- 1st place: 2
- 2nd place: 1
- 3rd place: 0.5

In [44]:
with open(woz_directory + "WOZ_dev_ans.txt") as f:
    dev_correct = [answer.strip() in f.readlines()]    

NameError: name 'answer' is not defined

We began our task trying to build an all in one approach with an LSTM and a Hugging Face's Zero shot classification model. We also experimented with BERTQ&A.
Our one-shot LSTM approach came close to success, but ultimately could not beat the baseline threshold.
Our final solution consisted of a three stage pipeline:
First, we built a Decision Tree classifier to classify the intent of the sentence, between `find_hotel` and `find_restaurant`.
Next, we developed a BIO Notation dataset and utilized a CRF suite with word token index and lexicon features to find the positions of arguments in the sentence.
Lastly, we built a simple mapping to fill in the slots by the index locations classified by the CRF.
Kishan built our first attempt oneshot LSTM, the Dataprocessing pipeline, the BIES notation formatting and implemented the final mapping to fill in the slots.
Kristian built the Binary Classifier, the word2features function and the CRF classifier (including hyper parameter tuning).
Celine and Sheena explored the BERTAQ&A system.