# TP5 (2) - Dialogue Systems

Group members:
- Thanh Gia Hieu KHUONG
- Ragi BHATT
- Benedictus Kent RACHMAT 

In [2]:
!pip install -q datasets pandas matplotlib scikit-learn transformers rouge evaluate tqdm

In [1]:
from tqdm.notebook import trange, tqdm # The progress bar

import torch # DeepLearning Framework
from torch import optim
from torch import nn
from torch.utils.data import Dataset, DataLoader

import numpy as np
import pandas as pd

from transformers import AutoTokenizer, AutoModelForCausalLM # Model repository
from datasets import load_dataset # Dataset Repository

# Generation for task oriented chatbot

<img src="media/dialogue_patient.png" style="width: 400px;"/></div>

The objective of this small project is to devellop a small chatbot using information of the corpus

## I. Getting started : Try a naive generative model
<div style={width:10%}> In this first part we will try a naive model and "play" with this model. The model is a simple transformer (based on gpt2 model), it's objective given a user query to answer it in natural language.</div><div><img src="media/transformer-block.png" alt="transformer architecture" style="width: 400px;"/></div>


**Let's start to load the model :**

In [3]:
model = AutoModelForCausalLM.from_pretrained("ThomasGerald/wozchitchat")
tokenizer = AutoTokenizer.from_pretrained("ThomasGerald/wozchitchat")

config.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/328M [00:00<?, ?B/s]

Now we can generate from an input text with the model (try your own input) : 

In [None]:
text = "I would" # input text
tokenized_text = tokenizer(text, return_tensors='pt') # we tokenize the text
generated_token_ids = model.generate(**tokenized_text, do_sample=True,
                                     max_length=200, pad_token_id=model.config.eos_token_id) # we generate the text (sampled)
print(f'GENERATED_TEXT : {tokenizer.decode(generated_token_ids[0])}')

GENERATED_TEXT : I would be nice. The restaurant should serve fusion food.[BOT]I'm sorry, but there are no fusion attractions at all. Will you be still contacting the TownInfo Centre?<|endoftext|>


Notice that the model as been `Adapted` using the following format :

**[USER] {user_input} [BOT] {answer_of_the_system}**

The model was trained to generate **{answer_of_the_system}**

### I.1 : Create a interactive interface following the previous format

Modify the following class to make an interactive chatbot using the previous model

In [None]:
class InteractiveChat(object):
    def __init__(self, model, tokenizer):
        raise NotImplementedError("")

    def answer(self, current_input):
        ''' return the answer of the chatbot
        '''
        raise NotImplementedError("")

    def start(self):
        current_answer = "Start dialogue"
        current_input = ""
        while(current_input != 'exit'):
            current_input = input("Bot: "+current_answer + " \nUser: ")
            current_answer = self.answer(current_input)

In [None]:
ichat = InteractiveChat(model, tokenizer)
ichat.start() # type exit if you want to stop the conversation

You should obtain a dialogue as following (not exactly the same)
```
User:  I'm looking for an hotel in center of cambridge for tonight
Bot: Might I suggest the the University Arms Hotel, is rated 4 stars and has an excellent reputation and is rated 3 stars. 
User:  How much is it?
Bot: The price range isn't listed. Is there another type of cuisine you might like?
```
However all answer are not relevant !!! 

**Let consider in the following evaluating the model**

## II.The MULTIWoZ corpus

The Multi-domain Wizard-of-Oz (MultiWOZ) dataset is a large-scale human-human conversational corpus spanning over seven domains, containing 8438 multi-turn dialogues, with each dialogue averaging 14 turns. Different from existing standard datasets like WOZ and DSTC2, which contain less than 10 slots and only a few hundred values, MultiWOZ has 30 (domain, slot) pairs and over 4,500 possible values. The dialogues span seven domains: restaurant, hotel, attraction, taxi, train, hospital and police. 

### Objective 
* Looking at the data ([lik-here](https://github.com/budzianowski/multiwoz) for original repository)
* Evaluate the generative model
* Discuss what are missing for a complete chatbot
* Improving the generation : notice for this last part you are free to use any model you can run

In [66]:
# woz_dataset
woz_dataset = load_dataset("multi_woz_v22", trust_remote_code=True)
training_set = woz_dataset['train']
validation_set = woz_dataset['validation']
test_set = woz_dataset['test']

In [67]:
test_set[0]

{'dialogue_id': 'MUL0484.json',
 'services': ['attraction', 'train'],
 'turns': {'turn_id': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
  'speaker': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
  'utterance': ['I need train reservations from norwich to cambridge',
   'I have 133 trains matching your request. Is there a specific day and time you would like to travel?',
   "I'd like to leave on Monday and arrive by 18:00.",
   'There are 12 trains for the day and time you request. Would you like to book it now?',
   'Before booking, I would also like to know the travel time, price, and departure time please.',
   'There are 12 trains meeting your needs with the first leaving at 05:16 and the last one leaving at 16:16. Do you want to book one of these?',
   'No hold off on booking for now. Can you help me find an attraction called cineworld cinema?',
   'Yes it is a cinema located in the south part of town what information would you like on it?',
   'Yes, that was all I needed. Thank you very 

### II.1 Get all the tuple of the test set 

Create a Dataframe with two columns, one containing the column of the user query and the other containing the bot answer

In [None]:
user_query = []
bot_answer = []

raise NotImplementedError("")

pd.DataFrame({'user_query': user_query, 'bot_answer':bot_answer})

### II.2 Generate the different output for user query
Select the 50 first lines (if you get access to gpus you can try to generate all answers) and generates from user_query a bot answer

In [None]:
raise NotImplementedError("")

### II.3 Evaluate the performance of the system
You can now evaluate the performance of the systems on the generated sample you get. **You will try two metrics :**
* A First approach base on common words between the ground truth and the generation
* You are free to chose the second approach (BERTScore, ROUGE, BLEU)

In [None]:
import evaluate

In [None]:
raise NotImplementedError("")

In [None]:
raise NotImplementedError("")

## III. Improving performances

**It is now up to you to improve the following model !!!**
* You are free to choose any architecture/model (even pretrained one to improve performances)
* You can add additional information in the input of the model
* You will find in the annex how the model has been trained !!!


# ANNEXE : Training/Fine-Tuning Material

In [28]:

tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
tokenizer.add_special_tokens({'pad_token': '<|endoftext|>'})
model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
model.resize_token_embeddings(len(tokenizer))

config.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/328M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/504 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/245 [00:00<?, ?B/s]

## Implement the dataset module

Create an object having as parent `torch.utils.data.dataset` implementing that return previous turn and answer of the dataset.

In [8]:
from torch.utils.data import Dataset

class WoZGenerationDataset:
    def __init__(self, dataset, window_size=3):
        self.dataset = dataset
        self.window_size = window_size
        self.index = []
        for i, dial in enumerate(dataset):
            for j, speaker in enumerate(dial['turns']['speaker']):
                if speaker == 1:
                    self.index.append((i,j))
    def __len__(self):
        return len(self.index)

    def __getitem__(self, index):
        i, j = self.index[index]
        dial = self.dataset[i]['turns']['utterance']

        turns = dial[j-1] if(j!= 0) else ''
        answer = dial[j]
        return {'turns': turns,
                'answer': answer}



In [27]:
class DialogueCollator(Dataset):
    def __init__(self, tokenizer):
        self.tokenizer = tokenizer
    def __call__(self, data):
        input_tokens = self.tokenizer(['[USER]' + d['turns'] + "[BOT]" + d['answer'] for d in data],
                                 return_tensors='pt', return_length=True, padding=True)
        return {
            'input_ids': input_tokens.input_ids,
            'attention_mask': input_tokens.attention_mask
        }


In [10]:
from tqdm.notebook import trange, tqdm
from torch import optim
from torch import nn


class Trainer():
    def __init__(self, model, padding_idx=100):
        self.model = model
        self.optimizer = None

    def at_training_start(self, learning_rate = 1e-3):
        self.optimizer = optim.Adam(self.model.parameters(), lr=learning_rate)
        self.criterion = nn.CrossEntropyLoss(ignore_index=50257)

    def validation_step(self, data):
        pass

    def training_step(self, data):
        y_pred = self.model(**data)
        y_truth = data["input_ids"][:, 1:].flatten()

        loss_reconstruction = self.criterion(y_pred.logits[:,:-1].reshape(y_truth.shape[0], -1), y_truth)
        (loss_reconstruction).backward()
        return loss_reconstruction.item()

    def on_validation_end(self, resp):
        pass

    def validation(self, validation_dl):
        pass

    def fit(self,
            training_dl,
            validation_dl,
            learning_rate = 1e-3,
            validation_frequency = 8,
            max_iter = 10000,
            use_gpu=False,

        ):
        if(use_gpu):
          self.model = self.model.cuda()
        self.at_training_start(learning_rate)

        iter_count = 0
        loss_buffer = []
        pbar = trange(max_iter)

        while(iter_count < max_iter):
            for data in training_dl:
                if use_gpu:
                    data = {k:v.cuda() for k, v in data.items()}
                self.optimizer.zero_grad()
                loss_buffer += [self.training_step(data)]
                self.optimizer.step()

                if(iter_count  % validation_frequency == 0):
                    print("Loss at iteration %s is %s"%(iter_count, np.mean(loss_buffer)))
                    self.validation(validation_dl)
                    loss_buffer = []
                iter_count += 1
                pbar.update(1)
                if(iter_count >= max_iter):
                  break

In [11]:
training_set = WoZGenerationDataset(dataset['train'])
collator = DialogueCollator(tokenizer)
training_dl = DataLoader(training_set, batch_size=32, shuffle=True, collate_fn=collator, num_workers=2)

In [12]:
my_trainer = Trainer(model)
my_trainer.fit(training_dl, None, validation_frequency=250, use_gpu=True, max_iter=1000)

  0%|          | 0/1000 [00:00<?, ?it/s]

Loss at iteration 0 is 7.432340621948242
Loss at iteration 250 is 1.271711953163147
Loss at iteration 500 is 1.0407882208824157
Loss at iteration 750 is 0.9919485347270965
Loss at iteration 1000 is 0.9869995164871216


In [21]:
class Chatbot(object):
  def __init__(self):
    pass

  def answer(self, current_input):
    return "Not Implemented"

  def start(self):
    current_answer = "Start dialogue"
    current_input = ""
    while(current_input != 'exit'):
      current_input = input("Bot: "+current_answer + " \nUser: ")
      current_answer = self.answer(current_input)

class ChitChat(Chatbot):
  def __init__(self, model, tokenizer, collator, history_len = 1):
    self.model = model
    self.tokenizer = tokenizer
    self.utterance = []
    self.hlen = history_len

  def answer(self, current_input):
    self.utterance.append('[USER]'+current_input)
    tokenized_text = self.tokenizer(''.join(self.utterance[max(0, len(self.utterance) - self.hlen): ]), return_tensors='pt')
    generated_token_ids = self.model.generate(**tokenized_text, do_sample=True, max_length=200, pad_token_id=model.config.eos_token_id)[0]
    answer = self.tokenizer.decode(generated_token_ids).split('[BOT]')[-1][:-len('<|endoftext|>')]
    self.utterance.append('[BOT]'+answer)
    return answer


In [22]:
cb = ChitChat(model.cpu(), tokenizer, collator, history_len=1)

In [43]:
model = AutoModelForCausalLM.from_pretrained("ThomasGerald/wozchitchat")
tokenizer = AutoTokenizer.from_pretrained("ThomasGerald/wozchitchat")

In [47]:
class ChitChat(Chatbot):
  def __init__(self, model, tokenizer, history_len = 1):
    self.model = model
    self.tokenizer = tokenizer
    self.utterance = []
    self.hlen = history_len

  def answer(self, current_input):
    self.utterance.append('[USER]'+current_input)
    tokenized_text = self.tokenizer(''.join(self.utterance[max(0, len(self.utterance) - self.hlen): ]), return_tensors='pt')
    generated_token_ids = self.model.generate(**tokenized_text, do_sample=True, max_length=200, pad_token_id=model.config.eos_token_id)[0]
    answer = self.tokenizer.decode(generated_token_ids).split('[BOT]')[-1][:-len('<|endoftext|>')].split('[USER]')[0]
    self.utterance.append('[BOT]'+answer)
    return answer

In [50]:
cb = ChitChat(model.cpu(), tokenizer, history_len=1)

In [51]:
cb.start()

Bot: Start dialogue 
User:  I'm looking for an hotel in center of cambridge for tonight
Bot: Might I suggest the the University Arms Hotel, is rated 4 stars and has an excellent reputation and is rated 3 stars. 
User:  How much is it?
Bot: The price range isn't listed. Is there another type of cuisine you might like? 
User:  exit
