# Conversation experiments for blenderbot
Blenderbot is a facebook model trained to chat. As input it takes a sequence of text(max_length=128) and then outputs a reply.

The input can be formed in different ways to give it contextual information. In this notebook we experiment with different ways of constructing this input:
#### 1. Dialogue history: Model gets as much of the dialogue history that it can
    - input : [user_input_1, bot_reply_1, user_input_2, bot_reply_2.... new_user_input]
#### 2. Subject: Model gets an input string that describes a subject, e.g. weather, + the users input at each step
    - input : [subject_string, last_bot_reply, new_user_input]
#### 3. Subject and dialogue: 
    - input : [subject_string, user_input_X, bot_reply_X... new_user_input]
    
## How to use this notebook:
1. Run everything in chapter 0 to load model and define functions
2. Jump to section 1,2 or 3 to chat with the model in different ways

## Google translate for translating text back and forth

In [69]:
from googletrans import Translator

translator = Translator()
def sv_to_en(text):
    # Translates text from swedish to english
    out = translator.translate(text,src='sv',dest='en')
    return out.text

def en_to_sv(text):
    # Translate text from english to swedish
    out = translator.translate(text,src='en',dest='sv')
    return out.text


# Blenderbot

In [70]:
from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
from torch import no_grad
mname = 'facebook/blenderbot-1B-distill' # options: 'facebook/blenderbot_small-90M' , 'facebook/blenderbot-400M-distill' ,'facebook/blenderbot-3B'
model = BlenderbotForConditionalGeneration.from_pretrained(mname)
tokenizer = BlenderbotTokenizer.from_pretrained(mname)

## BlenderConversation is a class for storing the conversation with blenderbot

In [51]:
class BlenderConversation:
    
    def __init__(self,lang,description='No description'):
        self.lang = lang
        self.description = description
        self.bot_text = []
        self.user_text = []
        self.user_turn = True
        
        
    def add_user_text(self,text):
        if self.user_turn:
            self.user_text.append(text)
            self.user_turn = False
        else:
            raise ValueError("It's the bot's turn to add a reply to the conversation")
        return
    
    def add_bot_text(self,text):
        if not self.user_turn:
            self.bot_text.append(text)
            self.user_turn = True
        else:
            raise ValueError("It's the user's turn to add an input to the conversation")
        return
        
    def get_dialogue_history(self,max_len=110):
        # Returns string of the dialogue history with bot and user inputs separated with '\n'
        # max_len set to default 110 as model has max input length 128 and we want some space for new input 
        history = ''
        tokens_left = max_len
        if self.user_turn:
            # Start backwards from bot_text
            for i in reversed(range(len(self.user_text))):
                bot_text = self.bot_text[i]
                user_text = self.user_text[i]
                nbr_tokens = len(bot_text.split()) + len(user_text.split())
                if  nbr_tokens < tokens_left: # This is not fool proof as the model tokenizer tokenizes differently
                    history = user_text + '\n' + bot_text + '\n' + history
                    tokens_left -= nbr_tokens
                                
        else:
            # Start backwards from user_text
            history = self.user_text[-1]
            tokens_left -= len(self.user_text[-1].split())
            for i in reversed(range(len(self.user_text)-1)):
                bot_text = self.bot_text[i]
                user_text = self.user_text[i]
                nbr_tokens = len(bot_text.split()) + len(user_text.split())
                if  nbr_tokens < tokens_left: # This is not fool proof as the model tokenizer tokenizes differently
                    history = user_text + '\n' + bot_text + '\n' + history
                    tokens_left -= nbr_tokens
        return history
        
    def to_txt(self,file='blender_conversation.txt'):
        # Writes the dialogue to txt file in subdirectory
        text = '####################################\n' + 'Conversation description: ' + self.description + '\n\n'
        if self.user_turn:
            for i in range(len(self.user_text)):
                text = text + 'User>>> '+ self.user_text[i] + '\n Bot>>> ' + self.bot_text[i] + '\n'
        else:
            for i in range(len(self.bot_text)):
                text = text + 'User>>> '+ self.user_text[i] + '\n Bot>>> ' + self.bot_text[i] + '\n'
            text = text + 'User>>> ' + self.user_text[-1]
        
        text = text + '\n\n'
        file_path = '01_conversation_output/' + file
        with open(file_path,'a') as f:
            f.write(text)
        return
         
    
    def print_dialogue(self):
        # Prints the dialogue 
        text = ''
        if self.user_turn:
            for i in range(len(self.user_text)):
                text = text + 'User>>> '+ self.user_text[i] + '\n Bot>>> ' + self.bot_text[i] + '\n'
        else:
            for i in range(len(self.bot_text)):
                text = text + 'User>>> '+ self.user_text[i] + '\n Bot>>> ' + self.bot_text[i] + '\n'
            text = text + 'User>>> ' + self.user_text[-1]
        print(text)
        return


def strip_token(line):
    # Removes SOS and EOS tokens from blenderbot reply
    line = line.replace('<s>','')
    line = line.replace('</s>','')
    return line


# 1 - Dialogue history
#### First a helper function for taking care of the conversation

In [66]:
def blender_history_chatting(user_input,convo_sv,convo_en):
    #assert convo_sv.lang == 'en' and convo_en.lang =='sv'
    
    translated_user_input = sv_to_en(user_input)
    
    # Add inputs to conversations
    convo_sv.add_user_text(user_input)
    convo_en.add_user_text(translated_user_input)
    
    context = convo_en.get_dialogue_history()
    model_input = context + '\n' + translated_user_input

    with no_grad():
        inputs = tokenizer([model_input], return_tensors='pt')
        reply_ids = model.generate(**inputs)

        reply = strip_token(tokenizer.batch_decode(reply_ids)[0]) # Decodes tokens and strips <s> and </s>  
    
    convo_en.add_bot_text(reply)
    convo_sv.add_bot_text(en_to_sv(reply))
    convo_sv.print_dialogue()
    return

#### Conversation objects for storing the conversation in swedish and english

In [67]:
history_conversation_sv = BlenderConversation('sv','Model gets entire dialogue history as context')
history_conversation_en = BlenderConversation('en','Model gets entire dialogue history as context')

well


## Use the cell below to chat

In [74]:
user_input = 'Tycker du inte om att vara ute?'

blender_history_chatting(user_input,history_conversation_sv,history_conversation_en)

User>>> Jag ska ut och springa 
 Bot>>> Vilken typ av körning gör du? Jag gillar att springa, men jag är inte särskilt bra på det.
User>>> Jag ska springa 3km. Det är kallt ute så jag vill inte springa längre än så
 Bot>>> Jag gillar att springa också, men jag är inte särskilt bra på det. Jag springer vanligtvis på ett löpband.
User>>> Tycker du inte om att vara ute?
 Bot>>> Jag gillar att vara ute, men jag är inte så bra på det. Jag föredrar att springa på ett löpband.



### Print conversation in english

In [72]:
history_conversation_en.print_dialogue()

User>>> I'm going out for a run
 Bot>>>  What kind of run are you doing? I like to run, but I'm not very good at it.



### Save convo to text file

In [None]:
history_conversation_sv.to_txt('dialogue_history_convo.txt')

# 2.  Subject chatting helper

In [93]:
def blender_subject_chatting(user_input,convo_sv,convo_en):
    #assert convo_sv.lang == 'en' and convo_en.lang =='sv'
    
    translated_user_input = sv_to_en(user_input)
    
    # Add inputs to conversations
    convo_sv.add_user_text(user_input)
    convo_en.add_user_text(translated_user_input)
    
    if len(convo_en.bot_text) == 0:
        model_input = subject + '\n' +  translated_user_input
    else:
        model_input = subject + '\n' + convo_en.bot_text[-1] + '\n' +  translated_user_input

    with no_grad():
        inputs = tokenizer([model_input], return_tensors='pt')
        reply_ids = model.generate(**inputs)

        reply = strip_token(tokenizer.batch_decode(reply_ids)[0]) # Decodes tokens and strips <s> and </s>  
    
    convo_en.add_bot_text(reply)
    convo_sv.add_bot_text(en_to_sv(reply))
    convo_sv.print_dialogue()
    return



### Set subject for conversation

In [102]:
kontext = 'Vad heter du förresten?'
subject = sv_to_en(kontext)

subject_conversation_sv = BlenderConversation('sv','Conversation subject: ' + kontext)
subject_conversation_en = BlenderConversation('en', 'Conversation subject: ' + subject)

## Use cell below to chat 

In [101]:
user_input= 'Åker du snowboard'
blender_subject_chatting(user_input,subject_conversation_sv,subject_conversation_en)

User>>> Hej
 Bot>>> Hej hur mår du? Jag älskar att prata om vädret, det är en av mina favorit saker att göra.
User>>> Hur är vädret hos dig?
 Bot>>> Det är lite kyligt, men inte så illa. Hur är vädret där du är?
User>>> Det är kallt och snöar
 Bot>>> Det är lite kyligt här också, men inte så illa. Jag älskar snön.
User>>> Varför gillar du snö?
 Bot>>> Jag älskar snön eftersom det påminner mig om att vara barn och leka i snön. Och du då?
User>>> Jag gillar att åka skidor
 Bot>>> Jag har aldrig åkt skidor, men jag har alltid velat åka. Det ser ut som så kul!
User>>> Det är jättekul faktiskt
 Bot>>> Jag har aldrig åkt skidor heller, men det ser ut som att det skulle vara jättekul.
User>>> Åker du snowboard
 Bot>>> Jag har aldrig snowboard, men jag skulle vilja prova det. Jag tror att det skulle vara jättekul.



### Print conversaton in english

In [None]:
subject_conversation_en.print_dialogue()

### Save conversation to text file 

In [103]:
subject_conversation_sv.to_txt('subject_convo.txt')

# 3. Dynamic inoput context -TODO 