# Conversation between DialogueGpt and Leolani

In this notebook, we create a loop in which DialogGpt has a conversation with Leolani. We send Leolani's initial prompt to DialogueGpt to start a conversation.
Next we capture Leolani's response from the brain and DialogueGpt's response constinuously until we meet the stop condition. Extracted triples are posted to a brain called DialueGpt.

In principle, this conversation can go on forever. At the end, we save the scenario in EMISSOR.

Before running, start GraphDB and make sure that there is a sandbox repository.
GraphDB can be downloaded from:

https://graphdb.ontotext.com

## Import the necessary modules

In [1]:
import json
import os
import time
import uuid
from datetime import date
from datetime import datetime
from random import getrandbits, choice
import pathlib
import pprint
import spacy

# general imports for EMISSOR and the BRAIN
import emissor as em
import requests
from cltl import brain
from cltl.brain.long_term_memory import LongTermMemory
from cltl.brain.utils.helper_functions import brain_response_to_json
from cltl.combot.backend.api.discrete import UtteranceType
from cltl.reply_generation.data.sentences import GREETING, ASK_NAME, ELOQUENCE, TALK_TO_ME
from cltl.reply_generation.lenka_replier import LenkaReplier
from cltl.triple_extraction.api import Chat, UtteranceHypothesis
from emissor.persistence import ScenarioStorage
from emissor.representation.annotation import AnnotationType, Token, NER
from emissor.representation.container import Index
from emissor.representation.scenario import Modality, ImageSignal, TextSignal, Mention, Annotation, Scenario

[nltk_data] Downloading package punkt to /Users/piek/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [2]:
#!python -m spacy download en

### Import the chatbot utility functions

In [3]:
import sys
import os

src_path = os.path.abspath(os.path.join('..'))
if src_path not in sys.path:
    sys.path.append(src_path)

#### The next utils are needed for the interaction and creating triples and capsules
import chatbots.util.driver_util as d_util
import chatbots.util.capsule_util as c_util
import chatbots.intentions.talk as talk
import chatbots.intentions.get_to_know_you as friend

## Import a conversation agent pipeline

In [4]:
#from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel, AutoModelWithLMHead
#import torch

#tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-medium')
#model = AutoModelForCausalLM.from_pretrained('microsoft/DialoGPT-medium')

#tokenizer = AutoTokenizer.from_pretrained('gpt2')
#model = AutoModelForCausalLM.from_pretrained('gpt2')

#tokenizer = AutoTokenizer.from_pretrained("manueltonneau/bert-base-cased-conversational-nli")
#model = AutoModel.from_pretrained("manueltonneau/bert-base-cased-conversational-nli")

#tokenizer = AutoTokenizer.from_pretrained("xlnet-large-cased")
#model = AutoModelForCausalLM.from_pretrained("xlnet-large-cased")

#tokenizer = AutoTokenizer.from_pretrained("t5-small")

#model = AutoModelWithLMHead.from_pretrained("t5-small")

#### Needed to suppress messages from DialgGPT
#import os
#os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [None]:
from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
mname = 'facebook/blenderbot-400M-distill'
model = BlenderbotForConditionalGeneration.from_pretrained(mname)
tokenizer = BlenderbotTokenizer.from_pretrained(mname)
UTTERANCE = "My friends are cool but they eat too many carbs."
inputs = tokenizer([UTTERANCE], return_tensors='pt')
reply_ids = model.generate(**inputs)
print(tokenizer.batch_decode(reply_ids))


## Standard initialisation of a scenario

We initialise a scenario in the standard way by creating a unique folder and setting the AGENT and HUMAN_NAME and HUMAN_ID variables. Throughout this scenario, the HUMAN_NAME and HUMAN_ID will be used as the source for the utterances.

In [None]:
from random import getrandbits
import requests
##### Setting the location
place_id = getrandbits(8)
location = None
try:
    location = requests.get("https://ipinfo.io").json()
except:
    print("failed to get the IP location")
    
##### Setting the agents
AGENT = "Leolani2"
HUMAN_NAME = "BLENDERBOT"
HUMAN_ID = "BLENDERBOT"

### The name of your scenario
scenario_id = datetime.today().strftime("%Y-%m-%d-%H:%M:%S")

### Specify the path to an existing data folder where your scenario is created and saved as a subfolder
# Find the repository root dir
parent, dir_name = (d_util.__file__, "_")
while dir_name and dir_name != "src":
    parent, dir_name = os.path.split(parent)
root_dir = parent
scenario_path = os.path.abspath(os.path.join(root_dir, 'data'))

if not os.path.exists(scenario_path) :
    os.mkdir(scenario_path)
    print("Created a data folder for storing the scenarios", scenario_path)

### Define the folders where the images and rdf triples are saved
imagefolder = scenario_path + "/" + scenario_id + "/" + "image"
rdffolder = scenario_path + "/" + scenario_id + "/" + "rdf"

### Create the scenario folder, the json files and a scenarioStorage and scenario in memory
scenarioStorage = d_util.create_scenario(scenario_path, scenario_id)
scenario_ctrl = scenarioStorage.create_scenario(scenario_id, int(time.time() * 1e3), None, AGENT)

## Specifying the BRAIN

We specify the BRAIN in GraphDB and use the scenario path just defined for storing the RDF triple produced in EMISSOR.

If you set *clear_all* to *True*, the sandbox triple store is emptied (memory erased) and the basic ontological models are reloaded. Setting it to *False* means you add things to the current memory.

In [None]:
log_path = pathlib.Path(rdffolder)
my_brain = brain.LongTermMemory(address="http://localhost:7200/repositories/blender",
                                log_dir=log_path,
                                clear_all=True)

## Create an instance of a replier

In [None]:
replier = LenkaReplier()

## Initialise a chat with the HUMAN_ID to keep track of the dialogue history

In [None]:
chat = Chat(HUMAN_ID)

In [None]:
nlp = spacy.load("en_core_web_sm")
#nlp= spacy.load('en') # other languages: de, es, pt, fr, it, nl

## Start the interaction

In [None]:
print_details=True

max_context=500
t1 = datetime.now()

#### Initial prompt by the system from which we create a TextSignal and store it
leolani_prompt = f"{choice(TALK_TO_ME)}"
print('\n\t'+AGENT + ": " + leolani_prompt)
textSignal = d_util.create_text_signal_with_speaker_annotation(scenario_ctrl, leolani_prompt, AGENT)
scenario_ctrl.append_signal(textSignal)


#BLENDERBOT
bot_input_ids = tokenizer(leolani_prompt, return_tensors='pt')
chat_history_ids = model.generate(**bot_input_ids)

# encode the new Leolani input, add the eos_token and return a tensor in Pytorch
##bot_input_ids = tokenizer.encode(leolani_prompt + tokenizer.eos_token, return_tensors='pt')
# append the new user input tokens to the chat history
#bot_input_ids = torch.cat([chat_history_ids, new_input_ids], dim=-1) if step > 0 else new_input_ids
# generated a response while limiting the total chat history to max_context tokens, 
##chat_history_ids = model.generate(bot_input_ids, max_length=max_context, pad_token_id=tokenizer.eos_token_id)
repetition = []
utterance = ""
response_json = None
#### Get input and loop
while (datetime.now()-t1).seconds <= 3600:
    ###### Getting the next input signals
    #utterance = format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True))
    # BLENDER
    utteranceList = tokenizer.batch_decode(chat_history_ids)
    #["<s> Sure, what do you do for a living? I'm an accountant, what about you?</s>"]
    answer = utteranceList[0].strip('</s>')
    #print("ANSWER", answer)
    doc = nlp(answer)
    sentences = []
    for s in doc.sents:
        sentence = ""
        for token in s:
            sentence += token.text+" "
        sentences.append(sentence) 
#    utterance = sentences[len(sentences)-1]
   #print(sentences)                       
    #print("UTTERANCE", utterance)
 
    for utterance in sentences:
        if utterance in repetition:
            print('Repeating', utterance)
            utterance = None
        else:
            repetition.append(utterance)
            
        if not utterance:
            if response_json:
                leolani_prompt =  replier.reply_to_statement(response_json, proactive=True, persist=True)
            else:
                leolani_prompt = f"{choice(TALK_TO_ME)}"

            print('\n\t'+AGENT + ": " + leolani_prompt)
            textSignal = d_util.create_text_signal_with_speaker_annotation(scenario_ctrl, leolani_prompt, AGENT)
            scenario_ctrl.append_signal(textSignal)
            
            #BLENDERBOT
            bot_input_ids = tokenizer(leolani_prompt, return_tensors='pt')
            chat_history_ids = model.generate(**bot_input_ids)


            # encode the new Leolani input, add the eos_token and return a tensor in Pytorch
            ##new_input_ids = tokenizer.encode(leolani_prompt + tokenizer.eos_token, return_tensors='pt')
            # append the new user input tokens to the chat history
            ##bot_input_ids = torch.cat([chat_history_ids, new_input_ids], dim=-1)
            # generated a response while limiting the total chat history to max_context tokens, 
            ##chat_history_ids = model.generate(bot_input_ids, max_length=max_context, pad_token_id=tokenizer.eos_token_id)
        else:
            print('\n\t'+HUMAN_NAME + ": " + utterance)
            textSignal = d_util.create_text_signal_with_speaker_annotation(scenario_ctrl, utterance, HUMAN_ID)
            scenario_ctrl.append_signal(textSignal)

            #### Process input and generate reply

            capsule, leolani_prompt, response_json = talk.process_text_and_reply(scenario_ctrl,
                                   place_id,
                                   location,
                                   HUMAN_ID,
                                   textSignal,
                                   chat,
                                   replier,
                                   my_brain,
                                   print_details)

            if not capsule:
                replies, response_json = talk.process_text_spacy_and_reply(scenario_ctrl,
                                   place_id,
                                   location,
                                   HUMAN_ID,
                                   textSignal,
                                   chat,
                                   replier,
                                   my_brain,
                                   nlp,
                                   print_details)
                for reply in replies:
                    leolani_prompt+= reply+"."

            print('\n\t'+AGENT + ": " + leolani_prompt)
            textSignal = d_util.create_text_signal_with_speaker_annotation(scenario_ctrl, leolani_prompt, AGENT)
            scenario_ctrl.append_signal(textSignal)
            
            #BLENDERBOT
            bot_input_ids = tokenizer(leolani_prompt, return_tensors='pt')
            chat_history_ids = model.generate(**bot_input_ids)


        # encode the new Leolani input, add the eos_token and return a tensor in Pytorch
        ##new_input_ids = tokenizer.encode(leolani_prompt + tokenizer.eos_token, return_tensors='pt')
        # append the new user input tokens to the chat history
        ##bot_input_ids = torch.cat([chat_history_ids, new_input_ids], dim=-1)
        # generated a response while limiting the total chat history to max_context tokens, 
        ##hat_history_ids = model.generate(bot_input_ids, max_length=max_context, pad_token_id=tokenizer.eos_token_id)


## Save the scenario data

In [None]:
scenario_ctrl.scenario.ruler.end = int(time.time() * 1e3)
scenarioStorage.save_scenario(scenario_ctrl)

## End of notebook