# Implementing Blenderbot 2.0 and DialoGPT Chatbots

Chatbots have gained a lot of popularity in recent years. As the interest grows in using chatbots for business, researchers also did a great job on advancing conversational AI chatbots.

In this tutorial, we'll use the Huggingface transformers library to employ the pre-trained Blenderbot and DialoGPT models for conversational response generation, then we will have a brief conversation with both about random subjects and evaluate by ourselves if the responses make sense.

Firstly, I have to inform that I prefered quoting a significant amount of the model explanations from official publications because they were extraordinarily well done, so all merit to the authors of them, I just did the implementation, at the end of the notebook I reference the webpages if you want to get more information.

With no more words to say, let's get started!

In [1]:
!pip install transformers



# Blenderbot 2.0 model

Facebook AI Research has built and open-sourced BlenderBot 2.0, the first chatbot that can simultaneously build long-term memory it can continually access, search the internet for timely information, and have sophisticated conversations on nearly any topic. It’s a significant update to the original BlenderBot, which we open-sourced in 2020 and which broke ground as the first to combine several conversational skills — like personality, empathy, and knowledge — into a single system.

- When talking to people, BlenderBot 2.0 demonstrated that it’s better at conducting longer, more knowledgeable, and factually consistent conversations over multiple sessions than its predecessor, the existing state-of-the-art chatbot.

- The model takes pertinent information gleaned during conversation and stores it in a long-term memory so it can then leverage this knowledge in ongoing conversations that may continue for days, weeks, or even months. The knowledge is stored separately for each person it speaks with, which ensures that no new information learned in one conversation is used in another.

- During conversation, the model can generate contextual internet search queries, read the results, and incorporate that information when responding to people’s questions and comments. This means the model stays up-to-date in an ever-changing world.

- Today we’re releasing the complete model, code, and evaluation setup, as well as two new conversational data sets — human conversations bolstered by internet searches, and multisession chats with people that reference previous sessions — used to train the model, so other researchers can reproduce this work and advance conversational AI research.

Current language-generation models such as GPT-3 and Facebook AI’s first version of BlenderBot can articulately express themselves, at least in the context of ongoing conversations, and generate realistic-looking text. But they suffer from very short “goldfish memory,” and any long-term memory they do have is static — it’s limited to what they’ve been previously taught. They can never gain additional knowledge, which is why GPT-3 and BlenderBot believe that NFL superstar Tom Brady is still on the New England Patriots, and don’t know that he won the 2021 Super Bowl with the Tampa Bay Buccaneers. Similarly, it knows about past popular TV shows and movies, but isn’t aware of new series, like WandaVision.

Finally, we have to consider that there are versions of this model according to the number of parameters such as:

- facebook/blenderbot_small-90M
- facebook/blenderbot-400M-distill
- facebook/blenderbot-1B-distill
- facebook/blenderbot-3B

The use of these models depends on our compute resources being the RAM the most important, fortunately Kaggle allows loading the biggest one so we will choose it for being the most complex and powerful model. 

In [2]:
from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
import torch

We have to load the pre-trained model from huggingface and use it in the tokenizer and model appropriate for Blenderbot. 

In [3]:
model_blender = BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-3B")
tokenizer_blender = BlenderbotTokenizer.from_pretrained("facebook/blenderbot-3B")

Downloading:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/5.10G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/146k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/61.4k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/74.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/130 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/302k [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


The following lines will create a loop in which the input message is tokenized and used as argument of the model in order to generate a response that has to be translated (encoded) to natural language:

In [5]:
print("Type \"q\" to quit")
while True:
    message = input("MESSAGE: ")
    if message in ["", "q", "quit"]:
        break
    inputs = tokenizer_blender([message], return_tensors='pt')
    reply_ids = model_blender.generate(**inputs)
    print(f"Blenderbot 2.0 response:     {tokenizer_blender.batch_decode(reply_ids, skip_special_tokens=True)[0]}")

Type "q" to quit


MESSAGE:  ""


Blenderbot 2.0 response:      Do you know how to use quotation marks in a sentence? I do not.


MESSAGE:  No


Blenderbot 2.0 response:      Do you think it's a good idea? I'm not sure if I should do it


MESSAGE:  yes


Blenderbot 2.0 response:      Do you know anyone who has done it? I'm not sure if I am


MESSAGE:  yes


Blenderbot 2.0 response:      Do you know anyone who has done it? I'm not sure if I am


MESSAGE:  no


Blenderbot 2.0 response:      Do you know anyone who has done it? I know it's not for everyone.


MESSAGE:  no


Blenderbot 2.0 response:      Do you know anyone who has done it? I know it's not for everyone.


MESSAGE:  q


Well, the model did not follow correctly the historic conversation, we can see it was not able to respond about the current number of deaths by COVID-19 and at the end it totally lost sense.

Important to notice the model took around 30 seconds to respond to each input I gave, this is a considerably long time if we take into account that a customer wouldn't be willing to wait that much just to have a response to their doubts, thus this must be the next challenge to face.

# DialoGPT model

DialoGPT is a large-scale tunable neural conversational response generation model trained on 147M conversations extracted from Reddit. The good thing is that you can fine-tune it with your dataset to achieve better performance than training from scratch.

There are three versions of DialoGPT; small, medium, and large. Fortunately Kaggle didn't have any problem loading the largest version, so let's get started:

We have to do exactly the same process for this one. Firstly, import the tokenizer and model which will work by loading into them the pre-trained DialoGPT:

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "microsoft/DialoGPT-large"
# model_name = "microsoft/DialoGPT-medium"
# model_name = "microsoft/DialoGPT-small"

tokenizer_GPT = AutoTokenizer.from_pretrained(model_name)
model_GPT = AutoModelForCausalLM.from_pretrained(model_name)

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Now, this process is longer than in the previous model, but the logic is the same, the input message is encoded with the tokenizer created, then concatenate with the historic messages and use it as argument of the model so as to generate a response, finally this is translated back to natural language.

There are some improvements. However, sampling on an exhaustive list of sequences with low probabilities can lead to random generation (like you see in the last sentence).

To improve it furthermore, we can:

- Lower the sampling temperature that helps us decrease the likelihood of picking low probability words and increase the likelihood of picking high probability words.
- Use Top-k sampling instead of picking all probable occurrences. This will help us to discard low probability words from getting picked.
- Nucleus sampling or Top-p sampling chooses from the smallest possible words whose cumulative probability exceeds the parameter p we set.
- We set do_sample to True for sampling.
- Set skip_special_tokens to True to make sure we don't see any annoying special tokens such as <|endoftext|>.


In [7]:
step = 0

print("Type \"q\" to quit")
while True:
    text = input("MESSAGE: ")
    if text in ["", "q", "quit"]:
        break
    # encode the input and add end of string token
    input_ids = tokenizer_GPT.encode(text + tokenizer_GPT.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model_GPT.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=0,
        temperature=0.75,
        pad_token_id=tokenizer_GPT.eos_token_id
    )
    #print the output
    output = tokenizer_GPT.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    step = step + 1
    print(f"DialoGPT response: {output}")

Type "q" to quit


MESSAGE:  ""


DialoGPT response: I feel like you might be out of your element.


MESSAGE:  how you know?


DialoGPT response: You looked at my flair...


MESSAGE:  seriously?


DialoGPT response: I am a guy that knows many people on the server.


MESSAGE:  how to trust you?


DialoGPT response: I am not the person you are asking.


MESSAGE:  great


DialoGPT response: I know you know me


MESSAGE:  yes


DialoGPT response: greatness intensifies


MESSAGE:  not really


DialoGPT response: hehe is soooooo adorable


MESSAGE:  getting bored now


DialoGPT response: He is indeed adorable


MESSAGE:  q


The model apparently has chosen a not too nice persona, this is why it gave pesimistic responses and didn't want to say much about it. I think it has the problem of memory at the same level as the previous model. 

The unique advantage of this one is that it took around 7 seconds to respond to the input messages. 

We have to see these technologies in an optimistic way and at first sight definitely both require fine tuning and training on datasets about specific subjects in order to be able to respond more appropriately to the user, but this obviously depends on the purposes a company would like to give it. 

Finally, I have to give credits to the publications from which I gathered and quoted a big part of the explanations, as they inspired me to tackle this fascinating field I absolutely encourage you to take a look at their content:

https://ai.facebook.com/blog/blender-bot-2-an-open-source-chatbot-that-builds-long-term-memory-and-searches-the-internet/

https://www.thepythoncode.com/article/conversational-ai-chatbot-with-huggingface-transformers-in-python


I would like to know any feedback in order to increase the performance of the model or tell me if you found a different one even better!

If you liked this notebook I would appreciate so much your upvote if you want to see more projects/tutorials like this one. I encourage you to see my projects portfolio, am sure you will love it.

Thank you!