# Creating a simple chatbot with open-source LLMs using Python and Hugging Face

In this notebook, we will create a very simple but functional chatbot!

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX04ESEN/images/DALL%C2%B7E%202023-06-06%2009.38.20%20-%20a%20robot%20driving%20a%20convertible%20sports%20car%20towards%20the%20sunset%2C%20digital%20art.png" width="400" alt="robot driving car">

## Learning outcomes:

By the end of this lab, you will understand:

- the main components of a chatbot
- what an LLM is
- how to choose an LLM for your application
- how a transformer essentially works
- how to feed input into a transformer (tokenization)
- how to program your own simple chatbot in Python

## Introduction: Under the hood of a chatbot


### Intro: How does a chatbot work?

A chatbot is a computer program that takes a text input, and returns a corresponding text output.

Chatbots use a special kind of computer program called a transformer, which is like its brain. Inside this brain, there is something called a language model (LLM), which helps the chatbot understand and generate human-like responses. It looks at lots of examples of human conversations it has seen before to help it respond in a way that makes sense.

Transformers and LLMs work together within a chatbot to enable conversation. Here's a simplified explanation of how they interact:

    Input Processing: When you send a message to the chatbot, the transformer helps process your input. It breaks down your message into smaller parts and represents them in a way that the chatbot can understand. Each part is called a token.

    Understanding Context: The transformer passes these tokens to the LLM, which is a language model trained on lots of text data. The LLM has learned patterns and meanings from this data, so it tries to understand the context of your message based on what it has learned.

    Generating Response: Once the LLM understands your message, it generates a response based on its understanding. The transformer then takes this response and converts it into a format that can be easily sent back to you.

    Iterative Conversation: As the conversation continues, this process repeats. The transformer and LLM work together to process each new input message, understand the context, and generate a relevant response.

The key is that the LLM learns from a large amount of text data to understand language patterns and generate meaningful responses. The transformer helps with the technical aspects of processing and representing the input/output data, allowing the LLM to focus on understanding and generating language

Once the chatbot understands your message, it uses the language model to generate a response that it thinks will be helpful or interesting to you. The response is sent back to you, and the process continues as you have a back-and-forth conversation with the chatbot.

### Intro: Hugging Face

Hugging Face is an organization that focuses on natural language processing (NLP) and AI. They provide a variety of tools, resources, and services to support NLP tasks.

We'll be making use of their Python library `transformers`, as you'll see soon.

Alright! Now that we know how a chatbot works at a high-level, let's get started with implementing a simple chatbot!

## Step 1: Installing Requirements

For this example, we will be using the `transformers` library, which is an open-source natural language processing (NLP) toolkit with many useful features.


In [1]:
pip install transformers



## Step 2: Import our required tools from the transformers library

For this example, we will be using `AutoTokenizer` and `AutoModelForSeq2SeqLM` from the `transformers` library.


In [2]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

## Step 3: Choosing a model

Choosing the right model for your purposes is an important part of building chatbots! You can read on the different types of models available on the Hugging Face website: https://huggingface.co/models.

LLMs differ from each other in how they are trained. Let's gloss over some examples to see how different models fit better in various contexts.

- **Text Generation**:
    If you need a general-purpose text generation model, consider using the GPT-2 or GPT-3 models. They are known for their impressive language generation capabilities.
    Example: You want to build a chatbot that generates creative and coherent responses to user input.

- **Sentiment Analysis**:
    For sentiment analysis tasks, models like BERT or RoBERTa are popular choices. They are trained to understand the sentiment and emotional tone of text.
    Example: You want to analyze customer feedback and determine whether it is positive or negative.

- **Named Entity Recognition**:
    LLMs such as BERT, GPT-2, or RoBERTa can be used for Named Entity Recognition (NER) tasks. They perform well in understanding and extracting entities like person names, locations, organizations, etc.
    Example: You want to build a system that extracts names of people and places from a given text.

- **Question Answering**:
    Models like BERT, GPT-2, or XLNet can be effective for question answering tasks. They can comprehend questions and provide accurate answers based on the given context.
    Example: You want to build a chatbot that can answer factual questions from a given set of documents.

- **Language Translation**:
    For language translation tasks, you can consider models like MarianMT or T5. They are designed specifically for translating text between different languages.
    Example: You want to build a language translation tool that translates English text to French.

However, these examples are very limited and the fit of an LLM may depend on many factors such as data availability, performance requirements, resource constraints, and domain-specific considerations. It's important to explore different LLMs thoroughly and experiment with them to find the best match for your specific application.

Other important purposes that should be taken into consideration when choosing an LLM include (but are not limited to):
- Licensing: Ensure you are allowed to use your chosen model the way you intend
- Model size: Larger models may be more accurate, but might also come at the cost of greater resource requirements
- Training data: Ensure that the model's training data aligns with the domain or context you intend to use the LLM for
- Performance and accuracy: Consider factors like accuracy, runtime, or any other metrics that are important for your specific use case

To explore all the different options, check out the available [models on the Hugging Face website](https://huggingface.co/models).

For this example, we'll be using "facebook/blenderbot-400M-distill" because it has an open-source license and runs relatively fast.


In [3]:
model_name = "facebook/blenderbot-400M-distill"

## Step 4: Fetch the model and initialize a tokenizer

When running this code for the first time, the host machine will download the model from Hugging Face API.
However, after running the code once, the script will not re-download the model and will instead reference the local installation.

We'll be looking at two terms here: `model` and `tokenizer`.

In this script, we initiate variables using two handy classes from the `transformers` library:
- `model` is an instance of the class `AutoModelForSeq2SeqLM`, which allows us to interact with our chosen language model.
- `tokenizer` is an instance of the class `AutoTokenizer`, which optimizes our input and passes it to the language model efficiently. It does so by converting our text input to "tokens", which is how the model interprets the text.


In [4]:
# Load model (download on first run and reference local installation for consequent runs)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/730M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/730M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

## Step 5: Chat

Now that we're all set up, let's start chatting!

There are several things we'll do to have an effective conversation with our chatbot.

Before interacting with our model, we need to initialize an object where we can store our conversation histiry.
1. Initialize object to store conversation history

Afterwards, we'll do the following for each interaction with the model:
2. Encode conversation history as a string
3. Fetch prompt from user
4. Tokenize (optimize) prompt
5. Generate output from model using prompt and history
6. Decode output
7. Update conversation history

### Step 5.1: Keeping track of conversation history

The conversation history is important when interacting with a chatbot because the chatbot will also reference the previous conversations when generating output.

For our simple implementation in Python, we may simply use a list. Per the Hugging Face implementation, we will use this list to store the conversation history as follows:

```
conversation_history

>> [input_1, output_1, input_2, output_2, ...]
```

Let's initialize this list before any conversations occur.


In [38]:
conversation_history = []

### Step 5.2: Encoding the conversation history

During each interaction, we will pass our conversation history to the model along with our input so that it may also reference the previous conversation when generating the next answer.

The `transformers` library function we are using expects to receive the conversation history as a string, with each element separated by the newline character `'\n'`. Thus, we create such a string.

We'll use the `join()` method in Python to do exactly that. (Initially, our history_string will be an empty string, which is okay, and will grow as the conversation goes on)ß


In [39]:
history_string = "\n".join(conversation_history)
history_string

''

### Step 5.3: Fetch prompt from user

Befor we start building a simple terminal chatbot, let's example, the input will be


In [40]:
input_text ="hello, how are you doing?"
input_text

'hello, how are you doing?'

### Step 5.4: Tokenization of User Prompt and Chat History


Tokens in NLP are individual units or elements that text or sentences are divided into. Tokenization or vectorization is the process of converting tokens into numerical representations. In NLP tasks, we often use the `encode_plus` method from the `tokenizer` object to perform tokenization and vectorization. Let's encode our inputs (prompt & chat history) as tokens so that we may pass them to the model.


In [41]:
inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")
inputs

{'input_ids': tensor([[1710,   86,   19,  544,  366,  304,  929,   38]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

In doing so, we've now created a Python `dictionary` which contains special keywords that allow the model to properly reference its contents.
To learn more about tokens and their associated pretrained vocabulary files, you can explore the pretrained_vocab_files_map attribute. This attribute provides a mapping of pretrained models to their corresponding vocabulary files.


In [42]:
tokenizer.pretrained_vocab_files_map

{}



### Step 5.5: Generate output from model

Now that we have our inputs ready, both past and present inputs, we can pass them to the model and generate a response. According to the documentation, we can use the `generate()` function and pass the inputs as keyword arguments ([kwargs](https://www.freecodecamp.org/news/args-and-kwargs-in-python/)).


In [43]:
outputs = model.generate(**inputs)
outputs

tensor([[   1,  281,  476,  929,  731,   21,  281,  632,  929,  712,  731,   21,
          855,  366,  304,   38,  946,  304,  360,  463, 5459, 7930,   38,    2]])

Great - now we have our outputs! However, the current output `outputs` is also a dictionary and contains tokens, not words in plaintext.
Therefore, we just need to decode the first index of `outputs` to see the response in plaintext.

### Step 5.6: Decode output

We may decode the output using `tokenizer.decode()`. This is know as "detokenization" or "reconstruction". It is the process of combining or merging individual tokens back into their original form, typically to reconstruct the original text or sentence


In [44]:
response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
response

"I'm doing well. I am doing very well. How are you? Do you have any hobbies?"

Alright! We've successfully had an interaction with our chatbot! We've given it a prompt, and we received its response.

Now, all that's left to do is to update our conversation history, so that we may pass it with the next iteration.

### Step 5.7: Update Conversation History

All we need to do here is add both the input and response to `conversation_history` in plaintext.


In [45]:
conversation_history.append(input_text)
conversation_history.append(response)
conversation_history

['hello, how are you doing?',
 "I'm doing well. I am doing very well. How are you? Do you have any hobbies?"]

# Step 6: Repeat

We have gone through all the steps of interacting with your chatbot. Now, we can put everything in a loop and run a whole conversation! (please note that it takes time to response)


In [47]:
# while True:
#     # Create conversation history string
#     history_string = "\n".join(conversation_history)

#     # Get the input data from the user
#     input_text = input("> ")

#     # Tokenize the input text and history
#     inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")

#     # Generate the response from the model
#     outputs = model.generate(**inputs)

#     # Decode the response
#     response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
#     print(response)

#     # Add interaction to conversation history
#     conversation_history.append(input_text)
#     conversation_history.append(response)


In [46]:
while True:
    # Create conversation history string
    history_string = "\n".join(conversation_history)

    # Get the input data from the user
    input_text = input("> ")

    # Jika user mengetik 'exit' atau 'quit', keluar dari loop
    if input_text.lower() in ["exit", "quit", "stop", "bye"]:
        print("👋 Percakapan diakhiri.")
        break

    # Tokenize the input text and history
    inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")

    # Generate the response from the model
    outputs = model.generate(**inputs)

    # Decode the response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    print(response)

    # Add interaction to conversation history
    conversation_history.append(input_text)
    conversation_history.append(response)

> hi
That's great! I'm doing pretty well as well. What hobbies do you have?
> running and play football
I love running, but I'm not very good at it. I'm more of an outdoorsy person.
> so you like traveling
Yes, I love traveling. What is your favorite place you have visited so far?
> mount Bromo, that was beautiful place in Indonesia
I have never been there. I would love to go someday. My favorite place I have been to is Hawaii.
> bye
👋 Percakapan diakhiri.


Voila! We have built a simple, functional chatbot that we can interact with through our terminal!


### Authors

J.C.(Junxing) Chen  

Sina Nazeri


### Modified by
Luthfi Krisna Bayu
