<a href="https://colab.research.google.com/github/luis-arrieta/Building-Generative-AI-Powered-Applications-with-Python/blob/main/Simple_Chatbot_with_Open_Source_LLMs_using_Python_and_Hugging_Face.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Create a Simple Chatbot with Open Source LLMs using Python and Hugging Face ##

### Learning outcomes: ###

At the end of this lab, you will be able to:

* Describe the main components of a chatbot
* Explain what an LLM is
* Select an LLM for your application
* Describe how a transformer essentially works
* Feed input into a transformer (tokenization)
* Program your own simple chatbot in Python

### Step 1: Installing requirements ###

Follow these steps to create a Python virtual environment and install the necessary libraries.

Set up your virtual environment:

In [None]:
!pip3 install virtualenv

In [None]:
!virtualenv my_env # create a virtual environment my_env

In [7]:
!source my_env/bin/activate # activate my_env

### Step 2: Import our required tools from the transformers library ###

For this example, you will be using AutoTokenizer and AutoModelForSeq2SeqLM from the transformers library.

In [8]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

### Step 3: Choosing a model ###

Choosing the right model for your purposes is an important part of building chatbots! You can read on the different types of models available on the Hugging Face website: https://huggingface.co/models.

LLMs differ from each other in how they are trained. Let's look at some examples to see how different models fit better in various contexts.

* Text generation: If you need a general-purpose text generation model, consider using the GPT-2 or GPT-3 models. They are known for their impressive language generation capabilities.
Example: You want to build a chatbot that generates creative and coherent responses to user input.

* Sentiment analysis: For sentiment analysis tasks, models like BERT or RoBERTa are popular choices. They are trained to understand the sentiment and emotional tone of text.
Example: You want to analyze customer feedback and determine whether it is positive or negative.

* Named entity recognition: LLMs such as BERT, GPT-2, or RoBERTa can be used for Named Entity Recognition (NER) tasks. They perform well in understanding and extracting entities like person names, locations, organizations, etc.
Example: You want to build a system that extracts names of people and places from a given text.

* Question answering: Models like BERT, GPT-2, or XLNet can be effective for question-answering tasks. They can comprehend questions and provide accurate answers based on the given context.
Example: You want to build a chatbot that can answer factual questions from a given set of documents.

* Language translation: For language translation tasks, you can consider models like MarianMT or T5. They are designed specifically for translating text between different languages.
Example: You want to build a language translation tool that translates English text to French.

However, these examples are very limited and the fit of an LLM may depend on many factors such as data availability, performance requirements, resource constraints, and domain-specific considerations. It's important to explore different LLMs thoroughly and experiment with them to find the best match for your specific application.

Other important purposes that should be taken into consideration when choosing an LLM include (but are not limited to):

* Licensing: Ensure you are allowed to use your chosen model the way you intend

* Model size: Larger models may be more accurate, but might also come at the cost of greater resource requirements

* Training data: Ensure that the model's training data aligns with the domain or context you intend to use the LLM for

* Performance and accuracy: Consider factors like accuracy, runtime, or any other metrics that are important for your specific use case

To explore all the different options, check out the available models on the Hugging Face website.

For this example, you'll be using facebook/blenderbot-400M-distill because it has an open-source license and runs relatively fast.

In [9]:
model_name = "facebook/blenderbot-400M-distill"

### Step 4: Fetch the model and initialize a tokenizer ###
When running this code for the first time, the host machine will download the model from Hugging Face API.
However, after running the code once, the script will not re-download the model and will instead reference the local installation.

You'll be looking at two terms here: model and tokenizer.

In this script, you initiate variables using two handy classes from the transformers library:

* model is an instance of the class AutoModelForSeq2SeqLM, which allows you to interact with your chosen language model.

* tokenizer is an instance of the class AutoTokenizer, which optimizes your input and passes it to the language model efficiently. It does so by converting your text input to “tokens”, which is how the model interprets the text.



In [None]:
# Load model (download on first run and reference local installation for consequent runs)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

### Step 5: Chat ###
Now that you're all set up, let's start chatting!

There are several things you'll do to have an effective conversation with your chatbot.

Before interacting with your model, you need to initialize an object where you can store your conversation history.

* Initialize object to store conversation history

Afterward, you'll do the following for each interaction with the model:

* Encode conversation history as a string
* Fetch prompt from user
* Tokenize (optimize) prompt
* Generate output from the model using prompt and history
* Decode output
* Update conversation history

### Step 5.1: Keeping track of conversation history ###

The conversation history is important when interacting with a chatbot because the chatbot will also reference the previous conversations when generating output.

For your simple implementation in Python, you may use a list. Per the Hugging Face implementation, you will use this list to store the conversation history as follows:

conversation_history

```python
[input_1, output_1, input_2, output_2, …]
```

Let's initialize this list before any conversations occur.

In [11]:
conversation_history = []

### Step 5.2: Encoding the conversation history ###

During each interaction, you will pass your conversation history to the model along with your input so that it may also reference the previous conversation when generating the next answer.

The transformers library function you are using expects to receive the conversation history as a string, with each element separated by the newline character '\n'. Thus, you create such a string.

You'll use the join() method in Python to do exactly that. (Initially, your history_string will be an empty string, which is okay, and will grow as the conversation goes on).

In [12]:
history_string = "\n".join(conversation_history)

### Step 5.3: Fetch prompt from user ###

Before you start building a simple terminal chatbot, let's look at an example of the input:

In [13]:
input_text ="hello, how are you doing?"

### Step 5.4: Tokenization of user prompt and chat history ###

Tokens in NLP are individual units or elements that text or sentences are divided into. Tokenization or vectorization is the process of converting tokens into numerical representations.

In NLP tasks, you often use the encode_plus method from the tokenizer object to perform tokenization and vectorization. Let's encode your inputs (prompt & chat history) as tokens so that you may pass them to the model.

In [14]:
inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")
print(inputs)

{'input_ids': tensor([[1710,   86,   19,  544,  366,  304,  929,   38]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}


To learn more about tokens and their associated pretrained vocabulary files, you can explore the pretrained_vocab_files_map attribute. This attribute provides a mapping of pretrained models to their corresponding vocabulary files.

In [15]:
tokenizer.pretrained_vocab_files_map

{}

### Step 5.5: Generate output from the model ###

Now that you have your inputs ready, both past and present inputs, you can pass them to the model and generate a response. According to the documentation, you can use the generate() function and pass the inputs as keyword arguments (kwargs).

In [16]:
outputs = model.generate(**inputs)
print(outputs)

tensor([[   1,  281,  476,  929,  731,   21,  281,  632,  929,  712,  731,   21,
          855,  366,  304,   38,  946,  304,  360,  463, 5459, 7930,   38,    2]])


Great - now you have your outputs! However, the current output outputs is also a dictionary and contains tokens, not words in plaintext.

Therefore, you just need to decode the first index of outputs to see the response in plaintext.

Please note that the model used in this project is a basic, lightweight version, not intended for handling complex queries. For more advanced and robust LLMs, you can explore a wide range of options at huggingface.com.

### Step 5.6: Decode output ###

You may decode the output using tokenizer.decode(). This is known as "detokenization" or "reconstruction". It is the process of combining or merging individual tokens back into their original form, to reconstruct the original text or sentence.

In [17]:
response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
print(response)

I'm doing well. I am doing very well. How are you? Do you have any hobbies?


### Step 5.7: Update conversation history ###

All you need to do here is add both the input and response to conversation_history in plaintext.

In [18]:
conversation_history.append(input_text)
conversation_history.append(response)
print(conversation_history)

['hello, how are you doing?', "I'm doing well. I am doing very well. How are you? Do you have any hobbies?"]


### Step 6: Repeat ###

You have gone through all the steps of interacting with your chatbot. Now, you can put everything in a loop and run a whole conversation!

In [None]:
while True:
    # Create conversation history string
    history_string = "\n".join(conversation_history)
    # Get the input data from the user
    input_text = input("> ")
    # Tokenize the input text and history
    inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")
    # Generate the response from the model
    outputs = model.generate(**inputs)
    # Decode the response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

    print(response)
    # Add interaction to conversation history
    conversation_history.append(input_text)
    conversation_history.append(response)

> How are you?
That's great! I'm doing pretty well as well. What hobbies do you have?
