# Chatbot - Prompt Engineering

In this notebook, you'll embark on developing chatbot functionality, crafting an AI bot equipped with the ability to maintain conversation history.

## Learning Objectives

Upon completing this notebook, you will have acquired the ability to:

- Develop chatbot functionality utilizing our Phi-3 model, endowed with the capability to preserve conversation history.

# <FONT COLOR="purple">Verify that the runtime environment is GPU in Colab!</FONT>

## Install Dependencie(s)

In [None]:
# The 'device_map' paramter requires Accelerate package.
# Restart workspace after the install!
!pip install accelerate flash_attn

## Create Microsoft [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) Pipeline

In [None]:
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
from transformers import TextStreamer

# Microsoft Phi-3-mini-4k-instruct model
model = "microsoft/Phi-3-mini-4k-instruct"

# The tokenizer is responsible for converting the text into a format understandable by the model.
tokenizer = AutoTokenizer.from_pretrained(model)

# Load model
model = AutoModelForCausalLM.from_pretrained(model, 
                                             torch_dtype=torch.float16, 
                                             device_map="auto",
                                             trust_remote_code=True,
                                             attn_implementation="eager")

# The task of the streamer object is to ensure that the model's response is continuous. This reduces the waiting time.
streamer = TextStreamer(tokenizer, skip_prompt=True)

## Generate Functions

In this notebook, we will use the following `generate` function to support our interaction with the LLM.

```python
# Microsoft Phi-3-mini-4k-instruct default prompt template

<|system|>
{system}<|end|>
<|user|>
{question}<|end|>
<|assistant|> 
{response}<|end|>
<|user|>
{question}<|end|>
<|assistant|> 
```

In [None]:
def generate(question, system=None, history=[], model=model, max_new_tokens = 512, do_sample=False, temperature=1):
    """
    This function facilitates the generation of text responses leveraging a designated large language model (LLM) pipeline.
    It accepts a prompt as input and transmits it to the specified LLM pipeline to produce a textual output.
    The function offers comprehensive control over the generative process through the inclusion of configurable parameters and keyword arguments.

    - question (str): This parameter holds the user question or any other instruction.
    - system (str): This parameter holds contextual information to be provided to the language model for all conversations.
    - history (array, opitonal) - This parameter stores the chat history. Each tuple within the list comprises a question and the corresponding assistant response.
    - model (object): This object contains the model.
    - max_new_tokens (int, optional) — The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
    - do_sample (bool, optional, defaults to False) — Whether or not to use sampling ; use greedy decoding otherwise.
    - temperature (float, optional, defaults to 1.0) — The value used to modulate the next token probabilities.
    """

    if system is None:
        system = """This is a chat between a user and an artificial intelligence assistant.
        The assistant gives helpful, detailed, and polite answers to the user's questions based on the context.
        The assistant should also indicate when the answer cannot be found in the context."""

    prompt = f"<|system|>\n{system}<|end|>\n"

    # Add each example from the history to the prompt
    for prev_question, prev_response in history:
        prompt += f"<|user|>{prev_question}<|end|>\n<|assistant|>{prev_response}<|end|>\n"
    
    # Add the user_message prompt at the end
    prompt += f"<|user|>{question}<|end|>\n<|assistant|>"
    tokenized_prompt = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(input_ids=tokenized_prompt.input_ids,
                             max_new_tokens=max_new_tokens,
                             streamer=streamer,
                             temperature=1, 
                             do_sample=do_sample)

    # Return the decoded text from outputs
    return tokenizer.decode(outputs[0][tokenized_prompt.input_ids.shape[-1]:], skip_special_tokens=True).strip()

## Chat Assitant Class

In [None]:
class ChatAssitant:
    """
    This is a Chat Assistant interface designed to generate conversational responses utilizing the Phi-3 language model.
    """

    def __init__(self, system_context=None, tokenizer=None, max_tokens=0):
        """
        Constructor of the Chat Assistant class.

        Parameters:
        - system_context (str): This parameter holds contextual information to be provided to the language model for all conversations.
        - tokenizer (object): The tokenizer used to tokenize the conversation for maintaining the history limit.
        - max_tokens (int): The maximum number of tokens allowed in the history.
        - history (list of tuples): This parameter stores the chat history. Each tuple within the list comprises a user message and the corresponding agent response.
        """
        self.tokenizer = tokenizer
        self.max_tokens = max_tokens
        self.system_context = system_context
        self.history = []

    def chat(self, user_message):
        """
        This function generates a response from the chatbot in response to the user's message.
        This method creates a prompt using the current system context and conversation history.
        It then sends this prompt to the language model.
        Finally, it stores the new user message and the model's response in the conversation history.

        Parameters:
        - user_msg (str): The user's input for which the chatbot will generate a response.

        Returns:
        - str: The generated response from the Chat Assistant.
        """

        prompt = generate(user_message, self.system_context, self.history)
        response = generate(prompt)

        # Store this interaction in the conversation history
        self.history.append((user_message, response))

        if self.tokenizer is not None and self.max_tokens > 0:
            self._history_trimmer()

        return response

    def _history_trimmer(self):
        """
        Trims the conversation history to maintain the number of tokens below the specified limit.
        """
        # Combine the conversation history into a unified string.
        history_string = ''.join(user + assitant for user, assitant in self.history)
        # Compute the total number of tokens in the conversation history.
        history_tokens = len(self.tokenizer.encode(history_string))

        # While the history exceeds the maximum token limit, remove the oldest items
        while history_tokens > self.max_tokens:
            # Check history. We need one item at least.
            if self.history:
                # Pop the oldest item from history.
                self.history.pop(0)
                # Recalculate the history string
                history_string = ''.join(user + assitant for user, assitant in self.history)
                history_tokens = len(self.tokenizer.encode(history_string))
            else:
                # If the conversation history is empty, exit the loop.
                break

    def reset(self):
        # Clear conversation history
        self.history = []

---

## Without Memory

Let's initiate a conversation with our LLM. We'll set a system context instructing it to behave as a friendly chatbot. Additionally, we'll naively encourage it to remember our name if it's provided during the conversation.

In [None]:
prompt = "Hello, I'm Peter. It's a pleasure to make your acquaintance!"

_ = generate(prompt)

The model certainly appears to be eager to show that it remembers who we are. Let's see what happens when we actually put its name-retention to the test.

In [None]:
prompt = "Could you kindly remind me of my name, please?"

_ = generate(prompt)

It's not unexpected that the model doesn't recall our name. This is because, despite its presentation, we haven't equipped it with the ability to retain details from previous conversations. The model seems to assert that our name is "Emily," which is evidently incorrect. When models produce responses that are fictional, often with unwavering confidence, we refer to this as hallucination.

---

## With Memory

To facilitate the creation of a chatbot experience capable of retaining information from past interactions, we'll employ a `ChatAssistant` class (defined earlier). Below, you'll find the ``help`` output extracted from our class definition.

In [None]:
help(ChatAssitant)

---

Of utmost relevance to our current objective is establishing a **conversation_history** list, which we'll update each time we use the chat method. We'll apply familiar logic from earlier notebooks, notably employing the Phi-3 **prompt template**. This ensures proper formatting of each user/model interaction, which is then added to the prompt for subsequent exchanges.

It's apt to describe our process as conducting **few-shot learning**, where the instructive examples consist of the previous interactions.

In [None]:
system = f"""This is a chat between a user and an artificial intelligence assistant.
Always try to keep your answers is very short and concise!
Do not explain your answers."""

assistant = ChatAssitant(system_context=system)

In [None]:
_ = assistant.chat("Hello, my name is Peter. It's a pleasure to make your acquaintance!")

So far so good. Let's see now if the model is able to "recall" our name.

In [None]:
_ = assistant.chat("Could you tell me what my name is?")

Success! Let's take a look at the model's conversation history.

In [None]:
assistant.history

Considering that `history` is added to the beginning of each new prompt, it's logical that the model can generate responses based on past interactions. The `reset` method is designed to clear `history`.

In [None]:
assistant.reset()

In [None]:
_ = assistant.chat("Could you tell me what my name is?")

As expected, the model is unable to "recall" details from our previous exchanges.

---

## Exercise: Task Tracker

TODO: Develop an assistant capable of managing your daily tasks. It should have the ability to add and remove items from your list based on your conversation. Additionally, it should accurately remind you of the tasks remaining on your list at any given time.

Refer to the solution provided below if you encounter any difficulties.

### Your Work Here

### Solution

In [None]:
system = f"""This is a chat between a user and an artificial intelligence assistant.
Always try to keep your answers is very short and concise!
Do not explain your answers."""

assistant = ChatAssitant(system_context=system)

In [None]:
_ = assistant.chat("Today's agenda includes: having breakfast, lunch, and dinner, going to work, exercising, and cleaning the house.")

In [None]:
_ = assistant.chat("Additionally, I plan to spend some time with friends.")

In [None]:
_ = assistant.chat("Breakfast and exercise are completed.")

In [None]:
_ = assistant.chat("Lunch is done. I should call the bike shop sometime today.")

In [None]:
_ = assistant.chat("Work is finished, along with spending time with friends, cleaning the house, and calling the bike shop.")

In [None]:
_ = assistant.chat("Dinner is eaten. Now, it's time to head to bed..")

In [None]:
assistant.history

In [None]:
assistant.reset()

---

## Review

The following key concepts were introduced in this notebook:

- **Hallucination:** When a model generates, often with some expressed confidence, untrue or inaccurate responses.

---

## Optional Advanced Exercises

Please note: In the upcoming notebook, we will explore the limitations regarding the amount of conversation the model can retain before encountering issues. Bearing this in mind, and prior to suggesting further experimentation, if you observe the model producing only empty responses, proceed to the next section to gain insights into this phenomenon.

For those interested in delving deeper into the course material, below are additional open-ended exercises to consider:

Develop a Helper Bot: Create a bot designed to offer support to individuals during challenging times. This bot should provide encouragement, praise, and empathy, while also discerning when to respond and when to prompt the user for more information about their situation.

---

## Restart the Kernel

In [None]:
from IPython import get_ipython

get_ipython().kernel.do_shutdown(restart=True)

## <FONT COLOR="red">The notebook is licensed under the Creative [Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/). This means that you can freely copy, distribute, and modify the notebook by authors ([Balázs Harangi](https://inf.unideb.hu/dr-harangi-balazs), [András Hajdu](https://inf.unideb.hu/munkatars/4250), and [Róbert Lakatos](https://inf.unideb.hu/lakatos-robert-tanarseged)), but not for commercial purposes. Additionally, if you modify the notebook, you must cite them as the original creators and share the modified version under the same terms.
</FONT>