<a href="https://colab.research.google.com/github/jchen8000/DemystifyingLLMs/blob/main/6_Deployment/Chatbot_HuggingFace_Hosted_LLM_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6.11 Chatbot, Example of LLM-Powered Application

## Chatbot built on a HuggingFace hosted model.

Inference with an LLM can be compute-intensive, HuggingFace hosts a lot of publicly accessible LLMs on its infrastructure for test and evaluation for free.

The Huggingface_hub library provides an easy way to call a service that runs inference for hosted models. Here we use *InferenceClient* to do the inference.

[Mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) model is used for this example. We pass the chat history together with prompt for inference.

A [Huggingface Access Token](https://huggingface.co/docs/hub/en/security-tokens) is needed for this example.


In [None]:
from huggingface_hub import InferenceClient
HuggingFaceToken = 'Huggingface Access Token'
client = InferenceClient("mistralai/Mistral-7B-Instruct-v0.3", token=HuggingFaceToken)

In [None]:
def format_prompt(message, history):
    prompt = "<s>"
    for user_prompt, bot_response in history:
      prompt += f"[INST] {user_prompt} [/INST]"
      prompt += f" {bot_response}</s> "
    prompt += f"[INST] {message} [/INST]"
    return prompt

def ask_model(prompt, history, temperature=0.9, max_new_tokens=256, top_p=0.95, repetition_penalty=1.0):

    generate_kwargs = dict(
        temperature=temperature,
        max_new_tokens=max_new_tokens,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
        do_sample=True,
        seed=42,
    )

    formatted_prompt = format_prompt(prompt, history)

    output = client.text_generation(formatted_prompt, **generate_kwargs)
    return output


In [None]:
def chatbot():

    print("Chatbot initialized. You can start chatting now (type 'quit' to stop)!\n")
    history = []

    while True:
        # Get user input
        user_input = input("You: ")

        # Check if the user wants to quit
        if user_input.lower() == "quit":
            break

        answer = ask_model(user_input, history)
        history.append([user_input, answer])
        # print(history)

        print(f"Chatbot: {answer}\n")

# Run the chatbot
chatbot()


Chatbot initialized. You can start chatting now (type 'quit' to stop)!

You: Hello
Chatbot: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help.

You: How are you?
Chatbot: I'm just a computer program, so I don't have feelings or emotions like humans do. I'm here to provide information and help you with your questions to the best of my ability. How can I assist you today?

You: What are the top 5 biggest cities in Canada?
Chatbot: The top 5 largest cities in Canada by population (as of 2021) are:

1. Toronto (Toronto-Durham Region CMA) - 6,417,516
2. Montreal (CMA) - 4,391,574
3. Vancouver (CMA) - 2,642,811
4. Calgary (CMA) - 1,388,988
5. Ottawa-Gatineau (CMA) - 1,463,891 (split between Ontario and Quebec)

These population numbers are for the entire metropolitan areas, not just the city proper. The cities themselves have smaller populations.

You: What is the next biggies city?
Chatbot: The city that is projected