# Chatbot Demo 

## Description

This notebook demonstrates how to create a simple chatbot interface using Llama models via Hugging Face's `transformers` library and `gradio` for the user interface. 

### Main Features:
- Select from a list of pre-trained Llama models.
- Input text and receive responses from the selected model.
- Cache models to avoid reloading them multiple times.

## Log in to Hugging Face Hub

Log in to your Hugging Face account to access the models.

In [None]:
from huggingface_hub import login
login()

## Import Required Libraries

In [None]:
import gradio as gr
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
device = 0 if torch.cuda.is_available() else -1

## Model Selection

Below is a list of available Llama models to choose from. The `load_model` function loads the selected Llama model along with its tokenizer. It uses Hugging Face's `AutoModelForCausalLM` and `AutoTokenizer` to load the pre-trained model and return a text generation pipeline. Note that for each version of Llama, you need to separately request access to be able to use them.

In [None]:
llama_models = {
    "Llama 3 70B Instruct": "meta-llama/Meta-Llama-3-70B-Instruct",
    "Llama 3 8B Instruct": "meta-llama/Meta-Llama-3-8B-Instruct",
    "Llama 3.1 70B Instruct": "meta-llama/Llama-3.1-70B-Instruct",
    "Llama 3.1 8B Instruct": "meta-llama/Llama-3.1-8B-Instruct",
    "Llama 3.2 3B Instruct": "meta-llama/Llama-3.2-3B-Instruct",
    "Llama 3.2 1B Instruct": "meta-llama/Llama-3.2-1B-Instruct",
}

In [None]:
def load_model(model_name):
    """Load the specified Llama model."""
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device)
    return generator

## Model Caching

Cache models to avoid reloading them multiple times.

In [None]:
model_cache = {}

## Chat Generation Function

The `generate_chat` function generates chatbot responses using the selected Llama model. It checks if the model is cached and loads it if necessary. The function builds the conversation context from the chat history and adds the user's input. This context is passed to the model for response generation. The response is extracted, appended to the history, and the updated conversation history is returned.

In [None]:
def generate_chat(user_input, history, model_choice):
    """Generate chatbot responses using the selected Llama model."""
    if model_choice not in model_cache:
        model_cache[model_choice] = load_model(llama_models[model_choice])
    generator = model_cache[model_choice]

    if history is None:
        history = []

    # Build the conversation history
    conversation = ''
    for user_msg, assistant_msg in history:
        conversation += "User: " + user_msg + "\n"
        conversation += "Assistant: " + assistant_msg + "\n"

    # Add the latest user input
    conversation += "User: " + user_input + "\n"
    conversation += "Assistant:"

    # Generate the assistant's response
    output = generator(
        conversation,
        max_length=512,
        pad_token_id=generator.tokenizer.eos_token_id,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )[0]["generated_text"]

    # Extract the assistant's response
    assistant_response = output[len(conversation):].strip()

    history.append((user_input, assistant_response))

    return history

## Gradio Interface

The cell below defines a simple Gradio interface. The interface consists of a dropdown menu for users to select a specific Llama model, a textbox to enter queries, and a window that displays the conversation between the user and the assistant.

In [None]:
with gr.Blocks() as demo:
    gr.Markdown("<h1><center>Chat with Llama Models</center></h1>")

    model_choice = gr.Dropdown(list(llama_models.keys()), label="Select Llama Model")

    chatbot = gr.Chatbot(label="Chatbot Interface")
    txt_input = gr.Textbox(show_label=False, placeholder="Type your message here...")

    def respond(user_input, chat_history, model_choice):
        if model_choice is None:
            model_choice = list(llama_models.keys())[0]  
        updated_history = generate_chat(user_input, chat_history, model_choice)
        return "", updated_history

    txt_input.submit(respond, [txt_input, chatbot, model_choice], [txt_input, chatbot])

    submit_btn = gr.Button("Submit")
    submit_btn.click(respond, [txt_input, chatbot, model_choice], [txt_input, chatbot])

## Launch the Interface

In [None]:
demo.launch()
