# Chatbot Demo 

## Description

This notebook demonstrates how to create a simple chatbot interface using Llama models via Hugging Face's `transformers` library and `gradio` for the user interface. 

### Main Features:
- Select from a list of pre-trained Llama models.
- Input text and receive responses from the selected model.
- Cache models to avoid reloading them multiple times.

## Log in to Hugging Face Hub

In this cell, we import the `login` function from the Hugging Face Hub and call it to authenticate with your Hugging Face account. This step is required to access private models, datasets, and other resources hosted on Hugging Face.

### Directions to Generate Access Token:
1. Go to the [Hugging Face website](https://huggingface.co/).
2. Log in to your Hugging Face account.
3. Navigate to your **profile icon** on the top right, and click **Settings**.
4. Under **Access Tokens** (on the left sidebar), click **New Token** to generate a new access token.
5. Copy the generated token.

### Usage:
When you run this cell, you'll be prompted to paste the access token, which grants access to your Hugging Face resources.

> **Do not share your Access Tokens with anyone**

In [None]:
from huggingface_hub import login
login()

## Import Required Libraries

In [None]:
import gradio as gr
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
device = 0 if torch.cuda.is_available() else -1

## Basic Usage

Below there are a list of available Llama models to choose from. The function, `load_model`, loads the selected Llama model along with its tokenizer. It uses Hugging Face's `AutoModelForCausalLM` and `AutoTokenizer` to load the pre-trained model and return a text generation pipeline. Note that, for each version of Llama you need to seperately request access to be able to use them.

In [None]:
llama_models = {
    "Llama 3 70B Instruct": "meta-llama/Meta-Llama-3-70B-Instruct",
    "Llama 3 8B Instruct": "meta-llama/Meta-Llama-3-8B-Instruct",
    "Llama 3.1 70B Instruct": "meta-llama/Llama-3.1-70B-Instruct",
    "Llama 3.1 8B Instruct": "meta-llama/Llama-3.1-8B-Instruct",
    "Llama 3.2 3B Instruct": "meta-llama/Llama-3.2-3B-Instruct",
    "Llama 3.2 1B Instruct": "meta-llama/Llama-3.2-1B-Instruct",
}

In [None]:
def load_model(model_name):
    """Load the specified Llama model."""
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device)
    return generator

Cache models to avoid reloading

In [None]:
model_cache = {}

The function, `chatbot_interface`, handles generating responses from the Llama models. It loads the model if it is not already cached. Builds the conversation context using the chat history then calls the model's text generation pipeline to generate a response based on the user's input. Finally, it extracts the response from the model and appends it to the conversation history.

In [None]:
def chatbot_interface(user_input, history, model_choice):
    """Generate chatbot responses using the selected Llama model."""
    if model_choice not in model_cache:
        model_cache[model_choice] = load_model(llama_models[model_choice])
    generator = model_cache[model_choice]
    
    if history is None:
        history = []
    
    conversation = ""
    for user_msg, assistant_msg in history:
        conversation += f"User: {user_msg}\nAssistant: {assistant_msg}\n"
    conversation += f"User: {user_input}\nAssistant:"

    response = generator(
        conversation,
        max_length=512,
        pad_token_id=generator.tokenizer.eos_token_id,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )[0]['generated_text']
    
    assistant_reply = response.split("Assistant:")[-1].strip()
    
    history.append((user_input, assistant_reply))
    
    return history

The cell below defines a simple Gradio interface. The interface consists of a dropdown menu for users to select specific Llama model, a textbox to enter queries and a window that displays the conversation between the user and the assistant.

In [None]:
with gr.Blocks() as demo:
    gr.Markdown("<h1><center>Chat with Llama Models</center></h1>")
    model_choice = gr.Dropdown(list(llama_models.keys()), label="Select Llama Model")
    chatbot = gr.Chatbot()
    txt_input = gr.Textbox(show_label=False, placeholder="Type your message here...")

    def respond(user_input, chat_history, model_choice):
        updated_history = chatbot_interface(user_input, chat_history, model_choice)
        return "", updated_history  
    
    txt_input.submit(respond, [txt_input, chatbot, model_choice], [txt_input, chatbot])
    submit_btn = gr.Button("Submit")
    submit_btn.click(respond, [txt_input, chatbot, model_choice], [txt_input, chatbot])

Launch the interface

In [None]:
demo.launch()