<a href="https://colab.research.google.com/github/peremartra/FinLLMOpt/blob/FinChat-XS-Instruct/Gradio_Interface_Test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FinChat-XS

Example of use for FinChat. This notebook uses a Gradio interface to chat with FinChat and

In [1]:
!pip install -q gradio

In [2]:
import gradio as gr
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Detect device: use CUDA if available, otherwise check for MPS (Apple Silicon), else CPU.
device = "cuda" if torch.cuda.is_available() else "mps" if hasattr(torch.backends, "mps") and torch.backends.mps.is_available() else "cpu"

# Cache for loaded models to avoid reloading on every request.
model_cache = {}


In [3]:
def load_model(model_name):
    """
    Loads and caches the tokenizer and model from Hugging Face.
    """
    if model_name not in model_cache:
        print(f"Loading {model_name} on {device} ...")
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForCausalLM.from_pretrained(model_name)
        model.to(device)
        model_cache[model_name] = (tokenizer, model)
    return model_cache[model_name]

def load_model_action(model_choice):
    """
    Action triggered by the 'Load Model' button.
    Loads the selected model and returns a status message.
    """
    try:
        load_model(model_choice)
        return (f"Model '{model_choice}' loaded successfully.", [])
    except Exception as e:
        return (f"Error loading model '{model_choice}': {str(e)}", [])



In [4]:
def chat(model_choice, message, history):
    """
    Appends the new message to the conversation history, builds the chat input
    using the tokenizer's apply_chat_template method, generates a response, and returns
    the updated conversation.
    """
    # Initialize history as list of (user, assistant) tuples if not provided
    if history is None:
        history = []

    # Convert the history (stored as tuples) into a list of message dictionaries
    conversation = []
    conversation.append({"role": "system", "content": """You are FinChat, a specialized finance AI assistant. Provide accurate,
concise information on markets, investments, accounting, and personal finance.
Always clarify when financial information may vary by jurisdiction."""})
    for user_text, bot_text in history:
        conversation.append({"role": "user", "content": user_text})
        conversation.append({"role": "assistant", "content": bot_text})
    # Append the new user message
    conversation.append({"role": "user", "content": message})

    tokenizer, model = load_model(model_choice)

    # Format the conversation using the chat template method provided by the tokenizer.
    # This method converts the list of message dicts into a model-specific formatted string.
    input_text = tokenizer.apply_chat_template(conversation, tokenize=False)

    # Encode the formatted input and generate a response
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    output_ids = model.generate(
        inputs,
        max_new_tokens=150,  # adjust as needed
        do_sample=True,
        temperature=0.2,
        top_p=0.9,
    )

    # Decode only the newly generated tokens
    output_text = tokenizer.decode(output_ids[0][inputs.shape[1]:], skip_special_tokens=True).strip()
    # Add code to remove leading "assistant:" or similar prefixes
    output_text = output_text.lstrip("assistant").strip()  # Simple removal

    # Append the assistant's reply to history (as a tuple for display)
    history.append((message, output_text))
    return history, history

In [5]:
# Build the Gradio interface using Blocks.
with gr.Blocks() as demo:
    gr.Markdown("# Hugging Face Model Chatbot")
    # Row for model selection and load button.
    with gr.Row():
        model_choice = gr.Dropdown(
            choices=["oopere/FinChat5-XS", "HuggingFaceTB/SmolLM2-360M-Instruct", "meta-llama/Llama-3.2-1B-Instruct"],
            value="oopere/FinChat5-XS",
            label="Select Model"
        )
        load_button = gr.Button("Load Model")
        load_status = gr.Textbox(label="Model Status", interactive=False)

    chatbot = gr.Chatbot(label="Chat Conversation", height=200)

    # Row for user input and send button.
    with gr.Row():
        message = gr.Textbox(label="Your Message", placeholder="Type your message here...", lines=1)
        send_button = gr.Button("Send")

    # State to store conversation history.
    state = gr.State([])

    # Link the "Load Model" button to load the model and update the status.
    load_button.click(fn=load_model_action, inputs=model_choice, outputs=[load_status, state])

    # Link both the Send button and pressing Enter in the textbox to send a message.
    send_button.click(fn=chat, inputs=[model_choice, message, state], outputs=[chatbot, state])
    message.submit(fn=chat, inputs=[model_choice, message, state], outputs=[chatbot, state])



In [6]:
# Launch the interface.
demo.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://8942a238c35721da58.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


