<a href="https://colab.research.google.com/github/manojmandal27/LLM_chatbots_and_memory/blob/main/Multi_LLM_interface_Chatbot_OpenAi_Hugging_face_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This code creates a Gradio chat interface with the following features:

**Model Selection:**

Users can choose between GPT-3.5-turbo, GPT-4, gpt-4o-mini or Zephyr-7b-beta using radio buttons
Easy to add more models by extending the choices list


**Chat Functionality:**

Uses Gradio's ChatInterface for a clean, chat-like experience
Maintains conversation history
Displays both user and AI messages in a threaded format


**Memory:**

Implements ConversationBufferMemory from LangChain
Maintains context across multiple exchanges
Memory is cleared when using the Clear Chat button


**Error Handling:**

Both API functions include error handling
User-friendly error messages are displayed if API calls fail


# Install Dependencies

In [2]:
!pip install gradio transformers
!pip install langchain
!pip install openai==0.28

Collecting gradio
  Downloading gradio-5.3.0-py3-none-any.whl.metadata (15 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.3-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.4.2 (from gradio)
  Downloading gradio_client-1.4.2-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub>=0.25.1 (from gradio)
  Downloading huggingface_hub-0.26.1-py3-none-any.whl.metadata (13 kB)
Collecting markupsafe~=2.0 (from gradio)
  Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.w

In [3]:
import openai
import gradio as gr
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from langchain.memory import ConversationBufferMemory
import os
from google.colab import userdata
import requests

OpenAI API and HF Key Setup: The code retrieves the OpenAI and HF API key from the environment variables to authenticate and connect to OpenAI's models.

In [5]:
# Use Google Colab secrets to set API keys
os.environ['OPENAI_API_KEY'] = userdata.get('OA_API')
os.environ['HF_API_KEY'] = userdata.get('HF_TOKEN')
# Set OpenAI API key
openai.api_key = os.getenv('OPENAI_API_KEY')
# Set your Hugging Face API key
os.environ['HF_API_KEY'] = userdata.get('HF_TOKEN')
hf_api_key = os.getenv('HF_TOKEN')

Purpose:

ConversationBufferMemory is a class from LangChain that stores conversation history
Acts like a simple buffer that maintains the order and context of messages


Parameters:

return_messages=True:

When set to True, returns the entire conversation history as a list of messages
Each message includes both the role (user/AI) and content
When not specified (as in the second example), it returns the history as a single string




Key Functions:

chat_memory.add_user_message(): Stores messages from the user
chat_memory.add_ai_message(): Stores responses from the AI
clear(): Clears all stored conversation history


Use Cases:

Maintains context across multiple exchanges
Allows the AI to reference previous parts of the conversation
Enables more coherent and contextual responses




# Example of how messages are stored internally
[
    {"role": "user", "content": "What is machine learning?"},

    {"role": "ai", "content": "Machine learning is..."},

    {"role": "user", "content": "Can you give an example?"},
    
    {"role": "ai", "content": "Here's an example..."}
]

In [8]:
# Initialize memory
memory = ConversationBufferMemory(return_messages=True)
# Instantiate conversation memory
#memory = ConversationBufferMemory()

In [9]:
# OpenAI API function
def openai_completion(model_name, prompt):
    try:
        response = openai.ChatCompletion.create(
            model=model_name,
            messages=[{"role": "user", "content": prompt}]
        )
        return response['choices'][0]['message']['content']
    except Exception as e:
        return f"Error with OpenAI API: {str(e)}"

In [10]:
# Hugging Face API function
def hf_completion(model_name, prompt):
    api_url = f"https://api-inference.huggingface.co/models/{model_name}"
    headers = {"Authorization": f"Bearer {os.getenv('HF_API_KEY')}"}
    payload = {
        "inputs": prompt,
        "parameters": {
            "max_length": 1000,
            "top_p": 0.95,
            "temperature": 0.7
        }
    }
    try:
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()
        return response.json()[0]['generated_text']
    except Exception as e:
        return f"Error with Hugging Face API: {str(e)}"

Uses Hugging Face's transformers library
Initializes a pre-trained speech recognition model (wav2vec2)
wav2vec2 is Facebook's model trained on 960 hours of speech data
Creates a pipeline for converting speech to text


Audio Transcription Function:
Purpose: Converts spoken audio to written text
Input: Takes an audio file or recording
Output: Returns transcribed text or error message
Includes error handling for robustness

In [20]:
# Initialize speech recognition
speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
# Audio transcription function
def transcribe_audio(audio):
    try:
        # Transcribe audio using the pipeline
        transcription = speech_recognizer(audio)
        return transcription["text"]
    except Exception as e:
        return f"Error transcribing audio: {str(e)}"

config.json:   0%|          | 0.00/1.60k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/378M [00:00<?, ?B/s]

Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You sho

tokenizer_config.json:   0%|          | 0.00/163 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/291 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]



preprocessor_config.json:   0%|          | 0.00/159 [00:00<?, ?B/s]

Purpose: Processes voice input and gets AI response
Takes 3 parameters:

audio: The recorded voice input
history: Previous chat history
model_choice: Selected AI model (e.g., GPT-3.5, GPT-4)

Step-by-Step Process:
Step 1: Speech-to-Text Conversion
Calls transcribe_audio() to convert voice to text
Example: "What is machine learning?" (spoken) → "what is machine learning" (text)

Step 2: Getting AI Response
Uses the transcribed text as input for the chatbot
Sends it to the chosen AI model through chat_with_llm()
Maintains conversation context using history

Step 3: Formatting Output
Combines both transcription and AI response
Shows what was understood from the voice input
Shows what the AI responded

Key Benefits:

Provides transparency by showing transcription
Helps users verify their voice was understood correctly
Maintains conversation flow like text chat
Integrates with existing chat memory system

In [23]:
# Audio input processing function
def process_audio(audio, history, model_choice):
    # Transcribe audio to text
    text = transcribe_audio(audio)

    # Use the transcribed text as input for the chat function
    response = chat_with_llm(text, history, model_choice)

    # Return both transcription and response
    return f"Transcribed: {text}\nResponse: {response}"

The chat_with_llm function facilitates interaction with a language model (LLM). It takes three parameters: message (the user's input), history (previous chat history), and model_choice (the selected model).

The function first adds the user's message to the memory.
It then generates a response based on the selected model:
If the model is either "gpt-3.5-turbo", "gpt-4", or "gpt-4o-mini", it uses the openai_completion function for generating the response.
For other models, it uses the hf_completion function.
Finally, the AI's response is added to the memory, and the response is returned to the caller.

In [24]:
# Main chat function
def chat_with_llm(message, history, model_choice):
    # Add user message to memory
    memory.chat_memory.add_user_message(message)

    # Generate response based on selected model
    if model_choice == "gpt-3.5-turbo" or model_choice == "gpt-4" or model_choice == "gpt-4o-mini":
        response = openai_completion(model_choice, message)
    else:
        response = hf_completion(model_choice, message)

    # Add AI response to memory
    memory.chat_memory.add_ai_message(response)

    return response

Provides a way to reset/clear the entire conversation history
Useful when users want to start a fresh conversation
Helps manage memory usage in long-running chat sessions

In [16]:
# Function to clear chat history
def clear_chat_history():
    memory.clear()
    return None

#Launch the Gradio chatbot interface

#Gradio interface chatbot with audio

In [27]:
#Create Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# Multi-LLM Chat Interface with Audio Support")

    with gr.Row():
        # Model selection
        model_choice = gr.Radio(
            choices=["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini", "HuggingFaceH4/zephyr-7b-beta"],
            value="gpt-3.5-turbo",
            label="Select Language Model"
        )

    with gr.Tab("Text Chat"):
        # Text chat interface
        chatbot = gr.ChatInterface(
            fn=lambda message, history, model_choice: chat_with_llm(message, history, model_choice),
            additional_inputs=[model_choice],
            title=""
        )

    with gr.Tab("Audio Chat"):
        # Audio components
        # The 'source' argument has been removed, so microphone can't be explicitly specified
        # This will default to file uploads instead of microphone recording
        audio_input = gr.Audio(
            type="filepath",
            label="Record your message"
        )
        audio_output = gr.Textbox(
            label="Transcription and Response",
            lines=5
        )
        audio_button = gr.Button("Process Audio")

        # Set up audio processing
        audio_button.click(
            fn=process_audio,
            inputs=[audio_input, chatbot.chatbot, model_choice],
            outputs=audio_output
        )

    # Clear button (affects both text and audio chat)
    clear_btn = gr.Button("Clear Chat")
    clear_btn.click(
        fn=clear_chat_history,
        inputs=None,
        outputs=[chatbot.chatbot, audio_output]
    )

# Launch the app
if __name__ == "__main__":
    demo.launch()



Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://bfd8280f1cc055b553.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


#Gradio interface without audio

In [18]:
# Create Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# Multi-LLM Chat Interface")

    # Model selection
    model_choice = gr.Radio(
        choices=["gpt-3.5-turbo", "gpt-4", "HuggingFaceH4/zephyr-7b-beta" , "gpt-4o-mini"],
        value="gpt-3.5-turbo",
        label="Select Language Model"
    )

    # Chat interface
    chatbot = gr.ChatInterface(
        fn=lambda message, history, model_choice: chat_with_llm(message, history, model_choice),
        additional_inputs=[model_choice],
        title="",
    )

    # Clear button
    clear_btn = gr.Button("Clear Chat")
    clear_btn.click(
        fn=clear_chat_history,
        inputs=None,
        outputs=chatbot.chatbot
    )

# Launch the app
if __name__ == "__main__":
    demo.launch()



Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://fc63ee46364de1d7f5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
