# Building a Custom AI Chatbot Interface using LM Studio API and Gradio

In this cell, we import the necessary libraries:
- `os`: Provides functionalities for interacting with the operating system.
- `openai`: The client library for interacting with OpenAI API.
- `gradio`: A Python library used to build and launch web-based applications, here used to build the chatbot interface.

In [1]:
import os
import json
from openai import OpenAI
import gradio as gr                                                 

## Installing LM Studio and Loading Models

1. **Download and Install LM Studio**: Follow the instructions provided by LM Studio to install it locally.
2. **Download Models**: Choose and download the required language models from the LM Studio repository.
3. **Start the LM Studio Server**: Run the LM Studio server on `http://localhost:1234` to handle API requests.

Here, the OpenAI client is initialized by the base URL where the LM Studio API is running (on localhost). The `api_key` is also provided. This client will be used to make API requests for the chatbot's responses.


In [2]:
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")



This cell defines two sets of options for user selection:
1. **System Messages**: Predefined system prompts to guide the chatbot's behavior, such as "You are a helpful assistant."
2. **Models**: Available AI models for inference. Internal names (used for API requests) are mapped to user-friendly display names. 


In [None]:
system_message = "You are a helpful assistant. "


The `chat` function handles interaction with the LM Studio API. It prepares a conversation history, which includes user inputs and assistant responses. 

* The `chat` function constructs a list of messages, including the system message, previous conversation history, and the user's current message.
* It then sends this list to a chatbot model (`phi-3.1-mini-128k-instruct`) using the `client.chat.completions.create` method.
* The function returns the chatbot's response.
* The `gr.ChatInterface` function is used to launch a chat interface with the `chat` function as its backend.


In [3]:
def chat(message, history):
    messages = [{"role": "system", "content": system_message}]
    for human, assistant in history:
        messages.append({"role": "user", "content": human})
        messages.append({"role": "assistant", "content": assistant})
    messages.append({"role": "user", "content": message})
    response = client.chat.completions.create(model="phi-3.1-mini-128k-instruct", messages=messages)
    return response.choices[0].message.content

gr.ChatInterface(fn=chat).launch()


# image generator Modal


In [None]:
import torch
from diffusers import StableDiffusionPipeline
from PIL import Image
import IPython.display as display
def artist(prompt, output_path=None, model_id="CompVis/stable-diffusion-v1-1"):
    # Check if a GPU is available and use it if possible
    device = "cuda"

    # Load the pre-trained Stable Diffusion model
    pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    pipe = pipe.to(device)

    # Generate the image
    with torch.autocast(device):
        image = pipe(prompt).images[0]  # Generate image based on the prompt

    # Save the generated image if output path is specified
    if output_path:
        image.save(output_path)
        #print(f"Image saved as '{output_path}'")

    # Display the image in Jupyter Notebook
    display.display(image)

    return image


artist("Create a new alien city")

# Add Image model in chatbot

In [None]:
import os
import torch
from openai import OpenAI
import gradio as gr
from diffusers import StableDiffusionPipeline
from PIL import Image

# Initialize OpenAI client for chat
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

# Define system message for chat
system_message = "You are a helpful assistant."

# Global variable for the Stable Diffusion model
pipe = None

# List of keywords that trigger image generation
image_generation_keywords = ['create', 'generate', 'draw', 'image', 'picture', 'illustrate']

# Chat function using OpenAI
def chat(message, history):
    messages = [{"role": "system", "content": system_message}]
    for human, assistant in history:
        messages.append({"role": "user", "content": human})
        messages.append({"role": "assistant", "content": assistant})
    messages.append({"role": "user", "content": message})

    # Make chat request to OpenAI
    response = client.chat.completions.create(
        model="phi-3.1-mini-128k-instruct",
        messages=messages
    )
    
    # Return assistant's reply
    return response.choices[0].message.content

# Image generation function using Stable Diffusion
def artist(prompt, model_id="CompVis/stable-diffusion-v1-1"):
    global pipe
    device = "cuda" if torch.cuda.is_available() else "cpu"
    
    if pipe is None:  # Load the model only once
        pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16 if device == "cuda" else torch.float32)
        pipe = pipe.to(device)

    try:
        with torch.autocast(device if device == "cuda" else "cpu"):
            image = pipe(prompt).images[0]  # Generate image based on the prompt
    except Exception as e:
        print(f"Error generating image: {str(e)}")
        image = Image.new('RGB', (512, 512), color='white')  # Placeholder image

    return image

# Combine chat and image generation process
def chat_and_generate_image(user_message, history):
    # Check if the message contains any keywords for image generation
    if any(keyword in user_message.lower() for keyword in image_generation_keywords):
        # Generate image based on user prompt
        generated_image = artist(user_message)  # Use the user message as the prompt for image generation
    else:
        generated_image = Image.new('RGB', (512, 512), color='white')  # Placeholder image if no image is generated

    # Chat response
    chat_response = chat(user_message, history[:-1])
    
    return chat_response, generated_image

# Build Gradio interface with Blocks
with gr.Blocks() as ui:
    with gr.Row():
        # Chatbot display
        chatbot = gr.Chatbot(height=500)
    
    with gr.Row():
        # Input textbox for user message
        msg = gr.Textbox(label="Chat with our AI Assistant:")
    
    with gr.Row():
        # Clear button
        clear = gr.Button("Clear")
    
    # Output for image display
    image_output = gr.Image(type="pil", height=512)

    # User function to update history
    def user(user_message, history):
        return "", history + [[user_message, None]]

    # Bot function to handle chat and image generation
    def bot(history):
        user_message = history[-1][0]
        bot_message, image = chat_and_generate_image(user_message, history)

        # Append bot message to the history
        history[-1][1] = bot_message
        
        return history, image  # Return history and generated image

    # When user submits a message, update chatbot and then generate the bot response and image
    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, [chatbot, image_output]  # Update chatbot and image output
    )

    # Clear chat history on clicking 'Clear'
    clear.click(lambda: (None, Image.new('RGB', (512, 512), color='white')), None, [chatbot, image_output], queue=False)

# Launch the UI
ui.launch()

# audio model

In [6]:
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from io import BytesIO
from pydub import AudioSegment
from pydub.playback import play

# Check if CUDA is available and set device
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load the model and tokenizer
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")

def talker(prompt):
    try:
        # Description of the voice characteristics (optional, can be customized)
        description = "Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
        
        # Tokenize the description and the prompt
        input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
        prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
        
        # Generate the audio using the model
        generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
        
        # Convert the numpy array to a BytesIO buffer
        audio_arr = generation.cpu().numpy().squeeze()
        audio_buffer = BytesIO()
        sf.write(audio_buffer, audio_arr, model.config.sampling_rate, format='WAV')
        audio_buffer.seek(0)  # Seek to the start of the buffer
        
        # Load the audio into an AudioSegment and play it
        audio_segment = AudioSegment.from_file(audio_buffer, format="wav")
        play(audio_segment)
    
    except Exception as e:
        print(f"An error occurred: {e}")

talker("Hello, how are you?")

Flash attention 2 is not installed
  WeightNorm.apply(module, name, dim)
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


# Add image generation and audio generation (PARLER_TTS) model in chatbot


In [7]:
import os
import torch
from openai import OpenAI
import gradio as gr
from diffusers import StableDiffusionPipeline
from PIL import Image
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from io import BytesIO
from pydub import AudioSegment
import tempfile

# Initialize OpenAI client for chat
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

# Define system message for chat
system_message = "You are a helpful assistant."
system_message += "Using stable-diffusion-v1-1 for generated image and parler-tts-mini-v1 use for audio generated."

# Global variables for models
pipe = None
tts_model = None
tokenizer = None

# List of keywords that trigger image generation
image_generation_keywords = ['create', 'generate', 'draw', 'image', 'picture', 'illustrate']

# Check if CUDA is available and set device
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load TTS model and tokenizer
tts_model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")

# Chat function using OpenAI
def chat(message, history):
    messages = [{"role": "system", "content": system_message}]
    for human, assistant in history:
        messages.append({"role": "user", "content": human})
        messages.append({"role": "assistant", "content": assistant})
    messages.append({"role": "user", "content": message})

    # Make chat request to OpenAI
    response = client.chat.completions.create(
        model="llama-3.2-3b-instruct",
        messages=messages
    )
    
    # Return assistant's reply
    reply = response.choices[0].message.content
    return reply

# Image generation function using Stable Diffusion
def artist(prompt, model_id="CompVis/stable-diffusion-v1-1"):
    global pipe
    device = "cuda" if torch.cuda.is_available() else "cpu"
    
    if pipe is None:  # Load the model only once
        pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16 if device == "cuda" else torch.float32)
        pipe = pipe.to(device)

    try:
        with torch.autocast(device if device == "cuda" else "cpu"):
            image = pipe(prompt).images[0]  # Generate image based on the prompt
    except Exception as e:
        print(f"Error generating image: {str(e)}")
        image = Image.new('RGB', (512, 512), color='white')  # Placeholder image

    return image

# Talker function for generating audio
def talker(prompt):
    try:
        # Create a temporary file to hold the audio data
        with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_audio_file:
            audio_buffer = BytesIO()  # Create a buffer to hold audio data
            
            # Description of the voice characteristics
            description = "Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
            
            # Tokenize the description and the prompt
            input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
            prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
            
            # Generate the audio using the model
            generation = tts_model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
            
            # Convert the numpy array to a WAV file
            audio_arr = generation.cpu().numpy().squeeze()
            sf.write(temp_audio_file.name, audio_arr, tts_model.config.sampling_rate, format='WAV')
            temp_audio_file.seek(0)  # Seek to the start of the file

            audio_file_path = temp_audio_file.name  # Get the file path
            
    except Exception as e:
        print(f"An error occurred: {e}")
        audio_file_path = None  # Handle error case

    return audio_file_path   # Return the audio buffer

# Combine chat and image generation process
def chat_and_generate_image(user_message, history):
    # Check if the message contains any keywords for image generation
    if any(keyword in user_message.lower() for keyword in image_generation_keywords):
        # Generate image based on user prompt
        generated_image = artist(user_message)  # Use the user message as the prompt for image generation
    else:
        generated_image = Image.new('RGB', (512, 512), color='white')  # Placeholder image if no image is generated

    # Chat response
    chat_response = chat(user_message, history[:-1])
    
    # Generate audio for the chat response
    audio_buffer = talker(chat_response)
    
    return chat_response, generated_image, audio_buffer

# Build Gradio interface with Blocks
with gr.Blocks() as ui:
    # Main title for the UI
    gr.Markdown("<h1 style='text-align: center;'>AI Assistant with Image Generation and Voice</h1>")

    with gr.Row():
        # Left column for chatbot display
        with gr.Column(scale=1):  # Adjust the scale for desired width
            # Chatbot display
            chatbot = gr.Chatbot(height=500)

            # Input textbox for user message
            msg = gr.Textbox(label="Chat with our AI Assistant:", placeholder="Type your message here...")

            # Clear button
            clear = gr.Button("Clear")

        # Right column for image display
        with gr.Column(scale=1):  # Adjust the scale for desired width
            # Output for image display
            image_output = gr.Image(type="pil", height=512, label="Generated Image")

            # Output for audio playback
            audio_output = gr.Audio(label="Generated Audio")

    # User function to update history
    def user(user_message, history):
        return "", history + [[user_message, None]]

    # Bot function to handle chat and image generation
    def bot(history):
        user_message = history[-1][0]
        bot_message, image, audio = chat_and_generate_image(user_message, history)

        # Append bot message to the history
        history[-1][1] = bot_message
        
        return history, image, audio  # Return history, generated image, and audio buffer

    # When user submits a message, update chatbot and then generate the bot response and image
    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, [chatbot, image_output, audio_output]  # Update chatbot, image output, and audio output
    )

    # Clear chat history on clicking 'Clear'
    clear.click(lambda: (None, Image.new('RGB', (512, 512), color='white'), None), None, [chatbot, image_output, audio_output], queue=False)

# Launch the UI
ui.launch()




* Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.




# Add image generation and audio generation (microsoft_TTS) only 600 tokens model in chatbot

In [9]:
import os
import torch
from openai import OpenAI
import gradio as gr
from diffusers import StableDiffusionPipeline
from PIL import Image
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from io import BytesIO
from pydub import AudioSegment
import tempfile
from transformers import pipeline
from datasets import load_dataset

# Initialize OpenAI client for chat
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

# Define system message for chat
system_message = "You are a helpful assistant. Your responses should be clear and informative, limited to a maximum of 200 words. Focus on providing accurate and relevant information to user inquiries while maintaining a friendly and professional tone"
system_message += "Using stable-diffusion-v1-1 for generated image and parler-tts-mini-v1 use for audio generated."

# Global variables for models
pipe = None
tts_model = None
tokenizer = None

# List of keywords that trigger image generation
image_generation_keywords = ['create', 'generate', 'draw', 'image', 'picture', 'illustrate']

# Check if CUDA is available and set device
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load TTS model and tokenizer
tts_model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")

# Chat function using OpenAI
def chat(message, history):
    messages = [{"role": "system", "content": system_message}]
    for human, assistant in history:
        messages.append({"role": "user", "content": human})
        messages.append({"role": "assistant", "content": assistant})
    messages.append({"role": "user", "content": message})

    # Make chat request to OpenAI
    response = client.chat.completions.create(
        model="gemma-2-2b-it",
        messages=messages,
        max_tokens= 200
    )
    
    # Return assistant's reply
    reply = response.choices[0].message.content
    return reply

# Image generation function using Stable Diffusion
def artist(prompt, model_id="CompVis/stable-diffusion-v1-1"):
    global pipe
    device = "cuda" if torch.cuda.is_available() else "cpu"
    
    if pipe is None:  # Load the model only once
        pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16 if device == "cuda" else torch.float32)
        pipe = pipe.to(device)

    try:
        with torch.autocast(device if device == "cuda" else "cpu"):
            image = pipe(prompt).images[0]  # Generate image based on the prompt
    except Exception as e:
        print(f"Error generating image: {str(e)}")
        image = Image.new('RGB', (512, 512), color='white')  # Placeholder image

    return image

# Talker function for generating audio
def talker(prompt, speaker_index=7306, device='cuda'):
    try:
        # Load the TTS synthesizer
        synthesiser = pipeline("text-to-speech", "microsoft/speecht5_tts", device=device)
        
        # Load the speaker embeddings dataset
        embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
        speaker_embedding = torch.tensor(embeddings_dataset[speaker_index]["xvector"]).unsqueeze(0)
        
        # Generate speech with the specified prompt and speaker embedding
        speech = synthesiser(prompt, forward_params={"speaker_embeddings": speaker_embedding})

        # Create a temporary file to hold the audio data
        with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_audio_file:
            # Save the generated audio to a file
            sf.write(temp_audio_file.name, speech["audio"], samplerate=speech["sampling_rate"])
            temp_audio_file.seek(0)  # Seek to the start of the file
            
            audio_file_path = temp_audio_file.name  # Get the file path
    
    except Exception as e:
        print(f"An error occurred: {e}")
        audio_file_path = None  # Handle error case

    return audio_file_path# Return the audio buffer


# Combine chat and image generation process
def chat_and_generate_image(user_message, history):
    # Check if the message contains any keywords for image generation
    if any(keyword in user_message.lower() for keyword in image_generation_keywords):
        # Generate image based on user prompt
        generated_image = artist(user_message)  # Use the user message as the prompt for image generation
    else:
        generated_image = Image.new('RGB', (512, 512), color='white')  # Placeholder image if no image is generated

    # Chat response
    chat_response = chat(user_message, history[:-1])
    
    # Generate audio for the chat response
    audio_buffer = talker(chat_response)
    
    return chat_response, generated_image, audio_buffer

# Build Gradio interface with Blocks
with gr.Blocks() as ui:
    # Main title for the UI
    gr.Markdown("<h1 style='text-align: center;'>AI Assistant with Image Generation and Voice</h1>")

    with gr.Row():
        # Left column for chatbot display
        with gr.Column(scale=1):  # Adjust the scale for desired width
            # Chatbot display
            chatbot = gr.Chatbot(height=500)

            # Input textbox for user message
            msg = gr.Textbox(label="Chat with our AI Assistant:", placeholder="Type your message here...")

            # Clear button
            clear = gr.Button("Clear")

        # Right column for image display
        with gr.Column(scale=1):  # Adjust the scale for desired width
            # Output for image display
            image_output = gr.Image(type="pil", height=512, label="Generated Image")

            # Output for audio playback
            audio_output = gr.Audio(label="Generated Audio")

    # User function to update history
    def user(user_message, history):
        return "", history + [[user_message, None]]

    # Bot function to handle chat and image generation
    def bot(history):
        user_message = history[-1][0]
        bot_message, image, audio = chat_and_generate_image(user_message, history)

        # Append bot message to the history
        history[-1][1] = bot_message
        
        return history, image, audio  # Return history, generated image, and audio buffer

    # When user submits a message, update chatbot and then generate the bot response and image
    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, [chatbot, image_output, audio_output]  # Update chatbot, image output, and audio output
    )

    # Clear chat history on clicking 'Clear'
    clear.click(lambda: (None, Image.new('RGB', (512, 512), color='white'), None), None, [chatbot, image_output, audio_output], queue=False)

# Launch the UI
ui.launch()


  WeightNorm.apply(module, name, dim)


* Running on local URL:  http://127.0.0.1:7863

To create a public link, set `share=True` in `launch()`.




unet\diffusion_pytorch_model.safetensors not found


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

An error occurred while trying to fetch C:\Users\iei1\.cache\huggingface\hub\models--CompVis--stable-diffusion-v1-1\snapshots\574555a657f0c8d6e970a02c7e095e9d4220dcbf\vae: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\iei1\.cache\huggingface\hub\models--CompVis--stable-diffusion-v1-1\snapshots\574555a657f0c8d6e970a02c7e095e9d4220dcbf\vae.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
An error occurred while trying to fetch C:\Users\iei1\.cache\huggingface\hub\models--CompVis--stable-diffusion-v1-1\snapshots\574555a657f0c8d6e970a02c7e095e9d4220dcbf\unet: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\iei1\.cache\huggingface\hub\models--CompVis--stable-diffusion-v1-1\snapshots\574555a657f0c8d6e970a02c7e095e9d4220dcbf\unet.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.


  0%|          | 0/50 [00:00<?, ?it/s]