# Notebook 01: Testing and Validation

**Purpose:** This notebook will cover testing and validation of the different AI components (text, image, and audio) in the multi-modal chatbot system.

# Project Setup

Before we begin, we need to load the necessary libraries and initialize the models.

In [43]:
import os
from dotenv import load_dotenv
from pathlib import Path
import base64
import requests
from swarmauri.utils.base64_to_img_url import base64_to_img_url


# Import the required classes from swarmauri library
from swarmauri.llms.concrete.FalAIVisionModel import FalAIVisionModel
from swarmauri.utils.base64_to_img_url import base64_to_img_url
from swarmauri.llms.concrete.OpenAIAudioTTS import OpenAIAudioTTS
from swarmauri.llms.concrete.OpenAIModel import OpenAIModel as LLM
from swarmauri.conversations.concrete.Conversation import Conversation
from swarmauri.messages.concrete.HumanMessage import HumanMessage
from swarmauri.llms.concrete.DeepInfraImgGenModel import DeepInfraImgGenModel


# Load environment variables and API key
load_dotenv()

# Fetch the API keys from environment variables
DEEPINFRA_API_KEY = os.getenv("DEEPINFRA_API_KEY")

IMGBB_API_KEY = os.getenv("IMGBB_API_KEY")

API_KEY = os.getenv("FAL_KEY")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


# Initialize the OpenAI Model
llm = LLM(api_key=OPENAI_API_KEY)

# Initialize the DeepInfra Image Generation Model
llm_img_gen = DeepInfraImgGenModel(api_key=DEEPINFRA_API_KEY)

# Initialize FALAI Vison Model
falai_vision_model = FalAIVisionModel(api_key=API_KEY) if API_KEY else None

# Initialize Text-to-Speech Model
tts = OpenAIAudioTTS(api_key=OPENAI_API_KEY)

# Text Generation

Our chatbot will start by responding to text prompts, generating text based on user input. The chatbot can answer questions, create stories, or even perform more complex tasks.

In [44]:
# Function to generate responses and display prompt engineering effects
def generate_text_response(prompt: str) -> str:
    # Create a new conversation instance for each prompt
    conversation = Conversation()
    conversation.add_message(HumanMessage(content=prompt))

    # Generate model response
    llm.predict(conversation=conversation)

    # Display prompt, model response, and usage data
    print(f"Prompt: {prompt}")
    print("Response:", conversation.get_last().content)
    print("=" * 50)
    print("Usage Data:", conversation.get_last().usage)
    print("=" * 50)

In [45]:
# Example text prompt
example_prompt = "Tell me a short story about a scientist discovering new planets."

text_response = generate_text_response(example_prompt)

Prompt: Tell me a short story about a scientist discovering new planets.
Response: Dr. Elena Vega had spent years studying the stars and searching for new planets. She had always dreamed of making a groundbreaking discovery that would change the course of history. And finally, that day had come.

One night, while studying the data from her telescope, Dr. Vega noticed something peculiar. There seemed to be a series of unexplained gravitational anomalies in a distant part of the galaxy. Intrigued, she decided to focus her attention on this area and investigate further.

After weeks of meticulous research and analysis, Dr. Vega finally made a breakthrough. She had discovered not one, but three new planets orbiting a distant star. These planets were unlike anything she had ever seen before - they were teeming with life and had unique geological formations that defied all known scientific principles.

Excited by her discovery, Dr. Vega quickly published her findings and soon the entire scie

# Image Generation

In addition to text-based responses, our chatbot will also be capable of generating images based on text descriptions. The user can provide textual prompts, and the chatbot will generate corresponding visual content.

In [46]:
# Function to generate an image from text
def generate_image_from_text(prompt: str):
    # Create a conversation instance
    conversation = Conversation()

    # Add the user prompt to the conversation
    human_message = HumanMessage(content=prompt)
    conversation.add_message(human_message)

    # Generate the detailed description of the scene based on the prompt
    llm = DeepInfraModel(api_key=DEEPINFRA_API_KEY)
    prediction = llm.predict(conversation=conversation, **{"temperature": 0})

    # Get the detailed description from the conversation
    detailed_description = conversation.get_last().content

    # Generate the image from the detailed description
    image_base64 = llm_img_gen.generate_image_base64(detailed_description)

    # Upload the generated image to IMGBB
    try:
        image_url = base64_to_img_url(image_base64, IMGBB_API_KEY)
        return image_url
    except Exception as e:
        print("Error uploading the image:", e)
        return None

In [47]:
# Example usage:
image_prompt = "A futuristic city skyline at night with neon lights and flying cars"
image_url = generate_image_from_text(image_prompt)

if image_url:
    print("Generated Image URL:", image_url)

Generated Image URL: https://i.ibb.co/7vdtyyB/761db0c955f8.jpg


# Audio Generation

For audio generation, our chatbot can convert text responses into speech using a Text-to-Speech (TTS) model. It can also create sound effects based on user descriptions.

In [48]:
# Setup output directory
output_dir = Path("output")
output_dir.mkdir(exist_ok=True)

def generate_audio_from_text(text: str, voice: str = "alloy", model: str = "tts-1", output_filename: str = "output.mp3") -> str:
    """
    Generates speech from input text and saves it as an audio file.

    Parameters:
        text (str): The text to be converted into speech.
        voice (str): The voice to be used for TTS. Default is "alloy".
        model (str): The TTS model to be used. Default is "tts-1".
        output_filename (str): The name of the output audio file. Default is "output.mp3".

    Returns:
        str: The file path of the generated audio.
    """
    # Set model and voice
    tts.name = model
    tts.voice = voice
    
    # Define output path
    output_path = output_dir / output_filename
    
    # Generate speech and save to file
    print(f"Generating speech for text: {text}")
    audio_file = tts.predict(
        text=text,
        audio_path=str(output_path)
    )
    
    # Return path to the generated audio file
    return str(output_path)

# Example usage
sample_text = "Welcome to the text-to-speech demonstration using OpenAI's TTS service."
audio_file_path = generate_audio_from_text(sample_text, voice="shimmer", model="tts-1", output_filename="sample_output.mp3")
print(f"Generated audio saved to: {audio_file_path}")

Generating speech for text: Welcome to the text-to-speech demonstration using OpenAI's TTS service.
Generated audio saved to: output\sample_output.mp3


# Computer Vision with Image Inputs

The chatbot can also handle images as input. It will process the images using computer vision models and respond with descriptive captions, tags, or detailed analysis.

In [49]:
def analyze_image(image_url: str) -> str:
    """Analyze an image using FalAIVisionModel to detect tags and describe content."""
    try:
        result = falai_vision_model.process_image(image_url=image_url, prompt="Describe the content of this image.")
        return result
    except Exception as e:
        return f"Error processing image: {e}"

In [50]:
# Example image URL (can be replaced with user-uploaded images)
new_image_url = "https://llava-vl.github.io/static/images/monalisa.jpg"
image_analysis_result = analyze_image(new_image_url)
print("Image URL: ", new_image_url)
print(f"Image Analysis Result: {image_analysis_result}")

Image URL:  https://llava-vl.github.io/static/images/monalisa.jpg
Image Analysis Result: The image you've provided is a representation of the famous painting "Mona Lisa" by Leonardo da Vinci. The painting is renowned for its depiction of a woman with a subtle smile, set against a distant landscape. The artwork is characterized by its use of sfumato, a technique that creates a


# Summary:

- **Text Generation:** The chatbot generates text responses for queries or prompts.
- **Image Generation:** Based on text descriptions, the chatbot can generate corresponding images.
- **Audio Generation:** The chatbot can convert text into speech for audio responses.
- **Computer Vision:** The chatbot can analyze images and provide descriptions or tags based on visual input.

# Notebook Metadata

In [52]:
import platform
import sys
from datetime import datetime

# Display author information
author_name = "Huzaifa Irshad" 
github_username = "irshadhuzaifa"  

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

# Last modified datetime (file's metadata)
notebook_file = "Notebook_01_Testing_and_Validation.ipynb"
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

# Display platform, Python version, and Swarmauri version
print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

import swarmauri

try:
    version = swarmauri.__version__
except AttributeError:
    version = f"Swarmauri Version: 0.5.1"

print(f"Swarmauri Version: {version}")

Author: Huzaifa Irshad
GitHub Username: irshadhuzaifa
Last Modified: 2024-11-06 18:04:16.976657
Platform: Windows 11
Python Version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
Swarmauri Version: Swarmauri Version: 0.5.1
