# **Real World Use Cases with Swarmauri**

## **Introduction**
This notebook presents practical real-world applications using swarmauri's multimodal capabilities.
In this final notebook, we shift our focus to the practical application of multimodal AI techniques in real-world scenarios. By exploring various use cases, we'll demonstrate how the  integration of text, image, and audio can lead to innovative solutions that enhance user experiences and improve content creation workflows.

Throughout this notebook, we'll delve into three distinct use cases that showcase the versatility of the swarmauri library's multimodal capabilities. From automated course content generation to multilingual content localization and interactive storytelling, you'll learn how to apply these powerful tools to address real-world challenges.

Each use case will be accompanied by well-designed, production-ready code examples, providing you with a solid foundation to build upon. By understanding the motivations, requirements, and implementation details of these use cases, you'll be better equipped to identify and address similar challenges in your own projects.

## **Setup**

In [1]:
import os
from pathlib import Path
from dotenv import load_dotenv
from swarmauri.llms.concrete.OpenAIImgGenModel import OpenAIImgGenModel
from swarmauri.llms.concrete.OpenAIAudioTTS import OpenAIAudioTTS
from swarmauri.llms.concrete.GroqAIAudio import GroqAIAudio

**Load environment variables**

In [2]:
load_dotenv()

True

## **Use Case 1: Automated Course Content Creation**

In [3]:
class CourseContentGenerator:
    def __init__(self):
        self.img_model = OpenAIImgGenModel(api_key=os.getenv("OPENAI_API_KEY"))
        self.tts_model = OpenAIAudioTTS(api_key=os.getenv("OPENAI_API_KEY"))
    
    def create_lesson(self, lesson_text, title):
        # Generate illustration
        image_url = self.img_model.generate_image(
            prompt=f"Educational illustration for: {title}"
        )[0]
        
        # Create audio narration
        audio_path = f"lesson_{title.lower().replace(' ', '_')}.mp3"
        audio_file = self.tts_model.predict(
            text=lesson_text,
            audio_path=audio_path
        )
        
        return {
            "title": title,
            "text": lesson_text,
            "illustration": image_url,
            "narration": audio_file
        }

In [4]:
# Example usage
generator = CourseContentGenerator()
lesson = generator.create_lesson(
    "Photosynthesis is the process by which plants convert sunlight into energy.",
    "Introduction to Photosynthesis"
)

**Print output course content output**

In [13]:
# Extract just the file name from audio_path
audio_name = Path(lesson['narration']).name

In [14]:
# Print output with keys
print(f"\nCourse Title: {lesson['title']}")
print(f"\nScript: {lesson['text']}")
print(f"Image URL: {lesson['illustration']}")
print(f"Audio Name: {audio_name}")


Course Title: Introduction to Photosynthesis

Script: Photosynthesis is the process by which plants convert sunlight into energy.
Image URL: https://oaidalleapiprodscus.blob.core.windows.net/private/org-apgARqqdlfy55Yko1fPIICVn/user-Xo2ejY1sCkk0iPxHhDLqVevG/img-FBMwXBCahj73JpRtL1h5KF0C.png?st=2024-11-05T12%3A29%3A30Z&se=2024-11-05T14%3A29%3A30Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-11-04T19%3A10%3A43Z&ske=2024-11-05T19%3A10%3A43Z&sks=b&skv=2024-08-04&sig=IC6SwlqawWZ/K1h6YkNB227fQUFTcCCSYc5xTL18tpA%3D
Audio Name: lesson_introduction_to_photosynthesis.mp3


## **Use Case 2: Multilingual Content Creation**

In [16]:
## Use Case 2: Multilingual Content Creation
class ContentLocalizer:
    def __init__(self):
        self.tts_model = OpenAIAudioTTS(api_key=os.getenv("OPENAI_API_KEY"))
        self.audio_model = GroqAIAudio(api_key=os.getenv("GROQ_API_KEY"))
    
    def process_content(self, audio_path, output_path):
        # Transcribe audio
        transcript = self.audio_model.predict(
            audio_path=audio_path,
            task="translation"
        )
        
        # Generate translated audio
        audio_file = self.tts_model.predict(
            text=transcript,
            audio_path=output_path
        )
        
        return {
            "transcript": transcript,
            "audio_file": audio_file
        }


**Example usage**

In [18]:
# Example usage
localizer = ContentLocalizer()
result = localizer.process_content(
    "French.mp3",
    "english_output.mp3"
)

**Print output course content output**

In [20]:
# Extract just the file name from audio_path
audio_name = Path(result['audio_file']).name

In [21]:
# Print output with keys
print(f"\nScript: {result['transcript']}")
print(f"Audio Name: {audio_name}")


Script:  Hello. Hello. May I? Yes, of course. Oh, my God. Sorry. It's all right. I'm just a little bit tired. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. Sorry. It's all right. What's your name? Cécile. And you? My name is François. I'm delighted to meet you. You come from here? Yes, I live here. And you? I come from La Rochelle. What do you do in life? I'm a gallerist. I have a small gallery in the 10th. I'm a gallerist. I have a small gallery in the 10th. I'm a gallerist. I have a small gallery in the 10th. I'm a gallerist. I'm a gallerist. I'm 

## **Use Case 3: Interactive Story Creation**

In [22]:
## Use Case 3: Interactive Story Creation
class StoryCreator:
    def __init__(self):
        self.img_model = OpenAIImgGenModel(api_key=os.getenv("OPENAI_API_KEY"))
        self.tts_model = OpenAIAudioTTS(api_key=os.getenv("OPENAI_API_KEY"))
    
    async def create_story_scene(self, scene_text, scene_number):
        # Generate scene illustration
        image_url = await self.img_model.agenerate_image(
            prompt=f"Story scene: {scene_text}"
        )
        
        # Create scene narration
        audio_path = f"scene_{scene_number}.mp3"
        audio_file = await self.tts_model.apredict(
            text=scene_text,
            audio_path=audio_path
        )
        
        return {
            "scene_number": scene_number,
            "text": scene_text,
            "illustration": image_url[0],
            "narration": audio_file
        }



**Example usage**

In [23]:
# Example usage
creator = StoryCreator()
scene = await creator.create_story_scene(
    "The magical tree glowed with a soft blue light", 1)


**Print Interactive story Creation output** 

In [25]:
# Extract just the file name from audio_path
audio_name = Path(scene['narration']).name

In [26]:
# Print output with keys
print(f"\nscene_number: {scene['scene_number']}")
print(f"\nScript: {scene['text']}")
print(f"Image URL: {scene['illustration']}")
print(f"Audio Name: {audio_name}")


scene_number: 1

Script: The magical tree glowed with a soft blue light
Image URL: https://oaidalleapiprodscus.blob.core.windows.net/private/org-apgARqqdlfy55Yko1fPIICVn/user-Xo2ejY1sCkk0iPxHhDLqVevG/img-oSG3VMcXnwp5lXlHYZp5BmOt.png?st=2024-11-05T13%3A00%3A51Z&se=2024-11-05T15%3A00%3A51Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-11-04T19%3A12%3A43Z&ske=2024-11-05T19%3A12%3A43Z&sks=b&skv=2024-08-04&sig=uaLah69GxCNjwlcBD5urmXjsBnJNLNstfWWPVgccamo%3D
Audio Name: scene_1.mp3


## **Conclusion**
**This notebook has explored a diverse range of real-world use cases that highlight the transformative potential of multimodal AI technologies**

**You've learned how to:**
- Automate the creation of educational content, including text, images, and audio narrations, 
to streamline the content generation process.
- Multilingual content processing
- Develop interactive storytelling experiences that  combine text, images, and 
audio to create engaging and immersive narratives.


These use cases represent just a fraction of the countless possibilities that emerge when we harness the power of integrated text, image, and audio processing. 
As you venture forth, we encourage you to continue exploring the potential of multimodal AI, identifying new challenges and innovative solutions that can transform industries and enrich the lives of users.