# **Complete Multimodal Pipeline** 
# **Introduction**
This notebook demonstrates how to create a complete pipeline combining text, image, and audio processing using swarmauri.
By using the swarmauri library's integration with various AI models, we'll demonstrate how to create a robust system that can handle diverse types of content and data.

The ability to  integrate multiple modalities opens up a wide range of possibilities, from automated content creation and localization to interactive storytelling and multimedia presentations. In this notebook, we'll explore practical use cases that illustrate the power of these multimodal techniques.

You'll learn how to orchestrate a  pipeline that can generate images from textual descriptions, convert text to  audio narrations, and even process content in parallel for improved efficiency. By the end of this notebook, you'll have a solid understanding of how to build your own end-to-end multimodal solutions
## **Setup**

In [4]:
import os
from pathlib import Path
from dotenv import load_dotenv
from swarmauri.llms.concrete.OpenAIImgGenModel import OpenAIImgGenModel
from swarmauri.llms.concrete.OpenAIAudioTTS import OpenAIAudioTTS
from swarmauri.llms.concrete.GroqAIAudio import GroqAIAudio

**Load environment variables**

In [2]:
load_dotenv()

# Initialize all models
img_model = OpenAIImgGenModel(api_key=os.getenv("OPENAI_API_KEY"))
tts_model = OpenAIAudioTTS(api_key=os.getenv("OPENAI_API_KEY"))
audio_model = GroqAIAudio(api_key=os.getenv("GROQ_API_KEY"))

## **Basic Pipeline**

In [5]:
def process_story(story_text):
    """Process a story through multiple modalities"""
    
    # Generate image from story
    image_prompt = f"Illustration for story: {story_text[:100]}"
    image_urls = img_model.generate_image(prompt=image_prompt)
    
    # Convert story to speech
    audio_path = "story_narration.mp3"
    audio_file = tts_model.predict(text=story_text, audio_path=audio_path)
    
    return {
        "image_url": image_urls[0],
        "audio_path": audio_file
    }



**Example usage**

In [6]:
# Example usage
story = """
A small robot discovered a garden filled with mechanical flowers.
Each flower played a different musical note when touched.
"""

results = process_story(story)

**Print output**

In [12]:
# Extract just the file name from audio_path
audio_name = Path(results['audio_path']).name

# Print output with keys
print(f"Image URL: {results['image_url']}")
print(f"\nAudio Name: {audio_name}")

Image URL: https://oaidalleapiprodscus.blob.core.windows.net/private/org-apgARqqdlfy55Yko1fPIICVn/user-Xo2ejY1sCkk0iPxHhDLqVevG/img-XC6Jb9JPHpBKvDXtq8ltRheT.png?st=2024-11-05T11%3A38%3A02Z&se=2024-11-05T13%3A38%3A02Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-11-04T20%3A26%3A37Z&ske=2024-11-05T20%3A26%3A37Z&sks=b&skv=2024-08-04&sig=CqqE4HcY/X41aBqGgyV9K%2B2ah70YZqpJQTe/bIxYy94%3D

Audio Name: story_narration.mp3


## **Parallel Processing Pipeline**

In [13]:
async def process_content_parallel(text_contents):
    """Process multiple content pieces in parallel"""
    
    async def process_single(text):
        image_url = await img_model.agenerate_image(prompt=text)
        audio_path = f"content_{hash(text)}.mp3"
        audio_file = await tts_model.apredict(text=text, audio_path=audio_path)
        return {"text": text, "image": image_url, "audio": audio_file}
    
    tasks = [process_single(text) for text in text_contents]
    results = await asyncio.gather(*tasks)
    return results



**Example usage**

In [18]:
import asyncio
contents = [
    "A peaceful mountain lake at sunrise",
    "A busy city street during rush hour",
    "A quiet forest path in autumn"
]

async_results = await process_content_parallel(contents)

**Print output**

In [23]:

for result in async_results:
    # Extract just the file name from audio_path
    audio_name = Path(result['audio']).name
    # Print output with keys
    print(f"\nScript: {result['text']}")
    print(f"Image URL: {result['image']}")
    print(f"Audio Name: {audio_name}")


Script: A peaceful mountain lake at sunrise
Image URL: ['https://oaidalleapiprodscus.blob.core.windows.net/private/org-apgARqqdlfy55Yko1fPIICVn/user-Xo2ejY1sCkk0iPxHhDLqVevG/img-jBBnjEoDNz2QK21KWd8EcOdc.png?st=2024-11-05T12%3A01%3A09Z&se=2024-11-05T14%3A01%3A09Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-11-05T03%3A01%3A08Z&ske=2024-11-06T03%3A01%3A08Z&sks=b&skv=2024-08-04&sig=b1O59vJssSG/MW0ENGjAUHOFWfzJU3h7BdVs8yUqNeE%3D']
Audio Name: content_3573223842926064870.mp3

Script: A busy city street during rush hour
Image URL: ['https://oaidalleapiprodscus.blob.core.windows.net/private/org-apgARqqdlfy55Yko1fPIICVn/user-Xo2ejY1sCkk0iPxHhDLqVevG/img-Yv0Ti5a6rnIbVp0vkuMAJlT0.png?st=2024-11-05T12%3A01%3A07Z&se=2024-11-05T14%3A01%3A07Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-11-04T20%3A15

## **Conclusion**
**This notebook has showcased the tremendous potential of a complete multimodal pipeline, leveraging the capabilities of the swarmauri library. You've learned how to:**

Integration of multiple AI modalities

Implement sequential and parallel processing pipelines to handle diverse content types.

Explore practical use cases, such as automated course content creation and interactive storytelling.


By mastering the techniques presented in this notebook, you'll be equipped to tackle a wide range of multimodal challenges, from enhancing user experiences to streamlining content creation workflows. 

We encourage you to continue exploring the boundaries of what's possible with these powerful AI-driven tools, and to innovate new and exciting multimodal applications.