# **Audio and Text Integration using Swarmauri**
## **Introduction**
This notebook explores audio processing capabilities using swarmauri, including text-to-speech with OpenAIAudioTTS and audio transcription with GroqAIAudio.

As language models and speech recognition technologies continue to advance, the ability to bridge the gap between spoken and written communication has become increasingly important.

We'll start by exploring text-to-speech (TTS) generation, where we'll learn how to convert text into  audio output. This capability opens the door to a wide range of applications, from automated narration and audiobook creation to assistive technologies and voice interfaces.

Next, we'll learn  about speech-to-text (STT) transcription, enabling us to convert audio recordings into text. This functionality is crucial for a variety of use cases, such as meeting transcripts, video subtitling, and language translation.
Throughout the notebook, we'll showcase the swarmauri library's integration with powerful AI models, such as OpenAIAudioTTS and GroqAIAudio, to provide robust and reliable audio processing capabilities.
## **Setup**

In [1]:
import os
from pathlib import Path
from dotenv import load_dotenv
from swarmauri.llms.concrete.OpenAIAudioTTS import OpenAIAudioTTS
from swarmauri.llms.concrete.GroqAIAudio import GroqAIAudio

**Load environment variables**

In [2]:

load_dotenv()

True

**Initialize models**

In [3]:
tts_model = OpenAIAudioTTS(api_key=os.getenv("OPENAI_API_KEY"))
transcribe_model = GroqAIAudio(api_key=os.getenv("GROQ_API_KEY"))

## **Text-to-Speech Generation**
**Basic TTS generation**

In [18]:
text = "Hello, this is a test of text-to-speech output."
audio_path = "output.mp3"
result = tts_model.predict(text=text, audio_path=audio_path)
print(f"Generated audio saved to: {Path(result).name}")

Generated audio saved to: output.mp3


**Stream audio chunks**

In [5]:
chunks = []
for chunk in tts_model.stream(text=text):
    chunks.append(chunk)
full_audio = b"".join(chunks)

## **Speech-to-Text Transcription**
**Basic transcription**



In [8]:
# Basic transcription
audio_file = "Recording (2).m4a"
transcription = transcribe_model.predict(audio_path=audio_file)
print(f"Transcription: {transcription}")



Transcription:  Thank you.


**Translation task**

In [9]:
# Translation task
french_audio = "French.mp3"
translation = transcribe_model.predict(
    audio_path=french_audio,
    task="translation"
)
print(f"Translation: {translation}")

Translation:  Hello. Hello. May I? Yes, of course. Oh, my God. Sorry. It's all right. I'm just a little bit tired. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. I'm going to go and get some rest. Sorry. It's all right. What's your name? Cécile. And you? My name is François. I'm delighted to meet you. You come from here? Yes, I live here. And you? I come from La Rochelle. What do you do in life? I'm a gallerist. I have a small gallery in the 10th. I'm a gallerist. I have a small gallery in the 10th. I'm a gallerist. I have a small gallery in the 10th. I'm a gallerist. I'm a gallerist. 

## **Batch Processing**

**Batch TTS processing**

In [10]:
# Batch TTS processing
text_paths = {
    "Hello": "hello.mp3",
    "Good morning": "morning.mp3",
    "Welcome": "welcome.mp3"
}

In [15]:
results = tts_model.batch(text_path_dict=text_paths)
for path in results:
    file_name = Path(path).name
    print(f"Generated: {file_name}")

Generated: hello.mp3
Generated: morning.mp3
Generated: welcome.mp3


## **Async Processing**
**Async TTS**

In [21]:
import asyncio

# Async TTS
text = "This is an async test"
result = await tts_model.apredict(text=text, audio_path="async_test.mp3")
print (Path(result).name)
    

async_test.mp3


**Async batch**

In [19]:

    # Async batch
    text_paths = {
        "First message": "first.mp3",
        "Second message": "second.mp3"
    }
    batch_results = await tts_model.abatch(text_path_dict=text_paths)
    for path in batch_results:
        file_name = Path(path).name
        print(f"Generated: {file_name}")

Generated: first.mp3
Generated: second.mp3


## **Conclusion**
**By the end of this notebook, you will have a comprehensive understanding of how to use audio and text integration using swarmauri. You'll be able to:**

Text-to-speech generation

Audio transcription and translation

Implement batch processing for both text-to-speech and speech-to-text tasks, improving efficiency and scalability.

Explore asynchronous audio processing, enabling you to handle long-running tasks and improve the responsiveness of your applications.

This notebook lays the foundation for incorporating multimodal capabilities into your projects. 
We encourage you to build upon the concepts and techniques presented here, and to continuously explore  audio and text integration.
