# **Understanding Swarmauri Audio Processing Classes**
---

In this notebook, we’ll explore the various **Audio Processing Models** provided by Swarmauri.  

We’ll also cover:  
- How to view the allowed/supported models for each Audio Processing class.  
- The methods available in these classes and their purposes.  

This notebook serves as a comprehensive guide to understanding and working with Swarmauri’s Audio Processing classes, helping you make the most of its powerful features.  

---

## **What Are Audio Processing Models?**  

Audio processing models are AI-driven systems designed to process and transform audio content. These models use cutting-edge deep learning techniques to perform tasks like converting audio to text or synthesizing text into natural-sounding speech.  

### **Use Cases of Audio Processing Models**  
Audio processing models have a wide range of applications across various industries. Below are some of the most notable use cases:  

1. **Speech Recognition**  
   - Transcribing podcasts, interviews, or meetings into text.  
   - Assisting in closed captioning for videos and live broadcasts.  
   - Enabling voice-controlled applications and devices.  

2. **Text-to-Speech (TTS)**  
   - Generating natural-sounding voiceovers for videos or presentations.  
   - Assisting visually impaired users by reading text content aloud.  
   - Creating interactive voice responses (IVRs) for customer service systems.  

3. **Education and Training**  
   - Providing audio-based learning tools for language acquisition.  
   - Converting eBooks or written materials into audio formats for accessibility.  
   - Enhancing virtual learning platforms with speech synthesis and transcription.  

4. **Gaming and Entertainment**  
   - Adding voiceovers to video games for characters or storytelling.  
   - Creating realistic and expressive voices for animated characters.  
   - Producing audiobooks or narrated content for streaming platforms.  

5. **Accessibility**  
   - Empowering individuals with disabilities by enabling voice-to-text or text-to-speech functionalities.  
   - Assisting those with hearing impairments through real-time transcriptions.  
   - Offering language translation and conversion for inclusivity.  


By leveraging these use cases, businesses and individuals can unlock innovative ways to communicate, improve accessibility, and enhance user experiences.  

With this understanding, let’s dive deeper into the specific **Audio Processing Models** provided by Swarmauri and the functionalities they offer.  

## List of Audio Processing Classes in Swarmauri
---

Swarmauri provides the following Audio classes, named based on the providers of the models:

1. **GroqAIAudio**: Converts Audio to Text. 
2. **HyperbolicAudioTTS**: Converts Text to Audio.
3. **OpenAIAudio**: Converts Audio to Text.
4. **OpenAIAudioTTS**: Converts Text to Audio.
5. **PlayHTModel**: Converts Text to Audio.
6. **WhisperLargeModel**: Converts Audio to Text

## How to see the Allowed Models in any Audio class
---

1. ### Import the class 

- Here we will use GroqAIAudio, you can use any other class of your choice from the List of Audio Classes in Swarmauri 

In [3]:
from swarmauri.llms.concrete.GroqAIAudio import GroqAIAudio

2. ### Instantiate the Model Class

In [4]:
model = GroqAIAudio(api_key="put your api key here") # Note: You don't need an API key to see the allowed_model, you can leave it as it is

3. ### List all available models
- To list the allowed models, we use the `allowed_models` class attribute, just like we did when working with LLMs

In [5]:
available_models = model.allowed_models
print(available_models)

['distil-whisper-large-v3-en', 'whisper-large-v3']


As you can see, the allowed models for the `GroqAIAudio` class have been printed.  

This approach is similar to how we worked with LLMs earlier. Swarmauri ensures consistency by allowing you to use the same methods and attributes across different classes in a unified manner. This design simplifies the process, making it more intuitive and efficient to build with Swarmauri.

## Methods Available in Each Audio Processing Class  
---

Each **Audio Processing** class in **Swarmauri** provides a set of methods to interact with the models. These methods are designed to offer flexibility for both **synchronous** and **asynchronous** workflows, as well as batch processing. Additionally, **speech-to-text** classes support streaming functionalities, enabling real-time transcription.  

### **Speech-to-Text (STT) Classes**  
The following methods are available in classes like `GroqAIAudio`, `OpenAIAudio`, and `WhisperLargeModel`, which convert audio to text:  

1. **`generate`**:  
   - **Description**: Converts audio to text synchronously (blocking).  
   - **Use Case**: When you need immediate transcription of an audio file.  

2. **`agenerate`**:  
   - **Description**: The asynchronous counterpart of `generate`. It allows you to transcribe audio without blocking the program.  
   - **Use Case**: When running multiple transcription tasks concurrently using `asyncio`.  

3. **`stream`**:  
   - **Description**: Enables real-time transcription of audio input synchronously, providing partial text as audio is processed.  
   - **Use Case**: When you require live transcription, such as for meetings, lectures, or broadcasts.  

4. **`astream`**:  
   - **Description**: Asynchronous version of the `stream` method for real-time transcription in an asynchronous environment.  
   - **Use Case**: When you need live transcription while running other asynchronous tasks.  

5. **`batch`**:  
   - **Description**: Processes multiple audio-to-text requests in a synchronous manner.  
   - **Use Case**: When you need to transcribe multiple audio files efficiently.  

6. **`abatch`**:  
   - **Description**: Asynchronous version of the `batch` method for transcribing multiple audio files concurrently.  
   - **Use Case**: When you need to process a batch of transcription tasks in an asynchronous environment.  

---

### **Text-to-Speech (TTS) Classes**  
The following methods are available in classes like `HyperbolicAudioTTS`, `OpenAIAudioTTS`, and `PlayHTModel`, which convert text to audio:  

1. **`generate`**:  
   - **Description**: Converts text to audio synchronously (blocking).  
   - **Use Case**: When you need immediate synthesis of a single text input into audio.  

2. **`agenerate`**:  
   - **Description**: The asynchronous counterpart of `generate`. It allows you to synthesize audio without blocking the program.  
   - **Use Case**: When running multiple text-to-speech tasks concurrently using `asyncio`.  

3. **`batch`**:  
   - **Description**: Processes multiple text-to-audio requests in a synchronous manner.  
   - **Use Case**: When you need to generate audio for a batch of text inputs efficiently.  

4. **`abatch`**:  
   - **Description**: Asynchronous version of the `batch` method for processing multiple text-to-speech requests concurrently.  
   - **Use Case**: When you need to process a batch of TTS tasks in an asynchronous environment.  

---

### **Note on Streaming**  
- **Speech-to-Text Classes**: Support `stream` and `astream` methods, enabling real-time transcription for live audio.  
- **Text-to-Speech Classes**: Do not support `stream` or `astream` methods because audio synthesis is delivered as a complete output rather than in incremental chunks.  

This comprehensive method set ensures that Swarmauri’s Audio Processing classes are flexible and intuitive, catering to the unique demands of speech-to-text and text-to-speech workflows.

# Notebook Metadata

In [1]:
from swarmauri.utils import print_notebook_metadata

metadata = print_notebook_metadata.print_notebook_metadata("Victory Nnaji", "3rd-Son")
print(metadata) 

Author: Victory Nnaji
GitHub Username: 3rd-Son
Notebook File: Notebook_01_Understanding_Swarmauri_Audio_Processing_Classes.ipynb
Last Modified: 2024-12-30 12:17:33.997472
Platform: Darwin 24.1.0
Python Version: 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ]
Swarmauri Version: 0.5.2
None
