## **Notebook_01_Intro_to_Audio_Generation**
## **Introduction**

In recent years, audio generation has emerged as a transformative field, bridging the gap between machine understanding and human interaction. Applications in speech synthesis, audio translation, and transcription are reshaping industries, from customer support and accessibility services to entertainment and education. This notebook introduces the foundational aspects of audio generation, emphasizing Swarmauri’s SDK capabilities, particularly through the GroqAIAudio module. Swarmauri SDK is designed to streamline the integration of complex audio functionalities with minimal setup, enabling developers to create, modify, and manage audio tasks efficiently.

We’ll explore core concepts of audio data handling and transformation to provide a practical foundation for using GroqAIAudio. This SDK offers multiple pre-trained models and supports an array of tasks, including audio transcription, translation, and speech synthesis, which are pivotal for interactive and multilingual applications. By the end of this notebook, users will be well-versed in Swarmauri SDK's structure and how it can support scalable and efficient audio processing workflows.

**Import dependencies**

In [6]:
import os
from swarmauri.llms.concrete.GroqAIAudio import GroqAIAudio
from pathlib import Path
from dotenv import load_dotenv

**Set up environment**


In [7]:
load_dotenv()
API_KEY = os.getenv("GROQ_API_KEY")

**Initialize the GroqAIAudio model**

In [9]:
llm = GroqAIAudio(api_key=API_KEY)

**Check model information**

In [10]:

print(f"Model Resource: {llm.resource}")
print(f"Model Type: {llm.type}")
print(f"Default Model Name: {llm.name}")


Model Resource: LLM
Model Type: GroqAIAudio
Default Model Name: distil-whisper-large-v3-en


**Get list of allowed models**

In [11]:
available_models = llm.allowed_models
print("\nAvailable Models:")
for model in available_models:
    print(f"- {model}")


Available Models:
- distil-whisper-large-v3-en
- whisper-large-v3


**Basic audio file setup**

In [22]:
root_dir = Path.cwd()
audio_path = os.path.join(root_dir, "Recording.m4a")

**Simple transcription example**

In [19]:
print("\nTranscribing audio file...")
transcription = llm.predict(audio_path=audio_path)
print(f"Transcription result: {transcription}")



Transcribing audio file...
Transcription result:  And we're going to be able to be ha'amu'all.


**Model serialization example**

In [23]:


model_json = llm.model_dump_json()
print("\nModel serialization:")
print(f"Model JSON: {model_json}")




Model serialization:
Model JSON: {"name":"distil-whisper-large-v3-en","id":"9c5a94ca-08ca-450e-900f-444f0b09d763","members":[],"owner":null,"host":null,"resource":"LLM","version":"0.1.0","type":"GroqAIAudio","allowed_models":["distil-whisper-large-v3-en","whisper-large-v3"],"api_key":"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"}


**Validate model serialization**

In [24]:

validated_model = GroqAIAudio.model_validate_json(model_json)
print(f"Validated Model ID: {validated_model.id}")

Validated Model ID: 9c5a94ca-08ca-450e-900f-444f0b09d763


## **Conclusion**

This introduction to audio generation through the Swarmauri SDK has highlighted its powerful, model-based framework for handling varied audio tasks. Swarmauri’s GroqAIAudio module simplifies traditionally complex audio operations, enabling quick experimentation and development. With this foundation, you can now delve into more specific functionalities, such as transcription, translation, and parameter tuning, which will be explored in subsequent notebooks. 
Leveraging Swarmauri’s tools can ultimately enhance user engagement, promote accessibility, and open new possibilities in human-computer interaction through audio.



In [None]:
from swarmauri.utils.print_notebook_metadata  import print_notebook_metadata

print_notebook_metadata(author_name = "Dominion John " , github_username = "DOMINION-JOHN1" )