# Audio Processing with MLX Server

This notebook demonstrates how to process audio files using the MLX Server with OpenAI-compatible API.


## What You'll Learn

- Connect to MLX Server
- Load and encode audio files for processing
- Send audio to the model for analysis
- Get text descriptions of audio content

## Prerequisites

- MLX Server running on localhost:8000
- Audio file in the `audios/` directory
- OpenAI Python library installed


## Step 1: Setup and Connection


In [1]:
# Import required libraries
from openai import OpenAI
import base64
import os

# Initialize OpenAI client to connect to MLX Server
# The MLX Server runs locally and provides OpenAI-compatible endpoints
client = OpenAI(
    base_url="http://localhost:8000/v1",  # MLX Server address
    api_key="fake-api-key",               # Any string works for local server
)

print("✅ Connected to MLX Server")


✅ Connected to MLX Server


## Step 2: Audio File Processing


In [2]:
def load_audio_file(audio_path: str) -> str:
    """
    Load an audio file and encode it as base64 for API transmission.
    
    Args:
        audio_path (str): Path to the audio file
        
    Returns:
        str: Base64 encoded audio data
    """
    if not os.path.exists(audio_path):
        raise FileNotFoundError(f"Audio file not found: {audio_path}")
    
    with open(audio_path, "rb") as audio_file:
        audio_data = audio_file.read()
        encoded_audio = base64.b64encode(audio_data).decode('utf-8')
        
    print(f"✅ Loaded audio file: {audio_path}")
    print(f"   File size: {len(audio_data)} bytes")
    print(f"   Encoded size: {len(encoded_audio)} characters")
    
    return encoded_audio

# Load the sample audio file
audio_path = "audios/audio.wav"
audio_base64 = load_audio_file(audio_path)


✅ Loaded audio file: audios/audio.wav
   File size: 372698 bytes
   Encoded size: 496932 characters


## Step 3: Audio Analysis


In [3]:
def analyze_audio(audio_base64: str, prompt: str = "Describe what you hear in this audio.") -> str:
    """
    Send audio to MLX Server for analysis.
    
    Args:
        audio_base64 (str): Base64 encoded audio data
        prompt (str): Text prompt for the model
        
    Returns:
        str: Model's response
    """
    try:
        response = client.chat.completions.create(
            model="local-multimodal",
            messages=[
                {
                    "role": "user", 
                    "content": [
                        {
                            "type": "input_audio",
                            "input_audio": {
                                "data": audio_base64,
                                "format": "wav"
                            }
                        },
                        {
                            "type": "text",
                            "text": prompt
                        }
                    ]
                }
            ],
            max_tokens=1024
        )
        
        return response.choices[0].message.content
        
    except Exception as e:
        return f"Error analyzing audio: {str(e)}"

# Analyze the audio with a descriptive prompt
result = analyze_audio(audio_base64, "Describe the audio in detail.")
print("🎵 Audio Analysis Result:")
print(f"   {result}")


🎵 Audio Analysis Result:
   Dogs are sitting by the door.


## Conclusion

This notebook demonstrated the audio processing capabilities of the MLX Server using OpenAI-compatible API endpoints. Key highlights include:

- **Audio Input Support**: Successfully processed audio files by encoding them as base64 and sending them through the `input_audio` message type
- **Multimodal Integration**: Combined audio input with text prompts to create rich, context-aware responses
- **OpenAI Compatibility**: Leveraged familiar OpenAI API patterns for seamless integration with existing workflows
- **Error Handling**: Implemented proper error handling for robust audio processing

The MLX Server's audio processing capabilities enable powerful applications such as:
- Audio transcription and analysis
- Voice-controlled interfaces
- Audio content summarization
- Accessibility features for audio-based content

This foundation opens up numerous possibilities for building audio-enabled AI applications with the performance benefits of MLX on Apple Silicon.
