# GPT-Audio Sample

A minimal GPT-Audio demo that generates a spoken announcement from a text prompt using the gpt-audio model. Configure your Azure OpenAI credentials, set the desired prompt, voice, and format, then run the notebook cells to produce an output file and a printed transcript. Inline audio playback is included for quick verification.

- Key outputs: `output.wav` (generated audio), transcript printed to the notebook.
- Quick steps: set AZURE_OPENAI_API_KEY and AZURE_OPENAI_API_ENDPOINT, adjust `prompt`, `audio_voice`, and `audio_format`, then run cells in order.

In [78]:
# Import libraries and load environment variables
import base64
import os
from openai import AzureOpenAI
from IPython.display import Audio, display
from dotenv import load_dotenv
load_dotenv()

True

In [79]:
# Load configuration from environment
api_key = os.getenv("AZURE_OPENAI_API_KEY")
endpoint = os.getenv("AZURE_OPENAI_API_ENDPOINT")
api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2025-01-01-preview")
model = os.getenv("AZURE_OPENAI_AUDIO_MODEL", "gpt-audio")

In [80]:
# Optional runtime configuration for the audio/chat request
modalities = ["text", "audio"]
audio_voice = "ballad"  # options: alloy, ash, ballad, cedar, coral, echo, marin, sage, shimmer, verse
audio_format = "wav"
prompt = "Announce the Grey Matter Tech Summit, hosted at Prospero House in London. Be enthusiastic, and don't include any preamble."

# Map to the shapes used later in the notebook
audio = {"voice": audio_voice, "format": audio_format}
messages = [{"role": "user", "content": prompt}]

# Basic validation
if not api_key or not endpoint:
    raise EnvironmentError("Environment variables required: AZURE_OPENAI_API_KEY and AZURE_OPENAI_API_ENDPOINT")

In [81]:
# Initialize the Azure OpenAI client
client = AzureOpenAI(
    api_key=api_key,
    azure_endpoint=endpoint,
    api_version=api_version
)

In [82]:
# Make the audio chat completions request
completion = client.chat.completions.create(
    model=model,
    modalities=modalities,
    audio=audio,
    messages=messages,
)

In [83]:
# Write the output audio data to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
out_path = "output.wav"
with open(out_path, "wb") as f:
    f.write(wav_bytes)
    # Ensure the bytes are fully flushed to disk on Windows before reading
    f.flush()
    os.fsync(f.fileno())

In [84]:
# In-notebook playback via HTML5 audio
display(Audio(filename=out_path))

In [85]:
# Print the transcript
print(completion.choices[0].message.audio.transcript)

Get ready for the Grey Matter Tech Summit, happening at Prospero House in London! Join innovators, thought leaders, and trailblazers for an unforgettable day of cutting-edge technology and groundbreaking ideas. Don’t miss it!
