<a href="https://colab.research.google.com/github/mapsguy/programming-gemini/blob/main/generating_audio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#import the genai library
from google import genai

In [2]:
#step 2: AIStudio: read the api key from the user data
from google.colab import userdata
client = genai.Client(api_key=userdata.get("GEMINI_API_KEY"))

#If you want to read from environment keys
#import os
#client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

In [3]:
model_name = "models/gemini-2.0-flash-live-001"

In [19]:
#Basic workflow: text to audio with Live API

import asyncio
import wave

config = {"response_modalities": ["AUDIO"]}

async def main():
  text_prompt="What is your favorite color?"
  print(f"Attempting to generate audio for: {text_prompt}")
  async with client.aio.live.connect(model=model_name, config=config) as session:
    wf = wave.open("audio.wav", "wb")
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(24000)

    await session.send_client_content(
        turns={"role": "user", "parts": [{"text": text_prompt}]}, turn_complete=True)

    async for response in session.receive():
      if response.data is not None:
        wf.writeframes(response.data)
      #Transcription?
      #if response.server_content.input_transcription:
        #print('Transcript:', response.server_content.input_transcription.text)


    wf.close()
    print("Audio generation complete")

await main()

Attempting to generate audio for: What is your favorite color?
Audio generation complete


In [14]:
#Transcribe the generated audio
audio_file = client.files.upload(file="/content/audio.wav")

response = client.models.generate_content(
    model="models/gemini-2.5-flash-preview-05-20",
     contents=["Please transcribe this audio.", audio_file])
print(response.text)


As a language model, I don't have personal preferences like favorite colors. Is there anything else I can help you with?


In [15]:
#Generating audio after configuring the speech output (specifying a voice/language)
from google.genai import types

gen_config = {
    "response_modalities": ["AUDIO"],
    "speech_config": {
        "voice_config":{"prebuilt_voice_config": {"voice_name": "Kore"}},
        "language_code": "en-US"
    }
}

async def main():
  text_prompt="Tell us a short story."
  print(f"Attempting to generate audio for: {text_prompt}")
  async with client.aio.live.connect(model=model_name, config=gen_config) as session:
    wf = wave.open("audio.wav", "wb")
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(24000)

    await session.send_client_content(
        turns={"role": "user", "parts": [{"text": text_prompt}]}, turn_complete=True)

    async for response in session.receive():
      if response.data is not None:
        wf.writeframes(response.data)

    wf.close()
    print("Audio generation complete")

await main()


Attempting to generate audio for: Tell us a short story.
Audio generation complete


In [16]:
#Transcribe the short story generated above
audio_file = client.files.upload(file="/content/audio.wav")

response = client.models.generate_content(
    model="models/gemini-2.5-flash-preview-05-20",
     contents=["Please transcribe this audio.", audio_file])

print(response.text)

Okay, here's a short story for you. A firefly named Flicker was born without a light. The other fireflies teased him, but Flicker didn't give up. He tried everything to spark his light, but nothing worked. One day, feeling dejected, he sat alone in a field of flowers. A little girl, seeing his sadness, gently touched him. Suddenly, Flicker burst into a brilliant glow. The girl's kindness had ignited his inner light. From that day on, Flicker shone brighter than any other firefly, teaching everyone that kindness can spark the light within us all. Would you like to hear another story?
