##### Copyright 2025 Google LLC.

## Setup

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [None]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
from google.colab import files

### Install and initialize the SDK


In [None]:
!pip install -U -q "google-genai>=1.16.0" # 1.16 is needed for multi-speaker audio


In [None]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select a model

Audio-out is only supported by the "`tts`" models, `gemini-2.5-flash-preview-tts` and `gemini-2.5-pro-preview-tts`.

For more information about all Gemini models, check the [documentation](https://ai.google.dev/gemini-api/docs/models/gemini) for extended information on each of them.


In [33]:
MODEL_ID = "gemini-2.5-flash-preview-tts" # @param ["gemini-2.5-flash-preview-tts","gemini-2.5-pro-preview-tts"] {"allow-input":true, isTemplate: true}

Next create a helper function to prompt the model and play back the audio in the notebook:

In [None]:
# @title Helper functions (just run that cell)

import contextlib
import wave
from IPython.display import Audio

file_index = 0

@contextlib.contextmanager
def wave_file(filename, channels=1, rate=24000, sample_width=2):
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        yield wf

def play_audio_blob(blob):
  global file_index
  file_index += 1

  fname = f'audio_{file_index}.wav'
  with wave_file(fname) as wav:
    wav.writeframes(blob.data)

  return Audio(fname, autoplay=True)

def play_audio(response):
    return play_audio_blob(response.candidates[0].content.parts[0].inline_data)

## Control how the model speaks

There are 30 different built-in voices you can use and 24 supported languages which gives you plenty of combinations to try.

### Choose a voice

Choose a voice among the 30 different ones. You can find their characteristics in the [documentation](https://ai.google.dev/gemini-api/docs/speech-generation#voices).

In [None]:
voice_name = "Charon" # @param ["Zephyr", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Aoede", "Callirhoe", "Autonoe", "Enceladus", "Iapetus", "Umbriel", "Algieba", "Despina", "Erinome", "Algenib", "Rasalgethi", "Laomedeia", "Achernar", "Alnilam", "Schedar", "Gacrux", "Pulcherrima", "Achird", "Zubenelgenubi", "Vindemiatrix", "Sadachbia", "Sadaltager", "Sulafar"]

### Change the language

Just tell the model to speak in a certain language and it will. The [documentation](https://ai.google.dev/gemini-api/docs/speech-generation#languages) lists all the supported ones.

In [34]:
response = client.models.generate_content(
  model=MODEL_ID,
  contents="""
    Speak in a warm, experienced voice, in Vietnamese:
    "Người chủ nhà hàng nhượng quyền đã làm một điều mà người chủ nhà hàng đơn lẻ không thể."

  """,
  config={
      "response_modalities": ['Audio'],
      "speech_config": {
          "voice_config": {
              "prebuilt_voice_config": {
                  "voice_name": voice_name
              }
          }
      }
  },
)
files.download(fname)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Download âm thanh

In [None]:
# prompt: chọn đầu vào là file text, chia file text thành nhiều đoạn mỗi đoạn 1429 từ. sau đó chuyển văn bản thành giọng nói. đặt tên file theo tên file text

# Function to split text into chunks
def split_text_into_chunks(text, chunk_size=1429):
    words = text.split()
    return [" ".join(words[i:i + chunk_size]) for i in range(0, len(words), chunk_size)]

# Upload the text file
uploaded = files.upload()

for filename in uploaded.keys():
    with open(filename, 'r', encoding='utf-8') as f:
        text = f.read()

    chunks = split_text_into_chunks(text)
    base_filename = filename.rsplit('.', 1)[0]

    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i+1}/{len(chunks)}")
        response = client.models.generate_content(
            model=MODEL_ID,
            contents=f"""
              Speak in a warm, experienced voice, in Vietnamese:
              "{chunk}"

            """,
            config={
                "response_modalities": ['Audio'],
                "speech_config": {
                    "voice_config": {
                        "prebuilt_voice_config": {
                            "voice_name": voice_name
                        }
                    }
                }
            },
        )

        # Get the audio blob
        audio_blob = response.candidates[0].content.parts[0].inline_data

        # Define the output filename
        output_fname = f'{base_filename}_part_{i+1}.wav'

        # Write the audio data to a wave file
        with wave_file(output_fname) as wav:
            wav.writeframes(audio_blob.data)

        print(f"Saved chunk {i+1} as {output_fname}")

        # Download the audio file
        files.download(output_fname)