google-gemini · Giom-V · May 21, 2025 · May 20, 2025 · May 21, 2025 · May 21, 2025
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ Here are the recent additions and updates to the Gemini API and the Cookbook:
 
 * **Gemini 2.5 models:** Explore the capabilities of the latest Gemini 2.5 models (Flash and Pro)! See the [Get Started Guide](./quickstarts/Get_started.ipynb) and the [thinking guide](./quickstarts/Get_started_thinking.ipynb) as they'll all be thinking ones.
 * **Imagen and Veo**: Get started with our media generation model with this brand new [Veo guide](./quickstarts/Get_started_Veo.ipynb) and [Imagen guide](./quickstarts/Get_started_imagen.ipynb)!
-* **Lyria**: Get started and music generation with the [Lyria RealTime](./quickstarts/Get_started_LyriaRealTime.ipynb) model.
+* **Lyria and TTS**: Get started with podcast and music generation with the [TTS](./quickstarts/Get_started_TTS.ipynb) and [Lyria RealTime](./quickstarts/Get_started_LyriaRealTime.ipynb) models.
 * **LiveAPI**: Get started with the [multimodal Live API](./quickstarts/Get_started_LiveAPI.ipynb) and unlock new interactivity with Gemini.
 * **Recently Added Guides:**
   * [Browser as a tool](./examples/Browser_as_a_tool.ipynb): Use a web browser for live and internal (intranet) web interactions

diff --git a/quickstarts/Get_started_LiveAPI.ipynb b/quickstarts/Get_started_LiveAPI.ipynb
@@ -57,15 +57,19 @@
       "source": [
         "**Preview**: The Live API is in preview.\n",
         "\n",
-        "This notebook demonstrates simple usage of the Gemini 2.0 Multimodal Live API. For an overview of new capabilities refer to the [Gemini 2.0 docs](https://ai.google.dev/gemini-api/docs/models/gemini-v2).\n",
+        "This notebook demonstrates simple usage of the Gemini Multimodal Live API. For an overview of new capabilities refer to the [Gemini Live API docs](https://ai.google.dev/gemini-api/docs/live).\n",
         "\n",
         "This notebook implements a simple turn-based chat where you send messages as text, and the model replies with audio. The API is capable of much more than that. The goal here is to demonstrate with **simple code**.\n",
         "\n",
-        "Some features of the API are not working in Colab, to try them it is recommended to have a look at this [python script](./Get_started_LiveAPI.py) and run it locally.\n",
+        "Some features of the API are not working in Colab, to try them it is recommended to have a look at this [Python script](./Get_started_LiveAPI.py) and run it locally.\n",
         "\n",
         "If you aren't looking for code, and just want to try multimedia streaming use [Live API in Google AI Studio](https://aistudio.google.com/app/live).\n",
         "\n",
-        "The [Next steps](#next_steps) section at the end of this tutorial provides links to additional resources."
+        "The [Next steps](#next_steps) section at the end of this tutorial provides links to additional resources.\n",
+        "\n",
+        "#### Native audio output\n",
+        "\n",
+        "**Info**: Gemini 2.5 introduces [native audio generation](https://ai.google.dev/gemini-api/docs/live#native-audio-output), which directly generates audio output, providing a more natural sounding audio, more expressive voices, more awareness of additional context, e.g., tone, and more proactive responses. You can try a native audio example in this [script](./Get_started_LiveAPI_NativeAudio.py)."
       ]
     },
     {
@@ -92,7 +96,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 1,
+      "execution_count": null,
       "metadata": {
         "id": "46zEFO2a9FFd"
       },
@@ -123,7 +127,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 2,
+      "execution_count": null,
       "metadata": {
         "id": "A1pkoyZb9Jm3"
       },
@@ -148,7 +152,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 3,
+      "execution_count": null,
       "metadata": {
         "id": "HghvVpbU0Uap"
       },
@@ -172,7 +176,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 4,
+      "execution_count": null,
       "metadata": {
         "id": "27Fikag0xSaB"
       },
@@ -194,7 +198,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 5,
+      "execution_count": null,
       "metadata": {
         "id": "Yd1vs3cP8EmS"
       },
@@ -228,7 +232,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 37,
+      "execution_count": null,
       "metadata": {
         "id": "dDfslcyIOqgI"
       },
@@ -284,7 +288,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 10,
+      "execution_count": null,
       "metadata": {
         "id": "7mEDGwJfLRrm"
       },
@@ -312,7 +316,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 36,
+      "execution_count": null,
       "metadata": {
         "id": "VFD4VleVKj1-"
       },
@@ -413,7 +417,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 13,
+      "execution_count": null,
       "metadata": {
         "id": "bWTaU8j-X3AJ"
       },
@@ -436,7 +440,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 16,
+      "execution_count": null,
       "metadata": {
         "id": "3zAjMOZXFuxI"
       },
@@ -579,7 +583,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 17,
+      "execution_count": null,
       "metadata": {
         "id": "WxdwgTKIGIlY"
       },
@@ -669,7 +673,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 18,
+      "execution_count": null,
       "metadata": {
         "id": "cbkoDa1ve_C5"
       },
@@ -768,7 +772,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 19,
+      "execution_count": null,
       "metadata": {
         "id": "yqBTtKvGmKI4"
       },
@@ -872,7 +876,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 20,
+      "execution_count": null,
       "metadata": {
         "id": "Y5ZVUQ5vJrEJ"
       },
@@ -906,7 +910,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 21,
+      "execution_count": null,
       "metadata": {
         "id": "xH_iZhTxKFtF"
       },

diff --git a/quickstarts/Get_started_LiveAPI_NativeAudio.py b/quickstarts/Get_started_LiveAPI_NativeAudio.py
@@ -0,0 +1,163 @@
+# -*- coding: utf-8 -*-
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+## Setup
+
+To install the dependencies for this script, run:
+
+```
+brew install portaudio
+pip install -U google-genai pyaudio
+```
+
+## API key
+
+Ensure the `GOOGLE_API_KEY` environment variable is set to the api-key
+you obtained from Google AI Studio.
+
+## Run
+
+To run the script:
+
+```
+python Get_started_LiveAPI_NativeAudio.py
+```
+
+Start talking to Gemini
+"""
+
+import asyncio
+import sys
+import traceback
+
+import pyaudio
+
+from google import genai
+
+if sys.version_info < (3, 11, 0):
+    import taskgroup, exceptiongroup
+
+    asyncio.TaskGroup = taskgroup.TaskGroup
+    asyncio.ExceptionGroup = exceptiongroup.ExceptionGroup
+
+FORMAT = pyaudio.paInt16
+CHANNELS = 1
+SEND_SAMPLE_RATE = 16000
+RECEIVE_SAMPLE_RATE = 24000
+CHUNK_SIZE = 1024
+
+pya = pyaudio.PyAudio()
+
+
+client = genai.Client()  # GOOGLE_API_KEY must be set as env variable
+
+MODEL = "gemini-2.5-flash-preview-native-audio-dialog"
+CONFIG = {"response_modalities": ["AUDIO"]}
+
+
+class AudioLoop:
+    def __init__(self):
+        self.audio_in_queue = None
+        self.out_queue = None
+
+        self.session = None
+
+        self.audio_stream = None
+
+        self.receive_audio_task = None
+        self.play_audio_task = None
+
+
+    async def listen_audio(self):
+        mic_info = pya.get_default_input_device_info()
+        self.audio_stream = await asyncio.to_thread(
+            pya.open,
+            format=FORMAT,
+            channels=CHANNELS,
+            rate=SEND_SAMPLE_RATE,
+            input=True,
+            input_device_index=mic_info["index"],
+            frames_per_buffer=CHUNK_SIZE,
+        )
+        if __debug__:
+            kwargs = {"exception_on_overflow": False}
+        else:
+            kwargs = {}
+        while True:
+            data = await asyncio.to_thread(self.audio_stream.read, CHUNK_SIZE, **kwargs)
+            await self.out_queue.put({"data": data, "mime_type": "audio/pcm"})
+
+    async def send_realtime(self):
+        while True:
+            msg = await self.out_queue.get()
+            await self.session.send_realtime_input(audio=msg)
+
+    async def receive_audio(self):
+        "Background task to reads from the websocket and write pcm chunks to the output queue"
+        while True:
+            turn = self.session.receive()
+            async for response in turn:
+                if data := response.data:
+                    self.audio_in_queue.put_nowait(data)
+                    continue
+                if text := response.text:
+                    print(text, end="")
+
+            # If you interrupt the model, it sends a turn_complete.
+            # For interruptions to work, we need to stop playback.
+            # So empty out the audio queue because it may have loaded
+            # much more audio than has played yet.
+            while not self.audio_in_queue.empty():
+                self.audio_in_queue.get_nowait()
+
+    async def play_audio(self):
+        stream = await asyncio.to_thread(
+            pya.open,
+            format=FORMAT,
+            channels=CHANNELS,
+            rate=RECEIVE_SAMPLE_RATE,
+            output=True,
+        )
+        while True:
+            bytestream = await self.audio_in_queue.get()
+            await asyncio.to_thread(stream.write, bytestream)
+
+    async def run(self):
+        try:
+            async with (
+                client.aio.live.connect(model=MODEL, config=CONFIG) as session,
+                asyncio.TaskGroup() as tg,
+            ):
+                self.session = session
+
+                self.audio_in_queue = asyncio.Queue()
+                self.out_queue = asyncio.Queue(maxsize=5)
+
+                tg.create_task(self.send_realtime())
+                tg.create_task(self.listen_audio())
+                tg.create_task(self.receive_audio())
+                tg.create_task(self.play_audio())
+        except asyncio.CancelledError:
+            pass
+        except ExceptionGroup as EG:
+            if self.audio_stream:
+                self.audio_stream.close()
+            traceback.print_exception(EG)
+
+
+if __name__ == "__main__":
+    loop = AudioLoop()
+    asyncio.run(loop.run())
diff --git a/quickstarts/Get_started_LyriaRealTime.ipynb b/quickstarts/Get_started_LyriaRealTime.ipynb
@@ -664,6 +664,7 @@
         "# What's next?\n",
         "\n",
         "Now that you know how to generate music, here are other cool things to try:\n",
+        "*   Instead of music, learn how to generate multi-speakers conversation using the [TTS models](./Get_started_TTS.ipynb),\n",
         "*   Discover how to generate [images](./Get_started_imagen.ipynb) or [videos](./Get_started_Veo.ipynb),\n",
         "*   Instead of generation music or audio, find out how to Gemini can [understand Audio files](./Audio.ipynb),\n",
         "*   Have a real-time conversation with Gemini using the [Live API](./Get_started_LiveAPI.ipynb)."