Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Here are the recent additions and updates to the Gemini API and the Cookbook:

* **Gemini 2.5 models:** Explore the capabilities of the latest Gemini 2.5 models (Flash and Pro)! See the [Get Started Guide](./quickstarts/Get_started.ipynb) and the [thinking guide](./quickstarts/Get_started_thinking.ipynb) as they'll all be thinking ones.
* **Imagen and Veo**: Get started with our media generation model with this brand new [Veo guide](./quickstarts/Get_started_Veo.ipynb) and [Imagen guide](./quickstarts/Get_started_imagen.ipynb)!
* **Lyria**: Get started and music generation with the [Lyria RealTime](./quickstarts/Get_started_LyriaRealTime.ipynb) model.
* **Lyria and TTS**: Get started with podcast and music generation with the [TTS](./quickstarts/Get_started_TTS.ipynb) and [Lyria RealTime](./quickstarts/Get_started_LyriaRealTime.ipynb) models.
* **LiveAPI**: Get started with the [multimodal Live API](./quickstarts/Get_started_LiveAPI.ipynb) and unlock new interactivity with Gemini.
* **Recently Added Guides:**
* [Browser as a tool](./examples/Browser_as_a_tool.ipynb): Use a web browser for live and internal (intranet) web interactions
Expand Down
40 changes: 22 additions & 18 deletions quickstarts/Get_started_LiveAPI.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -57,15 +57,19 @@
"source": [
"**Preview**: The Live API is in preview.\n",
"\n",
"This notebook demonstrates simple usage of the Gemini 2.0 Multimodal Live API. For an overview of new capabilities refer to the [Gemini 2.0 docs](https://ai.google.dev/gemini-api/docs/models/gemini-v2).\n",
"This notebook demonstrates simple usage of the Gemini Multimodal Live API. For an overview of new capabilities refer to the [Gemini Live API docs](https://ai.google.dev/gemini-api/docs/live).\n",
"\n",
"This notebook implements a simple turn-based chat where you send messages as text, and the model replies with audio. The API is capable of much more than that. The goal here is to demonstrate with **simple code**.\n",
"\n",
"Some features of the API are not working in Colab, to try them it is recommended to have a look at this [python script](./Get_started_LiveAPI.py) and run it locally.\n",
"Some features of the API are not working in Colab, to try them it is recommended to have a look at this [Python script](./Get_started_LiveAPI.py) and run it locally.\n",
"\n",
"If you aren't looking for code, and just want to try multimedia streaming use [Live API in Google AI Studio](https://aistudio.google.com/app/live).\n",
"\n",
"The [Next steps](#next_steps) section at the end of this tutorial provides links to additional resources."
"The [Next steps](#next_steps) section at the end of this tutorial provides links to additional resources.\n",
"\n",
"#### Native audio output\n",
"\n",
"**Info**: Gemini 2.5 introduces [native audio generation](https://ai.google.dev/gemini-api/docs/live#native-audio-output), which directly generates audio output, providing a more natural sounding audio, more expressive voices, more awareness of additional context, e.g., tone, and more proactive responses. You can try a native audio example in this [script](./Get_started_LiveAPI_NativeAudio.py)."
]
},
{
Expand All @@ -92,7 +96,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {
"id": "46zEFO2a9FFd"
},
Expand Down Expand Up @@ -123,7 +127,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {
"id": "A1pkoyZb9Jm3"
},
Expand All @@ -148,7 +152,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {
"id": "HghvVpbU0Uap"
},
Expand All @@ -172,7 +176,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {
"id": "27Fikag0xSaB"
},
Expand All @@ -194,7 +198,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {
"id": "Yd1vs3cP8EmS"
},
Expand Down Expand Up @@ -228,7 +232,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": null,
"metadata": {
"id": "dDfslcyIOqgI"
},
Expand Down Expand Up @@ -284,7 +288,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {
"id": "7mEDGwJfLRrm"
},
Expand Down Expand Up @@ -312,7 +316,7 @@
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": null,
"metadata": {
"id": "VFD4VleVKj1-"
},
Expand Down Expand Up @@ -413,7 +417,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {
"id": "bWTaU8j-X3AJ"
},
Expand All @@ -436,7 +440,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {
"id": "3zAjMOZXFuxI"
},
Expand Down Expand Up @@ -579,7 +583,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {
"id": "WxdwgTKIGIlY"
},
Expand Down Expand Up @@ -669,7 +673,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {
"id": "cbkoDa1ve_C5"
},
Expand Down Expand Up @@ -768,7 +772,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {
"id": "yqBTtKvGmKI4"
},
Expand Down Expand Up @@ -872,7 +876,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {
"id": "Y5ZVUQ5vJrEJ"
},
Expand Down Expand Up @@ -906,7 +910,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {
"id": "xH_iZhTxKFtF"
},
Expand Down
163 changes: 163 additions & 0 deletions quickstarts/Get_started_LiveAPI_NativeAudio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# -*- coding: utf-8 -*-
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
## Setup

To install the dependencies for this script, run:

```
brew install portaudio
pip install -U google-genai pyaudio
```

## API key

Ensure the `GOOGLE_API_KEY` environment variable is set to the api-key
you obtained from Google AI Studio.

## Run

To run the script:

```
python Get_started_LiveAPI_NativeAudio.py
```

Start talking to Gemini
"""

import asyncio
import sys
import traceback

import pyaudio

from google import genai

if sys.version_info < (3, 11, 0):
import taskgroup, exceptiongroup

asyncio.TaskGroup = taskgroup.TaskGroup
asyncio.ExceptionGroup = exceptiongroup.ExceptionGroup

FORMAT = pyaudio.paInt16
CHANNELS = 1
SEND_SAMPLE_RATE = 16000
RECEIVE_SAMPLE_RATE = 24000
CHUNK_SIZE = 1024

pya = pyaudio.PyAudio()


client = genai.Client() # GOOGLE_API_KEY must be set as env variable

MODEL = "gemini-2.5-flash-preview-native-audio-dialog"
CONFIG = {"response_modalities": ["AUDIO"]}


class AudioLoop:
def __init__(self):
self.audio_in_queue = None
self.out_queue = None

self.session = None

self.audio_stream = None

self.receive_audio_task = None
self.play_audio_task = None


async def listen_audio(self):
mic_info = pya.get_default_input_device_info()
self.audio_stream = await asyncio.to_thread(
pya.open,
format=FORMAT,
channels=CHANNELS,
rate=SEND_SAMPLE_RATE,
input=True,
input_device_index=mic_info["index"],
frames_per_buffer=CHUNK_SIZE,
)
if __debug__:
kwargs = {"exception_on_overflow": False}
else:
kwargs = {}
while True:
data = await asyncio.to_thread(self.audio_stream.read, CHUNK_SIZE, **kwargs)
await self.out_queue.put({"data": data, "mime_type": "audio/pcm"})

async def send_realtime(self):
while True:
msg = await self.out_queue.get()
await self.session.send_realtime_input(audio=msg)

async def receive_audio(self):
"Background task to reads from the websocket and write pcm chunks to the output queue"
while True:
turn = self.session.receive()
async for response in turn:
if data := response.data:
self.audio_in_queue.put_nowait(data)
continue
if text := response.text:
print(text, end="")

# If you interrupt the model, it sends a turn_complete.
# For interruptions to work, we need to stop playback.
# So empty out the audio queue because it may have loaded
# much more audio than has played yet.
while not self.audio_in_queue.empty():
self.audio_in_queue.get_nowait()

async def play_audio(self):
stream = await asyncio.to_thread(
pya.open,
format=FORMAT,
channels=CHANNELS,
rate=RECEIVE_SAMPLE_RATE,
output=True,
)
while True:
bytestream = await self.audio_in_queue.get()
await asyncio.to_thread(stream.write, bytestream)

async def run(self):
try:
async with (
client.aio.live.connect(model=MODEL, config=CONFIG) as session,
asyncio.TaskGroup() as tg,
):
self.session = session

self.audio_in_queue = asyncio.Queue()
self.out_queue = asyncio.Queue(maxsize=5)

tg.create_task(self.send_realtime())
tg.create_task(self.listen_audio())
tg.create_task(self.receive_audio())
tg.create_task(self.play_audio())
except asyncio.CancelledError:
pass
except ExceptionGroup as EG:
if self.audio_stream:
self.audio_stream.close()
traceback.print_exception(EG)


if __name__ == "__main__":
loop = AudioLoop()
asyncio.run(loop.run())
1 change: 1 addition & 0 deletions quickstarts/Get_started_LyriaRealTime.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,7 @@
"# What's next?\n",
"\n",
"Now that you know how to generate music, here are other cool things to try:\n",
"* Instead of music, learn how to generate multi-speakers conversation using the [TTS models](./Get_started_TTS.ipynb),\n",
"* Discover how to generate [images](./Get_started_imagen.ipynb) or [videos](./Get_started_Veo.ipynb),\n",
"* Instead of generation music or audio, find out how to Gemini can [understand Audio files](./Audio.ipynb),\n",
"* Have a real-time conversation with Gemini using the [Live API](./Get_started_LiveAPI.ipynb)."
Expand Down
Loading
Loading