Copyright 2024 Google, LLC. This software is provided as-is,
without warranty or representation for any use or purpose. Your
use of it is subject to your agreement with Google.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

# Gemini 2.0 Chat Example

This notebook provides a simple example for interacting with Google's new Gemini 2.0 models using the unified Gen AI SDK. For more information please visit https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2

Install the needed dependencies for the notebook.

In [None]:
!pip install -r requirements.txt

Import python libraries. Note the new "genai" library, which you can install using 'pip install google-genai'.

In [None]:
import asyncio
import base64
import contextlib
import datetime
import os
import json
import wave
import itertools

import nest_asyncio

from IPython.display import display, Audio, clear_output
from google import genai

Set your project variables. Change "YOUR_PROJECT_ID" to your GCP project ID.

In [None]:
project_id = "YOUR_PROJECT_ID"
location = "global"
region = "us-central1"

Initialize the model, specifying 'vertexai=True'. The new Google Gen AI SDK provides a unified interface to Gemini 2.0 through both the Gemini Developer API and the Gemini API on Vertex AI. 

In [None]:
client = genai.Client(
    vertexai=True, project=project_id, location=region
)

Instead of performning an interactive chat session, you can optionally request a single turn generated response as outlined below.

In [None]:
#response = client.models.generate_content(
#    model='gemini-2.0-flash-exp', contents='Hi Gemini!'
#)
#print(response.text)

## Start with a simple chat session

First, we'll select the model we want to use and then set the configuration the response for text only.

In [None]:
model_id = "gemini-2.0-flash-exp"
config = {"response_modalities": ["TEXT"]}

We need to use nest_asyncio since we will be calling asyncio.run from within the already-running notebook's event loop.  The asyncio.run is used for running a coroutine directly on the event loop, but the event loop will already be running since we're calling it inside of a notebook. You probably will not need this outside of a Jupyter notebook environment. 

In [None]:
nest_asyncio.apply()

Define an interactive text chat session using the Gen AI API.

In [None]:
async def interactive_chat(client, model_id, config):
    """
    Starts an interactive chat session with a specified model.

    Args:
        client: The client to use for communication.
        model_id: The ID of the model to interact with.
        config: The configuration to use for the session.
    """
    async with client.aio.live.connect(model=model_id, config=config) as session:
        while True:
            prompt = input("Enter your prompt here (or 'End' to quit): ")
            if prompt.lower() == "end":
                 break

            await session.send(prompt, end_of_turn=True)
            print("> You: ", prompt)  # Print the user's prompt

            async for response in session.receive():
                if response.text:
                    print(response.text, end="", flush=True)

Start the chat session by running the interactive_chat function. Simply type 'End' to stop the session.

In [None]:
asyncio.run(interactive_chat(client, model_id, config))

 __________________________________________

That's how easy it is to use the new GenAI API

## Text to Auidio

Next we'll change the response modality to Audio. This will allow Gemini 2.0 to respond using audio instead of text.

Define a function to create a temporary local wave file. This file will be overwritten as we proceed with the conversation.

In [None]:
@contextlib.contextmanager
def wave_file(filename, channels=1, rate=24000, sample_width=2):
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        yield wf

Now we'll change the response modality to Audio.

In [None]:
config={"generation_config": {"response_modalities": ["AUDIO"]}}

Define an interactive chat session with Gemini 2.0. We'll use the same format as our last interactive session, only this time Gemini will respond to us with audio instead of text.

In [None]:
async def interactive_chat_audio_response(client, model_id, config):
    """
    Starts an interactive chat session with audio responses.

    Args:
        client: The client to use for communication.
        model_id: The ID of the model to interact with.
        config: The configuration to use for the session.
    """
    async with client.aio.live.connect(model=model_id, config=config) as session:
        file_name = 'audio.wav'
        #with wave_file(file_name) as wav:
        while True:
            prompt = input("Enter your prompt here (or 'End' to quit): ")
            if prompt.lower() == "end":
                break
                    
            await session.send(prompt, end_of_turn=True)
            print("> You: ", prompt)  # Print the user's prompt
            
            first = True
            with wave_file(file_name) as wav:
                async for response in session.receive():
            
                    if response.data is not None:
                        model_turn = response.server_content.model_turn
                        if model_turn.parts[0].inline_data.mime_type == 'audio/pcs':
                            if first:
                                print(model_turn.parts[0].inline_data.mime_type)
                                first = False
                        print('.', end='.')
                        wav.writeframes(response.data)


                display(Audio(file_name, autoplay=True))

Start the chat session, only this time we'll run the interactive_chat_audio_response function. Just like before, simply type 'End' to stop the session.

In [None]:
asyncio.run(interactive_chat_audio_response(client, model_id, config))