##### Copyright 2024 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini 2.0 - Multimodal live API: Tool use

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Get_started_LiveAPI_tools.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/>

This notebook provides examples of how to use tools with the multimodal live API with [Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2).

The API provides Google Search, Code Execution and Function Calling tools. The earlier Gemini models supported versions of these tools. The biggest change with Gemini 2 (in the Live API) is that, basically, all the tools are handled by Code Execution. With that change, you can use **multiple tools** in a single API call, and the model can use multiple tools in a single code execution block.  

This tutorial assumes you are familiar with the Live API, as described in the [this tutorial](../quickstarts/Get_started_LiveAPI.ipynb).

## Setup

### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.

More details about this new SDK on the [documentation](https://ai.google.dev/gemini-api/docs/sdks) or in the [Getting started](../gemini-2/get_started.ipynb) notebook.

In [2]:
# %pip install -U -q google-genai

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [3]:
# from google.colab import userdata
# import os

# os.environ['GOOGLE_API_KEY']=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

The client will pickup your API key from the environment variable.
To use the live API you need to set the client version to `v1alpha`.

In [1]:
from google import genai

client = genai.Client(http_options= {
      'api_version': 'v1alpha'
})

### Select a model

Multimodal Live API are a new capability introduced with the [Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) model. It won't work with previous generation models.

In [2]:
model_name = "gemini-2.0-flash-exp"

### Imports

In [3]:
import asyncio
import contextlib
import json
import wave

from IPython import display


from google import genai
from google.genai import types

### Utilities

You're going to use the Live API's audio output, the easiest way hear it in Colab is to write the `PCM` data out as a `WAV` file:

In [4]:
@contextlib.contextmanager
def wave_file(filename, channels=1, rate=24000, sample_width=2):
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        yield wf

Use a logger so it's easier to switch on/off debugging messages.

In [5]:
import logging
logger = logging.getLogger('Live')
#logger.setLevel('DEBUG')  # Switch between "INFO" and "DEBUG" to toggle debug messages.
logger.setLevel('INFO')

## Get started

Most of the Live API setup will be similar to the [starter tutorial](../gemini-2/live_api_starter.ipynb). Since this tutorial doesn't focus on the realtime interactivity of the API, the code has been simplified: This code uses the Live API, but it only sends a single text prompt, and listens for a single turn of replies.

You can set `modality="AUDIO"` on any of the examples to get the spoken version of the output.

In [6]:
n = 0
async def run(prompt, modality="TEXT", tools=None):
  global n
  if tools is None:
    tools=[]

  config = {
          "tools": tools,
          "response_modalities": [modality]}

  async with client.aio.live.connect(model=model_name, config=config) as session:
    display.display(display.Markdown(prompt))
    display.display(display.Markdown('-------------------------------'))
    await session.send(input=prompt, end_of_turn=True)

    audio = False
    filename = f'audio_{n}.wav'
    with wave_file(filename) as wf:
      async for response in session.receive():
        logger.debug(str(response))
        if text:=response.text:
          display.display(display.Markdown(text))
          continue

        if data:=response.data:
          print('.', end='')
          wf.writeframes(data)
          audio = True
          continue

        server_content = response.server_content
        if server_content is not None:
          handle_server_content(wf, server_content)
          continue

        tool_call = response.tool_call
        if tool_call is not None:
          await handle_tool_call(session, tool_call)


  if audio:
    display.display(display.Audio(filename, autoplay=True))
    n = n+1

Since this tutorial demonstrates several tools, you'll need more code to handle the different types of objects it returns.

- The `code_execution` tool can return `executable_code` and `code_execution_result` parts.
- The `google_search` tool may attach a `grounding_metadata` object.

In [7]:
def handle_server_content(wf, server_content):
  model_turn = server_content.model_turn
  if model_turn:
    for part in model_turn.parts:
      executable_code = part.executable_code
      if executable_code is not None:
        display.display(display.Markdown('-------------------------------'))
        display.display(display.Markdown(f'``` python\n{executable_code.code}\n```'))
        display.display(display.Markdown('-------------------------------'))

      code_execution_result = part.code_execution_result
      if code_execution_result is not None:
        display.display(display.Markdown('-------------------------------'))
        display.display(display.Markdown(f'```\n{code_execution_result.output}\n```'))
        display.display(display.Markdown('-------------------------------'))

  grounding_metadata = getattr(server_content, 'grounding_metadata', None)
  if grounding_metadata is not None:
    display.display(
        display.HTML(grounding_metadata.search_entry_point.rendered_content))

  return

- Finally, with the `function_declarations` tool, the API may return `tool_call` objects. To keep this code minimal, the `tool_call` handler just replies to every function call with a response of `"ok"`.

In [8]:
async def handle_tool_call(session, tool_call):
  for fc in tool_call.function_calls:
    tool_response = types.LiveClientToolResponse(
        function_responses=[types.FunctionResponse(
            name=fc.name,
            id=fc.id,
            response={'result':'ok'},
        )]
    )

    print('\n>>> ', tool_response)
    await session.send(input=tool_response)

Try running it for a first time:

In [9]:
await run(prompt="Hello?", tools=None, modality = "TEXT")

  async with client.aio.live.connect(model=model_name, config=config) as session:


Hello?

-------------------------------

Hello

 there! How can I help you today?


## Simple function call

The function calling feature of the API Can handle a wide variety of functions. Support in the SDK is still under construction. So keep this simple just send a minimal function definition: Just the function's name.

Note that in the live API function calls are independent of the chat turns. The conversation can continue while a function call is being processed.

In [13]:
turn_on_the_lights = {'name': 'turn_on_the_lights'}
turn_off_the_lights = {'name': 'turn_off_the_lights'}

In [14]:
prompt = "Turn on the lights"

tools = [
    {'function_declarations': [turn_on_the_lights, turn_off_the_lights]}
]

await run(prompt, tools=tools, modality = "TEXT")

Turn on the lights

-------------------------------

-------------------------------

``` python
print(default_api.turn_on_the_lights())

```

-------------------------------


>>>  function_responses=[FunctionResponse(id='function-call-5599993286997869208', name='turn_on_the_lights', response={'result': 'ok'})]


-------------------------------

```
{'result': 'ok'}

```

-------------------------------

OK




## Code execution

The `code_execution` lets the model write and run python code. Try it on a math problem the model can't solve from memory:

In [15]:
prompt="Can you compute the largest prime palindrome under 100000."

tools = [
    {'code_execution': {}}
]

await run(prompt, tools=tools, modality="TEXT")

Can you compute the largest prime palindrome under 100000.

-------------------------------

Okay

, I understand. I need to find the largest prime palindrome that is less than

 100,000. Here's how I'll

 approach this:

1. **Generate Palindromes:** I'll start by generating palindromes in descending order, starting from the largest possible palindrome under 

100,000 (which is 99999), down to the smallest (which is 2).

2. **Prim

ality Test:** For each generated palindrome, I'll perform a primality test to see if it is a prime.

3. **Return Largest:** The first palindrome that is also a prime will be the answer.

Here's

 the implementation using python.



-------------------------------

``` python
import sympy

def is_palindrome(n):
    return str(n) == str(n)[::-1]

def find_largest_prime_palindrome(limit):
    for i in range(limit - 1, 1, -1):
        if is_palindrome(i):
            if sympy.isprime(i):
                return i
    return None

limit = 100000
largest_prime_palindrome = find_largest_prime_palindrome(limit)
print(f'{largest_prime_palindrome=}')

```

-------------------------------




-------------------------------

```
largest_prime_palindrome=98689

```

-------------------------------

The

 largest prime palindrome under 100,000 is 98

689.


## Compositional Function Calling

Compositional function calling refers to the ability to combine user defined functions with the `code_execution` tool. The model will write them into larger blocks of code, and then pause execution while it waits for you to send back responses for each call.


In [16]:
prompt="Can you turn on the lights wait 10s and then turn them off?"

tools = [
    {'code_execution': {}},
    {'function_declarations': [turn_on_the_lights, turn_off_the_lights]}
]

await run(prompt, tools=tools, modality="TEXT")

Can you turn on the lights wait 10s and then turn them off?

-------------------------------

-------------------------------

``` python
import time

default_api.turn_on_the_lights()
time.sleep(10)
default_api.turn_off_the_lights()

```

-------------------------------


>>>  function_responses=[FunctionResponse(id='function-call-4932162932389447579', name='turn_on_the_lights', response={'result': 'ok'})]

>>>  function_responses=[FunctionResponse(id='function-call-15573996242429362886', name='turn_off_the_lights', response={'result': 'ok'})]


```tool_outputs
{}
```




## Google search

The `google_search` tool lets the model conduct google searches. For example, try asking it about events that are too recent to be in the training data.

The search will still execute in `AUDIO` mode, but you won't see the detailed results:

In [17]:
prompt="Can you use google search tell me about the largest earthquake in california the week of Dec 5 2024?"

tools = [
   {'google_search': {}}
]

await run(prompt, tools=tools, modality="TEXT")

Can you use google search tell me about the largest earthquake in california the week of Dec 5 2024?

-------------------------------

-------------------------------

``` python
print(google_search.search(queries=["largest earthquake california week of December 5 2024", "California earthquakes December 2024"]))

```

-------------------------------




-------------------------------

```
Looking up information on Google Search.

```

-------------------------------

The

 largest earthquake in California during the week of December 5, 202

4, was a magnitude 7.0 that occurred offshore of Cape Mendoc

ino on December 5, 2024, at 10:44 a.m. local time. Here's a breakdown of

 what's known about the earthquake:

**Key Details:**

*   **Magnitude:** 7.0
*   **Date:** December 5

, 2024
*   **Time:** 10:44 a.m. local time (18:44 UTC)
*   **Location:** Offshore of Cape Mendocino, approximately 7

0 km (about 40-50 miles) southwest of Ferndale, in Northern California
*   **Depth:** About 10 km (6 miles)
*   **Type:** Strike-slip earthquake, where tectonic

 plates slide past each other horizontally.
*   **Tectonic Setting:** The earthquake occurred near the Mendocino Triple Junction, where the Pacific, North America, and Juan de Fuca/Gorda plates meet.

**Impact and Aftermath:**
*   **Tsunami Warning:** A tsunami warning was issued

 for parts of the California and Oregon coast but was lifted about an hour later as no major tsunami occurred. The warning extended from Santa Cruz in central California to central Oregon.
*   **ShakeAlert:** People throughout Oregon and California received alerts on their phones via the ShakeAlert system.
*   **Felt Area

:** The earthquake was felt from San Jose, north to Grants Pass, Oregon and as far inland as Sacramento.
*   **Aftershocks:** A series of aftershocks followed the main earthquake, clustered near the epicenter. Some extended eastward towards the coast and southward, with at least 15 aftershocks

 east of the main rupture zone.
* **Other Seismic Activity:** A magnitude 5.8 earthquake struck near Yerington, Nevada, on December 9, 2024. Also there were two foreshocks with magnitude 4.4 and 4.2 occurring hours before the main shock

.
*   **Damage:** The earthquake caused little immediate damage due to its remote location.
*   **Largest Event in the Region:** This was the largest earthquake to impact the North Coast since the 1992 magnitude 7.2 Cape Mendocino earthquake.
*   **Seismic Gap

:** The earthquake occurred in a previously identified seismic gap along the Mendocino Fault.

**Additional Notes:**

*   The earthquake occurred in a seismically active region, so it was not entirely unexpected.
*   The event has prompted further analysis of the area and its potential impact on the San Andreas Fault.


*   The event did not trigger a tsunami due to the nature of a strike-slip earthquake not causing vertical movement.

In summary, the magnitude 7.0 earthquake off the coast of Cape Mendocino on December 5, 2024, was the largest earthquake in California during that week

, and while it triggered a brief tsunami warning and was widely felt, it caused little damage due to its remote location.


## Multi-tool


The biggest difference with the new API however is that you're no longer limited to using 1-tool per request. Try combining those tasks from the previous sections:

In [18]:
prompt = """\
  Hey, I need you to do three things for me.

  1. Then compute the largest prime plaindrome under 100000.
  2. Then use google search to lookup unformation about the largest earthquake in california the week of Dec 5 2024?
  3. Turn on the lights

  Thanks!
  """

tools = [
    {'google_search': {}},
    {'code_execution': {}},
    {'function_declarations': [turn_on_the_lights, turn_off_the_lights]}
]

await run(prompt, tools=tools, modality="TEXT")

  Hey, I need you to do three things for me.

  1. Then compute the largest prime plaindrome under 100000.
  2. Then use google search to lookup unformation about the largest earthquake in california the week of Dec 5 2024?
  3. Turn on the lights

  Thanks!
  

-------------------------------

Okay

, I can help with that. Let's break this down into steps.



First, I'll compute the largest prime palindrome under 100

,000.


-------------------------------

``` python
def is_palindrome(n):
    return str(n) == str(n)[::-1]

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

largest_prime_palindrome = 0
for i in range(99999, 1, -1):
    if is_palindrome(i) and is_prime(i):
        largest_prime_palindrome = i
        break

print(largest_prime_palindrome)

```

-------------------------------

-------------------------------

```
98689

```

-------------------------------

Okay

, the largest prime palindrome under 100,000 is 

98689.

Now, let's search for information about

 the largest earthquake in California the week of December 5, 2024.


-------------------------------

``` python
concise_search("largest earthquake california week of December 5 2024", max_num_results=3)

```

-------------------------------




-------------------------------

```
Looking up information on Google Search.

```

-------------------------------

Based

 on the search results, the largest earthquake in California the week of December 5

, 2024 was a magnitude 7.0 earthquake that occurred

 offshore of Cape Mendocino on December 5th at 10:44 AM local time. It triggered a tsunami warning which was later lifted.

 The quake was felt throughout much of California and Oregon.

Finally, I'll turn on the lights.


-------------------------------

``` python
default_api.turn_on_the_lights()

```

-------------------------------





>>>  function_responses=[FunctionResponse(id='function-call-6019797377698983616', name='turn_on_the_lights', response={'result': 'ok'})]


```tool_outputs
{'message': 'The lights are now on.'}
```


Okay, the lights are now on. I've completed all three

 of your requests: calculated the largest prime palindrome under 100,000 (98689), searched for information on the largest earthquake

 in California the week of Dec 5th 2024, and turned on the lights. Is there anything else I can do for you?


## Next Steps

- For more information about the SDK see the [SDK docs](https://googleapis.github.io/python-genai/)
- This tutorial uses the high level SDK, if you're interested in the lower-level details, try the [Websocket version of this tutorial](../gemini-2/websockets/live_api_tool_use.ipynb)
- This tutorial only covers _basic_ usage of these tools for deeper (and more fun) example see the [Search tool tutorial](./Search_Grounding.ipynb)

Or check the other Gemini 2.0 capabilities from the [Cookbook](../gemini-2/), in particular this other [multi-tool](../examples/LiveAPI_plotting_and_mapping.ipynb) example and the one about Gemini [spatial capabilities](../quickstarts/Spatial_understanding.ipynb).