##### Copyright 2024 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini 2.0 - Multimodal live API: Tool use

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/gemini-2/live_api_tool_use.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides examples of how to use tools with the multimodal live API with [Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2).

The API provides Google Search, Code Execution and Function Calling tools. The earlier Gemini models supported versions of these tools. The biggest change with Gemini 2 (in the Live API) is that, basically, all the tools are handled by Code Execution. With that change, you can use **multiple tools** in a single API call, and the model can use multiple tools in a single code execution block.  

This tutorial assumes you are familiar with the Live API, as described in the [this tutorial](https://github.com/google-gemini/cookbook/blob/main/gemini-2/live_api_starter.ipynb).

## Setup

### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.

More details about this new SDK on the [documentation](https://ai.google.dev/gemini-api/docs/sdks) or in the [Getting started](../gemini-2/get_started.ipynb) notebook.

In [1]:
!pip install -U -q google-genai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/110.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m102.4/110.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.3/110.3 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/168.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.2/168.2 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [2]:
from google.colab import userdata
import os

os.environ['GOOGLE_API_KEY']=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

The client will pickup your API key from the environment variable.
To use the live API you need to set the client version to `v1alpha`.

In [3]:
from google import genai

client = genai.Client(http_options= {
      'api_version': 'v1alpha'
})

### Select a model

Multimodal Live API are a new capability introduced with the [Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) model. It won't work with previous generation models.

In [4]:
model_name = "gemini-2.0-flash-exp"

### Imports

In [5]:
import asyncio
import contextlib
import json
import wave

from IPython import display

from google import genai
from google.genai import types

### Utilities

You're going to use the Live API's audio output, the easiest way hear it in Colab is to write the `PCM` data out as a `WAV` file:

In [6]:
@contextlib.contextmanager
def wave_file(filename, channels=1, rate=24000, sample_width=2):
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        yield wf

Use a logger so it's easier to switch on/off debugging messages.

In [7]:
import logging
logger = logging.getLogger('Live')
#logger.setLevel('DEBUG')  # Switch between "INFO" and "DEBUG" to toggle debug messages.
logger.setLevel('INFO')

## Get started

Most of the Live API setup will be similar to the [starter tutorial](../gemini-2/live_api_starter.ipynb). Since this tutorial doesn't focus on the realtime interactivity of the API, the code has been simplified: This code uses the Live API, but it only sends a single text prompt, and listens for a single turn of replies.

In [8]:
n = 0
async def run(prompt, modality='TEXT', tools=None):
  global n
  if tools is None:
    tools=[]

  config = {
          "tools": tools,
          "generation_config": {
              "response_modalities": [modality]}}

  async with client.aio.live.connect(model=model_name, config=config) as session:
    display.display(display.Markdown(prompt))
    display.display(display.Markdown('-------------------------------'))
    await session.send(prompt, end_of_turn=True)

    audio = False
    filename = f'audio_{n}.wav'
    with wave_file(filename) as wf:
      async for response in session.receive():
        logger.debug(str(response))

        server_content = response.server_content
        if server_content is not None:
          a = handle_server_content(wf, server_content)
          audio = audio or a

        tool_call = response.tool_call
        if tool_call is not None:
          await handle_tool_call(session, tool_call)


  if audio:
    display.display(display.Audio(filename, autoplay=True))
    n = n+1

Since this tutorial demonstrates several tools, you'll need more code to handle the different types of objects it returns.

- The `code_execution` tool can return `executable_code` and `code_execution_result` parts.
- The `google_search` tool may attach a `grounding_metadata` object.

In [11]:
def handle_server_content(wf, server_content):
  audio = False
  model_turn = server_content.model_turn
  if model_turn:
    for part in model_turn.parts:
      text = part.text
      if text is not None:
        display.display(display.Markdown(text))

      inline_data = part.inline_data
      if inline_data is not None:
        print('.', end='')
        pcm_data = inline_data.data
        wf.writeframes(pcm_data)
        audio = True

      executable_code = part.executable_code
      if executable_code is not None:
        display.display(display.Markdown('-------------------------------'))
        display.display(display.Markdown(f'``` python\n{executable_code.code}\n```'))
        display.display(display.Markdown('-------------------------------'))

      code_execution_result = part.code_execution_result
      if code_execution_result is not None:
        display.display(display.Markdown('-------------------------------'))
        display.display(display.Markdown(f'```\n{code_execution_result.output}\n```'))
        display.display(display.Markdown('-------------------------------'))

  grounding_metadata = getattr(server_content, 'grounding_metadata', None)
  if grounding_metadata is not None:
    display.display(
        display.HTML(grounding_metadata.search_entry_point.rendered_content))

  return audio

- Finally, with the `function_declarations` tool, the API may return `tool_call` objects. To keep this code minimal, the `tool_call` handler just replies to every function call with a response of `"ok"`.

In [10]:
async def handle_tool_call(session, tool_call):
  for fc in tool_call.function_calls:
    tool_response = types.LiveClientToolResponse(
        function_responses=[types.FunctionResponse(
            name=fc.name,
            id=fc.id,
            response={'result':'ok'},
        )]
    )

    print('>>> ', tool_response)
    await session.send(tool_response)

Try running it for a first time:

In [None]:
await run(prompt="Hello?", tools=None, modality = "TEXT")

Hello?

-------------------------------

Hello

 there! How can I help you today?


## Simple function call

The function calling feature of the API Can handle a wide variety of functions. Support in the SDK is still under construction. So keep this simple just send a minimal function definition: Just the function's name.

Note that in the live API function calls are independent of the chat turns. The conversation can continue while a function call is being processed.

In [None]:
turn_on_the_lights = {'name': 'turn_on_the_lights'}
turn_off_the_lights = {'name': 'turn_off_the_lights'}

In [None]:
prompt = "Turn on the lights"

tools = [
    {'function_declarations': [turn_on_the_lights, turn_off_the_lights]}
]

await run(prompt, tools=tools, modality = "TEXT")

Turn on the lights

-------------------------------

-------------------------------

``` python
print(default_api.turn_on_the_lights())

```

-------------------------------

>>>  function_responses=[FunctionResponse(id='function-call-9277519295107782492', name='turn_on_the_lights', response={'result': 'ok'})]


-------------------------------

```
{'result': 'ok'}

```

-------------------------------

OK




## Code execution

The `code_execution` lets the model write and run python code. Try it on a math problem the model can't solve from memory:

In [None]:
prompt="What is the largest prime palindrome under 100000."

tools = [
    {'code_execution': {}}
]

await run(prompt, tools=tools, modality='TEXT')

What is the largest prime palindrome under 100000.

-------------------------------

Okay

, I understand. You're asking for the largest prime number that is also

 a palindrome (reads the same forwards and backward) and is less than 1

00,000.

Here's my plan:

1. **Generate Palindromes:** I'll need to create a list of

 palindromic numbers under 100,000. I'll start from the top, and work my way down since I need the *

largest*.
2. **Check for Primality:** I'll then test each of these palindromes for primality.
3. **Return the Largest Prime:** The largest prime palindrome encountered will be the answer.

Let's

 start by generating and checking the numbers using python.



-------------------------------

``` python
def is_palindrome(n):
    return str(n) == str(n)[::-1]

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

largest_prime_palindrome = 0
for i in range(99999, 1, -1):
    if is_palindrome(i):
        if is_prime(i):
          largest_prime_palindrome = i
          break

print(f'{largest_prime_palindrome=}')

```

-------------------------------

-------------------------------

```
largest_prime_palindrome=98689

```

-------------------------------

Okay

, I have found the largest prime palindrome under 100,00

0.

The code generated a list of palindromes by checking every number from

 99999 downwards. It then tested each of the numbers to see if it was prime. The first prime palindrome found going downwards was 9

8689, so that is the largest one.

Therefore, the answer is 98689.


## Compositional Function Calling

Compositional function calling refers to the ability to combine user defined functions with the `code_execution` tool. The model will write them into larger blocks of code, and then pause execution while it waits for you to send back responses for each call.


In [None]:
prompt="Can you turn on the lights wait 10s and then turn them off?"

tools = [
    {'code_execution': {}},
    {'function_declarations': [turn_on_the_lights, turn_off_the_lights]}
]

await run(prompt, tools=tools, modality='TEXT')

Can you turn on the lights wait 10s and then turn them off?

-------------------------------

-------------------------------

``` python
import time
default_api.turn_on_the_lights()
time.sleep(10)
default_api.turn_off_the_lights()

```

-------------------------------

>>>  function_responses=[FunctionResponse(id='function-call-6184362039141552989', name='turn_on_the_lights', response={'result': 'ok'})]
>>>  function_responses=[FunctionResponse(id='function-call-17968678311537051', name='turn_off_the_lights', response={'result': 'ok'})]


## Google search

The `google_search` tool lets the model conduct google searches. For example, try asking it about events that are too recent to be in the training data.

The search still executes in `AUDIO` mode, but you don't see the detailed results. So switch to text mode to see the full output:

In [None]:
prompt="Can you use google search tell me about the largest earthquake in california the week of Dec 5 2024?"

tools = [
   {'google_search': {}}
]

await run(prompt, tools=tools, modality='TEXT')

Can you use google search tell me about the largest earthquake in california the week of Dec 5 2024?

-------------------------------

-------------------------------

``` python
print(google_search.search(queries=["largest earthquake in California week of December 5 2024", "California earthquakes week of December 5 2024"]))

```

-------------------------------




-------------------------------

```
Looking up information on Google Search.

```

-------------------------------

The

 largest earthquake in California during the week of December 5, 202

4, was a magnitude 7.0 that occurred offshore of Cape Mendoc

ino on December 5th, 2024 at 10:44 a.m. PT. The earthquake was located approximately 6

0 miles southwest of Ferndale, California and about 45 miles southwest of Eureka.

Here's a summary of what happened:

*   **

Magnitude:** The earthquake was measured at a magnitude of 7.0, making it a major seismic event.
*   **Location:** It was centered offshore, about 60 miles west of Ferndale in Humboldt County, Northern California

. This region is where three tectonic plates meet, making it one of the most seismically active areas in California.
*   **Tsunami Warning:** A tsunami warning was issued for the coast of Northern California and Southern Oregon following the quake

. The warning extended from Davenport, California, to south of Florence, Oregon and included more than 4.6 million people. People were urged to move to higher ground, and Oregon State Parks closed access to its state park beaches.
*   **Tsunami Cancelled:** Fortunately, the tsunami warning was cancelled within

 a couple hours as no significant waves were reported.
*   **Aftershocks:** There were numerous aftershocks, including several that were magnitude 4.0 and greater. The strongest aftershock was a magnitude 4.7.
*   **Impact:** The earthquake caused some shaking as far south as

 the Bay Area. There were reports of minor damage including broken windows, ruptured water pipes, and items knocked off store shelves.

This earthquake was the strongest in the region since 2005 when a magnitude 7.2 earthquake occurred.


## Multi-tool


The biggest difference with the new API however is that you're no longer limited to using 1-tool per request. Try combining those tasks from the previous sections:

In [None]:
prompt = """\
  Hey, I need you to do three things for me.

  1. Then compute the largest prime plaindrome under 100000.
  2. Then use google search to lookup unformation about the largest earthquake in california the week of Dec 5 2024?
  3. Turn on the lights

  Thanks!
  """

tools = [
    {'google_search': {}},
    {'code_execution': {}},
    {'function_declarations': [turn_on_the_lights, turn_off_the_lights]}
]

await run(prompt, tools=tools, modality="TEXT")

  Hey, I need you to do three things for me.

  1. Then compute the largest prime plaindrome under 100000.
  2. Then use google search to lookup unformation about the largest earthquake in california the week of Dec 5 2024?
  3. Turn on the lights

  Thanks!
  

-------------------------------

Okay

, I will perform those tasks for you. First, let's find the

 largest prime palindrome under 100000.


-------------------------------

``` python
def is_palindrome(n):
  return str(n) == str(n)[::-1]

def is_prime(n):
  if n <= 1:
    return False
  if n <= 3:
    return True
  if n % 2 == 0 or n % 3 == 0:
    return False
  i = 5
  while i * i <= n:
    if n % i == 0 or n % (i + 2) == 0:
      return False
    i += 6
  return True

largest_prime_palindrome = 0
for i in range(100000 - 1, 1, -1):
  if is_palindrome(i) and is_prime(i):
    largest_prime_palindrome = i
    break

print(largest_prime_palindrome)

```

-------------------------------

-------------------------------

```
98689

```

-------------------------------

Okay

, the largest prime palindrome under 100000 is 9

8689.

Next, I will search for the largest earthquake in

 California the week of December 5, 2024.


-------------------------------

``` python
concise_search("largest earthquake California week of December 5 2024", max_num_results=3)

```

-------------------------------

-------------------------------

```
Looking up information on Google Search.

```

-------------------------------

Based

 on the search results, it appears that a magnitude 7.0 earthquake occurred

 off the coast of Cape Mendocino, California on December 5, 

2024. This earthquake triggered a brief tsunami warning for Northern California and Southern Oregon, which was later cancelled. There is also mention of a preliminary magnitude

 6.6 quake in some reports but the 7.0 appears to be more accurate for the largest.

Finally, I will turn on the lights

.


-------------------------------

``` python
default_api.turn_on_the_lights()

```

-------------------------------

>>>  function_responses=[FunctionResponse(id='function-call-16138398244891385862', name='turn_on_the_lights', response={'result': 'ok'})]


## Next Steps

- For more information about the SDK see the [SDK docs](https://googleapis.github.io/python-genai/)
- This tutorial uses the high level SDK, if you're interested in the lower-level details, try the [Websocket version of this tutorial](../gemini-2/websocket/search_tool.ipynb)
- This tutorial only covers _basic_ usage of these tools for deeper (and more fun) example see the [Search tool tutorial](../gemini-2/search_tool.ipynb)

Or check the other Gemini 2.0 capabilities from the [Cookbook](https://github.com/google-gemini/cookbook/blob/main/gemini-2/), in particular this other [multi-tool](../gemini-2/plotting_and_mapping.ipynb) example and the one about Gemini [spatial capabilities](../gemini-2/spatial_understanding.ipynb).