<a href="https://colab.research.google.com/github/jimmyliao/BwAI2025/blob/main/03_Get_started_OpenAI_Compatibility.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2025 Google LLC.

In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 透過 OpenAI 相容性的方式呼叫 Gemini API
<!-- ### Getting started with the Gemini API OpenAI compatibility -->

### 參考來源 [Get_started_OpenAI_Compatibility.ipynb](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Get_started_OpenAI_Compatibility.ipynb)

### 修改: [Jimmy Liao](https://g.dev/jimmyliao)

<a target="_blank" href="https://colab.research.google.com/github/jimmyliao/bwAI2025//blob/main/03_Get_started_OpenAI_Compatibility.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/>

這個範例說明如何使用 [OpenAI Python library](https://github.com/openai/openai-python) 與 [Gemini API](https://ai.google.dev/gemini-api/docs) 互動。

此 notebook 將帶著一起完成以下步驟：

* 嘗試透過多模態將中文語音檔轉譯成文字並總結。

關於 OpenAI 相容性的更多詳細資訊，請參考 [Gemini API 相關文件](https://ai.google.dev/gemini-api/docs/openai)。


## 設定

### Install the required modules

While running this notebook, you will need to install the following requirements:
- The [OpenAI python library](https://pypi.org/project/openai/)

In [2]:
%pip install -U -q openai pillow pdf2image pdfminer.six

## Get your Gemini API key

You will need your Gemini API key to perform the activities part of this notebook. You can generate a new one at the [Get API key](https://aistudio.google.com/app/apikey) AI Studio page.

In [3]:
from openai import OpenAI


try:
  # if you are running the notebook on Google Colab
  # and if you have saved your API key in the
  # Colab secrets
  from google.colab import userdata

  GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
except:
  # enter manually your API key here if you are not using Google Colab
  GOOGLE_API_KEY="--enter-your-API-key-here--"

# OpenAI client
client = OpenAI(
    api_key=GOOGLE_API_KEY,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

## Define the Gemini model to be used

You can start by listing the available models using the OpenAI library.

In [4]:
models = client.models.list()
for model in models:
  if 'gemini-2.0' in model.id or 'gemini-2.5' in model.id:
    print(model.id)


models/gemini-2.5-pro-exp-03-25
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp
models/gemini-2.0-pro-exp-02-05
models/gemini-2.0-flash-thinking-exp-01-21
models/gemini-2.0-flash-thinking-exp
models/gemini-2.0-flash-thinking-exp-1219


## Define the Gemini model to be used

In this example, you will use the `gemini-2.0-flash` model. For more details about the available models, check the [Gemini models](https://ai.google.dev/gemini-api/docs/models/gemini) page from the Gemini API documentation.

In [5]:
MODEL="gemini-2.5-pro-exp-03-25" # @param ['gemini-2.5-pro-exp-03-25']

## Multimodal interactions

Gemini models are able to process different data modatilities, such as unstructured files, images, audio and videos, allowing you to experiment with multimodal scenarios where you can ask the model to describe, explain, get insights or extract information out of those multimedia information included into your prompts. In this section you will work across different senarios with multimedia information.

**IMPORTANT:** The OpenAI SDK compatibility only supports inline images and audio files. For videos support, use the [Gemini API's Python SDK](https://ai.google.dev/gemini-api/docs/sdks).

### Working with audio files

You can also send audio files on your prompt. Audio data provides a more rich input than text alone, and can be use for tasks like transcription, or as direct prompting like a voice assistant.

First you need to download the audio you want to use.

In [6]:
!pip install -q gdown

In [7]:
# from IPython.display import Audio


# audio_url = "https://storage.googleapis.com/generativeai-downloads/data/Apollo-11_Day-01-Highlights-10s.mp3" # @param
# audio_filename = audio_url.split("/")[-1]

# # download the audio
# !wget -q $audio_url

# # listen to the downloaded audio
# display(Audio(audio_filename, autoplay=False))

#### 你可以將語音檔放置在 Google Shared Drive
##### - 或是直接使用此範例中文語音檔
###### 來源: 行政院廣播公共服務音檔下載區，[114年4月廣播公益廣告月包](https://www.ey.gov.tw/Page/463789EEBA7377FC/a7b8daef-7cfc-4d03-88ca-a71e70f0d8d0) `性別平等教育日華語`


In [8]:
from IPython.display import Audio
import gdown


# audio_url = "https://storage.googleapis.com/generativeai-downloads/data/Apollo-11_Day-01-Highlights-10s.mp3" # @param
# download the audio
# !wget -q $audio_url

google_drive_url = "https://drive.google.com/file/d/1KHMjyFCmtqcs1oYThKra1S5AKuuhg9aW/view?usp=sharing" # @param

output_file = "test.mp3" # @param

# first use gdown.download, need to wait
audio_filename = gdown.download(google_drive_url, output=output_file, fuzzy=True)

# !ls -al

audio_url = "test.mp3"

audio_filename = audio_url.split("/")[-1]


# # listen to the downloaded audio
display(Audio(audio_filename, autoplay=False))

Downloading...
From: https://drive.google.com/uc?id=1KHMjyFCmtqcs1oYThKra1S5AKuuhg9aW
To: /content/test.mp3
100%|██████████| 1.20M/1.20M [00:00<00:00, 35.5MB/s]


Now you will encode the audio in `base64` and send it as part of your request prompt.

In [9]:
from IPython.display import Markdown
import base64

# define a helper function to encode the images in base64 format
def encode_audio(audio_path):
  with open(audio_path, 'rb') as audio_file:
    audio_content = audio_file.read()
    return base64.b64encode(audio_content).decode('utf-8')

base64_audio = encode_audio(audio_filename)

prompt = "轉錄這個音訊檔案。轉錄完成後，告訴我這可能與什麼相關。" # @param
response = client.chat.completions.create(
    model=MODEL,
    messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": prompt,
        },
        {
              "type": "input_audio",
              "input_audio": {
                "data": base64_audio,
                "format": "mp3"
          }
        }
      ],
    }
  ],
)

Markdown(response.choices[0].message.content)

好的，這是音訊檔案的轉錄內容：

```
00:00 [男]: 你還記得去年
00:01 [男]: 4月20號性別平等教育日那天，
00:04 [男]: 我們有去看展覽嗎？
00:05 [女]: 記得啊，
00:06 [女]: 今年又有展覽嗎？
00:07 [男]: 有，
00:08 [男]: 今年的展覽是「元宇宙下的性別光影展：
00:11 [男]: 穿越虛實之境，反家啟程」。
00:14 [男]: 4月19號到5月11號，
00:15 [男]: 在國立台灣科學教育館七樓喔。
00:18 [女]: 哇，聽起來很吸引人欸。
00:19 [男]: 沒錯，
00:20 [男]: 有光影互動展區、
00:21 [男]: 手作活動、
00:22 [男]: 專題演講等等，
00:23 [男]: 帶領大家探討性別議題。
00:25 [女]: 那我們趕快揪朋友去看！
00:27 [旁白]: 以上廣告是由教育部提供。
```

**這段音訊可能與以下內容相關：**

這是一則由**台灣教育部**贊助的**廣告**，目的是**宣傳一個名為「元宇宙下的性別光影展：穿越虛實之境，反家啟程」的展覽活動**。

*   **主題：** 展覽聚焦於**性別議題**，並結合了**元宇宙**（Metaverse）這個新興科技概念。
*   **內容：** 展覽包含光影互動展區、手作活動、專題演講等。
*   **時間與地點：** 展期為 4月19日至5月11日，地點在國立台灣科學教育館七樓。
*   **背景：** 對話中提到了去年的「性別平等教育日」（通常是4月20日），暗示這個展覽可能與此紀念日或相關的教育推廣活動有關。
*   **目的：** 旨在吸引公眾（特別是可能對科技、社會議題、教育感興趣的人）前往參觀，並藉此探討性別議題。