## Audio Transcription with Partial Ordering Mode in BigQuery DataFrames

This notebook demonstrates how to use the `audio_transcribe` function from BigQuery DataFrames' LLM module in partial ordering mode. Partial ordering mode can improve performance by allowing BigQuery to process data in a more optimized, non-sequential manner.

### 1. Setup

In [1]:
PROJECT = "bigframes-dev" # replace with your project. 
# Refer to https://cloud.google.com/bigquery/docs/multimodal-data-dataframes-tutorial#required_roles for your required permissions

OUTPUT_BUCKET = "bigframes_blob_test" # replace with your GCS bucket. 
# The connection (or bigframes-default-connection of the project) must have read/write permission to the bucket. 
# Refer to https://cloud.google.com/bigquery/docs/multimodal-data-dataframes-tutorial#grant-permissions for setting up connection service account permissions.
# In this Notebook it uses bigframes-default-connection by default. You can also bring in your own connections in each method.

import bigframes
# Setup project
bigframes.options.bigquery.project = PROJECT

# Display options
bigframes.options.display.blob_display_width = 300
bigframes.options.display.progress_bar = None

import bigframes.pandas as bpd

### 2. Set to Partial Ordering Mode

In [2]:
bpd.options.compute.ordering_mode = "partial"
print(f"Ordering mode set to: {bpd.options.compute.ordering_mode}")

Ordering mode set to: partial


### 3. Prepare Audio Data

We will use a publicly available audio file stored in a Google Cloud Storage bucket. We use `from_glob_path` to correctly load the audio file as a blob object.

In [3]:
audio_gcs_path = "gs://bigframes_blob_test/audio/*"
df = bpd.from_glob_path(
    audio_gcs_path, name="audio"
)

  _global_session = bigframes.session.connect(
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.


### 4. Run Audio Transcription

Now, we'll use the `audio_transcribe` function with the `gemini-2.0-flash-001` model to transcribe the audio file. This operation will be executed in partial ordering mode as configured.

In [4]:
transcribed_series = df['audio'].blob.audio_transcribe(model_name="gemini-2.0-flash-001", verbose=False)
transcribed_series

instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.


0    Now, as all books, not primarily intended as p...
Name: transcribed_content, dtype: string