In [1]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Struct Data Types

In BigQuery, a [STRUCT](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigQuery DataFrames maps BigQuery `STRUCT` types to the pandas equivalent, `pandas.ArrowDtype(pa.struct())`. 

This notebook illustrates how to work with `STRUCT` columns in BigQuery DataFrames. First, let's import the required packages and perform the necessary setup below.

In [2]:
import bigframes.pandas as bpd
import bigframes.bigquery as bbq
import pandas as pd
import pyarrow as pa

In [3]:
REGION = "US"  # @param {type: "string"}

bpd.options.display.progress_bar = None
bpd.options.bigquery.location = REGION

## Create DataFrames with struct columns

**Example 1: Creating from a list of objects**

In [4]:
names = ["Alice", "Bob", "Charlie"]
addresses = [
    {'City': 'New York', 'State': 'NY'},
    {'City': 'San Francisco', 'State': 'CA'},
    {'City': 'Seattle', 'State': 'WA'}
]
df = bpd.DataFrame({'Name': names, 'Address': addresses})
df

Unnamed: 0,Name,Address
0,Alice,"{'City': 'New York', 'State': 'NY'}"
1,Bob,"{'City': 'San Francisco', 'State': 'CA'}"
2,Charlie,"{'City': 'Seattle', 'State': 'WA'}"


In [5]:
df.dtypes

Name                                    string[pyarrow]
Address    struct<City: string, State: string>[pyarrow]
dtype: object

**Example 2: Defining schema explicitly**

In [6]:
bpd.Series(
    data=addresses, 
    dtype=bpd.ArrowDtype(pa.struct([('City', pa.string()), ('State', pa.string())]))
)

0         {'City': 'New York', 'State': 'NY'}
1    {'City': 'San Francisco', 'State': 'CA'}
2          {'City': 'Seattle', 'State': 'WA'}
dtype: struct<City: string, State: string>[pyarrow]

**Example 3: Reading from a source**

In [7]:
bpd.read_gbq("bigquery-public-data.ml_datasets.credit_card_default", max_results=5)["predicted_default_payment_next_month"]

0    [{'tables': {'score': 0.9349926710128784, 'val...
1    [{'tables': {'score': 0.9690881371498108, 'val...
2    [{'tables': {'score': 0.8667634129524231, 'val...
3    [{'tables': {'score': 0.9351968765258789, 'val...
4    [{'tables': {'score': 0.8572560548782349, 'val...
Name: predicted_default_payment_next_month, dtype: list<item: struct<tables: struct<score: double, value: string>>>[pyarrow]

## Operate on `STRUCT` data

BigQuery DataFrames provides two main approaches for operating on `STRUCT` data:

1. **[The `Series.struct` accessor](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor)**: Provides Pandas-like methods for STRUCT column manipulation.
2. **The `DataFrame.struct` accessor**: Provides Pandas-like methods for all child STRUCT columns manipulation.
3. **[BigQuery built-in functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions)**: Allows you to use functions mirroring BigQuery SQL operations, available through the `bigframes.bigquery` module (abbreviated as `bbq` below), such as [`struct`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_struct).

### View Data Types of Struct Fields

In [8]:
df['Address'].struct.dtypes()

City     string[pyarrow]
State    string[pyarrow]
dtype: object

### Access a Struct Field by Name

In [9]:
df['Address'].struct.field("City")

0         New York
1    San Francisco
2          Seattle
Name: City, dtype: string

### Extract Struct Fields into a DataFrame

**Example 1: Using Series `.struct` accessor**

In [10]:
df['Address'].struct.explode()

Unnamed: 0,City,State
0,New York,NY
1,San Francisco,CA
2,Seattle,WA


**Example 2: Using DataFrame `.struct` accessor while keeping other columns**

In [11]:
df.struct.explode("Address")

Unnamed: 0,Name,Address.City,Address.State
0,Alice,New York,NY
1,Bob,San Francisco,CA
2,Charlie,Seattle,WA
