Skip to content

google-cloud-bigquery: support column-level data policies (dataPolicies) in SchemaField #17617

Description

@shayaansaiyed

Feature request

SchemaField supports policy_tags (Data Catalog column-level security) but has no equivalent for column-level data policies (dataPolicies), the v2 data-masking / raw-data-access policies bound directly to a column. This makes it impossible to read, set, or modify a column's data policies through the ergonomic client — callers must drop down to the raw REST API (tables.get / tables.patch) plus DDL.

Background

The BigQuery REST API exposes dataPolicies on TableFieldSchemaIList<DataPolicyOption>, "Optional. Data policies attached to this field, used for field-level access control" (see the tables resource reference). This is distinct from policyTags. The separate bigquery-datapolicies client manages the policy resources, but not the binding of a policy to a column, which is a table-schema operation.

What's missing in SchemaField

In packages/google-cloud-bigquery/google/cloud/bigquery/schema.py:

  • __init__ accepts policy_tags but has no data_policies parameter.
  • There is a policy_tags property getter but no data_policies getter.
  • _key() does not account for data policies.

from_api_repr stores the whole API dict in _properties, so a pure read round-trip retains dataPolicies as an opaque key — but there is no supported way to read it (no getter) or to set/modify it (no constructor param / setter) without reaching into the private _properties.

Repro

from google.cloud import bigquery  # 3.38.0

f = bigquery.SchemaField("ssn", "STRING")
# No way to attach a data policy:
# bigquery.SchemaField("ssn", "STRING", data_policies=[...])  # unsupported
# f.data_policies  # AttributeError

# Reading an existing field:
table = client.get_table("proj.ds.tbl")   # column has a data policy bound out-of-band
field = table.schema[0]
# field.data_policies -> no such attribute; only field._properties.get("dataPolicies") (private)

Requested

Add data_policies to SchemaField mirroring policy_tags:

  • constructor param + property getter,
  • serialized into _properties["dataPolicies"],
  • included in _key() for equality/hashing,
  • unit coverage in packages/google-cloud-bigquery/tests/unit/test_schema.py.

Related gotcha (worth documenting either way)

When manipulating dataPolicies via tables.patch/tables.update, an empty dataPolicies array is silently ignored (treated as "no change"), so a column's last data policy cannot be removed through the schema API — only DDL (ALTER TABLE ... ALTER COLUMN <col> SET OPTIONS (data_policies=[])) clears it. Non-empty updates (add / reduce to ≥1) do work via patch.

Environment

  • google-cloud-bigquery==3.38.0 (also confirmed against the latest reference docs, which list the same SchemaField params — no data_policies).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions