Skip to content

fix(bigquery): Prefer query over table in get_table_query_string#6360

Merged
ntkathole merged 2 commits intofeast-dev:masterfrom
Jwrede:fix/bigquery-source-query-priority
May 3, 2026
Merged

fix(bigquery): Prefer query over table in get_table_query_string#6360
ntkathole merged 2 commits intofeast-dev:masterfrom
Jwrede:fix/bigquery-source-query-priority

Conversation

@Jwrede
Copy link
Copy Markdown
Contributor

@Jwrede Jwrede commented May 2, 2026

Summary

Fixes #6200

When both table and query are set on a BigQuerySource, get_table_query_string() silently ignores query and always returns the table reference. This makes it impossible to use a custom read query (e.g., for deduplication via QUALIFY) on a PushSource, since PushSource requires table for offline writes.

Root cause: get_table_query_string() checks if self.table first — since table is always truthy when set, query is never reached.

Fix: Invert the priority — prefer query when present (it's more specific and intentionally provided), fall back to table. The write path (offline_write_batch()) accesses .table directly and is unaffected.

Also applies the same fix to get_table_column_names_and_types() so schema inference uses the query when both are set, matching the actual read path.

Changes

  • bigquery_source.py: Swap condition order in get_table_query_string() and get_table_column_names_and_types() to prefer query over table
  • test_bigquery.py: Add 4 unit tests covering table-only, query-only, both-set, and write-path-unaffected scenarios

Test plan

  • All 11 existing + new unit tests pass (pytest sdk/python/tests/unit/infra/offline_stores/test_bigquery.py)
  • Integration test with BigQuery PushSource (requires GCP credentials)

@Jwrede Jwrede requested review from a team and sudohainguyen as code owners May 2, 2026 22:40
@ntkathole ntkathole changed the title fix(bigquery): prefer query over table in get_table_query_string fix(bigquery): Prefer query over table in get_table_query_string May 3, 2026
@Jwrede Jwrede force-pushed the fix/bigquery-source-query-priority branch from fc290f7 to af72355 Compare May 3, 2026 05:33
@Jwrede
Copy link
Copy Markdown
Contributor Author

Jwrede commented May 3, 2026

The failing test (test_online_write_batch_async_skip_dedup_single_pipeline in test_redis.py) is unrelated to this PR — it's an event-loop issue introduced in 2e50da0 that affects macOS and Python 3.12. My changes only touch bigquery_source.py and test_bigquery.py.

The 3.10 and 3.11 Ubuntu runs pass cleanly.

@ntkathole
Copy link
Copy Markdown
Member

@Jwrede init from bigquery_source.py says Exactly one of 'table' and 'query' must be specified. ?

@Jwrede
Copy link
Copy Markdown
Contributor Author

Jwrede commented May 3, 2026

Good catch, the docstring says "Exactly one of 'table' and 'query' must be specified" but the actual validation (line 67) only enforces at least one:

if table is None and query is None:
    raise ValueError('No "table" or "query" argument provided.')

It has never rejected both being reads. I'll push an update to fix the docstring to match reality.

Jwrede added 2 commits May 3, 2026 06:03
When both `table` and `query` are set on a BigQuerySource,
`get_table_query_string()` now returns the query (wrapped in parens)
instead of the table reference. This allows PushSource users to
provide a custom read query (e.g. for deduplication) while keeping
`table` for offline writes via `offline_write_batch()`.

Also applies the same priority inversion to
`get_table_column_names_and_types()` so schema inference matches the
actual read path.

Closes feast-dev#6200

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
The validation only enforces at least one of table/query, not exactly
one. Update the docstring to document the supported behavior when both
are set.

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
@Jwrede Jwrede force-pushed the fix/bigquery-source-query-priority branch from 39f2047 to 7db33e0 Compare May 3, 2026 06:03
@ntkathole ntkathole merged commit 77ed779 into feast-dev:master May 3, 2026
23 of 27 checks passed
franciscojavierarceo pushed a commit that referenced this pull request May 4, 2026
# [0.63.0](v0.62.0...v0.63.0) (2026-05-04)

### Bug Fixes

* Add project filter to apply_data_source and delete_data_source (closes [#6206](#6206)) ([#6322](#6322)) ([96562c4](96562c4))
* Add project_id filter to SnowflakeRegistry UPDATE path ([#6243](#6243)) ([6658b71](6658b71)), closes [#6208](#6208) [#6208](#6208)
* Add subprocess timeouts to prevent test_e2e_local hanging on Dask atexit handler ([3de6556](3de6556))
* Ambiguous truth value of array during materialization ([#6259](#6259)) ([d0c8984](d0c8984))
* Auto-detect GCS/S3 registry store when registry is passed as string ([#6260](#6260)) ([7ebcf03](7ebcf03))
* **bigquery:** Prefer query over table in get_table_query_string ([#6360](#6360)) ([77ed779](77ed779)), closes [#6200](#6200)
* correct project_id scoping in get_user_metadata and delete_project ([0c469a7](0c469a7))
* disable Redis RDB persistence in test deployments ([44cd682](44cd682))
* Disable snowflake tests temporarily in CI ([#6356](#6356)) ([31d5a98](31d5a98))
* Filter empty SQL commands at execute_snowflake_statement call sites ([#6249](#6249)) ([92ffbb9](92ffbb9))
* Fix five bugs in milvus online store ([#6275](#6275)) ([212504b](212504b))
* Fix issue with apply feature view ([835cda8](835cda8))
* Fix streaming materialization for exotic sources with lazy UDF pipelines ([c07972d](c07972d))
* Handle missing features gracefully instead of panicking ([7d00b3a](7d00b3a))
* Harden informer cache with label selectors and memory optimizations ([#6242](#6242)) ([3f11356](3f11356))
* **helm:** Avoid nil pointer for metrics.enabled inside podAnnotations ([#6251](#6251)) ([c833f1a](c833f1a))
* Include git in feast server image ([fb03c46](fb03c46))
* Include StreamFeatureView in freshness metric ([#6269](#6269)) ([463f16c](463f16c))
* Pre-create S3A event log dir before SparkContext init ([#6317](#6317)) ([9feca77](9feca77))
* Remote Online Store Type Inference Error with All-NULL Columns ([#6063](#6063)) ([de67bdd](de67bdd))
* Remove selector with kustomize overlay using a JSON 6902 patch ([9107a43](9107a43))
* Resolve multiple bugs in SnowflakeRegistry and Snowflake connection handling ([#6315](#6315)) ([7e66a2e](7e66a2e))
* **spark:** BatchFeatureView with TransformationMode.PYTHON now reads all source columns ([a310eaf](a310eaf))
* **spark:** Use SELECT * when feature_name_columns is empty in pull_all_from_table_or_query ([e1b1d2d](e1b1d2d))
* Support pandas mode in feature builder and fix dask column extraction ([863315e](863315e))
* support SQL string as entity_df in RemoteOfflineStore.get_historical_features ([c559889](c559889))
* Wrap LocalOutputNode return value in ArrowTableValue for consist… ([#6286](#6286)) ([a16cd55](a16cd55))

### Features

* Add agent skills and Cursor/Claude rules for Feast development ([312eea3](312eea3))
* Add feature view versioning support to FAISS online store ([b36acb7](b36acb7))
* Add feature view versioning support to Redis and DynamoDB online stores ([#6257](#6257)) ([edf25af](edf25af)), closes [#6164](#6164) [#6163](#6163)
* Add optional 'org' in feature view ([#6288](#6288)) ([#6301](#6301)) ([608b105](608b105))
* Add RaySource, to_ray_dataset first-class method, docs, and tests ([1c98157](1c98157))
* Add TLS support for Go Feature Server ([#6229](#6229)) ([28a58d0](28a58d0))
* Add Vector Search support to MongoDBOnlineStore ([#6344](#6344)) ([c102738](c102738))
* Add versioning support to Milvus online store ([#6330](#6330)) ([3268ced](3268ced))
* Addresses performance issues in the Redis online store ([2e50da0](2e50da0))
* Allow to set gpu for ray ([5580ab4](5580ab4))
* Bump redis-py version cap from <5 to <8 ([#6339](#6339)) ([9538180](9538180))
* Expose feature_server, materialization, and openlineage configuration via FeatureStore CRD ([ec6ecfd](ec6ecfd))
* Make online_write_batch_size configurable in MaterializationConfig ([#6268](#6268)) ([d41becf](d41becf))
* Make udf optional if agg defined ([#5689](#5689)) ([#6328](#6328)) ([f630056](f630056))
* MongoDB offline store ([#6138](#6138)) ([8eebad7](8eebad7))
* Optional input_schema for ODFV ([#6308](#6308)) ([#6312](#6312)) ([f08b4e8](f08b4e8))
* Provision minimal TokenReview RBAC for OIDC auth and add SSL error logging in token parser ([#6240](#6240)) ([dca57e8](dca57e8))
* **spark:** Add compute-on-read support for BatchFeatureView in get_… ([#6357](#6357)) ([630d9f8](630d9f8))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BigQuerySource.get_table_query_string() silently ignores query when table is also set

2 participants