fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes by 1fanwang · Pull Request #6395 · feast-dev/feast

1fanwang · 2026-05-12T08:36:04Z

What this PR does

The bug isn't actually Snowflake-specific despite the issue title — it lives in the local compute engine's DAG nodes.

Path:

_get_column_names() returns the reverse-mapped join keys (e.g. ["USERID"]).
pull_latest_from_table_or_query is called with those raw names — correct, the offline store needs them.
LocalSourceReadNode.execute renames the result columns via field_mapping.get(col, col) — the table now has user_id.
LocalDedupNode and LocalJoinNode then look up column_info.join_keys (still ["USERID"]) on a df whose columns are user_id. pandas raises KeyError: Index(['USERID'], dtype='object') — the exact error in the issue.

The existing ColumnInfo class already has mapped properties for timestamp_column and created_timestamp_column (precedent: PR #4886, which fixed the analogous bug for get_historical_features). This PR adds the missing join_keys_columns mapped property and uses it in LocalDedupNode and LocalJoinNode.

How was this tested?

New regression test test_local_dedup_node_with_field_mapping_on_join_key in tests/unit/infra/compute_engines/local/test_nodes.py. Reproduces the exact KeyError(['USERID']) on master; passes with this PR.
All 9 existing tests in test_nodes.py still pass.
All 62 tests under tests/unit/infra/compute_engines/ still pass.
ruff check, ruff format --check, mypy all clean.

Scope

Local-engine only. The Spark, Ray, AWS Lambda, k8s, and Snowflake compute engines have the same column_info.join_keys lookup pattern in their nodes and likely the same latent bug. Each is a separate <50 LoC follow-up — happy to file those if maintainers want them in this PR or as stacked.

When a batch source defines a `field_mapping` that renames an entity join key (e.g. `USERID` -> `user_id`), the source-read node renames the columns on the pulled Arrow table to their mapped names. Downstream `LocalDedupNode` and `LocalJoinNode` then look up the *pre-mapping* names from `column_info.join_keys`, which raises `KeyError: Index(['USERID'])` during materialization (or returns an empty join). Add a `join_keys_columns` property on `ColumnInfo` that mirrors the existing `timestamp_column` / `created_timestamp_column` properties — returning join keys translated through `field_mapping` — and use it from the dedup and join nodes. Fixes feast-dev#5942. Signed-off-by: 1fanwang <1fannnw@gmail.com>

HaoXuAI · 2026-05-12T08:52:05Z

LGTM
The initial work didn't handle the field mapping because I was thinking to unify the field mapping interface. But this never happened.

Signed-off-by: 1fanwang <1fannnw@gmail.com>

…+ join nodes (feast-dev#6395) * fix: Apply field mapping to join keys in local compute engine nodes When a batch source defines a `field_mapping` that renames an entity join key (e.g. `USERID` -> `user_id`), the source-read node renames the columns on the pulled Arrow table to their mapped names. Downstream `LocalDedupNode` and `LocalJoinNode` then look up the *pre-mapping* names from `column_info.join_keys`, which raises `KeyError: Index(['USERID'])` during materialization (or returns an empty join). Add a `join_keys_columns` property on `ColumnInfo` that mirrors the existing `timestamp_column` / `created_timestamp_column` properties — returning join keys translated through `field_mapping` — and use it from the dedup and join nodes. Fixes feast-dev#5942. Signed-off-by: 1fanwang <1fannnw@gmail.com> * test: also cover LocalJoinNode field_mapping case Signed-off-by: 1fanwang <1fannnw@gmail.com> --------- Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com>

@ntkathole

* feat: Add enabled/disabled toggle for feature views Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat: Add demo noteboooks for users Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat: Add CLI enable/disable commands and registry metadata support Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * Added features Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes (#6395) * fix: Apply field mapping to join keys in local compute engine nodes When a batch source defines a `field_mapping` that renames an entity join key (e.g. `USERID` -> `user_id`), the source-read node renames the columns on the pulled Arrow table to their mapped names. Downstream `LocalDedupNode` and `LocalJoinNode` then look up the *pre-mapping* names from `column_info.join_keys`, which raises `KeyError: Index(['USERID'])` during materialization (or returns an empty join). Add a `join_keys_columns` property on `ColumnInfo` that mirrors the existing `timestamp_column` / `created_timestamp_column` properties — returning join keys translated through `field_mapping` — and use it from the dedup and join nodes. Fixes #5942. Signed-off-by: 1fanwang <1fannnw@gmail.com> * test: also cover LocalJoinNode field_mapping case Signed-off-by: 1fanwang <1fannnw@gmail.com> --------- Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat: Add Prometheus gauges for FeatureStore installation telemetry (#6354) Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * docs: Rename Atlas Vector Search to MongoDB Vector Search and fix code examples Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat(dynamodb): Use ProjectionExpression when requested_features is set The requested_features parameter was accepted by online_read and online_read_async but never used -- DynamoDB always fetched all features stored in the values map regardless. Add a ProjectionExpression to BatchGetItem requests when requested_features is provided, reducing data transfer, latency, and read costs. Fixes #6058 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix(dynamodb): Fix mypy type for _build_projection_expression return The return dict contains both str and Dict[str, str] values, so the return type must be Dict[str, Any] not Dict[str, str]. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix(bigquery): Enable list inference for parquet loads in offline_write_batch When pushing features with array/list types (e.g. STRING_LIST) to BigQuery via offline_write_batch, the data arrives as empty arrays because BigQuery's parquet loader does not infer list structure by default. Set parquet_options.enable_list_inference = True on the LoadJobConfig so array columns are written correctly. Fixes #5845 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix(trino): Clean up temporary entity tables after retrieval (#6381) * fix(trino): Clean up temporary entity tables after retrieval TrinoOfflineStore.get_historical_features() creates a temporary table for the entity DataFrame but never drops it, leaking tables indefinitely. Apply the same context manager pattern used by BigQuery, Redshift, and Athena offline stores: wrap the query in a generator that issues DROP TABLE IF EXISTS in a finally block. Fixes #6306 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * fix: sort imports for ruff compliance Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * fix: decouple temp table cleanup from query access Avoid dropping the temporary entity table on to_sql() calls. Previously, every method used a context manager that dropped the table on exit, so calling to_sql() before to_df() would destroy the table and cause subsequent queries to fail. Now the query is stored as a plain string and cleanup is handled by a dedicated _drop_temp_table() method called only after query execution (to_df, to_trino). A __del__ fallback ensures cleanup if execution methods are never called. The _cleaned_up flag makes the drop idempotent. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> --------- Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat(bigquery): Support DATE-type event timestamp columns (#6362) * feat(bigquery): Support DATE-type event timestamp columns When the event_timestamp column in BigQuery is a DATE type, the generated SQL wraps comparison values in TIMESTAMP(), causing a type mismatch error. This adds a timestamp_field_type parameter to BigQuerySource that, when set to "DATE", generates DATE() comparisons instead. Closes #2530 (part 2) Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * fix(bigquery): Use protobuf 4.25.x compatible generated code The proto files were regenerated with protobuf 6.31.1 / grpcio-tools 1.80.0, which imports runtime_version -- a module that does not exist in protobuf 4.25.x used by the project. Revert generated code to 4.25.1 format while keeping the new timestamp_field_type field. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * fix(bigquery): Add Literal type annotation for cast_style Mypy infers str from the ternary expression; annotate with the exact Literal union so the call to get_timestamp_filter_sql passes type checking. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * fix: Make timestamp_field_type default to None in FeatureViewQueryContext Callers that do not use DATE-typed timestamp fields (e.g. Spark offline store tests) should not be forced to pass timestamp_field_type. Adding a default keeps the new field backward-compatible. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * fix: Keep timestamp_field_type required in FeatureViewQueryContext A default value on timestamp_field_type breaks the SparkFeatureViewQueryContext subclass because its non-default fields (min_date_partition, max_date_partition) would follow a field with a default. Instead, keep it required and update the Spark test to pass it. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * fix: regenerate protos matching upstream mypy-protobuf style Reset all non-DataSource generated files to match master. Only DataSource_pb2.py and DataSource_pb2.pyi contain our timestamp_field_type additions (field 28). The .pyi stub is hand-edited to match the existing import style used on master. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> --------- Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Fixes for ray source Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat: Expose registry endpoints on feature server for MCP access Mount the existing REST registry routers under /registry on the feature server so that fastapi_mcp automatically exposes registry introspection (list/get for entities, feature views, data sources, feature services, permissions, projects, saved datasets, lineage, search) as MCP tools. The RegistryServer is created in-process from store.registry — no external registry server is required. Auth is enforced via inject_user_details on every mounted router. Made-with: Cursor Signed-off-by: Chaitany patel <patelchaitany93@gmail.com> Made-with: Cursor Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Revert state propagation to always update in _update_metadata_fields Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Recompile protos for protobuf 4.x compatibility and fix state machine to be opt-in Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat: Add unit tests for state machine and clean up lazy imports in registry Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Address review comments for feature view state management Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Resolve integration test failures in apply loop Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Resolve integration test failures in apply loop Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * Apply suggestion from @ntkathole Co-authored-by: Nikhil Kathole <nikhilkathole2683@gmail.com> Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Resolve review comments for feature_store Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Resolve review comments for feature_views.py Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * feat: Add FeatureStore methods and update describe for enabled/state Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Add type: ignore comments for mypy on BaseFeatureView attr access Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> * fix: Remove REST API endpoints for enable/disable/set-state (deferred to follow-up PR) Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> --------- Signed-off-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com> Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> Signed-off-by: Chaitany patel <patelchaitany93@gmail.com> Co-authored-by: RutujaPathade <73137503+RutujaPathade@users.noreply.github.com> Co-authored-by: ntkathole <nikhilkathole2683@gmail.com> Co-authored-by: Stefan Wang <1fannnw@gmail.com> Co-authored-by: jvincent-mongodb <jeffrey.vincent@mongodb.com> Co-authored-by: Jonathan Wrede <wrede.jonathan00@gmail.com> Co-authored-by: Jwrede <62910358+Jwrede@users.noreply.github.com> Co-authored-by: Chaitany patel <patelchaitany93@gmail.com>

1fanwang requested a review from a team as a code owner May 12, 2026 08:36

HaoXuAI approved these changes May 12, 2026

View reviewed changes

HaoXuAI added the ok-to-test label May 12, 2026

1fanwang changed the title ~~fix(compute-engine/local): honor field_mapping on join keys in dedup + join nodes~~ fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes May 12, 2026

1fanwang added 2 commits May 12, 2026 02:01

test: also cover LocalJoinNode field_mapping case

61e440f

Signed-off-by: 1fanwang <1fannnw@gmail.com>

Merge branch 'master' into fix/materialize-field-mapping-on-join-keys

2262af5

franciscojavierarceo merged commit bd01824 into feast-dev:master May 13, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes#6395

fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes#6395
franciscojavierarceo merged 3 commits into
feast-dev:masterfrom
1fanwang:fix/materialize-field-mapping-on-join-keys

1fanwang commented May 12, 2026

Uh oh!

HaoXuAI commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

1fanwang commented May 12, 2026

Uh oh!

HaoXuAI commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants