Add OneFlow OpenTab enrichment backend support (#5465) by EddyLXJ · Pull Request #5465 · pytorch/FBGEMM

EddyLXJ · 2026-03-10T02:16:03Z

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:

Add ONEFLOW_OPENTAB_SID enrichment type enum in both Python and C++
Create oneflow_enrichment.h implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
Extend EnrichmentConfig with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
Wire OpenTab parameters through Python EnrichmentPolicy → training.py → TorchScript registration
Add dispatch branch in dram_kv_embedding_cache.h for ONEFLOW_OPENTAB_SID enrichment type
Fix typo: "becuase" → "because" in error message

Reviewed By: emlin

Differential Revision: D95888898

meta-codesync · 2026-03-10T02:16:11Z

@EddyLXJ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95888898.

Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898

Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898

Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898

Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898

Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898

Summary: CONTEXT: cudaStreamAddCallback holds the CUDA driver mutex during callback execution, blocking other threads from making CUDA API calls (e.g., NCCL kernel launches on other streams). This causes latency when multiple CUDA streams operate concurrently. WHAT: Migrate all cudaStreamAddCallback call sites to cudaLaunchHostFunc which does not hold the CUDA driver mutex during execution. - Renamed cuda_callback_func to cuda_host_func with simplified signature (removed stream and status params) - Updated all 7 call sites across kv_db_table_batched_embeddings.cpp and ssd_scratch_pad_indices_queue.cpp - Updated documentation to reflect the new API and its concurrency benefits Differential Revision: D95879076

Summary: CONTEXT: DRAM KV embedding cache currently only stores cached embeddings without enrichment from external sources. This limits the cache's ability to leverage pre-trained embeddings or external knowledge sources like IGR (Identity Graph Resolution) data from Laser, resulting in cold-start issues and lower quality for sparse features. WHAT: This diff introduces a configurable enrichment system that asynchronously fetches embeddings from external sources (e.g., IGR via Laser) to populate the DRAM cache for missing IDs. - Added Python configuration layer: EnrichmentPolicy NamedTuple in split_table_batched_embeddings_ops_common.py with fields for enrichment_type, provider_name, client_id, enrichment_dim, and response_format - Added C++ configuration layer: EnrichmentConfig TorchScript custom class in enrichment_config.h with EnrichmentType enum and EnrichmentResponseFormat enum - Implemented IGR enrichment infrastructure in new igr_enrichment.h: LaserClient initialization, async coroutine-based Laser queries, generic thrift parsing, and tensor preparation utilities - Added enrichment orchestration in DramKVEmbeddingCache: dedicated laser_executor_ thread pool (4 threads), pause/resume mechanism for laser writes to yield to forward/backward passes, rate limiting via pending_laser_requests_ counter - Implemented enrichment_query_id() in training.py: runs on dedicated enrichment_query_stream CUDA stream, performs deduplication on linearized indices, async D2H copy of query IDs - Added zero-weight detection and invalidation logic: detects cache slots with all-zero weights during prefetch/flush, invalidates L1 cache state for zero-weight rows to trigger re-fetch - Updated BUCK dependencies: added laser/client, thrift protocol, and folly/io:iobuf Differential Revision: D95873233

Summary: CONTEXT: The enrichment configuration in EnrichmentPolicy uses raw strings for enrichment_type and response_format, which is error-prone and lacks type safety. Additionally, there is no utility to extract unhashed IDs from KJT features for enrichment queries. WHAT: Strengthen enrichment configuration with Python enums and add a KJT builder utility. - Added EnrichmentType enum (IGR_LASER_EMBEDDING, IGR_LASER_SID) and EnrichmentResponseFormat enum (JSON, THRIFT_FLOAT, THRIFT_INT64) in split_table_batched_embeddings_ops_common.py - Updated EnrichmentPolicy to use enum types instead of strings - Added enrichment_policy field to KVZCHTBEConfig for config propagation - Convert enum values to int when passing to C++ TorchScript layer in training.py - Added build_embedding_cache_write_kjt() in kvzch_utils.py to extract hashed/unhashed feature pairs from KJT and encode unhashed IDs as float32 weights for enrichment queries - Wired enrichment_policy through batched_embedding_kernel.py to KVZCHParams Differential Revision: D95883280

Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Reviewed By: emlin Differential Revision: D95888898

Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Reviewed By: emlin Differential Revision: D95888898

meta-codesync · 2026-03-16T21:42:52Z

This pull request has been merged in ad6f6ce.

facebook-github-tools · 2026-03-17T02:52:27Z

This pull request has been reverted by 7a26e26.

meta-cla Bot added the cla signed label Mar 10, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 10, 2026

EddyLXJ force-pushed the export-D95888898 branch from f6eab5b to 0ff871c Compare March 11, 2026 00:40

EddyLXJ force-pushed the export-D95888898 branch from 0ff871c to e37287f Compare March 11, 2026 00:41

EddyLXJ force-pushed the export-D95888898 branch from e37287f to dcaefbc Compare March 11, 2026 00:44

EddyLXJ force-pushed the export-D95888898 branch 2 times, most recently from 3b93afb to e8f8fb5 Compare March 12, 2026 20:30

EddyLXJ force-pushed the export-D95888898 branch from e8f8fb5 to 82247b8 Compare March 12, 2026 20:31

meta-codesync Bot changed the title ~~Add OneFlow OpenTab enrichment backend support~~ Add OneFlow OpenTab enrichment backend support (#5465) Mar 13, 2026

EddyLXJ force-pushed the export-D95888898 branch from 82247b8 to 2fe7742 Compare March 13, 2026 18:22

EddyLXJ force-pushed the export-D95888898 branch from 2fe7742 to fdded51 Compare March 13, 2026 18:24

EddyLXJ force-pushed the export-D95888898 branch 2 times, most recently from d0efc85 to 19fbb8e Compare March 13, 2026 18:28

EddyLXJ force-pushed the export-D95888898 branch from 19fbb8e to ca8841a Compare March 13, 2026 21:32

EddyLXJ force-pushed the export-D95888898 branch from ca8841a to 59f49ee Compare March 13, 2026 23:54

EddyLXJ added 2 commits March 14, 2026 16:21

EddyLXJ force-pushed the export-D95888898 branch from 59f49ee to 01e24ce Compare March 16, 2026 18:03

EddyLXJ force-pushed the export-D95888898 branch from 01e24ce to fbe59b7 Compare March 16, 2026 18:03

EddyLXJ force-pushed the export-D95888898 branch from fbe59b7 to 42f3e92 Compare March 16, 2026 18:08

meta-codesync Bot closed this in ad6f6ce Mar 16, 2026

facebook-github-tools Bot added the Merged label Mar 16, 2026

facebook-github-tools Bot added the Reverted label Mar 17, 2026

gchalump added category:new contributor:Meta feature:tbessd labels May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OneFlow OpenTab enrichment backend support (#5465)#5465

Add OneFlow OpenTab enrichment backend support (#5465)#5465
EddyLXJ wants to merge 4 commits into
pytorch:mainfrom
EddyLXJ:export-D95888898

EddyLXJ commented Mar 10, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Mar 10, 2026

Uh oh!

meta-codesync Bot commented Mar 16, 2026

Uh oh!

facebook-github-tools Bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EddyLXJ commented Mar 10, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Mar 10, 2026

Uh oh!

meta-codesync Bot commented Mar 16, 2026

Uh oh!

facebook-github-tools Bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EddyLXJ commented Mar 10, 2026 •

edited by meta-codesync Bot

Loading