Add OneFlow OpenTab enrichment backend support (#5465)#5465
Closed
EddyLXJ wants to merge 4 commits into
Closed
Conversation
Contributor
f6eab5b to
0ff871c
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 11, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
0ff871c to
e37287f
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 11, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 11, 2026
Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
e37287f to
dcaefbc
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 11, 2026
Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
3b93afb to
e8f8fb5
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 12, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
e8f8fb5 to
82247b8
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 12, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
82247b8 to
2fe7742
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 13, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
2fe7742 to
fdded51
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 13, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 13, 2026
Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
d0efc85 to
19fbb8e
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 13, 2026
Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
19fbb8e to
ca8841a
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 13, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
ca8841a to
59f49ee
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 13, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Differential Revision: D95888898
Summary: CONTEXT: cudaStreamAddCallback holds the CUDA driver mutex during callback execution, blocking other threads from making CUDA API calls (e.g., NCCL kernel launches on other streams). This causes latency when multiple CUDA streams operate concurrently. WHAT: Migrate all cudaStreamAddCallback call sites to cudaLaunchHostFunc which does not hold the CUDA driver mutex during execution. - Renamed cuda_callback_func to cuda_host_func with simplified signature (removed stream and status params) - Updated all 7 call sites across kv_db_table_batched_embeddings.cpp and ssd_scratch_pad_indices_queue.cpp - Updated documentation to reflect the new API and its concurrency benefits Differential Revision: D95879076
Summary: CONTEXT: DRAM KV embedding cache currently only stores cached embeddings without enrichment from external sources. This limits the cache's ability to leverage pre-trained embeddings or external knowledge sources like IGR (Identity Graph Resolution) data from Laser, resulting in cold-start issues and lower quality for sparse features. WHAT: This diff introduces a configurable enrichment system that asynchronously fetches embeddings from external sources (e.g., IGR via Laser) to populate the DRAM cache for missing IDs. - Added Python configuration layer: EnrichmentPolicy NamedTuple in split_table_batched_embeddings_ops_common.py with fields for enrichment_type, provider_name, client_id, enrichment_dim, and response_format - Added C++ configuration layer: EnrichmentConfig TorchScript custom class in enrichment_config.h with EnrichmentType enum and EnrichmentResponseFormat enum - Implemented IGR enrichment infrastructure in new igr_enrichment.h: LaserClient initialization, async coroutine-based Laser queries, generic thrift parsing, and tensor preparation utilities - Added enrichment orchestration in DramKVEmbeddingCache: dedicated laser_executor_ thread pool (4 threads), pause/resume mechanism for laser writes to yield to forward/backward passes, rate limiting via pending_laser_requests_ counter - Implemented enrichment_query_id() in training.py: runs on dedicated enrichment_query_stream CUDA stream, performs deduplication on linearized indices, async D2H copy of query IDs - Added zero-weight detection and invalidation logic: detects cache slots with all-zero weights during prefetch/flush, invalidates L1 cache state for zero-weight rows to trigger re-fetch - Updated BUCK dependencies: added laser/client, thrift protocol, and folly/io:iobuf Differential Revision: D95873233
Summary: CONTEXT: The enrichment configuration in EnrichmentPolicy uses raw strings for enrichment_type and response_format, which is error-prone and lacks type safety. Additionally, there is no utility to extract unhashed IDs from KJT features for enrichment queries. WHAT: Strengthen enrichment configuration with Python enums and add a KJT builder utility. - Added EnrichmentType enum (IGR_LASER_EMBEDDING, IGR_LASER_SID) and EnrichmentResponseFormat enum (JSON, THRIFT_FLOAT, THRIFT_INT64) in split_table_batched_embeddings_ops_common.py - Updated EnrichmentPolicy to use enum types instead of strings - Added enrichment_policy field to KVZCHTBEConfig for config propagation - Convert enum values to int when passing to C++ TorchScript layer in training.py - Added build_embedding_cache_write_kjt() in kvzch_utils.py to extract hashed/unhashed feature pairs from KJT and encode unhashed IDs as float32 weights for enrichment queries - Wired enrichment_policy through batched_embedding_kernel.py to KVZCHParams Differential Revision: D95883280
59f49ee to
01e24ce
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 16, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Reviewed By: emlin Differential Revision: D95888898
01e24ce to
fbe59b7
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Mar 16, 2026
Summary: X-link: facebookresearch/FBGEMM#2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Reviewed By: emlin Differential Revision: D95888898
Summary: Pull Request resolved: pytorch#5465 X-link: https://github.com/facebookresearch/FBGEMM/pull/2440 CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store. WHAT: - Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++ - Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation - Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size) - Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration - Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type - Fix typo: "becuase" → "because" in error message Reviewed By: emlin Differential Revision: D95888898
fbe59b7 to
42f3e92
Compare
Contributor
|
This pull request has been merged in ad6f6ce. |
|
This pull request has been reverted by 7a26e26. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2440
CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.
WHAT:
ONEFLOW_OPENTAB_SIDenrichment type enum in both Python and C++oneflow_enrichment.himplementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparationEnrichmentConfigwith OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)EnrichmentPolicy→training.py→ TorchScript registrationdram_kv_embedding_cache.hforONEFLOW_OPENTAB_SIDenrichment typeReviewed By: emlin
Differential Revision: D95888898