Skip to content

Add OneFlow OpenTab enrichment backend support (#5465)#5465

Closed
EddyLXJ wants to merge 4 commits into
pytorch:mainfrom
EddyLXJ:export-D95888898
Closed

Add OneFlow OpenTab enrichment backend support (#5465)#5465
EddyLXJ wants to merge 4 commits into
pytorch:mainfrom
EddyLXJ:export-D95888898

Conversation

@EddyLXJ
Copy link
Copy Markdown
Contributor

@EddyLXJ EddyLXJ commented Mar 10, 2026

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:

  • Add ONEFLOW_OPENTAB_SID enrichment type enum in both Python and C++
  • Create oneflow_enrichment.h implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
  • Extend EnrichmentConfig with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
  • Wire OpenTab parameters through Python EnrichmentPolicytraining.py → TorchScript registration
  • Add dispatch branch in dram_kv_embedding_cache.h for ONEFLOW_OPENTAB_SID enrichment type
  • Fix typo: "becuase" → "because" in error message

Reviewed By: emlin

Differential Revision: D95888898

@meta-cla meta-cla Bot added the cla signed label Mar 10, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Mar 10, 2026

@EddyLXJ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95888898.

EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 11, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 11, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 11, 2026
Summary:
Pull Request resolved: pytorch#5465

X-link: https://github.com/facebookresearch/FBGEMM/pull/2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 11, 2026
Summary:
Pull Request resolved: pytorch#5465

X-link: https://github.com/facebookresearch/FBGEMM/pull/2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
@EddyLXJ EddyLXJ force-pushed the export-D95888898 branch 2 times, most recently from 3b93afb to e8f8fb5 Compare March 12, 2026 20:30
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 12, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 12, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
@meta-codesync meta-codesync Bot changed the title Add OneFlow OpenTab enrichment backend support Add OneFlow OpenTab enrichment backend support (#5465) Mar 13, 2026
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 13, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 13, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: pytorch#5465

X-link: https://github.com/facebookresearch/FBGEMM/pull/2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
@EddyLXJ EddyLXJ force-pushed the export-D95888898 branch 2 times, most recently from d0efc85 to 19fbb8e Compare March 13, 2026 18:28
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: pytorch#5465

X-link: https://github.com/facebookresearch/FBGEMM/pull/2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 13, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 13, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Differential Revision: D95888898
Summary:
CONTEXT: cudaStreamAddCallback holds the CUDA driver mutex during callback execution, blocking other threads from making CUDA API calls (e.g., NCCL kernel launches on other streams). This causes latency when multiple CUDA streams operate concurrently.

WHAT: Migrate all cudaStreamAddCallback call sites to cudaLaunchHostFunc which does not hold the CUDA driver mutex during execution.
- Renamed cuda_callback_func to cuda_host_func with simplified signature (removed stream and status params)
- Updated all 7 call sites across kv_db_table_batched_embeddings.cpp and ssd_scratch_pad_indices_queue.cpp
- Updated documentation to reflect the new API and its concurrency benefits

Differential Revision: D95879076
EddyLXJ added 2 commits March 14, 2026 16:21
Summary:
CONTEXT: DRAM KV embedding cache currently only stores cached embeddings without enrichment from external sources. This limits the cache's ability to leverage pre-trained embeddings or external knowledge sources like IGR (Identity Graph Resolution) data from Laser, resulting in cold-start issues and lower quality for sparse features.

WHAT: This diff introduces a configurable enrichment system that asynchronously fetches embeddings from external sources (e.g., IGR via Laser) to populate the DRAM cache for missing IDs.
- Added Python configuration layer: EnrichmentPolicy NamedTuple in split_table_batched_embeddings_ops_common.py with fields for enrichment_type, provider_name, client_id, enrichment_dim, and response_format
- Added C++ configuration layer: EnrichmentConfig TorchScript custom class in enrichment_config.h with EnrichmentType enum and EnrichmentResponseFormat enum
- Implemented IGR enrichment infrastructure in new igr_enrichment.h: LaserClient initialization, async coroutine-based Laser queries, generic thrift parsing, and tensor preparation utilities
- Added enrichment orchestration in DramKVEmbeddingCache: dedicated laser_executor_ thread pool (4 threads), pause/resume mechanism for laser writes to yield to forward/backward passes, rate limiting via pending_laser_requests_ counter
- Implemented enrichment_query_id() in training.py: runs on dedicated enrichment_query_stream CUDA stream, performs deduplication on linearized indices, async D2H copy of query IDs
- Added zero-weight detection and invalidation logic: detects cache slots with all-zero weights during prefetch/flush, invalidates L1 cache state for zero-weight rows to trigger re-fetch
- Updated BUCK dependencies: added laser/client, thrift protocol, and folly/io:iobuf

Differential Revision: D95873233
Summary:
CONTEXT: The enrichment configuration in EnrichmentPolicy uses raw strings for enrichment_type and response_format, which is error-prone and lacks type safety. Additionally, there is no utility to extract unhashed IDs from KJT features for enrichment queries.

WHAT: Strengthen enrichment configuration with Python enums and add a KJT builder utility.
- Added EnrichmentType enum (IGR_LASER_EMBEDDING, IGR_LASER_SID) and EnrichmentResponseFormat enum (JSON, THRIFT_FLOAT, THRIFT_INT64) in split_table_batched_embeddings_ops_common.py
- Updated EnrichmentPolicy to use enum types instead of strings
- Added enrichment_policy field to KVZCHTBEConfig for config propagation
- Convert enum values to int when passing to C++ TorchScript layer in training.py
- Added build_embedding_cache_write_kjt() in kvzch_utils.py to extract hashed/unhashed feature pairs from KJT and encode unhashed IDs as float32 weights for enrichment queries
- Wired enrichment_policy through batched_embedding_kernel.py to KVZCHParams

Differential Revision: D95883280
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 16, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Reviewed By: emlin

Differential Revision: D95888898
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Mar 16, 2026
Summary:

X-link: facebookresearch/FBGEMM#2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Reviewed By: emlin

Differential Revision: D95888898
Summary:
Pull Request resolved: pytorch#5465

X-link: https://github.com/facebookresearch/FBGEMM/pull/2440

CONTEXT: The enrichment framework currently only supports IGR/Laser as a data source. We need to support OneFlow's OpenTab/Maple as an additional enrichment backend to fetch side info (e.g., SID payloads) from the Maple distributed KV store.

WHAT:
- Add `ONEFLOW_OPENTAB_SID` enrichment type enum in both Python and C++
- Create `oneflow_enrichment.h` implementing OpenTab/Maple reader initialization, batch fetching with timeout, and int64 payload → tensor preparation
- Extend `EnrichmentConfig` with OpenTab-specific parameters (tier_name, payload_ids, payload_types, column_group_ids, vec_payload_indexes, timeout_ms, batch_size)
- Wire OpenTab parameters through Python `EnrichmentPolicy` → `training.py` → TorchScript registration
- Add dispatch branch in `dram_kv_embedding_cache.h` for `ONEFLOW_OPENTAB_SID` enrichment type
- Fix typo: "becuase" → "because" in error message

Reviewed By: emlin

Differential Revision: D95888898
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Mar 16, 2026

This pull request has been merged in ad6f6ce.

@facebook-github-tools
Copy link
Copy Markdown

This pull request has been reverted by 7a26e26.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants