feat: add R3/v1 router replay deserialization support#450
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e769ac1e1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f10a6a9 to
6be8056
Compare
| # Build final model base URL with tracing metadata | ||
| final_model_base_url = model_base_url | ||
| if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost")): | ||
| if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost") or "litellm-gateway" in model_base_url): |
There was a problem hiding this comment.
Why do we need the check for tracing.fireworks.ai or litellm-gateway. Which one is it. Are there cases where its one and not the other, and vice versa?
There was a problem hiding this comment.
for dev testing since that is litellm-gateway
Mirrors the gateway-side r3_serializer change: the per-token matrix shape (num_moe_layers, top_k) is no longer required and is no longer written into the r3/v1 binary header. Per-token matrix byte size is recovered as matrix_byte_length / replayed_token_count. - HEADER_FORMAT: "<4sBBBBIIHHIIQ" (36 bytes) -> "<4sBBBBIIIIQ" (32 bytes). - Drop num_moe_layers/top_k from _parse_header() and the metadata dict returned by decompress_and_parse_r3(). - Compute matrix_elem_size from matrix_byte_length / replayed_token_count with a divisibility check that surfaces malformed payloads early. - Update unit tests to use matrix_elem_size as the parameter and drop assertions on the removed header fields; round-trip test no longer passes num_moe_layers/top_k to RouterReplayData. Co-authored-by: Cursor <cursoragent@cursor.com>
ZstdCompressor.compress() (used by the gateway-side r3_serializer) embeds the uncompressed size in the frame header, so passing max_output_size=len(compressed)*20 was both unnecessary and incorrect: highly compressible router-replay payloads (e.g. tokens routing to a small subset of experts) routinely exceed a 20:1 ratio, and would have failed deserialization with ZstdError. Removing the cap lets the library auto-allocate from the embedded content size. Verified locally: a 64 KiB zero-filled matrix payload compresses to ~35 bytes (>1800x ratio) and now deserializes cleanly. Adds a regression test covering the high-compression case. Co-authored-by: Cursor <cursoragent@cursor.com>
_RoutingDtype(int) and _SelectorMode(int) raise ValueError for any value not in the enum, so the .get() fallback was unreachable: a future routing_dtype=3 in the header would crash metadata construction before str(int) could run. Look up names by raw int instead — IntEnum keys hash-equal their int values, so known modes resolve to their lowercase name and unknown ones fall back to str(int) without ever constructing the enum. Adds a regression test exercising routing_dtype=99. Co-authored-by: Cursor <cursoragent@cursor.com>
decompress_and_parse_r3 now derives matrix_elem_size from matrix_byte_length / replayed_token_count, so the dtype's per-element byte width is no longer referenced anywhere. Removing dead code. Co-authored-by: Cursor <cursoragent@cursor.com>
6be8056 to
6639c01
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f1e393d. Configure here.

Summary
r3_deserializermodule that decompresses and unpacks R3/v1 binary router-replay payloads (base64-encoded, zstd-compressed) into per-token routing matrices. Supports ALL, SUFFIX, and BITMAP selector modes with uint8/uint16 dtypes.include_payloadsparameter throughFireworksTracingAdapter.get_evaluation_rows(),RemoteRolloutProcessor,DataLoaderConfig, andupdate_row_with_remote_trace()so callers can opt-in to fetching and extracting router replay data from traces.include_payloads=True,convert_trace_dict_to_evaluation_rowautomatically decompresses anypayloads.router_replay.datablob and attachesrouting_matricesandrouting_metadatatoexecution_metadata.extra.zstandard>=0.19.0as a dependency.Test plan
tests/adapters/test_r3_deserializer.pycovering:r3_serializer(skips if serializer not available)convert_trace_dict_to_evaluation_row(with payload, without, empty data)Made with Cursor
Note
Medium Risk
Adds zstd decompression and binary parsing of trace payloads (potentially large/untrusted data) and threads a new opt-in flag through rollout/tracing code paths, which could impact performance and error handling when enabled.
Overview
Adds opt-in router replay extraction from Fireworks/Langfuse traces. A new
include_payloadsflag is threaded throughFireworksTracingAdapter.get_evaluation_rows(), remote rollout processing, andDataLoaderConfig, so callers can request trace payloads from the gateway.When payloads are present,
convert_trace_dict_to_evaluation_rownow attempts to decompress and deserializepayloads.router_replay.data(R3/v1) and attachesrouting_matricesplusrouting_metadataontoexecution_metadata.extra.Introduces a new
adapters/r3_deserializer.pyimplementing the R3/v1 zstd+base64 binary format (ALL/SUFFIX/BITMAP selectors; uint8/uint16) with comprehensive unit and integration tests, and adds thezstandarddependency.Reviewed by Cursor Bugbot for commit f1e393d. Bugbot is set up for automated code reviews on this repo. Configure here.