Skip to content

feat(taskbroker): Dual-write new parameters_bytes#602

Merged
untitaker merged 5 commits intomainfrom
taskbroker-parameter-bytes
Apr 20, 2026
Merged

feat(taskbroker): Dual-write new parameters_bytes#602
untitaker merged 5 commits intomainfrom
taskbroker-parameter-bytes

Conversation

@untitaker
Copy link
Copy Markdown
Member

@untitaker untitaker commented Apr 20, 2026

Introduces the new parameters_bytes field on TaskActivation so tasks can
carry raw bytes via msgpack, and adds the worker-side reader that prefers
parameters_bytes with a fallback to the legacy JSON parameters field.

The producer dual-writes both fields by default so this rolls out in a
single commit regardless of worker/broker upgrade order. This, I think,
is a good default for self-hosted, as that effectively makes workers and
clients forwards and backwards-compatible, and we can delete parameters
one month from now.

In prod, I think this will however bloat activations too much, particularly
if those are already at the brink of being too large for kafka.
So the actual rollout plan there would be:

  1. set TASKBROKER_CLIENT_PARAMETERS_FORMAT=json envvar for all getsentry and launchpad pods
  2. roll out this PR everywhere
  3. set TASKBROKER_CLIENT_PARAMETERS_FORMAT=msgpack a few days after

ref https://linear.app/getsentry/issue/STREAM-882

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

…ameters (JSON)

Introduces the new parameters_bytes field on TaskActivation so tasks can
carry raw bytes via msgpack, and adds the worker-side reader that prefers
parameters_bytes with a fallback to the legacy JSON parameters field.

The producer dual-writes both fields by default so this rolls out in a
single commit regardless of worker/broker upgrade order. The
TASKBROKER_CLIENT_PARAMETERS_FORMAT env var (both|json|msgpack) narrows
this once everything is on the new reader.

First step toward STREAM-882 (taskbroker passthrough mode for arbitrary
Kafka topics): https://linear.app/getsentry/issue/STREAM-882

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented Apr 20, 2026

…bytes

# Conflicts:
#	clients/python/tests/worker/test_worker.py
@untitaker untitaker marked this pull request as ready for review April 20, 2026 12:27
@untitaker untitaker requested a review from a team as a code owner April 20, 2026 12:27
@untitaker untitaker changed the title feat(taskbroker): Dual-write parameters_bytes (msgpack) alongside parameters (JSON) feat(taskbroker): Dual-write new parameters_bytes Apr 20, 2026
Comment thread clients/python/src/taskbroker_client/task.py
untitaker and others added 2 commits April 20, 2026 14:33
- upkeep.rs: move #[allow(deprecated)] from individual rstest-expanded
  tests (where it gets swallowed by the macro expansion) to the test
  module as a whole.
- task.py: black formatting.
- pyproject.toml: tell mypy to ignore missing msgpack stubs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the module-wide allow with three narrow block-scoped allows
exactly where the tests touch the legacy parameters field, so we keep
deprecation warnings active for everything else in the test module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread clients/python/src/taskbroker_client/task.py

should_compress = (
self.compression_type == CompressionType.ZSTD
or (len(msgpack_bytes) or len(json_bytes or b""))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the sum of the bytes in both mode?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense.

"taskname": self.name,
"topic": self._namespace.topic,
},
len(parameters_bytes_val) or len(parameters_str),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The parameter size metric len(parameters_bytes_val) or len(parameters_str) incorrectly reports only the size of parameters_bytes_val in BOTH mode, ignoring parameters_str.
Severity: MEDIUM

Suggested Fix

The metric calculation should be changed to sum the lengths of both fields to accurately reflect the total size. Modify the line to be len(parameters_bytes_val) + len(parameters_str).

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: clients/python/src/taskbroker_client/task.py#L256

Potential issue: When `parameters_format` is `BOTH` (the default), both `msgpack` and
`json` parameter representations are sent. The metric intended to track the size of
these parameters is calculated with `len(parameters_bytes_val) or len(parameters_str)`.
Because `parameters_bytes_val` is non-empty in this mode, Python's `or` operator causes
the expression to always evaluate to `len(parameters_bytes_val)`, completely ignoring
the size of the `parameters_str` field. This results in the metric underreporting the
actual wire size by approximately 50%, which undermines the stated goal of monitoring
for message bloat.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would be more correct, but I think changing this would cause too many artifacts in metrics, for no obvious gain.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, metrics will spike up when the both mode is used. I think what you have is fine as we mostly use this metric to understand the shape production traffic.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f549b2d. Configure here.

self.compression_type == CompressionType.ZSTD
or (len(msgpack_bytes) + len(json_bytes or b""))
> MAX_PARAMETER_BYTES_BEFORE_COMPRESSION
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-compression threshold effectively halved in default mode

Medium Severity

In the default BOTH mode, should_compress checks len(msgpack_bytes) + len(json_bytes or b"") against MAX_PARAMETER_BYTES_BEFORE_COMPRESSION (3 MB). Since both serializations are roughly the same size, the sum is ~2× a single field, so compression now triggers at ~1.5 MB per field instead of the original 3 MB. This changes the auto-compression behavior for every task using the default ParametersFormat.BOTH, adding compression CPU overhead for payloads that previously didn't need it. The threshold likely needs to compare each field individually, or use max() instead of +.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f549b2d. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're trying to bring down the overall size of the activation so I think the new version makes more sense. on that note, I think we should just unconditionally compress everything, I think zstd is probably fast enough.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, compression being enabled earlier is desirable as we want to keep the overall size of activations low.

Copy link
Copy Markdown
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

self.compression_type == CompressionType.ZSTD
or (len(msgpack_bytes) + len(json_bytes or b""))
> MAX_PARAMETER_BYTES_BEFORE_COMPRESSION
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, compression being enabled earlier is desirable as we want to keep the overall size of activations low.

"taskname": self.name,
"topic": self._namespace.topic,
},
len(parameters_bytes_val) or len(parameters_str),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, metrics will spike up when the both mode is used. I think what you have is fine as we mostly use this metric to understand the shape production traffic.

@untitaker untitaker merged commit 6637d7f into main Apr 20, 2026
23 checks passed
@untitaker untitaker deleted the taskbroker-parameter-bytes branch April 20, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants