Skip to content

feat(taskbroker): Support multiple topics#668

Merged
untitaker merged 10 commits into
mainfrom
multi-topic-impl
Jun 3, 2026
Merged

feat(taskbroker): Support multiple topics#668
untitaker merged 10 commits into
mainfrom
multi-topic-impl

Conversation

@untitaker
Copy link
Copy Markdown
Member

@untitaker untitaker commented Jun 3, 2026

Continuing from #663, this is the bare minimum for multi-topic to be useful. We spawn one consumer per topic, and then each consumer has its own pipeline, activationbatcher, but shares an activationstore.

What is deliberately left out of this PR and left for follow-up:

  • Postgres support is entirely unimplemented. We could add a topic column now, but actually due to other reasons (slicing) we're reconsidering whether even partition should even be there. The main reason partition exists is to avoid (row-level?) lock contention when many brokers share an alloydb instance. This, however, makes draining and migration of topics more complicated, so we are considering using another "sharding key" in pg entirely.
  • The batching is still per-topic. An activationbatcher is created per-consumer. This is not what we want in the long-term, because postgres perf hinges on that. but without postgres support, it doesn't make sense to do it in this PR
  • Metrics are not fully fixed yet. I fixed everything but realistically there will be huge churn for our dashboards.

Other things fixed:

  • bugfix: Raw mode was not properly wired up, the legacy fields were used exclusively.
  • bugfix: kafka_retry_topic must be set explicitly when a consumed topic uses raw mode. before it was too easy to accidentally send retries back to the (raw) main topic (and indeed devenv has that problem right now)

This, to me, is the bare minimum that is useful for consolidating existing low-traffic/low-cost pools. I want to use this PR as-is to achieve that and come back to the other points async.

ref STREAM-1042
ref /STREAM-1096

@untitaker untitaker changed the title multi topic impl feat(taskbroker): Support multiple topics Jun 3, 2026
@linear-code
Copy link
Copy Markdown

linear-code Bot commented Jun 3, 2026

STREAM-1042

@untitaker untitaker marked this pull request as ready for review June 3, 2026 11:19
@untitaker untitaker requested a review from a team as a code owner June 3, 2026 11:20
Comment thread src/kafka/deserialize_raw.rs
untitaker and others added 2 commits June 3, 2026 16:34
With one consumer per topic, the consumer/pipeline metrics raced (gauges)
or merged across topics (counters/histograms). Add a `topic` tag to the
rebalance gauges/counters, the activation writer and batcher metrics, and
the deserialize payload-size histograms. Store-level metrics are left
untagged. Also demote sqlite's per-consumer assign_partitions warn to debug.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A topic with raw mode but missing namespace/application/taskname/
processing_deadline_duration passed config validation and only panicked
later when the consumer built its deserializer (via .expect()). Validate
completeness (and that the application is in worker_map, and the retry
topic differs from the raw topic) in normalize_and_validate so it's a
clean config error instead.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/upkeep.rs Outdated
The forwarding producer authenticates against the deadletter cluster and
only overrides bootstrap.servers, so the deadletter cluster is the only
target where its credentials reliably work. The consumer-side batcher and
the upkeep path disagreed on the default forward cluster when
demoted_topic_cluster is unset: the batcher used each topic's own cluster
while upkeep used the deadletter cluster for multi-topic. Since the
demoted topic is a single global topic, default both to the deadletter
cluster. Unchanged for legacy single-topic, where the deadletter cluster
address defaults to the consumed cluster's address.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/kafka/activation_batcher.rs
Follow-up to the forward-cluster fix: the upkeep comment overstated that
legacy behavior is "unchanged" (it only is when kafka_deadletter_cluster
is unset), and the demoted_topic_cluster doc still claimed it defaults to
the consumed cluster. Correct both to say it defaults to the deadletter
cluster.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2724130. Configure here.

};
metrics::counter!(
"consumer.inflight_activation_writer.backpressure",
"topic" => self.config.topic.clone(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared store backpressure race

Medium Severity

With multiple consumable topics, each topic pipeline has its own ActivationWriter on the same store, but flush still decides backpressure from a single count_depths snapshot with no coordination. Two writers can both pass the pending, delay, processing, or DB size checks and insert together, so shared max_* limits can be exceeded under concurrent consumption.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2724130. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the kind of thing i'd like to scope into a followup pr. the current way we manage db connections (proportional to the amount of topics we connect to) is fundamentally leading to bad performance. in my mind the fix is to reuse the activationwriter and -batcher across topics, but this seemed like a more invasive change as it would require me to break them out of the consumer-specific pipeline.

reusing those services across topics would lower the amount of db transactions and also fix this race.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. Also, there isn't a "hard" cap on the DB size, we just want to make sure the DB can't grow unbounded. So this race might let one extra write in, but then things will still backpressure after that.

Copy link
Copy Markdown
Member

@evanh evanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that in the future we should have ActivationBatcher and ActivationWriter be standalone components that are shared across all consumers.

};
metrics::counter!(
"consumer.inflight_activation_writer.backpressure",
"topic" => self.config.topic.clone(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. Also, there isn't a "hard" cap on the DB size, we just want to make sure the DB can't grow unbounded. So this race might let one extra write in, but then things will still backpressure after that.

@untitaker untitaker merged commit c58a161 into main Jun 3, 2026
26 checks passed
@untitaker untitaker deleted the multi-topic-impl branch June 3, 2026 15:53
untitaker added a commit to getsentry/sentry that referenced this pull request Jun 3, 2026
…116758)

See getsentry/taskbroker#667 for details.

Same kafka config migration as the self-hosted change, but for the
`ingest-profiles` taskbroker in devservices. Deprecation warnings land
as per getsentry/taskbroker#663

`ingest-profiles` runs in raw mode, and raw mode now requires an
explicit retry topic: raw payloads aren't activations, so retries can't
loop back into the `profiles` topic. Retries now go to the main
`taskworker` topic so the existing taskbroker picks them up instead of
running another broker.

Depends on getsentry/taskbroker#668, which wires
up per-topic raw mode and enforces the retry-topic requirement. Once
that ships, the current devenv config breaks without this change.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants