RFC: High-rate runtime telemetry and status-write offload path #78
lanycrost
started this conversation in
Feature Requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
High-rate runtime events currently flow directly into Kubernetes status updates while Bobravoz also emits hot-path, high-cardinality metrics. The platform needs an explicit boundary between durable workflow state, operator-facing summaries, and operational telemetry before ingress/runtime workloads scale further.
Why this needs a discussion
This is bigger than a single SDK optimization.
bobrapetalready makesStepRun.statusa bounded summary surface, while the SDK and Bobravoz still write high-rate activity directly into CRD status and default metrics. That is an ecosystem contract problem, not just a local performance bug.Current evidence
bubu-sdk-go/signals.go,effects.go, andk8s/impulse_stats.gopatchStepRunorImpulsestatus on each runtime event.bubu-sdk-go#66already tracks the impulse-trigger-stats slice of this write-amplification problem.bobravoz-grpc/pkg/metrics/hub_metrics.goandconnector_metrics.gouse high-cardinality labels such as StoryRun and binding identifiers on hot metrics.bobrapetalready gives us controller-owned status onStoryTrigger, whileStepRun.status.signals/signalEvents/effectsare explicitly bounded small surfaces instead of a scalable event log.Proposed decisions
StepRun.statusandImpulse.statusare summary surfaces only. They are not the sink for high-rate runtime telemetry.Cross-repo scope
bobrapet: owns CRD/status boundaries plus any future history/outbox resource or controller-facing summary contract.bubu-sdk-go: must batch/offload runtime events instead of patching status on every signal/effect/trigger.bobravoz-grpc: must narrow default metric cardinality and align ingress/runtime telemetry to the same boundary.Desired outcome
A cross-repo decision on runtime telemetry boundaries so follow-up issues can split cleanly across
bubu-sdk-go,bobrapet, and Bobravoz without re-arguing the scaling model in each repo.Beta Was this translation helpful? Give feedback.
All reactions