feat: flush blocks to backend storage from livestore in monolithic mode#6941
feat: flush blocks to backend storage from livestore in monolithic mode#6941javiermolinar merged 25 commits intomainfrom
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates Tempo’s single-binary (monolithic) deployment path so that livestore can flush locally completed blocks directly to the configured storage backend, eliminating the need for Kafka ingest and the block-builder in that mode.
Changes:
- Introduces a
completeBlockLifecycleabstraction with a local-mode implementation that flushes completed blocks asynchronously via a providedWriteBlockflusher. - Updates single-binary wiring so the Distributor pushes spans in-process (no Kafka) and LiveStore receives a storage-backed flusher.
- Updates integration harness/config overlays and examples to remove Kafka/block-builder dependencies in single-binary scenarios, and adds an integration test for flush-to-backend.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/livestore/complete_block_lifecycle.go | Adds lifecycle abstraction and local-mode flush queue/worker implementation with metrics. |
| modules/livestore/live_store.go | Extends constructor to accept a flusher and wires lifecycle into LiveStore/instances. |
| modules/livestore/live_store_background.go | Hooks lifecycle start/stop and invokes lifecycle on completion/reload paths. |
| modules/livestore/instance.go | Returns completed block from completeBlock, adds lookup helper, and uses lifecycle for deletion decisions. |
| modules/livestore/*_test.go | Updates tests for new APIs and adds focused lifecycle unit tests. |
| cmd/tempo/app/modules.go | Single-binary wiring: disable Kafka push, pass store as flusher, adjust module DAG (remove BlockBuilder). |
| cmd/tempo/app/modules_test.go | Adds local tempodb-backed store setup for single-binary LiveStore init tests. |
| modules/distributor/distributor.go | Makes push path exclusive: Kafka or local consumers, not both. |
| modules/distributor/distributor_test.go | Removes test that depended on “Kafka then local” dual-push behavior. |
| integration/util/harness.go | Avoids starting Kafka/block-builder components in single-binary mode. |
| integration/util/harness_config_overlay.go | Treats null overlay values as deletions during config merge. |
| integration/util/config-single-binary.yaml | Removes ingest and block_builder sections via null overlay. |
| integration/operations/single_binary_flush_test.go | Adds integration coverage for ingest + flush-to-backend in single-binary mode. |
| integration/operations/config-single-binary-flush.yaml | Test overlay to speed up flush cadence. |
| example/docker-compose/** | Removes Kafka/Redpanda wiring from single-binary-focused compose examples/configs. |
| @@ -151,8 +173,6 @@ func (s *LiveStore) processCompleteOp(op *completeOp) error { | |||
| _ = level.Error(s.logger).Log("msg", "failed to requeue block for flushing", "tenant", op.tenantID, "block", op.blockID, "err", err) | |||
There was a problem hiding this comment.
The error log inside retryCompleteOp says "failed to requeue block for flushing", but this path is requeuing a completion op (and now may also retry lifecycle application). Could we update the log message to reflect the operation being requeued? This will make on-call debugging clearer now that there is a separate local flush lifecycle.
| _ = level.Error(s.logger).Log("msg", "failed to requeue block for flushing", "tenant", op.tenantID, "block", op.blockID, "err", err) | |
| _ = level.Error(s.logger).Log("msg", "failed to requeue block completion", "tenant", op.tenantID, "block", op.blockID, "err", err) |
What this PR does:
It makes the livestore to flush blocks to the storage in monolithic mode
This is the last step for a kafkaless single binary.
With this PR the single binary no longer require kafka or the blockbuilders.
One of the requirements was to encapsulate all the flushing logic to not add more complexity to the livestore. To do so I have created the completeBlockLifeCycle. This struct holds its own queue and logic. There are three insertion points in the live_store_background but are minimal.
Example of the new metrics. Notice the use of local in the metric name to state that this mode is only possible for the single binary
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]