Skip to content

[Rhythm] Full RF1 read path#4857

Merged
mapno merged 13 commits intografana:mainfrom
mapno:rhythm-rf1-read-path
Mar 24, 2025
Merged

[Rhythm] Full RF1 read path#4857
mapno merged 13 commits intografana:mainfrom
mapno:rhythm-rf1-read-path

Conversation

@mapno
Copy link
Contributor

@mapno mapno commented Mar 14, 2025

What this PR does:

Updates #4478 to actual state of main. Original text:

These are the changes needed to migrate the backend read path to RF1. New config options rf1_after which is a timestamp, after that only blocks with rf==1 are included in searches and trace by ID lookups. And an option to discontinue flushes to object storage from the ingesters.

How each api is handled:

Search: the frontend determines the blocks:

query_frontend:
  search:
    rf1_after: "2024-12-18T00:00:00Z"

Trace lookup: the querier determines the blocks:

querier:
  trace_by_id:
    rf1_after: "2024-12-18T00:00:00Z"

Tags: this is actually not filtering on replication factor, so no changes needed
Metrics: already limited to rf1 blocks

The migration path:

  • Enable kafka ingest, block-builder, etc. It is now flushing RF1 blocks.
  • Grace period for monitoring and validation
  • Set rf1_after to move the read path over to the new RF1 blocks (recent searches/tags/tracelookups are still directed at ingesters)
  • Grace period for monitoring and validation
  • Disable flushes on ingesters

Note: Global config is non-trivial due to Tempo's modular setup, so this adds redundant config in multiple places. Not ideal.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mapno mapno marked this pull request as ready for review March 17, 2025 14:00
@mapno mapno changed the title WIP: [Rhythm] Full RF1 read path [Rhythm] Full RF1 read path Mar 17, 2025
mapno added 2 commits March 17, 2025 18:01
Introduced a new `RF1After` field in the search request to allow filtering blocks by replication factor and time. Updated backend logic and tests to handle this functionality. Adjusted max search duration to enhance usability and clarity in search tests.

Signed-off-by: Mario <mariorvinas@gmail.com>
Signed-off-by: Mario <mariorvinas@gmail.com>

type TraceByIDConfig struct {
QueryTimeout time.Duration `yaml:"query_timeout"`
RF1After time.Time `yaml:"rf1_after"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking the separate settings for search and trace by id rf1. I don't think there is a case to enable them separately - but I'm remembering now this was done because we don't have a great way to share a config option between frontend and querier modules (frontend has knowledge of blocks during search, and querier has knowledge of blocks for trace by id). Can you think of a better approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now rf1After is passed via query params to the querier.

mapno added 3 commits March 20, 2025 11:00
Added RF1After field to trace query validation logic and related structures. Updated request parsing, test cases, and query configuration to support RF1After. Removed deprecated RF1After config from querier module, ensuring runtime determination.

Signed-off-by: Mario <mariorvinas@gmail.com>
endT := time.Unix(int64(end), 0)
blocks := blockMetasForSearch(s.reader.BlockMetas(tenantID), startT, endT, backend.DefaultReplicationFactor)
blocks := blockMetasForSearch(s.reader.BlockMetas(tenantID), startT, endT, func(m *backend.BlockMeta) bool {
return m.ReplicationFactor == backend.DefaultReplicationFactor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use the same logic here as in the search sharder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we need to. Otherwise these requests will fail when we stop flushing ingester blocks. Great catch!

mapno added 2 commits March 20, 2025 11:49
Signed-off-by: Mario <mariorvinas@gmail.com>

// validate start and end parameter
_, _, _, _, _, reqErr := api.ValidateAndSanitizeRequest(req)
_, _, _, _, _, _, reqErr := api.ValidateAndSanitizeRequest(req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function signature is getting lengthy. Do we want to introduce a TraceByIDRequest object and use it as the output from api.ParseTraceID(req) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought of that as well. What didn't convince me was that it also contains extra params start and end that are not part of TraceByIDRequest and it requires extra changes.

I don't have a strong opinion, but initially favored introducing only the necessary changes.

@mapno mapno merged commit 4e29421 into grafana:main Mar 24, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments