New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup startup time for TSI #9147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the simplest approach right now. 👍
tsdb/engine/tsm1/engine.go
Outdated
// can skip scanning all the TSM files. | ||
if e.index.Type() != inmem.IndexName && !e.fieldset.IsEmpty() { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/and and/and/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
A general question though—looking through this PR and the previous work on the field set stuff, it seemed to me like a more natural location for NewMeasurementFieldSet
and the MeasurementFields
would be under tsdb/index
. The profobuf stuff would live under tsdb/index/internal
.
Shard
wouldn't access the MeasurementFields
directly, instead it could just call engine. CreateFieldIfNotExists
.
Just thought anyway.
repeated Tag Tags = 2; | ||
} | ||
|
||
message Tag { | ||
required string Key = 1; | ||
required string Value = 2; | ||
string Key = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an advantage to making these no longer required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These protobufs were dead code. I removed them since I was re-purposing the code.
tsdb/shard.go
Outdated
@@ -63,6 +66,13 @@ var ( | |||
// ErrShardDisabled is returned when a the shard is not available for | |||
// queries or writes. | |||
ErrShardDisabled = errors.New("shard is disabled") | |||
|
|||
// ErrUnknownFieldsFormat is returned when the fields index file is not identifiable by | |||
// the files magic number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/files/file's
@e-dard Yeah, I held off of moving stuff around while @benbjohnson is still making changes to the index. I agree, moving them to |
This persists the field types in a shard to avoid having to scan all the TSM files at startup.
It's not used anymore.
e65bf76
to
279f82a
Compare
This fixes the startup time issue with TSI indexes. Startup time is slow because the field types for each measurement need to be reloaded at startup. These types are not part of the TSI index and the only way to find them is to scan all the TSM file indexes. Where cardinality is high or there are a lot of TSM files on disk, this can be slow.
This PR allows the field types to be persisted to a shard local
fields.idx
file that is updated when new fields are added/removed from a measurement. If the file does not exists, we fallback to the old method and scan all the TSM index. If the file exists, we just load this smaller file and skip scanning the index.This file only affects TSI startup times as we still need to re-populate the index with
inmem
which requires scanning all the TSM indexes.Local benchmarking of 32m series (32M values total) improved from about
1m6s
to2s
. Other cardinalities started up in about2s
as well. This is just with a single shard.Required for all non-trivial PRs