Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup startup time for TSI #9147

Merged
merged 6 commits into from Nov 22, 2017
Merged

Speedup startup time for TSI #9147

merged 6 commits into from Nov 22, 2017

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Nov 22, 2017

This fixes the startup time issue with TSI indexes. Startup time is slow because the field types for each measurement need to be reloaded at startup. These types are not part of the TSI index and the only way to find them is to scan all the TSM file indexes. Where cardinality is high or there are a lot of TSM files on disk, this can be slow.

This PR allows the field types to be persisted to a shard local fields.idx file that is updated when new fields are added/removed from a measurement. If the file does not exists, we fallback to the old method and scan all the TSM index. If the file exists, we just load this smaller file and skip scanning the index.

This file only affects TSI startup times as we still need to re-populate the index with inmem which requires scanning all the TSM indexes.

Local benchmarking of 32m series (32M values total) improved from about 1m6s to 2s. Other cardinalities started up in about 2s as well. This is just with a single shard.

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

@jwilder jwilder added this to the 1.5.0 milestone Nov 22, 2017
@ghost ghost assigned jwilder Nov 22, 2017
@ghost ghost added the review label Nov 22, 2017
Copy link
Contributor

@benbjohnson benbjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the simplest approach right now. 👍

// can skip scanning all the TSM files.
if e.index.Type() != inmem.IndexName && !e.fieldset.IsEmpty() {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and and/and/

Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

A general question though—looking through this PR and the previous work on the field set stuff, it seemed to me like a more natural location for NewMeasurementFieldSet and the MeasurementFields would be under tsdb/index. The profobuf stuff would live under tsdb/index/internal.

Shard wouldn't access the MeasurementFields directly, instead it could just call engine. CreateFieldIfNotExists.

Just thought anyway.

repeated Tag Tags = 2;
}

message Tag {
required string Key = 1;
required string Value = 2;
string Key = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to making these no longer required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These protobufs were dead code. I removed them since I was re-purposing the code.

tsdb/shard.go Outdated
@@ -63,6 +66,13 @@ var (
// ErrShardDisabled is returned when a the shard is not available for
// queries or writes.
ErrShardDisabled = errors.New("shard is disabled")

// ErrUnknownFieldsFormat is returned when the fields index file is not identifiable by
// the files magic number.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/files/file's

@jwilder
Copy link
Contributor Author

jwilder commented Nov 22, 2017

@e-dard Yeah, I held off of moving stuff around while @benbjohnson is still making changes to the index. I agree, moving them to index makes sense.

@jwilder jwilder merged commit efe582e into master Nov 22, 2017
@ghost ghost removed the review label Nov 22, 2017
@jwilder jwilder deleted the jw-field-set branch November 22, 2017 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants