Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki: Allow configuring query_store_max_look_back_period when running a filesystem store and boltdb-shipper #2073

Merged
merged 7 commits into from
May 14, 2020

Conversation

slim-bean
Copy link
Collaborator

This should allow horizontal scaling of a filesystem store on separate nodes as the ingesters can query the store

Signed-off-by: Ed Welch edward.welch@grafana.com

…esystem store, which allows for boltdb-shipper to horizontally scale filesystem store.

Signed-off-by: Ed Welch <edward.welch@grafana.com>
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the code looks good to me. Just some minor nits and questions.

docs/operations/storage/boltdb-shipper.md Outdated Show resolved Hide resolved

Scaling up is as easy as adding more loki instances and letting them talk to the same ring.

Scaling down is possible but manual, you would need to shutdown the loki instance and then physically copy the chunks directory and its index files in their entirety to another Loki instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other instance would also have index files with same name so it would not be just copying of files. I guess we would have to provide a tool to merge boltdb files. We will have to make it clear in the docs until we have that tool.
What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each ingester should be writing files with a unique name such that you should be able to copy ones file to another without conflicts

uploader := fmt.Sprintf("%s-%d", s.cfg.IngesterName, time.Now().Unix())

However, I think I may have found a bug here, this is what I see for my current test ingesters, it looks like the lifecycler ID was empty when they created their files

ed@ed-VirtualBox:/tmp$ ls loki1/chunks/index/index_2628
-1589422114
ed@ed-VirtualBox:/tmp$ ls loki2/chunks/index/index_2628
-1589422113

These still wouldn't collide because the timestamps were different but they should also have the unique lifecycler ID in front

My suspicion here is they created the files before joining the ring? Can you take a look at this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at how this works, there is a real chicken and egg problem here when initializing the shipper and trying to get the ingester ID to use in the name file.

I'm not sure it's possible to do this, but maybe you have some ideas?

I'm wondering instead if we should try a different strategy and use the hostname or allow for the name to be configured in the yaml directly?

@@ -84,6 +84,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.Float64Var(&cfg.SyncMinUtilization, "ingester.sync-min-utilization", 0, "Minimum utilization of chunk when doing synchronization.")
f.IntVar(&cfg.MaxReturnedErrors, "ingester.max-ignored-stream-errors", 10, "Maximum number of ignored stream errors to return. 0 to return all errors.")
f.DurationVar(&cfg.MaxChunkAge, "ingester.max-chunk-age", time.Hour, "Maximum chunk age before flushing.")
f.DurationVar(&cfg.QueryStoreMaxLookBackPeriod, "ingester.query-store-max-look-back-period", 0, "How far back should an ingester be allowed to query the store for data, for use only with boltdb-shipper index and filesystem object store. -1 for infinite.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the requirement should be boltdb(not boltdb_shipper) as index store and filesystem as object store because there is no use of having boltdb_shipper without shared object store otherwise you would be copying files somewhere on the same filesystem which is essentially same as not using boltdb_shipper at all. I might be wrong here or might be missing your point. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my above assumption is true, that the file names are unique, I specified boltdb_shipper so that you could copy the files between ingesters to allow scaling down, otherwise what you said is true that there isn't much reason to use boltdb_shipper for this rather than boltdb.

With the exception of one other reason I could think of, it would be easier to take backups or even move the Loki to another host because the index files can be directly copied with the chunks directory while Loki is still running, I don't think this is true for boltdb I don't think you can safely copy an open boltdb file.

@@ -64,7 +64,7 @@ type Config struct {
ingesterClientFactory func(cfg client.Config, addr string) (client.HealthAndIngesterClient, error)

QueryStore bool `yaml:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can maybe get rid of this because now we mostly rely on QueryStoreMaxLookBackPeriod to enable/disable querying the store from ingesters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems correct what @sandeepsukhani is saying ?

slim-bean and others added 2 commits May 14, 2020 07:07
Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
… just to make sure there are not collisions if for some reason the ID is empty (which can happen if the hostname doesn't resolve for some reason)

Signed-off-by: Ed Welch <edward.welch@grafana.com>
…abled.

Signed-off-by: Ed Welch <edward.welch@grafana.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels May 14, 2020
…ore so they don't duplicate query efforts

Signed-off-by: Ed Welch <edward.welch@grafana.com>
Signed-off-by: Ed Welch <edward.welch@grafana.com>
return cfg.Configs[i]
}

func calculateMaxLookBack(pc chunk.PeriodConfig, maxLookBackConfig, maxChunkAge time.Duration) (time.Duration, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That deserve a test and it seems simple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, yeah, i thought i put in the description or somewhere I wanted to test this but hadn't done it yet

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think you should remove QueryStore and add more tests.

But otherwise LGTM

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't finished review yet (dont wait for me), but thanks for the doc.

directory: /tmp/loki/
```

A folder is created for every tenant all the chunks for one tenant are stored in that directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A folder is created for every tenant all the chunks for one tenant are stored in that directory.
A folder is created for every tenant and all the chunks for one tenant are stored in that directory.


A folder is created for every tenant all the chunks for one tenant are stored in that directory.

If loki is run in single-tenant mode, all the chunks are put in a folder named `fake` which is the synthesized tenant name used for single tenant mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If loki is run in single-tenant mode, all the chunks are put in a folder named `fake` which is the synthesized tenant name used for single tenant mode.
If loki is run in single-tenant mode, all the chunks are put in a folder named `fake` which is the placeholder tenant name used for single tenant mode.

Signed-off-by: Ed Welch <edward.welch@grafana.com>
@slim-bean
Copy link
Collaborator Author

I will do a follow up to remove QueryStore as well as refactor the Select method to be able to test the logic within. If I don't do this in 30 days please fire me from the project

@slim-bean slim-bean merged commit 9803eab into master May 14, 2020
@slim-bean slim-bean deleted the boltdb-shipper-separate-filesystem branch May 14, 2020 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants