Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/cache: fix broken metric and current index cache size handling #873

Merged
merged 5 commits into from Mar 4, 2019

Conversation

Projects
None yet
3 participants
@GiedriusS
Copy link
Collaborator

commented Feb 27, 2019

Potpourri of fixes to the cache:

  • Do not forget to increase the c.current metric with relevant labels when
    adding items to the cache. Without this the metric thanos_store_index_cache_items is unavailable.

  • Properly adjust c.curSize when adding items to the cache.

  • Switch the order of operands in a check to avoid a potential uint64 overflow and thus not removing enough items to satisfy our size constraints.

Without these last two fixes essentially the metric thanos_store_index_cache_items_evicted_total is always zero together with thanos_store_index_cache_items_overflowed_total (most of the time). This is because we never enter the loop, and we never increase c.curSize before this.

Verification:

  • New smoke tests pass which check if curSize is properly adjusted.

@GiedriusS GiedriusS changed the title store/cache: do not forget to increase c.current on adding new items store/cache: fix broken metric and current index cache size handling Feb 28, 2019

store/cache: prevent uint64 overflow by switching operands
Adding uint64(len(b)) to c.curSize might potentially overflow uint64 if
the numbers are big enough and then we might not remove enough items
from the LRU to satisfy the request. On the other hand, switching the
operands avoids this problem because we check before if uint64(len(b))
is bigger than c.maxSize so subtracting uint64(len(b)) will *never*
overflow because we know that it is less or equal to c.maxSize.
for c.curSize+uint64(len(b)) > c.maxSize {
c.lru.RemoveOldest()
for c.curSize > c.maxSize-uint64(len(b)) {
if _, val, ok := c.lru.RemoveOldest(); ok {

This comment has been minimized.

Copy link
@adrien-f

adrien-f Feb 28, 2019

Contributor

Should we handle something if that does not ok ?

This comment has been minimized.

Copy link
@GiedriusS

GiedriusS Feb 28, 2019

Author Collaborator

How we could inform the users about such case? AFAICT there is no way ok can be false due to the check before-hand.

@bwplotka
Copy link
Collaborator

left a comment

Hm . Isn't currSize updated in "onEvict"?

@GiedriusS

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 1, 2019

Ah yes, that's true but it is never increased. Will revert fbaae7c.

GiedriusS added some commits Mar 1, 2019

store/cache: revert ensureFits() changes
c.curSize is lowered in onEvict.
store/cache: add smoke tests
Add smoke tests for the index cache which check if we set curSize
properly, and if removal works.
@bwplotka
Copy link
Collaborator

left a comment

LGTM, thanks!

@bwplotka

This comment has been minimized.

Copy link
Collaborator

commented Mar 4, 2019

2 flakes on tests, but now is green: https://circleci.com/gh/improbable-eng/thanos/2353

@bwplotka bwplotka merged commit b3d7d15 into improbable-eng:master Mar 4, 2019

1 check passed

ci/circleci: test Your tests passed on CircleCI!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.