Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/cache: remove iteration limit and reset on internal inconsistency #1274

Merged
merged 1 commit into from
Jun 25, 2019
Merged

store/cache: remove iteration limit and reset on internal inconsistency #1274

merged 1 commit into from
Jun 25, 2019

Conversation

abursavich
Copy link
Contributor

@abursavich abursavich commented Jun 24, 2019

Changes

This affects adding new items to the cache:

  1. It removes the max iterations restriction. See Store: Hundreds of errors in logs #1257 (comment) for reasoning.
  2. If the item is small enough to fit but doesn't fit even after the cache has been emptied, the cache is reset to recover from an internal accounting error (this should never occur).

Verification

Updated tests.

Notes

Please let me know if you'd like a CHANGELOG entry.

Fixes #1257

@bwplotka bwplotka requested a review from GiedriusS June 24, 2019 20:55
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I remember I wanted to just leave the _, _, ok := c.lru.RemoveOldest() initially, but I found the reason why it is not enough. However I double checked and don't see this reason anymore... maybe it's too late ^^ Will double check tomorrow.

Also CI is not happy exactly in this point (Looks like endless loop), https://circleci.com/gh/improbable-eng/thanos/3967?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Exactly in TestIndexCache_AvoidsDeadlock (:

@abursavich
Copy link
Contributor Author

abursavich commented Jun 25, 2019

That test replaced the LRU and changed its logic such that nothing could be evicted. I changed it such that it only breaks the current size accounting. However, I wouldn't normally suggest writing tests that reach into internal state and break invariants. That code should be unreachable.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's try it out. Indeed the test is weird, but we got this case a lot so let's keep it until we are certain it does not happen anymore.

One thing though. If you potentially had 500 ops for small items to remove - can we actually optimize cache for those? E.g Reducing ensureFit from O(n) to O(logN) in large N cases. Still 500 is easy for CPU, but we don't really know how many you have in total. Just some thoughts (:

@bwplotka bwplotka merged commit 2431318 into thanos-io:master Jun 25, 2019
@bwplotka
Copy link
Member

.. and we fotgot about CHANGELOG. Something to remember to add during release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store: Hundreds of errors in logs
2 participants