-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loading_cache: improve pollution resilience #8674
Comments
@avikivity Please, comment. |
Looks good. Note you can use a single set of links for both lists, since an item can be on one list at a time. |
what about distinguishing to what generation an item belongs? |
@sitano The heuristics is that if a statement has been used more than once this means that it's not a "pollution". The implementation is as follows:
The rest as I've described before. |
There are more complex cache eviction schemes, like the one used by ARC cache. |
…(LFRU) eviction policy This patch implements a simple variation of LFRU eviction policy: * We define 2 dynamic cache partitions which total size should not exceed the maximum cache size. * New cache entry is always added to the "new generation" partition. * After a cache entry is read more than PartitionHitThreshold times it moves to the second cache partition. * Both partitions' entries obey expiration and reload rules as before this patch. * When cache entries need to be evicted due to a size restriction "new generation" partition least recently used entries are evicted first. Fixes scylladb#8674 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
…(LFRU) eviction policy This patch implements a simple variation of LFRU eviction policy: * We define 2 dynamic cache sections which total size should not exceed the maximum cache size. * New cache entry is always added to the "unprivileged" section. * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section. * Both sections' entries obey expiration and reload rules in the same way as before this patch. * When cache entries need to be evicted due to a size restriction "unprivileged" section's least recently used entries are evicted first. Note: With a 2 sections cache it's not enough for a new entry to have the latest timestamp in order not be evicted right after insertion: e.g. if all all other entries are from the privileged section. And obviously we want to allow new cache entries to be added to a cache. Therefore we can no longer first add a new entry and then shrink the cache. Switching the order of these two operations resolves the culprit. Fixes scylladb#8674 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
…(LFRU) eviction policy This patch implements a simple variation of LFRU eviction policy: * We define 2 dynamic cache sections which total size should not exceed the maximum cache size. * New cache entry is always added to the "unprivileged" section. * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section. * Both sections' entries obey expiration and reload rules in the same way as before this patch. * When cache entries need to be evicted due to a size restriction "unprivileged" section's least recently used entries are evicted first. Note: With a 2 sections cache it's not enough for a new entry to have the latest timestamp in order not be evicted right after insertion: e.g. if all all other entries are from the privileged section. And obviously we want to allow new cache entries to be added to a cache. Therefore we can no longer first add a new entry and then shrink the cache. Switching the order of these two operations resolves the culprit. Fixes scylladb#8674 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
…tarov This series introduces a new version of a loading_cache class. The old implementation was susceptible to a "pollution" phenomena when frequently used entry can get evicted by an intensive burst of "used once" entries pushed into the cache. The new version is going to have a privileged and unprivileged cache sections and there's a new loading_cache template parameter - SectionHitThreshold. The new cache algorithm goes as follows: * We define 2 dynamic cache sections which total size should not exceed the maximum cache size. * New cache entry is always added to the "unprivileged" section. * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section. * Both sections' entries obey expiration and reload rules in the same way as before this patch. * When cache entries need to be evicted due to a size restriction "unprivileged" section's least recently used entries are evicted first. More details may be found in #8674. In addition, during a testing another issue was found in the authorized_prepared_statements_cache: #9590. There is a patch that fixes it as well. Closes #9708 * github.com:scylladb/scylla: loading_cache: account unprivileged section evictions loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy authorized_prepared_statements_cache: always "touch" a corresponding cache entry when accessed loading_cache::timestamped::lru_entry: refactoring loading_cache.hh: rearrange the code (no functional change) loading_cache: use std::pmr::polymorphic_allocator
The feature is broken by 3f2224a As a result of a regression if the amount of "old" queries is greater than half of the cache a "pollution" workload is going to evict "old" entries instead of entries in the unprivileged section of the cache. |
It's not broken. It changes the size of the reservation of the new part from 1 item to half the capacity. As a result the old section drops from n-1 to half. It behaves the same way, with smaller reservation - it still resists pollution, just in a different way. The previous reservation size of 1 item for the new part wasn't enough to execute a batch statement, since the members of the batch would compete with each other. |
@avikivity You are confused. 3f2224a, doesn't specify the sizes of the cache sections - it specifies the new condition of the eviction and this is not the same. As a result when the amount of "old" (good) prepared statements is greater than half of the cache the polluting workload will cause a constant eviction from the privileged section (as I described here #8674 (comment)). More than that - this will happen before evicting a possibly evictable entries from the non-privileged section. A proposal for a solution that ACTUALLY does what you described above (and I described last week) can be found here: #10674 (comment) |
@avelanarius please check |
Installation details
HEAD: 8480839
Description
If cache is being polluted by used-once entries it may possibly evict entries that are meant to be cached due to cache size restriction.
This is because we use LRU algorithm based on the creation and last usage timestamp when we decide which items to evict.
For instance, if the used-once entries are being pushed in a burst.
Proposal
Have additional LRU list (by creation time) that would hold used-once entries.
When we need to evict by size - evict items from that new list first and only when it's empty fall back to the current algorithm.
The text was updated successfully, but these errors were encountered: