-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(lru cache): do some simple refactor and add some comments to improve code readability #2043
Conversation
… improve code readibility
Codecov Report
@@ Coverage Diff @@
## main #2043 +/- ##
==========================================
- Coverage 70.99% 70.87% -0.13%
==========================================
Files 625 633 +8
Lines 80503 81485 +982
==========================================
+ Hits 57153 57750 +597
- Misses 23350 23735 +385
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
src/storage/src/hummock/cache.rs
Outdated
h.key = key; | ||
h.value = Some(value); | ||
self.evict_from_lru(charge, last_reference_list); | ||
if self.usage.load(Ordering::Relaxed) + charge > self.capacity && self.strict_capacity_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought whether strict_capacity_limit
is necessary because if a data does not enter the cache, it will still hold by some executor and the next read will cause another remote storage request because of cache miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a trade off between using more memory and staying safe from OOM. For block cache I think it is necessary to have a strict_capacity_limit
to ensure that we don't use over-sized memory for the cache.
Maybe we can borrow the idea of high-low threshold? For insert
we just make sure it won't go over high threshold and when we release a handle, we evict it from cache if the usage has gone over low threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have remove strict capacity limit in #1994 and this pr may cause a lot of conflict....
debug_assert!((*ptr).is_same_key((*h).get_key())); | ||
debug_assert!((*ptr).is_in_cache()); | ||
// The handle to be removed is set not in cache. | ||
(*ptr).set_in_cache(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to set_in_cache
here because it's a hashtable rather than lru-list. We would remove it in remove_cache_handle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think set_in_cache
only reflects whether the handle is in hash table or not, so it's fine to move all set_in_cache
to the hash table so that maintaining the in_cache
flag gets easier.
FYI, I have checked the current code in the main branch, and all set_in_cache(true)
is following a table.insert
and set_in_cache(false)
is following a table.remove
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems OK...
Running leak / address / thread sanitizer upon tests and benches may also help detect some potential bugs 🤣 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In this PR, we mainly finished two works
debug_assert
to ensure the correct usage on some unsafe methods.In current code, in the
insert
ofLruCacheShard
https://github.com/singularity-data/risingwave/blob/2027b1f1686eee2da853af2ba1edc2edcf3f1d3b/src/storage/src/hummock/cache.rs#L328-L336
we use
unwrap
afterself.remove_cache_handle(old)
.However,
remove_cache_handle
will returnNone
when the handle is still referenced externally.https://github.com/singularity-data/risingwave/blob/2027b1f1686eee2da853af2ba1edc2edcf3f1d3b/src/storage/src/hummock/cache.rs#L385-L397
And therefore, if we first lookup a specific key and hold the handle, and then we update the value of this key, we call call
unwrap
onNone
and trigger panic.This bug is fixed and the case is covered in a newly added unit test called
test_update_referenced_key
Checklist
Refer to a related PR or issue link (optional)
None