Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(lru cache): do some simple refactor and add some comments to improve code readability #2043

Merged
merged 5 commits into from
Apr 24, 2022

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Apr 22, 2022

In this PR, we mainly finished two works

  • Add some comments and do some simple refactor in the lru cache code to improve code readability.
  • Add some debug_assert to ensure the correct usage on some unsafe methods.
  • Fix the following bug.

In current code, in the insert of LruCacheShard
https://github.com/singularity-data/risingwave/blob/2027b1f1686eee2da853af2ba1edc2edcf3f1d3b/src/storage/src/hummock/cache.rs#L328-L336
we use unwrap after self.remove_cache_handle(old).

However, remove_cache_handle will return None when the handle is still referenced externally.
https://github.com/singularity-data/risingwave/blob/2027b1f1686eee2da853af2ba1edc2edcf3f1d3b/src/storage/src/hummock/cache.rs#L385-L397

And therefore, if we first lookup a specific key and hold the handle, and then we update the value of this key, we call call unwrap on None and trigger panic.

This bug is fixed and the case is covered in a newly added unit test called test_update_referenced_key

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

None

@codecov
Copy link

codecov bot commented Apr 22, 2022

Codecov Report

Merging #2043 (f7639e5) into main (f846d9f) will decrease coverage by 0.12%.
The diff coverage is 96.86%.

@@            Coverage Diff             @@
##             main    #2043      +/-   ##
==========================================
- Coverage   70.99%   70.87%   -0.13%     
==========================================
  Files         625      633       +8     
  Lines       80503    81485     +982     
==========================================
+ Hits        57153    57750     +597     
- Misses      23350    23735     +385     
Flag Coverage Δ
rust 70.87% <96.86%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/storage/src/hummock/cache.rs 96.51% <96.86%> (+0.57%) ⬆️
src/source/src/lib.rs 62.96% <0.00%> (-37.04%) ⬇️
src/stream/src/executor_v2/receiver.rs 46.42% <0.00%> (-32.15%) ⬇️
src/frontend/src/handler/mod.rs 39.28% <0.00%> (-20.18%) ⬇️
.../frontend/src/optimizer/plan_node/logical_limit.rs 41.66% <0.00%> (-15.00%) ⬇️
src/meta/src/stream/scheduler.rs 83.41% <0.00%> (-13.80%) ⬇️
src/stream/src/executor_v2/hop_window.rs 66.82% <0.00%> (-13.74%) ⬇️
src/meta/src/stream/stream_manager.rs 60.03% <0.00%> (-12.47%) ⬇️
src/stream/src/executor_v2/v1_compat.rs 30.56% <0.00%> (-6.89%) ⬇️
src/common/src/util/ordered/serde.rs 92.79% <0.00%> (-5.41%) ⬇️
... and 162 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

h.key = key;
h.value = Some(value);
self.evict_from_lru(charge, last_reference_list);
if self.usage.load(Ordering::Relaxed) + charge > self.capacity && self.strict_capacity_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought whether strict_capacity_limit is necessary because if a data does not enter the cache, it will still hold by some executor and the next read will cause another remote storage request because of cache miss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a trade off between using more memory and staying safe from OOM. For block cache I think it is necessary to have a strict_capacity_limit to ensure that we don't use over-sized memory for the cache.

Maybe we can borrow the idea of high-low threshold? For insert we just make sure it won't go over high threshold and when we release a handle, we evict it from cache if the usage has gone over low threshold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have remove strict capacity limit in #1994 and this pr may cause a lot of conflict....

debug_assert!((*ptr).is_same_key((*h).get_key()));
debug_assert!((*ptr).is_in_cache());
// The handle to be removed is set not in cache.
(*ptr).set_in_cache(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to set_in_cache here because it's a hashtable rather than lru-list. We would remove it in remove_cache_handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think set_in_cache only reflects whether the handle is in hash table or not, so it's fine to move all set_in_cache to the hash table so that maintaining the in_cache flag gets easier.

FYI, I have checked the current code in the main branch, and all set_in_cache(true) is following a table.insert and set_in_cache(false) is following a table.remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK...

@skyzh
Copy link
Contributor

skyzh commented Apr 22, 2022

Running leak / address / thread sanitizer upon tests and benches may also help detect some potential bugs 🤣

Copy link
Contributor

@Little-Wallace Little-Wallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenym1 wenym1 merged commit c9b37d6 into main Apr 24, 2022
@wenym1 wenym1 deleted the yiming/lru_cache_doc branch April 24, 2022 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants