refactor(lru cache): do some simple refactor and add some comments to improve code readability #2043

wenym1 · 2022-04-22T02:45:15Z

In this PR, we mainly finished two works

Add some comments and do some simple refactor in the lru cache code to improve code readability.
Add some debug_assert to ensure the correct usage on some unsafe methods.
Fix the following bug.

In current code, in the insert of LruCacheShard
https://github.com/singularity-data/risingwave/blob/2027b1f1686eee2da853af2ba1edc2edcf3f1d3b/src/storage/src/hummock/cache.rs#L328-L336
we use unwrap after self.remove_cache_handle(old).

However, remove_cache_handle will return None when the handle is still referenced externally.
https://github.com/singularity-data/risingwave/blob/2027b1f1686eee2da853af2ba1edc2edcf3f1d3b/src/storage/src/hummock/cache.rs#L385-L397

And therefore, if we first lookup a specific key and hold the handle, and then we update the value of this key, we call call unwrap on None and trigger panic.

This bug is fixed and the case is covered in a newly added unit test called test_update_referenced_key

Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

None

… improve code readibility

codecov · 2022-04-22T02:56:17Z

Codecov Report

Merging #2043 (f7639e5) into main (f846d9f) will decrease coverage by 0.12%.
The diff coverage is 96.86%.

@@            Coverage Diff             @@
##             main    #2043      +/-   ##
==========================================
- Coverage   70.99%   70.87%   -0.13%     
==========================================
  Files         625      633       +8     
  Lines       80503    81485     +982     
==========================================
+ Hits        57153    57750     +597     
- Misses      23350    23735     +385

Flag	Coverage Δ
rust	`70.87% <96.86%> (-0.13%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/storage/src/hummock/cache.rs	`96.51% <96.86%> (+0.57%)`	⬆️
src/source/src/lib.rs	`62.96% <0.00%> (-37.04%)`	⬇️
src/stream/src/executor_v2/receiver.rs	`46.42% <0.00%> (-32.15%)`	⬇️
src/frontend/src/handler/mod.rs	`39.28% <0.00%> (-20.18%)`	⬇️
.../frontend/src/optimizer/plan_node/logical_limit.rs	`41.66% <0.00%> (-15.00%)`	⬇️
src/meta/src/stream/scheduler.rs	`83.41% <0.00%> (-13.80%)`	⬇️
src/stream/src/executor_v2/hop_window.rs	`66.82% <0.00%> (-13.74%)`	⬇️
src/meta/src/stream/stream_manager.rs	`60.03% <0.00%> (-12.47%)`	⬇️
src/stream/src/executor_v2/v1_compat.rs	`30.56% <0.00%> (-6.89%)`	⬇️
src/common/src/util/ordered/serde.rs	`92.79% <0.00%> (-5.41%)`	⬇️
... and 162 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

src/storage/src/hummock/cache.rs

Little-Wallace · 2022-04-22T03:50:35Z

src/storage/src/hummock/cache.rs

-            h.key = key;
-            h.value = Some(value);
+        self.evict_from_lru(charge, last_reference_list);
+        if self.usage.load(Ordering::Relaxed) + charge > self.capacity && self.strict_capacity_limit


I just thought whether strict_capacity_limit is necessary because if a data does not enter the cache, it will still hold by some executor and the next read will cause another remote storage request because of cache miss.

I think it's a trade off between using more memory and staying safe from OOM. For block cache I think it is necessary to have a strict_capacity_limit to ensure that we don't use over-sized memory for the cache.

Maybe we can borrow the idea of high-low threshold? For insert we just make sure it won't go over high threshold and when we release a handle, we evict it from cache if the usage has gone over low threshold.

I have remove strict capacity limit in #1994 and this pr may cause a lot of conflict....

Little-Wallace · 2022-04-22T03:54:07Z

src/storage/src/hummock/cache.rs

+            debug_assert!((*ptr).is_same_key((*h).get_key()));
+            debug_assert!((*ptr).is_in_cache());
+            // The handle to be removed is set not in cache.
+            (*ptr).set_in_cache(false);


It's not necessary to set_in_cache here because it's a hashtable rather than lru-list. We would remove it in remove_cache_handle.

I think set_in_cache only reflects whether the handle is in hash table or not, so it's fine to move all set_in_cache to the hash table so that maintaining the in_cache flag gets easier.

FYI, I have checked the current code in the main branch, and all set_in_cache(true) is following a table.insert and set_in_cache(false) is following a table.remove.

seems OK...

skyzh · 2022-04-22T06:22:28Z

Running leak / address / thread sanitizer upon tests and benches may also help detect some potential bugs 🤣

Little-Wallace

LGTM

refactor(lru cache): do some simple refactor and add some comments to…

b758cef

… improve code readibility

wenym1 requested review from hzxa21 and Little-Wallace April 22, 2022 02:45

wenym1 self-assigned this Apr 22, 2022

github-actions bot added the type/refactor label Apr 22, 2022

use unwrap unchecked in unwrap

077704d

Little-Wallace reviewed Apr 22, 2022

View reviewed changes

src/storage/src/hummock/cache.rs Show resolved Hide resolved

Little-Wallace reviewed Apr 22, 2022

View reviewed changes

fix memory leak

cd30560

wenym1 added 2 commits April 22, 2022 15:58

impl Drop for LruCacheShard

e39aa1c

remove strict_capacity_limit

f7639e5

Little-Wallace approved these changes Apr 22, 2022

View reviewed changes

wenym1 merged commit c9b37d6 into main Apr 24, 2022

wenym1 deleted the yiming/lru_cache_doc branch April 24, 2022 02:30

skyzh mentioned this pull request Apr 24, 2022

Empty query result for TPC-H Q10 #2071

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(lru cache): do some simple refactor and add some comments to improve code readability #2043

refactor(lru cache): do some simple refactor and add some comments to improve code readability #2043

wenym1 commented Apr 22, 2022 •

edited

codecov bot commented Apr 22, 2022 •

edited

Little-Wallace Apr 22, 2022

wenym1 Apr 22, 2022

Little-Wallace Apr 22, 2022

Little-Wallace Apr 22, 2022

wenym1 Apr 22, 2022

Little-Wallace Apr 22, 2022

skyzh commented Apr 22, 2022

Little-Wallace left a comment

refactor(lru cache): do some simple refactor and add some comments to improve code readability #2043

refactor(lru cache): do some simple refactor and add some comments to improve code readability #2043

Conversation

wenym1 commented Apr 22, 2022 • edited

Checklist

Refer to a related PR or issue link (optional)

codecov bot commented Apr 22, 2022 • edited

Codecov Report

Little-Wallace Apr 22, 2022

Choose a reason for hiding this comment

wenym1 Apr 22, 2022

Choose a reason for hiding this comment

Little-Wallace Apr 22, 2022

Choose a reason for hiding this comment

Little-Wallace Apr 22, 2022

Choose a reason for hiding this comment

wenym1 Apr 22, 2022

Choose a reason for hiding this comment

Little-Wallace Apr 22, 2022

Choose a reason for hiding this comment

skyzh commented Apr 22, 2022

Little-Wallace left a comment

Choose a reason for hiding this comment

wenym1 commented Apr 22, 2022 •

edited

codecov bot commented Apr 22, 2022 •

edited