content-cache: general cleanup, small bug fixes, and test improvement #3645

garlick · 2021-05-09T04:45:40Z

Split from PR #3639 (this should go in first).

garlick · 2021-05-09T14:37:41Z

I don't see a lot of opportunity to increase the diff coverage here since most of it is unlikely error paths.

chu11

overall, LGTM, although one comment below

chu11 · 2021-05-09T16:27:54Z

t/t0012-content-sqlite.t

@@ -14,14 +14,14 @@ RPC=${FLUX_BUILD_DIR}/t/request/rpc

 HASHFUN=`flux getattr content.hash`

+test_expect_success 'load heartbeat module with fast rate to drive purge' '
+	flux module load heartbeat period=1s


skimming the heartbeat module, could we make the period < 1s? 1s seems long in the unit tests (especially w/ the sleep 1 below).

The sync callback rate is forced to be within sync_min=1 and sync_max=10 seconds, so it doesn't help to make the heartbeat < 1s. However, that min value was set to avoid triggering the heavyweight cache purge too frequently, and now with the LRU, the purge doesn't need to scan for eligible items and should exit immediately if there is nothing to do. So maybe we can just eliminate the min value and crank the heartbeat down as you say. I'll go ahead and do that.

garlick · 2021-05-09T19:08:50Z

OK, made that change. DId you want to have another look @chu11? I see you already approved and I'll set MWP if you're satisfied.

chu11 · 2021-05-09T19:09:56Z

@garlick LGTM

garlick · 2021-05-09T19:20:34Z

Thanks!

Problem: the content.dropcache rpc handler walks the entire cache, but it only needs to walk the LRU now that all LRU entries are neither dirty nor invalid. Simplify the content.dropcache RPC handler.

Problem: when a cache entry is used and moved to the front of the LRU, the lastused timestamp is also updated; however, if the entry is already at the front, it is not. Update entry->lastused when entry is already at the front as well.

Problem: if flux_future_aux_set() fails in cache_store(), a future is leaked. Simplify that function so there is only one error path and thereby stop the leak.

Problem: if flux_future_aux_set() fails in cache_load(), a future is leaked. Simplify that function so there is only one error path and thereby stop the leak.

Problem: comment misuses semicolon. Fix semicolon usage.

Problem: Some request handlers pass the request message through flux_request_decode() with NULL arguments, which accomplishes nothing since the message dispatcher will have already verified the message type and topic string. Drop the extra checks.

Problem: cache_load() contains handling for ENOSYS and ENOENT errors but those errors cannot occur until the continuation. Drop dead error handling code.

Problem: t0012-content-sqlite.t has inconsistent indent and use of tabs vs spaces. Convert to single tab indent.

Problem: codecov report shows that test to supposedly exercise the content-cache store batching on rank 0 when a backing store is loaded is not actually exercising that code. Probably the synchronous stores from the test take about as long as the stores from the cache to sqlite, so no backlog is created by the test. Solution: overlap the content store RPCs from the test.

Problem: the content.flush request handler contains dead code and emits useless debug log messages. Drop the dead code and the logs.

Problem: content.dropcache has no test coverage. Drop the cache once in the t0012-content-sqlite sharness test.

Problem: cache entries are purged every heartbeat with the period bounded by sync_min=1 and sync_max=10 seconds. In test we would like to crank down the heartbeat period to make the test run faster but setting it less than sync_min doesn't help. sync_min was established to avoid triggering the heavyweight cache purge too frequently, for example when heartbeat messages "bunch up" in the message queue. Now that purging uses an LRU, it doesn't need to scan for eligible items, and exits immediately if there is nothing to do. Eliminate the sync_min lower bound on the cache purge period.

Problem: test coverage for purging the content-cache in front of a backing store is minimal. Add some tests to t0012-content-sqlite.t that ought to improve coverage.

codecov · 2021-05-09T19:43:14Z

Codecov Report

Merging #3645 (c149c03) into master (642fa1a) will increase coverage by 0.13%.
The diff coverage is 86.20%.

@@            Coverage Diff             @@
##           master    #3645      +/-   ##
==========================================
+ Coverage   82.65%   82.78%   +0.13%     
==========================================
  Files         325      325              
  Lines       49076    49041      -35     
==========================================
+ Hits        40562    40597      +35     
+ Misses       8514     8444      -70

Impacted Files	Coverage Δ
src/broker/content-cache.c	`83.70% <80.00%> (+12.93%)`	⬆️
src/cmd/flux-kvs.c	`81.58% <100.00%> (-0.09%)`	⬇️
src/modules/kvs/kvs.c	`67.78% <100.00%> (+0.07%)`	⬆️
src/broker/module.c	`76.30% <0.00%> (-1.25%)`	⬇️
src/broker/state_machine.c	`78.59% <0.00%> (-0.85%)`	⬇️
src/common/libflux/message.c	`84.04% <0.00%> (+0.12%)`	⬆️
src/modules/job-info/guest_watch.c	`76.61% <0.00%> (+0.61%)`	⬆️
src/broker/overlay.c	`88.86% <0.00%> (+0.75%)`	⬆️
src/cmd/builtin/content.c	`73.87% <0.00%> (+8.10%)`	⬆️

garlick mentioned this pull request May 9, 2021

content-cache: avoid linear search for dirty blobs #3639

Merged

chu11 approved these changes May 9, 2021

View reviewed changes

garlick force-pushed the content_cleanup branch from 8e970b6 to a63d788 Compare May 9, 2021 17:15

garlick added the merge-when-passing label May 9, 2021

garlick added 13 commits May 9, 2021 19:20

content-cache: simplify dropcache

3454017

Problem: the content.dropcache rpc handler walks the entire cache, but it only needs to walk the LRU now that all LRU entries are neither dirty nor invalid. Simplify the content.dropcache RPC handler.

content-cache: fix memory leak on store error path

e7a30d9

Problem: if flux_future_aux_set() fails in cache_store(), a future is leaked. Simplify that function so there is only one error path and thereby stop the leak.

content-cache: fix memory leak on load error path

fc6149e

Problem: if flux_future_aux_set() fails in cache_load(), a future is leaked. Simplify that function so there is only one error path and thereby stop the leak.

content-cache: fix comment punctuation

a5b069e

Problem: comment misuses semicolon. Fix semicolon usage.

content-cache: remove dead code

13960a5

Problem: cache_load() contains handling for ENOSYS and ENOENT errors but those errors cannot occur until the continuation. Drop dead error handling code.

content-sqlite: fix whitespace in sharness test

846df8b

Problem: t0012-content-sqlite.t has inconsistent indent and use of tabs vs spaces. Convert to single tab indent.

content-cache: drop dead code from flush handler

5295581

Problem: the content.flush request handler contains dead code and emits useless debug log messages. Drop the dead code and the logs.

content-cache: add dropcache test

036dba8

Problem: content.dropcache has no test coverage. Drop the cache once in the t0012-content-sqlite sharness test.

content-cache: add coverage for backing store purge

c149c03

Problem: test coverage for purging the content-cache in front of a backing store is minimal. Add some tests to t0012-content-sqlite.t that ought to improve coverage.

SteVwonder force-pushed the content_cleanup branch from a63d788 to c149c03 Compare May 9, 2021 19:21

mergify bot merged commit 560293e into flux-framework:master May 9, 2021

garlick deleted the content_cleanup branch May 9, 2021 21:30

chu11 mentioned this pull request Jun 21, 2022

broker: content.flush does not force flush #4378

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

content-cache: general cleanup, small bug fixes, and test improvement #3645

content-cache: general cleanup, small bug fixes, and test improvement #3645

garlick commented May 9, 2021

garlick commented May 9, 2021

chu11 left a comment

chu11 May 9, 2021

garlick May 9, 2021

garlick commented May 9, 2021

chu11 commented May 9, 2021

garlick commented May 9, 2021

codecov bot commented May 9, 2021

content-cache: general cleanup, small bug fixes, and test improvement #3645

content-cache: general cleanup, small bug fixes, and test improvement #3645

Conversation

garlick commented May 9, 2021

garlick commented May 9, 2021

chu11 left a comment

Choose a reason for hiding this comment

chu11 May 9, 2021

Choose a reason for hiding this comment

garlick May 9, 2021

Choose a reason for hiding this comment

garlick commented May 9, 2021

chu11 commented May 9, 2021

garlick commented May 9, 2021

codecov bot commented May 9, 2021

Codecov Report