New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
content-{sqlite,files,s3}: route checkpoint-get and checkpoint-put through broker #4463
content-{sqlite,files,s3}: route checkpoint-get and checkpoint-put through broker #4463
Conversation
2c5741d
to
049da09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the KVS checkpiont functionality is being folded into the content service, does it make sense to prefix the functions with content_
instead of kvs_
and add a flags option to allow the cache to be bypassed like the content functions?
The dump/restore commands have a --no-cache
option for bypassing the broker cache. This would let them trivially follow suit with the checkpoint functions.
I'll stop short of suggesting that libcontent be merged into libkvs since it only contains non-public helpers and the content service now more or less exists only for the kvs, but maybe that's food for thought for a future cleanup.
if (e->dirty) | ||
cache->acct_dirty--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, dead code there.
Note "decrement" is misspelled in the commit description.
src/common/libkvs/kvs_checkpoint.c
Outdated
"kvs-checkpoint.put", | ||
"content-backing.checkpoint-put", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good cleanup. This was initially created as a separate service since it seemed like it ought to be independent of the "content" service, but that's not how it has played out so this change make sense and in fact has been bugging me.
src/broker/content-cache.c
Outdated
|
||
if (flux_respond (cache->h, msg, s) < 0) { | ||
flux_log_error (cache->h, "%s: flux_respond", __FUNCTION__); | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't call flux_respond_error()
to handle a flux_respond()
error (it is likely to fail).
src/broker/content-cache.c
Outdated
/* flux_free_f type */ | ||
static void flux_msg_decref_wrapper (void *arg) | ||
{ | ||
const flux_msg_t *msg = arg; | ||
flux_msg_decref (msg); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cast flux_msg_decref()
to flux_free_f instead?
src/broker/content-cache.c
Outdated
|| flux_future_then (f, -1, checkpoint_get_continuation, cache) < 0) { | ||
flux_log_error (h, "%s: checkpoint backing get", __FUNCTION__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message would be "broker: content_checkpoint_get_request: checkpoint backing get: ...".
Maybe just have it be "error starting checkpoint-get RPC: ..."
Also: this leaks msg
if the rpc succeeds but the aux_set fails.
src/broker/content-cache.c
Outdated
if (flux_respond (cache->h, msg, s) < 0) { | ||
flux_log_error (cache->h, "%s: flux_respond", __FUNCTION__); | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When flux_respond()
fails, don't try to send an error response (it is likely to fail). Logging the unlikely error is sufficient.
src/broker/content-cache.c
Outdated
|
||
error: | ||
if (flux_respond_error (h, msg, errno, NULL) < 0) | ||
flux_log_error (h, "flux_respond_error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "broker: flux_respond_error..." which would be infuriatingly non specific, maybe "error responding to checkpoint-get request" or similar? I realize there's a lot of examples of the former in the code base but I think in this case it's better to be inconsistent and start doing better rather rather than defer fixing these until we can fix all of them.
src/broker/content-cache.c
Outdated
if (!(f = flux_rpc (h, topic, s, 0, 0)) | ||
|| flux_future_aux_set (f, | ||
"msg", | ||
(void *)flux_msg_incref (msg), | ||
flux_msg_decref_wrapper) < 0 | ||
|| flux_future_then (f, -1, checkpoint_put_continuation, cache) < 0) { | ||
flux_log_error (h, "%s: checkpoint backing put", __FUNCTION__); | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as other function: leaks msg on error, tries to send an error response in response to a response error, unnecessarily vague log message if flux_respond_error()
fails.
You're referring to the Hmmm. I like the idea of adding the bypass flag, but wondering if we should keep them prefixed as |
I'm going to actually do this in a separate PR leaving this one isolated to what it currently does (and adding flags would break |
049da09
to
e47c80c
Compare
Problem: Several lines of code were > 80 characters, which violates RFC7. Correct the long lines.
Problem: The test output file names rootref3.{exp,out} is duplicated in two tests. On an error, we can't debug a failure in the first test using these filenames. Solution: Update filenames in tests to be unique.
Problem: In cache_entry_remove(), the `acct_dirty` variable is decremented if the entry is dirty. However, cache_entry_remove() should never be called if a cache entry is dirty. Solution: Remove the decrement of `acct_dirty` if a cache entry is dirty. Instead add an assert() check that dirty is false.
Problem: In the near future we would like the KVS checkpoint to content modules to be routed through the broker, instead of communicating directly to the modules. The topic names of kvs-checkpoint.get and kvs-checkpoint.put should be renamed similarly to other topics in the content modules so readers will know they all behave in similar fashion. Solution: Rename kvs-checkpoint.get and kvs-checkpoint.put to content-backing.checkpoint-get and content-backing.checkpoint-put respectively. Update all callers and tests accordingly. Due to the length of the topic name, adjust some text from "content-backing.checkpoint" to just "checkpoint" when appropriate.
Problem: In the future, we would like to cache checkpoint data in the broker in the event a content-backing module is not loaded. But this requires checkpoint data to be routed through the broker and not sent directly to the content-backing modules. Solution: Add new content.checkpoint-get and content-checkpoint-put handlers in the broker which will route to content-backing.checkpoint-get and content-backing.checkpoint-put if they are available. Update callers to call content.checkpoint-{get,put}. Update t0018-content-files.t tests that had previously assumed no registration with the broker. Update t2807-dump-cmd.t test that will get different error message as a result.
e47c80c
to
ff1da9e
Compare
re-pushed with fixes for comments from above, went ahead and squashed them since they were all relatively minor |
Codecov Report
@@ Coverage Diff @@
## master #4463 +/- ##
==========================================
+ Coverage 83.33% 83.35% +0.01%
==========================================
Files 400 400
Lines 67273 67342 +69
==========================================
+ Hits 56062 56130 +68
- Misses 11211 11212 +1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, should have approved last time! LGTM!
As per-requisite support for issue #4267, we will need to route checkpoints through the broker into the content backing modules.
There is a nice chunk of cleanup to do before this can happen, mostly surrounding topic names. So this is a bit of a pre-requisite PR I'm splitting off from the main PR since it doesn't really do much but touches a lot of lines of code.
Note that in the last commit, there is some code that is "common" and would look like I should add a common function. But they will be modified shortly so I left them separate. I could clean that up if desired.