Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etc: support content.backing-module=none #4492

Merged

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Aug 13, 2022

Per discussion in #4267, always loading the backing module content-sqlite can lead to performance issues, especially for shorter lived single user instances. Default to not loading a backing module.

  • to enable this, cache checkpoint put/get in the broker even when a backing module does not exist. This allows a number of checkpoint operations to still work even if a backing module isn't loaded.
  • update rc1/rc3 to not load the checkpoint module by default
    • update systemd service file to default configure loading content-sqlite
    • update tests that require content-sqlite to specifically configure for it

one possible remaining gotcha is that if the user doesn't set a backing module, they are still allowed to load one and that becomes the backing module. Dunno if we'd like to support a special backing module config of "none" (or equivalent word) to say "do not allow a backing module to be loaded"?

@garlick
Copy link
Member

garlick commented Aug 13, 2022

In #4267

I'm not sure about making this is the default, since there is unexplained job throughput degradation in this mode. However it is not currently possible to even select this mode. Let's allow this issue to be closed once setting content.backing-module=none works as you'd expect.

Could we cut this PR there and save the discussion about making it the default for another time?

@chu11
Copy link
Member Author

chu11 commented Aug 13, 2022

Could we cut this PR there and save the discussion about making it the default for another time?

ahhh, I missed that. I'll tweak the PR. Should be simpler as a result as many testsuite updates don't need to happen now.

@garlick
Copy link
Member

garlick commented Aug 13, 2022

One other thought is would it make sense to split the broker checkpoint stuff off to another source file? content-cache.c is already long and complex and this seems to be somewhat disjoint.

@chu11 chu11 force-pushed the issue4267_content_backing_module_none branch from ee8fae5 to 7944d68 Compare August 16, 2022 04:35
@chu11
Copy link
Member Author

chu11 commented Aug 16, 2022

re-pushed

  • split out content checkpoint code into another file
    • the "context' that the content-checkpoint API creates is actually stored within struct content_cache and not struct broker. I
      didn't initially intend for this, but content-cache needs to call content-checkpoint at one point, so I felt it better to do it this way.
  • rc scripts do not load none by default now

@grondo
Copy link
Contributor

grondo commented Aug 16, 2022

rc scripts do not load none by default now

You are probably already on it, but just in case: don't forget to update PR description if it no longer describes the overall change.

@chu11 chu11 changed the title broker: do not load content backing module by default etc: support content.backing-module=none Aug 16, 2022
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I'm noting that while I can shut down a "none" instance with flux shutdown --dump=foo.tgz, I can't restart from that file:

$ flux start -o,-Scontent.backing-module=none,-Scontent.restore=foo.tgz
2022-08-21T14:05:21.398977Z broker.err[0]: rc1.0: flux-restore: error flushing content cache: Function not implemented

It might be a good idea to pause and question whether the "flush" operation should fail when there is no backing store. Maybe we should redefine it as "flush to backing store, if any"? I know that would require other content tests to be updated. I'm not sure if that's the right answer or not, but if it is, it could also enable the "startlog" stuff to remain in the rc files, and an instance that is restarted several times using dump/restore could have a startlog like one that restarts with a backing store.

Another thing that's probably worth testing is that you can reload the kvs module in a "none" instance, and its content remains valid.

Edit: just realized this checkpoint service is created on all ranks, but I think it should only be created on rank 0? If for some reason it is accessed from other ranks, it should just let the broker forward the requests to rank 0 (which is what happens if the message handlers are only registered on rank 0).

Comment on lines 152 to 195
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaks msgcpy on this return path

@@ -55,24 +128,36 @@ void content_checkpoint_get_request (flux_t *h, flux_msg_handler_t *mh,
{
struct content_checkpoint *checkpoint = arg;
const char *topic = "content-backing.checkpoint-get";
const char *s = NULL;
const flux_msg_t *msgcpy = flux_msg_incref (msg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming msgcpy to msgref since it's only a reference not a copy, and code relies on the fact that key, which points to memory allocated wtihin msg, remains valid after msg is destroyed.

Comment on lines 149 to 150
flux_log_error (checkpoint->h, "%s: flux_respond_pack", __FUNCTION__);
goto error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After respond fails, just log it, don't try to send an error response since that's likely to fail too.

Also log a better message like "error responding to checkpoint-get".

Comment on lines 71 to 163
if (!(f = flux_rpc (h, topic, s, 0, 0))
if (!(f = flux_rpc_pack (h, topic, 0, 0, "{s:s}", "key", key))
|| flux_future_aux_set (f,
"msg",
(void *)msgcpy,
(flux_free_f)flux_msg_decref) < 0
|| flux_future_aux_set (f, "key", (void *)key, NULL) < 0
|| flux_future_then (f,
-1,
checkpoint_get_continuation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails after the aux_set of msgcpy succeeds, there is a double free when both f and msgcpy are freed.

Also, instead of logging "error starting checkpoint-get", can you just set errstr to that message and let the caller deal with it?

It would simplify things a bit if key were not stored directly in the aux hash, and instead just retrieved from msg again in the continuation.

Comment on lines 230 to 234
if (!(f = flux_rpc_pack (h, topic, 0, 0,
"{s:s s:O}",
"key", key,
"value", value))
|| flux_future_aux_set (f,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF this fails after the aux_set, double free when both f and msgcpy are freed.

Also, send textual error response back to requestor rather than logging it.

flux_log_error (checkpoint->h, "%s: flux_respond", __FUNCTION__);
goto error;
}
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaks msgcpy on this return path

If respond fails, log a better message and don't try to send an error response.

etc/rc1 Outdated
Comment on lines 16 to 17
backingmod=${backingmod:-content-sqlite}
echo ${backingmod}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just echo ${backingmod:-content-sqlite} and skip the second assignment? (repeated in rc3)

@chu11
Copy link
Member Author

chu11 commented Aug 22, 2022

Edit: just realized this checkpoint service is created on all ranks, but I think it should only be created on rank 0? If for some reason it is accessed from other ranks, it should just let the broker forward the requests to rank 0 (which is what happens if the message handlers are only registered on rank 0).

Oh good catch, although I think we want to load the checkpoint service all of the time. We just don't want the checkpoint caching service used on rank != 0. This was actually a bug back in #4463, ENOSYS should only be returned when the backing module isn't loaded on rank == 0.

@chu11 chu11 force-pushed the issue4267_content_backing_module_none branch from 7944d68 to d866c69 Compare August 22, 2022 21:49
@chu11
Copy link
Member Author

chu11 commented Aug 22, 2022

re-pushed, addressing all of the comments above except the discussion about if content.flush should be an error or not. Of particular note:

  • created a bunch of new functions to deal with the cleanup paths better. Those mem-leaks / double frees were embarrasing :P
  • add some more tests per comments above (reload kvs, forwarding from rank != 0 works as intended)

@chu11
Copy link
Member Author

chu11 commented Aug 22, 2022

It might be a good idea to pause and question whether the "flush" operation should fail when there is no backing store. Maybe we should redefine it as "flush to backing store, if any"? I know that would require other content tests to be updated. I'm not sure if that's the right answer or not, but if it is, it could also enable the "startlog" stuff to remain in the rc files, and an instance that is restarted several times using dump/restore could have a startlog like one that restarts with a backing store.

Hmmmm, my initial feeling is that when someone does content.flush there should be an error returned. Presumably you'd want to know that things weren't backed up properly?

Here's a thought, could content.flush take an argument that is something like --quiet? i.e. don't return an error if the backing isn't there?

Alternately, we could simply add some options to startlog, restore, etc. to not flush when the option is set, and we could set the option when backing module == none.

Edit: not necessarily for this PR, could be a follow up one

@chu11 chu11 force-pushed the issue4267_content_backing_module_none branch 3 times, most recently from 0b90df9 to 3c3c02a Compare August 23, 2022 00:13
@chu11
Copy link
Member Author

chu11 commented Aug 23, 2022

argh, re-pushed, fixed up a mem-leak and fixed some bash-isms in my tests that were affecting the CI

@chu11 chu11 force-pushed the issue4267_content_backing_module_none branch 2 times, most recently from e9cb6c8 to 08ffb5f Compare August 23, 2022 04:33
@codecov
Copy link

codecov bot commented Aug 23, 2022

Codecov Report

Merging #4492 (e9cb6c8) into master (11b0680) will decrease coverage by 0.02%.
The diff coverage is 77.43%.

❗ Current head e9cb6c8 differs from pull request most recent head 08ffb5f. Consider uploading reports for the commit 08ffb5f to get more accurate results

@@            Coverage Diff             @@
##           master    #4492      +/-   ##
==========================================
- Coverage   83.36%   83.34%   -0.03%     
==========================================
  Files         401      402       +1     
  Lines       67649    67771     +122     
==========================================
+ Hits        56397    56481      +84     
- Misses      11252    11290      +38     
Impacted Files Coverage Δ
src/broker/content-checkpoint.c 76.71% <76.71%> (ø)
src/broker/content-cache.c 85.83% <100.00%> (+0.09%) ⬆️
src/modules/content-files/content-files.c 77.43% <0.00%> (-1.83%) ⬇️
src/modules/job-archive/job-archive.c 62.62% <0.00%> (-0.70%) ⬇️
src/modules/job-info/guest_watch.c 76.21% <0.00%> (-0.55%) ⬇️
src/cmd/builtin/restore.c 87.50% <0.00%> (-0.38%) ⬇️
src/common/libsubprocess/subprocess.c 87.89% <0.00%> (-0.30%) ⬇️
src/modules/kvs/kvs.c 70.46% <0.00%> (-0.14%) ⬇️
src/broker/overlay.c 86.39% <0.00%> (-0.11%) ⬇️
src/cmd/flux-job.c 87.50% <0.00%> (+0.12%) ⬆️
... and 5 more

@chu11
Copy link
Member Author

chu11 commented Aug 23, 2022

and re-pushed, fixing a s3 CI issue

@chu11 chu11 force-pushed the issue4267_content_backing_module_none branch from 08ffb5f to 2846ca1 Compare August 23, 2022 05:24
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much improved though I still had a couple of comments/questions

Comment on lines 146 to 150
if (flux_future_aux_set (f,
"msg",
(void *)msgref,
(flux_free_f)flux_msg_decref) < 0) {
flux_msg_decref (msgref);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The msgref temporary variable isn't really accomplishing anything at this point. Might as well just use

    if (flux_future_aux_set (f,
                             "msg",
                             (void *)flux_msg_incref (msg),
                             (flux_free_f)flux_msg_decref) < 0) {
        flux_msg_decref (msg);

Same comment applies to put

Comment on lines 187 to 202
if (content_checkpoint_get_backing (checkpoint, msg, key, &errstr) < 0)
goto error;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For rank 0, this is forcing the RPC to use the backing store, so it would get ENOSYS in the "none" case. The request should go to the broker RPC. Same comment applies to put.

It looks like t0028-content-backing-none.t includes a test that expects this behavior. Shouldn't it work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to keep dragging this out. Do we have a use case for not returning ENOSYS to the checkpoint operations on rank > 0? This is primarily used internally by the kvs on rank 0 only, and by the dump/restore and startlog tools, all of which one would expect to run on rank0 I think. Up to you but we could just prune that case and call it good if it cuts down the code. I maybe shouldn't have brought it up, but I was thinking that we could just load the service on rank 0 and let requests be forwarded "naturally", forgetting that the service name is now shared with the content load/store ops that must work on all ranks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought upon reading your comments was that I screwed up. We should have content.checkpoint-{get,put} forward to rank 0's content.checkpoint-{get,put} on non-rank 0 brokers. But your comment makes sense that maybe we don't have to, we should just ENOSYS right off the bat for ranks != 0.

For myself, I tend to lean towards "consistency", b/c I just dislike seeing something different in the code.

Let me have rank > 0 forward appropriately to rank 0. I'll spin that off into another PR since it's sort of independent of this and really a mistake from #4463

@chu11
Copy link
Member Author

chu11 commented Aug 24, 2022

re-pushed, building on top of #4519, adjust some tests as a result of #4519

@chu11 chu11 force-pushed the issue4267_content_backing_module_none branch 2 times, most recently from 2a051fa to 5877028 Compare August 25, 2022 17:57
@chu11
Copy link
Member Author

chu11 commented Aug 25, 2022

rebased and re-pushed now that #4519 is merged, this PR is a lot smaller now :-)

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for sticking it out through all the review comments :-)

@chu11
Copy link
Member Author

chu11 commented Aug 27, 2022

re-started one builder that had a ton of failures not related to this PR. assumption is workflow/container borked and affected bunch of tests.

Problem: content checkpoints presently only work when
content backing modules are loaded.

Solution: Cache checkpoint data so that checkpoint put/get
works regardless if the backing module is loaded.

Update tests in t2807-dump-cmd that need to check for new
error messages.
Problem: There are presently no tests to ensure that
checkpoint get/put work correctly when backing modules
are loaded / unloaded.

Solution: Add tests to content-sqlite, content-files,
and content-s3 to ensure checkpoint get/put work as
expected when backing modules are loaded and unloaded.
Add additional tests in a new content "none" testfile.
Problem: By default, rc scripts always assume a content backing
module will be loaded.  There is no way to specify "no" backing
module.

Solution: Support "none" as a special input to not load a content
backing module.

Fixes flux-framework#4267
Problems: No test exists to ensure content.backing-module "none"
works in the rc scripts.

Solution: Add a test.
@chu11 chu11 force-pushed the issue4267_content_backing_module_none branch from aee7c40 to f3c2437 Compare August 27, 2022 03:20
@chu11
Copy link
Member Author

chu11 commented Aug 27, 2022

rebased & re-pushed, mergifyio seemed to get stuck on something

@mergify mergify bot merged commit 7deb4f3 into flux-framework:master Aug 27, 2022
@chu11 chu11 deleted the issue4267_content_backing_module_none branch August 29, 2022 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants