Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libflux: change flux_respond(), flux_respond_error() plus misc fixes #2120

Merged
merged 14 commits into from Apr 17, 2019

Conversation

@garlick
Copy link
Member

commented Apr 12, 2019

This PR knocks off some easy bugs, and makes some changes to flux_respond() and flux_respond_error() proposed in #2088 and #2089.

@garlick garlick force-pushed the garlick:fix_potpourri branch from ff20085 to fbe9f10 Apr 12, 2019
Copy link
Contributor

left a comment

LGTM, just some small things. Obviously lots of code changes, perhaps another glance from someone else would be good to see if I missed anything.

@@ -2720,7 +2730,7 @@ static void namespace_list_request_cb (flux_t *h, flux_msg_handler_t *mh,
rc = 0;
done:
if (rc < 0) {
if (flux_respond (h, msg, errno, NULL) < 0)
if (flux_respond_error (h, msg, errno, NULL) < 0)

This comment has been minimized.

Copy link
@chu11

chu11 Apr 12, 2019

Contributor

log msg should now be flux_respond_error

@@ -226,7 +223,7 @@ void xping_request_cb (flux_t *h, flux_msg_handler_t *mh,
free (hashkey);
return;
error:
if (flux_respond (h, msg, errno, NULL) < 0)
if (flux_respond_error (h, msg, errno, NULL) < 0)

This comment has been minimized.

Copy link
@chu11

chu11 Apr 12, 2019

Contributor

log msg flux_respond_error

@@ -49,11 +49,11 @@ static void rusage_request_cb (flux_t *h, flux_msg_handler_t *mh,
"nsignals", ru.ru_nsignals,
"nvcsw", ru.ru_nvcsw,
"nivcsw", ru.ru_nivcsw) < 0)
goto error;
flux_log_error (h, "%s: flux_respond_pack", __FUNCTION__);

This comment has been minimized.

Copy link
@chu11

chu11 Apr 12, 2019

Contributor

did you accidentally remove this goto error? Seems not right.

This comment has been minimized.

Copy link
@garlick

garlick Apr 12, 2019

Author Member

It didn't seem useful to handle a flux_respond_pack() error by calling flux_respond_error() since that is likely to fail also. However, maybe I should do that in a separate cleanup commit.

@@ -388,7 +388,7 @@ static void server_exec_cb (flux_t *h, flux_msg_handler_t *mh,
return;

error:
if (flux_respond (h, msg, errno, NULL) < 0)
if (flux_respond_error (h, msg, errno, NULL) < 0)

This comment has been minimized.

Copy link
@chu11

chu11 Apr 12, 2019

Contributor

log msg flux_respond_error

@@ -704,7 +710,7 @@ static void cron_sync_handler (flux_t *h, flux_msg_handler_t *w,
return;

error:
if (flux_respond (h, msg, errno, NULL) < 0)
if (flux_respond_error (h, msg, errno, NULL) < 0)

This comment has been minimized.

Copy link
@chu11

chu11 Apr 12, 2019

Contributor

log msg flux_respond_error

arguments. The error string may be used to provide a more
detailed error message than can be conveyed via _errnum_.
_errnum_ must be non-zero. If _errmsg_ is non-NULL, an error string
payload is included in the response The error string may be used to

This comment has been minimized.

Copy link
@chu11

chu11 Apr 12, 2019

Contributor

missing period after response

@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2019

@grondo
grondo approved these changes Apr 12, 2019
Copy link
Contributor

left a comment

Looks good to me! I had one concern about the code bloat being caused by removal of the errnum parameter from flux_respond, though I generally agree with the change. Perhaps it would be nice if we have some sort of convenience function/macro that would allow us to keep possible errno responses in a couple lines instead of 8 or more..

else {
if (flux_respond (h, msg, NULL) < 0)
flux_log_error (h, "aggregator.push: flux_respond");
}
}

This comment has been minimized.

Copy link
@grondo

grondo Apr 12, 2019

Contributor

I see a few places where this blob of code is required due to the removal of errnum parameter from flux_respond -- trading 2 lines of code for 8. The improvement to the function seems like a good idea, but I wonder if we need a convenience macro or function to do it the old way?

You went through and made all the changes, was it overall a net win?

This comment has been minimized.

Copy link
@garlick

garlick Apr 12, 2019

Author Member

I think it was. As discussed face to face, though, there's a trade off that may be stylistic preference for how request handlers are written.

The style in #2089 is primarily used in older flux code. Success response and error response take the same path through the code, resulting in better coverage, reduced duplication of any required teardown, and thus fewer LOC. However, care is needed to avoid assuming errno is zero on success, like the introduction of an rc variable and/or saved_errno. This style is favored by the old flux_respond() footprint.

The other style separates the two paths, with one ending in flux_resopnd_error() and one ending in one of the other flux_respond_*() variants. Teardown is duplicated. One path may not have coverage. However, IMHO the error path and the success path are more clearly delineated. This style is favored by the new flux_respond() footprint.

Sometimes a little refactoring can be done to reduce the amount of duplicated code in both legs in the latter style.

If it's agreeable generally that the newer style is the better choice, then I should probably tack on a commit to update the older stuff. I started out that way, but quickly broke something and decided to back off. Perhaps doing it a little more carefully will yield better results.

@chu11

This comment has been minimized.

Copy link
Contributor

commented Apr 15, 2019

Just skimmed changes. Looks good. Squash then we can merge?

@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2019

Thanks! I didn't quite finish my pass through src/modules updating the response error handling, so let me do that first.

@garlick garlick force-pushed the garlick:fix_potpourri branch from 8d5687b to 99a4029 Apr 15, 2019
@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2019

OK, I made it through. If @grondo could maybe have a peek at the aggregator and cron changes, that would be appreciated. Then if good, I'll squash.

Edit: sorry the diffs are irritating to try to follow, to me at least. It might be easier to look at the full source and just search for "flux_respond". If this seems like too much change for code that's not broken, LMK and I can lop off that portion.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

If @grondo could maybe have a peek at the aggregator and cron changes, that would be appreciated. Then if good, I'll squash.

Seemed good to me! A bit worrisome how much code had to be rewritten to change error handling 😨

@garlick garlick force-pushed the garlick:fix_potpourri branch from 99a4029 to 20362d2 Apr 16, 2019
@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

Rebased and also squashed the incremental development.

@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

It just occurred to me that it would be relatively simple to add a test that sends a malformed request to every one of the changed request handlers, to elicit an EPROTO response and send it down the common error leg. I should do that here.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

Great idea! That would be really useful in general.

garlick added 14 commits Apr 11, 2019
Problem: t2005-hwloc-basic.t fails the "hwloc: internal
aggregate-load cmd works" test if jq is not installed

Add missing HAVE_JQ.

Fixes #2116
Problem: when a service is not registered, the
ENOSYS (Function not implemented) response is a bit
unhelpful.

When the broker cannot find a service to route a request
to, add a textual error message to the ENOSYS response, e.g.
  No service matching foo.bar is registered
Problem: the job-manager.submit error path includes
code to fill in the future error string if missing,
but the future code already fills in with flux_strerror().

Simplify this code to pass through the error string
unconditionally.
Problem: if job-manager is not loaded, flux job submit
complains that the job-ingest is not loaded.

Pass through the error string without interpretation,
instead of special casing ENOSYS.

Fixes #2113
Problem: duplicate keys can appear in the KVS setroot
event message, needlessly increasing message size.

This was noted when multiple events to the same
job eventlog were batched by the job-manager.

Ensure key list is unique.

Fixes #2108
Problem: the varargs error string results in extra
lines of code in common situations where an error
string is only set conditionally.

Switch to a (const char *errstr) argument.

Fixes #2088
Problem: flux_respond() can return errors as well as
success, which sometimes leads to awkward and unsafe
code, such as code that depends on errno being set to zero,
when it is undefined *unless* there is an error.

Drop errnum from flux_respond().  Now it should only
be used to return success.

Update tests and users.

Fixes #2089
Update man page to match changes made to the function.
Update man page to match changes made to function
prototype.
Problem: When flux_respond_*() fails in a service's
request handler, the error should not be handled by
calling flux_respond_error(), since that is just as
likely to fail as the first call.

This unlikely error will probably be the result of a
coding error, or a fatal error on the flux_t handle.
Reasonable coping mechanisms are stopping the reactor,
and/or by logging it at LOG_ERR level.

This pattern occurs in the broker and the KVS.
Log the error instead of retrying.
Problem: the new footprint of flux_respond() and
flux_respond_error() favor a pattern for response
handlers that separates error and success code paths.
The error path ends in flux_respond_error() and the
success path ends in flux_respond().  However, some
services use a different pattern that combines error
and success code paths.  This pattern is now a poor
fit for the respond functions.

Replace this pattern in the content-cache, the broker
log service, kvs, sched-simple, cron, and aggregator.
Add a test helper that sends a single RPC, using topic
string from command line and payload from stdin.  Any
response payload is displayed on stdout.

$ jq -j -c -n  "{name:\"rank\"}" | t/request/rpc attr.get

If a numerical argument follows the topic string, it indicates
that a failure expected with errno == the number.  In that case,
a failure response with the expected errno is treated as success,
and a success response, or a failure response with the wrong
errno is treated as failure.
Add test coverage to some request handlers for
malformed request payloads, focusing on those
whose response handling logic was just reworked.
@garlick garlick force-pushed the garlick:fix_potpourri branch from 20362d2 to 379b360 Apr 17, 2019
@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

Rebased on current master and added some new tests to elicit EPROTO responses to changed code paths, where that works.

@codecov-io

This comment has been minimized.

Copy link

commented Apr 17, 2019

Codecov Report

Merging #2120 into master will increase coverage by 0.04%.
The diff coverage is 64.2%.

@@            Coverage Diff             @@
##           master    #2120      +/-   ##
==========================================
+ Coverage   80.22%   80.26%   +0.04%     
==========================================
  Files         201      201              
  Lines       31733    31709      -24     
==========================================
- Hits        25457    25451       -6     
+ Misses       6276     6258      -18
Impacted Files Coverage Δ
src/modules/connector-local/local.c 73.62% <0%> (ø) ⬆️
src/broker/module.c 82.47% <0%> (ø) ⬆️
src/broker/rusage.c 59.45% <0%> (-1.66%) ⬇️
src/modules/job-info/lookup.c 64.28% <100%> (ø) ⬆️
src/common/libflux/response.c 79.24% <100%> (-2.24%) ⬇️
src/modules/job-ingest/job-ingest.c 73.23% <100%> (-0.49%) ⬇️
src/cmd/flux-job.c 84.69% <100%> (-0.08%) ⬇️
src/modules/job-manager/submit.c 72.64% <100%> (-0.7%) ⬇️
src/broker/heaptrace.c 89.28% <100%> (ø) ⬆️
src/bindings/lua/flux-lua.c 87.34% <100%> (ø) ⬆️
... and 28 more
@grondo

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2019

Nice work! That increased coverage significantly. Is this ready to go in?

@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

Thanks! Yes please.

@grondo grondo merged commit 6648154 into flux-framework:master Apr 17, 2019
2 of 3 checks passed
2 of 3 checks passed
codecov/patch 64.2% of diff hit (target 80.22%)
Details
codecov/project 80.26% (+0.04%) compared to adce7f6
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garlick garlick deleted the garlick:fix_potpourri branch Apr 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.