Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libflux: allow error string in RPC response #1538

Merged
merged 19 commits into from May 30, 2018

Conversation

Projects
None yet
4 participants
@garlick
Copy link
Member

garlick commented May 27, 2018

As discussed in #796, this PR adds the capability to return an error string in an RPC response, adding these API functions:

response.h

// mainly used in unit tests like other similar functions
int flux_response_decode_error (const flux_msg_t *msg, const char *errstr);
flux_msg_t *flux_response_encode_error (const char *topic, int errnum, const char *errstr);

// respond with error:  errnum != 0 required, errstr != NULL optional
int flux_respond_error (flux_t *h, const flux_msg_t *msg, int errnum, const char *errstr);

rpc.h

// Get error message from fulfilled RPC future
// Never NULL - if no error string in response, returns flux_strerror (errnum)
const char *flux_rpc_get_error (flux_future_t *f);

So one might write RPC client code that looks like this:

if (!(f = flux_rpc (h, ....)))
    fatal ("flux_rpc failed");
if (flux_rpc_get (f, "foo.method", ...) < 0)
    fatal ("foo.method: %s", flux_rpc_get_error (f))
...

In addition, relax some API calls that take "JSON strings" to allow any string, for flexibility (ostensibly to add error string payload to response, but this seemed like a good general improvement):

  • Rename flux_msg_set_json() to flux_msg_set_string()
  • Rename flux_msg_get_json() to flux_msg_get_string()
  • Drop FLUX_MSGFLAG_JSON flag as proposed in flux-framework/rfc#123
  • Change various message payload accessors to use "NULL terminated strings" instead of "JSON strings" (mostly cosmetic)

Finally, I thought now that we have flux_respond_error(), there was no need to have flux_respond() and flux_respond_raw() perform double duty, accepting both errnum and a payload, and ignoring the payload if errnum != 0. I did remove errnum from flux_respond_raw which has few users, but flux_respond() has many users and I thought I'd better open an issue and get feedback, and do that in a separate PR if we think that's a good idea.

Todo: add new functions to man pages, revisit testing, probably some PR comment improvement.

@garlick garlick changed the title allow error string in RPC response libflux: allow error string in RPC response May 27, 2018

@coveralls

This comment has been minimized.

Copy link

coveralls commented May 27, 2018

Coverage Status

Coverage increased (+0.02%) to 79.178% when pulling 7f7bf71 on garlick:error_response into ba1f064 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented May 27, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@bf59bee). Click here to learn what that means.
The diff coverage is 81.77%.

@@            Coverage Diff            @@
##             master    #1538   +/-   ##
=========================================
  Coverage          ?   78.89%           
=========================================
  Files             ?      166           
  Lines             ?    30598           
  Branches          ?        0           
=========================================
  Hits              ?    24140           
  Misses            ?     6458           
  Partials          ?        0
Impacted Files Coverage Δ
src/modules/wreck/job.c 71.12% <100%> (ø)
src/modules/cron/task.c 81.81% <100%> (ø)
src/bindings/lua/zmsg-lua.c 84.26% <100%> (ø)
src/broker/publisher.c 84.78% <100%> (ø)
src/common/libflux/event.c 79.71% <100%> (ø)
src/cmd/builtin/proxy.c 75.24% <100%> (ø)
src/bindings/lua/flux-lua.c 82.15% <100%> (ø)
src/common/libflux/mrpc.c 86.11% <100%> (ø)
src/common/libflux/request.c 88.46% <100%> (ø)
src/modules/content-sqlite/content-sqlite.c 56.29% <40%> (ø)
... and 5 more

@garlick garlick force-pushed the garlick:error_response branch from 6f965cd to cb52a8d May 28, 2018

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 29, 2018

This is an excellent improvement IMO! Awesome!
I'll poke at it a bit, but this should go in right away.

@garlick garlick force-pushed the garlick:error_response branch from cb52a8d to 23beb88 May 30, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 30, 2018

Rebased on master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 30, 2018

Is there a reason not to have a variadic flux_respond_error?

int flux_respond_error (flux_t *h, const flux_msg_t *msg, int errnum, const char *fmt, ...);

This might help in many RPC response error paths where it would be awkward to create formatted error messages (and the code would be difficult to test as well), but you may have already considered this and have some arguments against the implementation.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 30, 2018

Good idea! I didn't think of that actually.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 30, 2018

Just pushed that change - I'll squash if it looks OK.

I vsnprintf()ed the message into a 1K-byte static buffer, silently truncating. Any problem with that? I could go with vasprintf() if you think that's better. I thought since we recommend 80 chars or less in the RFC, truncating at 1K would be OK and another malloc saved (albeit in an uncommon case).

@garlick garlick force-pushed the garlick:error_response branch from 6c88b1d to 7f7bf71 May 30, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 30, 2018

Fixed a man page spelling error in that last change and forced a push.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 30, 2018

I vsnprintf()ed the message into a 1K-byte static buffer, silently truncating. Any problem with that?

That seems good to me!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 30, 2018

One other quick idea would be to add __attribute__ ((format (printf, 4, 5))) to the prototype for flux_respond_error, but only since you are squashing anyway. Otherwise, LGTM! Is it ready for merge once squash is done?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 30, 2018

Oh duh, right I'll add the attribute after I get in and squash.

garlick added some commits May 26, 2018

doc/flux_event_publish(3): [cleanup] fix whitespace
Problem: page contains lines with trailing whitespace.

Drop trailing whitespace.
libflux/message: drop FLUX_MSGFLAG_JSON
As discussed in flux-framework/rfc#123, FLUX_MSGFLAG_JSON
is no longer needed.

Drop the flag, and the flags argument from flux_msg_set_payload()
and flux_msg_get_payload(), which was only allowed to be zero
or FLUX_MSGFLAG_JSON.

Move the JSON payload check from flux_msg_set_payload()
to flux_msg_set_json().

Update unit tests and users, primarily in libflux.
libflux/message: rename set/get_json to _string
Problem: it's not convenient to send a C string
message payload.

Current code that accepts a 'json_str' does a cursory
check that the string looks vaguely like an
encoded JSON object.  Better to just drop this check
and allow any type of string in the payload.

Rename and drop the cursory check:
  flux_msg_set_json() -> flux_msg_set_string()
  flux_msg_get_json() -> flux_msg_get_string()

N.B. it's convenient to include the terminating \0
in the payload so accessor functions can return the
string as a (const char *) without copying it first
to add the \0 character.

Update users and tests.
doc/man3: update message API descriptions
Update section 3 descriptions of request, response,
event, and rpc functions that accept a string argument
to describe the payload as a NULL-terminated string
rather than a "JSON string" since that is no longer
assumed.
libflux/response: add flux_response_decode_error()
Add new function for decoding an error string returned
with a failed response.
libflux/response: drop errnum arg from encode_raw
Problem: Problem flux_response_encode_error() allows
error response to be encoded so there is no need
for errnum argument to flux_response_encode_raw().

Drop the 'errnum' parameter.

Update unit test.
libflux/response: add flux_response_encode_error()
Add a function that allows an error number and string
to be encoded in a response.
libflux/response: drop errnum arg from encode
Problem: flux_response_encode_error() allows errnum to
be set so there is no need for errnum argument in
flux_response_encode().

Drop errnum argument.

Update users.
libflux/response: add flux_respond_error()
Add a function that allows a custom, human readable
error message to be returned in a response message.
libflux/response: drop wrong FLUX_FATAL calls
Problem: several flux_response functions call FLUX_FATAL
on all errors, but FLUX_FATAL should only be used when
the flux_t handle has become unusable.

Fix error paths.
libflux/response: drop errnum arg from respond_raw
Problem: flux_respond_error() allows error response
to be sent so drop errnum arg in flux_respond_raw()
is not necessary.

Drop errnum arg from flux_respond_raw().

Update users.
libflux/rpc: add flux_rpc_get_error()
Add a function that allows an RPC user to retrieve
an error response payload string.  Ensure that the
function always returns a valid string by substituting
flux_strerror() if the payload string is unavailable.

This in combination with the addition of
flux_respond_error() fixes #796.

garlick added some commits May 27, 2018

sharness/rpc: [cleanup] fix handler error paths
Problem: control flow for message handlers in RPC
sharness test is a little hard to follow and has
inconsitent error handling.

Rework handlers to use the same error handling
pattern in each callback.
doc/flux_response_decode(3): add decode_error
Add entry and stub page for flux_response_decode_error()
doc/flux_respond(3): add flux_respond_error()
Add entry and stub page for flux_respond_error(3).
doc/flux_rpc(3): add flux_rpc_get_error()
Add an entry and stub page for flux_rpc_get_error()

@garlick garlick force-pushed the garlick:error_response branch from 7f7bf71 to 299097c May 30, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 30, 2018

Attribute added, incremental work squashed, and rebased on master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 30, 2018

Looks good! As a test I added a check for job->ntasks > 0 in job.create and added an informative error message in the response. Along with a requisite small change in the lua bindings, we now get this instead of Protocol error as an error message from submit/run:

$ flux wreckrun -n 0 hostnanme
wreckrun: flux.rpc: Invalid value for required job parameter ntasks (got 0)

@grondo grondo merged commit 48670b5 into flux-framework:master May 30, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls First build on master at 79.292%
Details
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 30, 2018

Nice! Thanks!

@garlick garlick deleted the garlick:error_response branch Jun 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.