New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testsuite: improve coverage of malformed RPC requests #2133
Conversation
Problem: flux message functions require NULL-termination of stringified JSON payloads, but t/request/rpc does not NULL-terminate. Add NULL terminator if request payload is not empty.
Problem: test client had a bug that would cause EPROTO for any valid request payload, but testing did not uncover this. Add some tests to exercise the test code!
Problem: barrier module does not respond if it receives a malformed request. Send an EPROTO response in that case.
src/modules/job-exec/job-exec.c
Outdated
if (!(job = jobinfo_new ())) | ||
return -1; | ||
|
||
if (!(job->req = flux_msg_copy (msg, true))) { | ||
flux_log_error (ctx->h, "start: flux_msg_copy"); | ||
jobinfo_decref (job); | ||
if (flux_respond_error (ctx->h, msg, errno, "flux_msg_copy failed") < 0) | ||
flux_log_error (ctx->h, "flux_respond_error"); | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we wanted to respond to job-manager with a "generic" (i.e. non-jobid response) when the job-exec module encounters a fatal error, so that the job-manager/exec protocol is "stopped" and the presumably internal error can be rectified before we encounter many failures.
For example, if the job-exec gets an invalid message from the job-manager, that probably isn't a one time thing, but may indicate a version incompatibility problem, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you are right, I was confused. The job manager is plumbed to tear down the exec system interface if it ever receives a "normal" error response to start
.
There is still an issue because both error and success responses are being sent in the EPROTO case. Let me rework, and fix the commit comment
Nice coverage increase! Besides the question about how to handle EPROTO error from job-manager above, I'm curious why the error handling code in job-exec.c that is supposedly being tested by this PR (EPROTO handling) isn't showing up as covered in the coverage report.. |
I had dropped the test from the final PR that covered the job-exec changes. I will re-add it. |
Problem: when a malformed start request is received, a failure response is sent that includes an uninitialized job id, since the id was unable to be parsed from the request: { "id":0, "type":"exception", "data":{ "severity":0, "type":"exec", "note":"job start failure: Protocol error" } } Skip sending this response. An EPROTO response is also sent, which causes the job manager to stop sending requests to the exec service. Add a comment about the non-obvious purpose of this response. Now a test can be written in a straightforward way to ensure that the EPROTO is returned.
Add more tests that ensure malformed requests fail with EPROTO.
Forced a push which added a job-exec.start EPROTO check to |
Codecov Report
@@ Coverage Diff @@
## master #2133 +/- ##
=========================================
+ Coverage 80.28% 80.38% +0.1%
=========================================
Files 201 201
Lines 31709 31710 +1
=========================================
+ Hits 25456 25489 +33
+ Misses 6253 6221 -32
|
100% diff coverage! I'm sold! |
This PR adds some more tests that send a malformed request to services and make sure they get an EPROTO response. In addition it addresses a problem with the encoding of request messages by the
t/request/rpc
test program and adds tests for that.A couple of response handlers needed some minor tweaks to behave properly on error paths.