Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testsuite: improve coverage of malformed RPC requests #2133

Merged
merged 5 commits into from Apr 19, 2019

Conversation

garlick
Copy link
Member

@garlick garlick commented Apr 18, 2019

This PR adds some more tests that send a malformed request to services and make sure they get an EPROTO response. In addition it addresses a problem with the encoding of request messages by the t/request/rpc test program and adds tests for that.

A couple of response handlers needed some minor tweaks to behave properly on error paths.

Problem: flux message functions require NULL-termination
of stringified JSON payloads, but t/request/rpc does
not NULL-terminate.

Add NULL terminator if request payload is not empty.
Problem: test client had a bug that would cause EPROTO
for any valid request payload, but testing did not
uncover this.

Add some tests to exercise the test code!
Problem: barrier module does not respond if it receives
a malformed request.

Send an EPROTO response in that case.
if (!(job = jobinfo_new ()))
return -1;

if (!(job->req = flux_msg_copy (msg, true))) {
flux_log_error (ctx->h, "start: flux_msg_copy");
jobinfo_decref (job);
if (flux_respond_error (ctx->h, msg, errno, "flux_msg_copy failed") < 0)
flux_log_error (ctx->h, "flux_respond_error");
return -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wanted to respond to job-manager with a "generic" (i.e. non-jobid response) when the job-exec module encounters a fatal error, so that the job-manager/exec protocol is "stopped" and the presumably internal error can be rectified before we encounter many failures.

For example, if the job-exec gets an invalid message from the job-manager, that probably isn't a one time thing, but may indicate a version incompatibility problem, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are right, I was confused. The job manager is plumbed to tear down the exec system interface if it ever receives a "normal" error response to start.

There is still an issue because both error and success responses are being sent in the EPROTO case. Let me rework, and fix the commit comment

@grondo
Copy link
Contributor

grondo commented Apr 18, 2019

Nice coverage increase!

Besides the question about how to handle EPROTO error from job-manager above, I'm curious why the error handling code in job-exec.c that is supposedly being tested by this PR (EPROTO handling) isn't showing up as covered in the coverage report..

@garlick
Copy link
Member Author

garlick commented Apr 18, 2019

Besides the question about how to handle EPROTO error from job-manager above, I'm curious why the error handling code in job-exec.c that is supposedly being tested by this PR (EPROTO handling) isn't showing up as covered in the coverage report..

I had dropped the test from the final PR that covered the job-exec changes. I will re-add it.

Problem: when a malformed start request is received,
a failure response is sent that includes an uninitialized
job id, since the id was unable to be parsed from the
request:

{ "id":0,
  "type":"exception",
  "data":{
    "severity":0,
    "type":"exec",
    "note":"job start failure: Protocol error"
  }
}

Skip sending this response.  An EPROTO response is also
sent, which causes the job manager to stop sending requests
to the exec service.  Add a comment about the non-obvious
purpose of this response.

Now a test can be written in a straightforward way to
ensure that the EPROTO is returned.
Add more tests that ensure malformed requests fail
with EPROTO.
@garlick
Copy link
Member Author

garlick commented Apr 18, 2019

Forced a push which added a job-exec.start EPROTO check to t2400-job-exec-test.t, and reworks the change to job-exec to avoid sending the specific job response before the EPROTO, since the jobid was uninitialized (0). Not so coincidentally, that made the EPROTO test easier to write.

@codecov-io
Copy link

Codecov Report

Merging #2133 into master will increase coverage by 0.1%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #2133     +/-   ##
=========================================
+ Coverage   80.28%   80.38%   +0.1%     
=========================================
  Files         201      201             
  Lines       31709    31710      +1     
=========================================
+ Hits        25456    25489     +33     
+ Misses       6253     6221     -32
Impacted Files Coverage Δ
src/modules/job-exec/job-exec.c 74.21% <100%> (+0.78%) ⬆️
src/modules/barrier/barrier.c 78.52% <100%> (-0.54%) ⬇️
src/modules/kvs-watch/kvs-watch.c 79.41% <0%> (+0.42%) ⬆️
src/modules/job-manager/start.c 73.45% <0%> (+0.88%) ⬆️
src/modules/job-manager/alloc.c 76.23% <0%> (+0.99%) ⬆️
src/modules/job-ingest/job-ingest.c 74.72% <0%> (+1.48%) ⬆️
src/modules/job-manager/submit.c 74.52% <0%> (+1.88%) ⬆️
src/modules/userdb/userdb.c 72.02% <0%> (+2.38%) ⬆️
src/modules/job-manager/raise.c 80.95% <0%> (+2.38%) ⬆️
... and 4 more

@grondo
Copy link
Contributor

grondo commented Apr 19, 2019

100% diff coverage! I'm sold!

@grondo grondo merged commit 9a8e6ef into flux-framework:master Apr 19, 2019
@garlick garlick deleted the request_test branch February 25, 2020 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants