Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testsuite: improve coverage of malformed RPC requests #2133

Merged
merged 5 commits into from Apr 19, 2019

Conversation

@garlick
Copy link
Member

commented Apr 18, 2019

This PR adds some more tests that send a malformed request to services and make sure they get an EPROTO response. In addition it addresses a problem with the encoding of request messages by the t/request/rpc test program and adds tests for that.

A couple of response handlers needed some minor tweaks to behave properly on error paths.

garlick added 3 commits Apr 18, 2019
Problem: flux message functions require NULL-termination
of stringified JSON payloads, but t/request/rpc does
not NULL-terminate.

Add NULL terminator if request payload is not empty.
Problem: test client had a bug that would cause EPROTO
for any valid request payload, but testing did not
uncover this.

Add some tests to exercise the test code!
Problem: barrier module does not respond if it receives
a malformed request.

Send an EPROTO response in that case.
@garlick garlick force-pushed the garlick:request_test branch from e4ab54c to d359756 Apr 18, 2019
if (!(job = jobinfo_new ()))
return -1;

if (!(job->req = flux_msg_copy (msg, true))) {
flux_log_error (ctx->h, "start: flux_msg_copy");
jobinfo_decref (job);
if (flux_respond_error (ctx->h, msg, errno, "flux_msg_copy failed") < 0)
flux_log_error (ctx->h, "flux_respond_error");
return -1;

This comment has been minimized.

Copy link
@grondo

grondo Apr 18, 2019

Contributor

I thought we wanted to respond to job-manager with a "generic" (i.e. non-jobid response) when the job-exec module encounters a fatal error, so that the job-manager/exec protocol is "stopped" and the presumably internal error can be rectified before we encounter many failures.

For example, if the job-exec gets an invalid message from the job-manager, that probably isn't a one time thing, but may indicate a version incompatibility problem, etc.

This comment has been minimized.

Copy link
@garlick

garlick Apr 18, 2019

Author Member

Oh you are right, I was confused. The job manager is plumbed to tear down the exec system interface if it ever receives a "normal" error response to start.

There is still an issue because both error and success responses are being sent in the EPROTO case. Let me rework, and fix the commit comment

@grondo

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Nice coverage increase!

Besides the question about how to handle EPROTO error from job-manager above, I'm curious why the error handling code in job-exec.c that is supposedly being tested by this PR (EPROTO handling) isn't showing up as covered in the coverage report..

@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

Besides the question about how to handle EPROTO error from job-manager above, I'm curious why the error handling code in job-exec.c that is supposedly being tested by this PR (EPROTO handling) isn't showing up as covered in the coverage report..

I had dropped the test from the final PR that covered the job-exec changes. I will re-add it.

Problem: when a malformed start request is received,
a failure response is sent that includes an uninitialized
job id, since the id was unable to be parsed from the
request:

{ "id":0,
  "type":"exception",
  "data":{
    "severity":0,
    "type":"exec",
    "note":"job start failure: Protocol error"
  }
}

Skip sending this response.  An EPROTO response is also
sent, which causes the job manager to stop sending requests
to the exec service.  Add a comment about the non-obvious
purpose of this response.

Now a test can be written in a straightforward way to
ensure that the EPROTO is returned.
@garlick garlick force-pushed the garlick:request_test branch from d359756 to 2a11cb6 Apr 18, 2019
Add more tests that ensure malformed requests fail
with EPROTO.
@garlick garlick force-pushed the garlick:request_test branch from 2a11cb6 to c7cb3df Apr 18, 2019
@garlick

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

Forced a push which added a job-exec.start EPROTO check to t2400-job-exec-test.t, and reworks the change to job-exec to avoid sending the specific job response before the EPROTO, since the jobid was uninitialized (0). Not so coincidentally, that made the EPROTO test easier to write.

@codecov-io

This comment has been minimized.

Copy link

commented Apr 18, 2019

Codecov Report

Merging #2133 into master will increase coverage by 0.1%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #2133     +/-   ##
=========================================
+ Coverage   80.28%   80.38%   +0.1%     
=========================================
  Files         201      201             
  Lines       31709    31710      +1     
=========================================
+ Hits        25456    25489     +33     
+ Misses       6253     6221     -32
Impacted Files Coverage Δ
src/modules/job-exec/job-exec.c 74.21% <100%> (+0.78%) ⬆️
src/modules/barrier/barrier.c 78.52% <100%> (-0.54%) ⬇️
src/modules/kvs-watch/kvs-watch.c 79.41% <0%> (+0.42%) ⬆️
src/modules/job-manager/start.c 73.45% <0%> (+0.88%) ⬆️
src/modules/job-manager/alloc.c 76.23% <0%> (+0.99%) ⬆️
src/modules/job-ingest/job-ingest.c 74.72% <0%> (+1.48%) ⬆️
src/modules/job-manager/submit.c 74.52% <0%> (+1.88%) ⬆️
src/modules/userdb/userdb.c 72.02% <0%> (+2.38%) ⬆️
src/modules/job-manager/raise.c 80.95% <0%> (+2.38%) ⬆️
... and 4 more
@grondo

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

100% diff coverage! I'm sold!

@grondo grondo merged commit 9a8e6ef into flux-framework:master Apr 19, 2019
4 checks passed
4 checks passed
Mergify — Summary 1 potential rule
Details
codecov/patch 100% of diff hit (target 80.28%)
Details
codecov/project 80.38% (+0.1%) compared to 6648154
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.