testsuite: improve coverage of malformed RPC requests #2133

garlick · 2019-04-18T16:35:27Z

This PR adds some more tests that send a malformed request to services and make sure they get an EPROTO response. In addition it addresses a problem with the encoding of request messages by the t/request/rpc test program and adds tests for that.

A couple of response handlers needed some minor tweaks to behave properly on error paths.

Problem: flux message functions require NULL-termination of stringified JSON payloads, but t/request/rpc does not NULL-terminate. Add NULL terminator if request payload is not empty.

Problem: test client had a bug that would cause EPROTO for any valid request payload, but testing did not uncover this. Add some tests to exercise the test code!

Problem: barrier module does not respond if it receives a malformed request. Send an EPROTO response in that case.

grondo · 2019-04-18T17:44:51Z

src/modules/job-exec/job-exec.c

    if (!(job = jobinfo_new ()))
        return -1;

    if (!(job->req = flux_msg_copy (msg, true))) {
        flux_log_error (ctx->h, "start: flux_msg_copy");
        jobinfo_decref (job);
-        if (flux_respond_error (ctx->h, msg, errno, "flux_msg_copy failed") < 0)
-            flux_log_error (ctx->h, "flux_respond_error");
        return -1;


I thought we wanted to respond to job-manager with a "generic" (i.e. non-jobid response) when the job-exec module encounters a fatal error, so that the job-manager/exec protocol is "stopped" and the presumably internal error can be rectified before we encounter many failures.

For example, if the job-exec gets an invalid message from the job-manager, that probably isn't a one time thing, but may indicate a version incompatibility problem, etc.

Oh you are right, I was confused. The job manager is plumbed to tear down the exec system interface if it ever receives a "normal" error response to start.

There is still an issue because both error and success responses are being sent in the EPROTO case. Let me rework, and fix the commit comment

grondo · 2019-04-18T17:48:32Z

Nice coverage increase!

Besides the question about how to handle EPROTO error from job-manager above, I'm curious why the error handling code in job-exec.c that is supposedly being tested by this PR (EPROTO handling) isn't showing up as covered in the coverage report..

garlick · 2019-04-18T21:42:01Z

Besides the question about how to handle EPROTO error from job-manager above, I'm curious why the error handling code in job-exec.c that is supposedly being tested by this PR (EPROTO handling) isn't showing up as covered in the coverage report..

I had dropped the test from the final PR that covered the job-exec changes. I will re-add it.

Problem: when a malformed start request is received, a failure response is sent that includes an uninitialized job id, since the id was unable to be parsed from the request: { "id":0, "type":"exception", "data":{ "severity":0, "type":"exec", "note":"job start failure: Protocol error" } } Skip sending this response. An EPROTO response is also sent, which causes the job manager to stop sending requests to the exec service. Add a comment about the non-obvious purpose of this response. Now a test can be written in a straightforward way to ensure that the EPROTO is returned.

Add more tests that ensure malformed requests fail with EPROTO.

garlick · 2019-04-18T23:13:09Z

Forced a push which added a job-exec.start EPROTO check to t2400-job-exec-test.t, and reworks the change to job-exec to avoid sending the specific job response before the EPROTO, since the jobid was uninitialized (0). Not so coincidentally, that made the EPROTO test easier to write.

codecov-io · 2019-04-18T23:40:50Z

Codecov Report

Merging #2133 into master will increase coverage by 0.1%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #2133     +/-   ##
=========================================
+ Coverage   80.28%   80.38%   +0.1%     
=========================================
  Files         201      201             
  Lines       31709    31710      +1     
=========================================
+ Hits        25456    25489     +33     
+ Misses       6253     6221     -32

Impacted Files	Coverage Δ
src/modules/job-exec/job-exec.c	`74.21% <100%> (+0.78%)`	⬆️
src/modules/barrier/barrier.c	`78.52% <100%> (-0.54%)`	⬇️
src/modules/kvs-watch/kvs-watch.c	`79.41% <0%> (+0.42%)`	⬆️
src/modules/job-manager/start.c	`73.45% <0%> (+0.88%)`	⬆️
src/modules/job-manager/alloc.c	`76.23% <0%> (+0.99%)`	⬆️
src/modules/job-ingest/job-ingest.c	`74.72% <0%> (+1.48%)`	⬆️
src/modules/job-manager/submit.c	`74.52% <0%> (+1.88%)`	⬆️
src/modules/userdb/userdb.c	`72.02% <0%> (+2.38%)`	⬆️
src/modules/job-manager/raise.c	`80.95% <0%> (+2.38%)`	⬆️
... and 4 more

grondo · 2019-04-19T16:43:56Z

100% diff coverage! I'm sold!

garlick added 3 commits April 18, 2019 08:54

testsuite: fix json-encoding in rpc test client

90f4ed0

Problem: flux message functions require NULL-termination of stringified JSON payloads, but t/request/rpc does not NULL-terminate. Add NULL terminator if request payload is not empty.

testsuite: verify rpc test client operation

5a2811a

Problem: test client had a bug that would cause EPROTO for any valid request payload, but testing did not uncover this. Add some tests to exercise the test code!

modules/barrier: send EPROTO response on bad request

4465e4e

Problem: barrier module does not respond if it receives a malformed request. Send an EPROTO response in that case.

garlick force-pushed the request_test branch from e4ab54c to d359756 Compare April 18, 2019 16:36

grondo reviewed Apr 18, 2019

View reviewed changes

garlick force-pushed the request_test branch from d359756 to 2a11cb6 Compare April 18, 2019 23:05

testsuite: verify that malformed requests get EPROTO

c7cb3df

Add more tests that ensure malformed requests fail with EPROTO.

garlick force-pushed the request_test branch from 2a11cb6 to c7cb3df Compare April 18, 2019 23:06

grondo merged commit 9a8e6ef into flux-framework:master Apr 19, 2019

garlick deleted the request_test branch February 25, 2020 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testsuite: improve coverage of malformed RPC requests #2133

testsuite: improve coverage of malformed RPC requests #2133

garlick commented Apr 18, 2019

grondo Apr 18, 2019

garlick Apr 18, 2019

grondo commented Apr 18, 2019

garlick commented Apr 18, 2019

garlick commented Apr 18, 2019

codecov-io commented Apr 18, 2019

grondo commented Apr 19, 2019

testsuite: improve coverage of malformed RPC requests #2133

testsuite: improve coverage of malformed RPC requests #2133

Conversation

garlick commented Apr 18, 2019

grondo Apr 18, 2019

Choose a reason for hiding this comment

garlick Apr 18, 2019

Choose a reason for hiding this comment

grondo commented Apr 18, 2019

garlick commented Apr 18, 2019

garlick commented Apr 18, 2019

codecov-io commented Apr 18, 2019

Codecov Report

grondo commented Apr 19, 2019