Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libsubprocess: Remove invalid EOF calculation #2230

Merged
merged 2 commits into from Jul 15, 2019

Conversation

@chu11
Copy link
Contributor

commented Jul 13, 2019

As discussed in #2211 and #2228, there was an invalid EOF calculation / assumption in local processes in libsubprocess.

Unfortunately, I have been unable to create a reproducer that works solely against the libsubprocess library. I can't seem to find that magic combination that makes this trigger regularly.

Perhaps the reproducer with flux-shell in #2211 could be added as an additional test in that PR or a later PR (which I can add after #2211 is merged or I can try and add one on that branch).

Also fixed something stupid I found in t/t0005-rexec.t.

@garlick

This comment has been minimized.

Copy link
Member

commented Jul 14, 2019

At the moment I cherry-picked your fix in #2211, where there is a fairly reliable reproducer in travis. One option would be to just leave it there. O/w this should go in first. Either way is fine with me.

test_must_fail 'basic rexec functionality (process fail)' '
${FLUX_BUILD_DIR}/t/rexec/rexec /bin/false
test_expect_success 'basic rexec functionality (process fail)' '
! ${FLUX_BUILD_DIR}/t/rexec/rexec /bin/false

This comment has been minimized.

Copy link
@grondo

grondo Jul 14, 2019

Contributor

technically you want test_must_fail here, or maybe better yet test_expect_code 1 ...

Not a showstopper for a merge though

This comment has been minimized.

Copy link
@chu11

chu11 Jul 15, 2019

Author Contributor

I saw this error message pop up.

./sharness.sh: line 505: basic rexec functionality (process fail): command not found
test_must_fail: command not found: basic rexec functionality (process fail) 
	${FLUX_BUILD_DIR}/t/rexec/rexec /bin/false

It's when I realized, the mistake was I used test_must_fail as the primary uhhhh "expect" test call (i.e. test_expect_success, test_expect_failure). perhaps I should have done

test_expect_success 'basic rexec functionality (process fail)' '
	test_must_fail ${FLUX_BUILD_DIR}/t/rexec/rexec /bin/false

?

This comment has been minimized.

Copy link
@grondo

grondo Jul 15, 2019

Contributor

Yes, that is the suggested use of test_must_fail (and what I meant above. It is not suggested to use ! to negate exit status because it would register success even if the program crashed) Like I said though, not a huge deal.

test_expect_code 1 .. would also work here.

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Jul 15, 2019

At the moment I cherry-picked your fix in #2211, where there is a fairly reliable reproducer in travis. One option would be to just leave it there. O/w this should go in first. Either way is fine with me.

Assuming @grondo is fine with my hack to t0005-rexec above, then we can just close this PR and merge it in via #2211.

@garlick

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

@grondo thinks this looks OK to go in if you're OK with it. Then I'll rebase the shell PR on top and the cherry picked commit will drop out.

@garlick

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

I went ahead and added merge-when-passing to get mergify to do the rebase. Hope that's OK @chu11!

@grondo
grondo approved these changes Jul 15, 2019
Copy link
Contributor

left a comment

LGTM. The use of ! in testing shouldn't hold this one up.

chu11 added 2 commits Jul 12, 2019
In local subprocesses, it was assumed that an empty output buffer
for an exited subprocesses was equivalent to an EOF being generated
on that output stream.

This assumption was invalid.  The assumption was based on the traditional
logic that when you read() zero bytes on a file descriptor, you've
read EOF on a stream.  However, we are reading from a buffer and not a
stream.  While the subprocess has exited, it's possible that more data may
be available for the buffer later.  Since data and child exit status
go through a reactor, it is possible the subprocess exit status could be
seen before some final data is available in the buffer.

Fixes #2228
@SteVwonder SteVwonder force-pushed the chu11:issue2228 branch from 90f0448 to 383144f Jul 15, 2019
@mergify mergify bot merged commit 5116393 into flux-framework:master Jul 15, 2019
3 checks passed
3 checks passed
Rule: rebase and merge when passing all checks (merge) The pull request has been merged automatically
Details
Summary 1 rule matches
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@codecov-io

This comment has been minimized.

Copy link

commented Jul 15, 2019

Codecov Report

Merging #2230 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2230      +/-   ##
==========================================
+ Coverage    80.7%   80.72%   +0.01%     
==========================================
  Files         202      202              
  Lines       32297    32288       -9     
==========================================
- Hits        26066    26064       -2     
+ Misses       6231     6224       -7
Impacted Files Coverage Δ
src/common/libsubprocess/local.c 76.61% <100%> (-0.04%) ⬇️
src/cmd/flux-module.c 83.72% <0%> (-0.24%) ⬇️
src/common/libflux/message.c 81% <0%> (+0.37%) ⬆️
src/common/libflux/mrpc.c 88.97% <0%> (+1.18%) ⬆️
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.