Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-exec: ensure stdin is restored to blocking mode on exit #1814

Merged
merged 21 commits into from Nov 8, 2018

Conversation

Projects
None yet
3 participants
@garlick
Copy link
Member

garlick commented Nov 8, 2018

This hopefully fixes #1803, a problem mainly seen with bash on Ubuntu based systems, where after running make check, sometimes the next commands run by the shell fail with EAGAIN. The culprit was narrowed down to flux-exec and its handling of the O_NONBLOCK flag on the sometimes inherited stdin file descriptor.

First, several sharness tests that use flux exec are modified so that if stdin is not required, the -n option is used.

Then buffer_read_watcher_create() no longer sets O_NONBLOCK on its file descriptor. Instead, the user must pass it in that way (or fail with EINVAL). Various, users such as in libsubprocess, flux exec, and unit tests, are then modified to set this flag before calling the function. Special care is taken in flux exec save and restore the original flags on stdin.

I'll open a bug to do the same for buffer_write_watcher_create() (no time for that now).

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 8, 2018

Actually it would probably be good to fix both functions in the same PR since they fix a similar issue (doesn't the same problem occur if shared stdout fd is set nonblocking and not reset).

I was also working on a fix for this, but I started by cleaning up all the various implementations of set_nonblocking throughout the code, so actually not much overlap. However, I want to make sure you aren't doing the same before I proceed wasting my time :(

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 8, 2018

Oops! I assigned #1803 to myself before starting but maybe I should have said something. You are more than welcome to go ahead and do whatever (push to this PR, have me withdraw it and cherry pick what you want for a new PR, ask me to finish it, ...)

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 8, 2018

Yeah my fault for not attempting the same (somehow I missed your self assignment, sorry!). I'll just continue my work on top of your PR and push it here (I only mentioned it in case you were already headed in the same direction)

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 8, 2018

Sounds good.

garlick and others added some commits Nov 7, 2018

testsuite: use flux exec with -n when stdin not used
Problem: since flux exec inherits stdin from the shell,
and it nees to put stdin into O_NONBLOCK mode to use
the buffer watchers, there is a risk that the shell's
stdin file descriptor may be left in that mode after
flux exec finishes.

As discussed in #1803, in all tests where flux exec is
used without the need for stdin, add the -n option to
minimize risk of this effect.
cmd/flux-exec: don't open /dev/null if -n
Instead of literally "redirecting from /dev/null" when
-n,--noinput is specified, simply close the read end
of the subprocess buffers immediately.  This avoids
the creating a buffer watcher for a file descriptor
that will immediately return EOF.
cmd/flux-exec: set stdin O_NONBLOCK flag and restore
Set the stdin file descriptor (if used) to O_NONBLOCK
mode, and arrange for it to be restored to its original
mode by an atexit(3) handler.

This prepares for the buffer_read_watcher implementation
to stop setting O_NONBLOCK internally per issue #1803.
libutil/fdutils: add utility functions for fds
Add utility functions for manipulating file descriptor flags,
most notably O_NONBLOCK and FD_CLOEXEC. There are a couple
implementations of these functions throughout the flux-core codebase,
and these functions are just meant as a convenience to collect common
and repeated code.
testsuite: add unit tests for libutil/fdutils
Add unit tests for the functions in libutil/fdutils.
testsuite: libflux: avoid use of fcntl(2) in reactor test
Switch to SOCK_NONBLOCK flag to socketpair instead of using
explicit call to fcntl(2) to set up nonblocking socketpair
for testing.
libflux/reactor: buffer read/write watchers require O_NONBLOCK
Problem: buffer_read/write_watcher_create() sets the O_NONBLOCK
on its file descriptor, but doesn't restore the original setting
when the buffer is destroyed.  When the fd is stdin or stdout and was
inherited from the shell, leaving it in O_NONBLOCK mode can cause
the next programs run by the shell to fail with EAGAIN.

Since the fd is not "owned" by the buffer_read_watcher, but
is required to be in O_NONBLOCK mode, simply check that the
mode is set on entry and leave management of its final state
to the calling program.

If the flag is not set, fail with EINVAL.

Fixes #1803.

Update unit tests.

Co-authored-by: Mark A. Grondona <mark.grondona@gmail.com>
libsubprocess: set O_NONBLOCK on fd
Set fd mode to O_NONBLOCK before calling
flux_buffer_read_watcher_create().
testsuite: set O_NONBLOCK on stdin/out in reactorcat
In test program, set O_NONBLOCK on stdin and stdout fds
prior to calling buffer_read/write_watcher_create() and
arrange for an atexit(3) handler to restore the original
flags on exit.

Co-authored-by: Mark A. Grondona <mark.grondona@gmail.com>
doc/flux-exec(1): reword -n description
Since we no longer literally redirect /dev/null
to stdin when -n is specified, rephrase the option
description so it fits what is actually happening now.
libutil/popen2: fix close-on-exec flag use
Problem: the popen2() call set close-on-exec flag on the parent
side of the socketpair was using F_SETFL and O_CLOEXEC instead of
F_SETFD and FD_CLOEXEC. Thus the intended result was not applied
to the file descriptor and the parent fd was most likely being
leaked into the child process.

Instead, use the new fd_set_cloexec() call, which is simpler, uses
the correct flags, and has a unit test to ensure the correct flag
is applied.
cmd/proxy: update to common fdutils code
Remove duplicated "set_nonblock" code in cmd/builtin/proxy.c in
favor of fd_set_nonblocking() from shared fdutils code.
subprocess/local: update to commonn fdutils code
Use fd_unset_cloexec() and fd_set_nonblocking() from fdutils instead
of calling fcntl(2) directly in libsubprocess/local.c.
libzio: update to common fdutils code
Update use of fcntl(2) to common fdutils code.
connectors/local: update to common libutils code
Update use of fcntl(2) to common fdutils functions.
connectors/ssh: update to common fdutils code
Update uses of fcntl(2) to common fdutils code.
modules/connector-local: update to common fdutils code
Update uses of fcntl(2) to common fdutils functions
libflux/reactor.c: update to common fdutils code
Update uses of fcntl(2) to common fdutils code from libutil.
testsuite: libflux/reactor: update buffer write watcher tests
Update buffered write watcher test to ensure creation returns
EINVAL if output fd isn't set nonblocking, and to set all
fds to nonblocking as necessary for the tests.

@grondo grondo force-pushed the garlick:stdin_nonblock branch from 5a84e64 to 008fdb5 Nov 8, 2018

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 8, 2018

Ok, @garlick, pushed on top of your PR here some related changes I was working on:

  • Various copies of fd_set_nonblock or set_nonbock and other uses of raw fcntl(2) interface were replaced with a new (very simple) libutil/fdutils.c shared convenience functions.

  • Went ahead and made the buffered write watcher match the changes for buffered read watcher you had implemented.

This branch has undergone creative rebase, so hopefully the commits make sense (and work!)

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Nov 8, 2018

Codecov Report

Merging #1814 into master will decrease coverage by <.01%.
The diff coverage is 82.81%.

@@            Coverage Diff             @@
##           master    #1814      +/-   ##
==========================================
- Coverage   79.69%   79.69%   -0.01%     
==========================================
  Files         187      188       +1     
  Lines       34577    34587      +10     
==========================================
+ Hits        27557    27564       +7     
- Misses       7020     7023       +3
Impacted Files Coverage Δ
src/common/libzio/zio.c 74.22% <ø> (-0.29%) ⬇️
src/modules/wreck/luastack.c 41.3% <ø> (ø) ⬆️
src/common/libutil/popen2.c 76% <100%> (-0.24%) ⬇️
src/common/libutil/fdutils.c 100% <100%> (ø)
src/connectors/local/local.c 89.28% <100%> (-0.08%) ⬇️
src/connectors/ssh/ssh.c 85.71% <100%> (-0.07%) ⬇️
src/modules/connector-local/local.c 73.77% <100%> (-0.17%) ⬇️
src/cmd/builtin/proxy.c 74.55% <50%> (-0.2%) ⬇️
src/common/libsubprocess/local.c 75.72% <50%> (-0.56%) ⬇️
src/common/libflux/reactor.c 91.17% <75%> (-0.23%) ⬇️
... and 3 more
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 8, 2018

This turned into a really nice cleanup. Thanks for doing all that!

@garlick garlick merged commit 0e45e34 into flux-framework:master Nov 8, 2018

3 checks passed

codecov/patch 82.81% of diff hit (target 79.69%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +3.11% compared to e7a0be0
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@garlick garlick deleted the garlick:stdin_nonblock branch Nov 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.