Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: emit error when running interactive shell without tty #4253

Merged
merged 7 commits into from
Apr 1, 2022

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Mar 31, 2022

While going through some old issues I found one that seemed easy and valuable to fix

This PR has the broker check for a tty on stdin when invoking an interactive shell. If there is no tty the broker exits with an error:

$ flux mini run flux start                                                                              
lt-flux-broker: stdin is not a tty - can't run interactive shell               
flux-job: task(s) exited with exit code 1                                      

This prevents wasting time when you think an interactive job is taking a long time to start, but really you've forgotten -o pty/-o pty.interactive or the --pty option with srun.

There was a little test fallout from this one, and most of my time was actually spent trying to fix Python encoding errors in the runpty test script (as well as working around timing issues getting EOF to the shell in the one case that is required for testing)

Problem: Some tests need to run under a tty but still want stdin
closed after starting, e.g. if running an interactive shell.

If --input=none in runpty.py, then immediately send eof (^D or 0x04)
to the child, which effectively closes stdin. This is as a convenience
over doing the equivalent using an asciicast file as input with
embedded eof.

E.g.:

 runpty.py --input=none cat

will exit instead of hanging.
Problem: Python has issues with utf-8 sequences when LC_ALL=C is set
in the environment (which it is for the testsuite). Even when we're
very careful about encoding data with utf-8 and errors=replace, when
the data is written to stdout/err, Python will senselessly raise an
exception and abort.

Re-open stdout and stderr with encoding="utf-8" and
errors="surrogateescape" to work around this problem, even when the
current LANG is set to C.
Problem: The test 'default initial program in $SHELL' in
t0014-runlevel.t executes the interactive shell without a tty,
which may break in future flux-core versions that issue an error in
this situation.

Run the test under runpty.py to avoid the potential failure. Use the
runpty --input=none option instead of redirecting from /dev/null.

In case of a hang due to problems with sending EOF over the pty, also
execute the test under run_timeout(), adding SHELL=/bin/sh to the
--env option of this program.

Since the runtpy.py script must be able to load the flux Python module,
execute this script under flux(1) so that paths are set up correctly,
otherwise the Python modules may not be found.
Problem: A test in t2212-job-manager-plugins.t runs an interactive
shell.  This works even without a tty since the test expects a failure,
but soon the broker may exit with a different error in this situation,
breaking the test.

Add /bin/true to the arguments in the test so that the shell is no
longer interactive. This doesn't change the semantics of the test,
but will prevent future failures if the broker checks for a tty.
Problem: It is a common folly to run `flux start` via flux-mini run
or foreign RM launcher and forget to allocate a pty to the job. This
can waste time as a user waits for a shell prompt that will never
come.

Since it is almost guaranteed that running an interactive shell without
a pty is a mistake, catch this condition in the broker and emit a useful
error message and exit.

Now, `flux mini run flux start` without `-o pty` or `-o pty.interactive`
will result in the error:

 flux-broker: stdin is not a tty - can't run interactive shell
 flux-job: task(s) exited with exit code 1

Fixes flux-framework#2032
Problem: No tests check that launching an interactive shell without
a tty generates an error.

Add a test to t0014-runlevel.t that ensures launching an interactive
session without a tty results in an error.
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Sorry the tests were a nuisance.

@grondo
Copy link
Contributor Author

grondo commented Apr 1, 2022

Thanks, and no problem!

@mergify mergify bot merged commit d9d451f into flux-framework:master Apr 1, 2022
@grondo grondo deleted the issue#2032 branch April 1, 2022 01:30
@vsoch
Copy link
Member

vsoch commented Aug 19, 2022

Can I run it without interactive? E.g., I'm trying to run flux in a container base on github actions and I'm seeing this error - I can't figure out a way to disable or, or to run flux (and test with snakemake) with needing to start flux.

@grondo
Copy link
Contributor Author

grondo commented Aug 19, 2022

@vsoch: Yes, as discussed offline, this check and error for an interactive tty only occurs when there are no arguments provided to flux start or flux broker and thus an interactive shell is run. This can be a problem if a user doesn't allocate a tty to the instance, then the command appears to hang, when really the shell is just not displaying a prompt.

You can run without a tty by providing any argument to flux start or flux broker, even flux start bash if you really want to start an interactive shell without a tty.

@vsoch
Copy link
Member

vsoch commented Aug 19, 2022

For future us, the workflow that worked wound up being:

      - name: Start Flux and Test Workflow
        run: |
          su fluxuser
          cd examples/flux
          which snakemake
          flux start snakemake --flux --jobs=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants