Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job control in scripts is completely broken #119

Closed
McDutchie opened this issue Aug 26, 2020 · 2 comments
Closed

Job control in scripts is completely broken #119

McDutchie opened this issue Aug 26, 2020 · 2 comments
Labels
bug Something is not working

Comments

@McDutchie
Copy link

Very basic job control usage in scripts is broken in scripts on ksh 93u+m, ksh 93u+ 2012-08-01, as well as ksh2020, so also likely to be broken in the 93v- beta. It is also broken on pre-2012 ksh93 versions. It looks like this has simply always been broken in ksh93.

Reproducer (run as script):

set -m  # enable job control
(sleep 1; echo end) &
jobs    # this should show the bg job, but shows "<command unknown>"
fg %1   # this fails on ksh93, so 'sleep' stays in background
kill "$!" 2>/dev/null && echo BUG || echo ok

Output on ksh93:

[1] +  Running                 <command unknown>
BUG

Expected output (as on every other shell):

[1] + Running              ( sleep 1 ; echo end ) 
( sleep 1 ; echo end ) 
end
ok

The reproducer script does work on ksh93 if it is run like ksh -i test.sh. So the fix likely involves locating and removing some incorrect check(s) for an interactive shell.

@McDutchie McDutchie added bug Something is not working blocker This had better be fixed before releasing labels Aug 26, 2020
McDutchie added a commit that referenced this issue Sep 18, 2020
This patch from Red Hat fixes the following:

1. ksh was ignoring the -m (-o monitor) option when specified on
   the invocation command line.

2. Scripts did not properly terminate their background processes
   on Ctrl+C if the -m option was turned off. Reproducer:
	xterm &
	read junk
   When run as a script without turning on -m, pressing Ctrl+C
   should terminate the xterm, and now does.

3. Scripts no longer attempt to set the terminal foreground process
   group ID, as only interactive shells should be doing that.

This makes some progress on #119
but we're a long way from fixing all of that.

src/cmd/ksh93/sh/main.c: exfile():
- On non-interactive shells, do not turn off the monitor option.
  Instead, if it was turned on, turn on the SH_MONITOR state flag.

src/cmd/ksh93/edit/edit.c: ed_getchar():
- On Ctrl+C, issue SIGINT to the current process group using
  killpg(2) instead of going via sh_fault(), which handles a
  signal only for the current shell process.

src/cmd/ksh93/sh/jobs.c: job_reap(), job_reset(),
src/cmd/ksh93/sh/xec.c: sh_exec():
- Only attempt to set the terminal foreground process group ID
  using tcsetpgrp(3) if the shell is interactive.

Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-kshmfix.patch
This was applied to Red Hat's ksh 93u+ on 8 July 2013.
@McDutchie McDutchie removed the blocker This had better be fixed before releasing label Jan 14, 2021
McDutchie added a commit that referenced this issue Feb 11, 2021
Another longstanding whopper of a bug in basic ksh93 functionality:
run a ${ shared-state; } command substitution twice and job control
promptly loses track of all your running jobs. New jobs are tracked
again until you run another two shared-state command substitutions.
This is in at least 93t+, 93u-, 93u+, 93v- and ksh2020.

$ sleep 300 &
[1]	56883
$ jobs						# OK
[1] +  Running                 sleep 300 &
$ v=${ echo hi1; }
$ jobs						# OK
[1] +  Running                 sleep 300 &
$ v=${ echo hi2; }
$ jobs						# Nothing!
$ fg
ksh: fg: no such job

src/cmd/ksh93/sh/subshell.c: sh_subshell():
- The current environment number shp->curenv (a.k.a. sh.curenv) was
  not being restored if the virtual subshell we're leaving is of
  the shared-state command substitution variety as it was wrongly
  considered to be part of the environment that didn't need
  restoring. This caused it to be out of sync with shp->jobenv
  (a.k.a. sh.jobenv) which did get restored from savedcurenv.
  Restore both from savedcurenv at the same time for any subshell.
  (How these numbers are used exactly remains to be discovered.)

src/cmd/ksh93/tests/jobs.sh:
- Added, with a test for this bug to start it off. There is no
  other test script where job control fits, and a lot more related
  fixes are anticipated: #119
@McDutchie
Copy link
Author

McDutchie commented Feb 11, 2021

The <command unknown> issue is because the current code takes the job's command from the history list -- and in scripts of course there is no command history. So that's quite broken.

It also takes the entire command line as the job's command. This assumes two things:

  1. that at most one background job is launched from any particular command line;
  2. that the entire command line is dedicated to that background job.

Both assumptions are bogus, as the following shows:

$ echo test; sleep 20 & sleep 21 &
test
[1]	27153
[2]	27154
$ jobs
[2] +  Running                 echo test; sleep 20 & sleep 21 &
[1] -  Running                 echo test; sleep 20 & sleep 21 &

In the jobs output, you cannot tell which job is which and the echo test doesn't belong there at all.

The fix would be not to use the history list for this purpose. Presumably the parsed command tree should be available somehow, so this could be a new use for sh_deparse() which is currently only used in (disused and probably broken) code for systems that don't support fork(2).

McDutchie added a commit that referenced this issue Feb 12, 2021
If I haven't missed anything, this should make the non-interactive
aspects of job control in scripts work as expected, except for the
"<command unknown>" issue in the output of 'bg', 'fg' and 'jobs'
(which is not such a high priority as those commands are really
designed for interactive use).

Plus, I believe I now finally understand what these three are for:
* The job.jobcontrol variable is set to nonzero by job_init() in
  jobs.c if, and only if, the shell is interactive *and* managed to
  get control of the terminal. Therefore, any changing of terminal
  settings (tcsetpgrp(3), tty_set()) should only be done if
  job.jobcontrol is nonzero. This commit changes several checks for
  sh_isoption(SH_INTERACTIVE) to checks for job.jobcontrol for
  better consistency with this.
* The state flag, sh_isstate(SH_MONITOR), determines whether the
  bits of job control that are relevant for both scripts and
  interactive shells are active, which is mostly making sure that a
  background job gets its own process group (setpgid(3)).
* The shell option, sh_isoption(SH_MONITOR), is just that. When the
  user turns it on or off, the state flag is synched with it. It
  should usually not be directly checked for, as the state may be
  temporarily turned off without turning off the option.

Prior discussion:
https://www.mail-archive.com/austin-group-l@opengroup.org/msg06456.html

src/cmd/ksh93/bltins/typeset.c, src/cmd/ksh93/sh/args.c:
- Move synching the SH_MONITOR state flag with the SH_MONITOR
  shell option from b_set() (the 'set' builtin) to sh_applyopts()
  which is indirectly called from b_set() and is also used when
  parsing the shell invocation command line. This ensures -m is
  properly enabled in both scenarios.

src/cmd/ksh93/sh/jobs.c:
- job_init(): Do not refuse to initialise job control on
  non-interactive shells. Instead, skip everything that should only
  be done on interactive shells (i.e., everything to do with the
  terminal). This function is now even more of a mess than it was
  before, so refactoring may be desirabe at some point.
- job_close(), job_set(), job_reset(), job_wait(): Do not reset the
  terminal process group (tcsetpgrp()) if job.jobcontrol isn't on.

src/cmd/ksh93/sh/xec.c:
- sh_exec(): TFORK: For SIGINT handling, check the SH_MONITOR
  state flag, not the shell option.
- sh_exec(): TFORK: Do not turn off the SH_MONITOR state flag in
  forked children. The non-interactive part of job control should
  stay active. Instead, turn off the SH_INTERACTIVE state flag so
  we don't get interactive shell behaviour (i.e. job control noise
  on the terminal) in forked subshells.
- _sh_fork(), sh_ntfork(): Do not reset the terminal process group
  (tcsetpgrp()) if job.jobcontrol isn't on. Do not turn off the
  SH_MONITOR state flag in forked children.

src/cmd/ksh93/sh/subshell.c: sh_subfork():
- Do not turn off the monitor option and state in forked subshells.
  The non-interactive part of job control should stay active.

src/cmd/ksh93/bltins/misc.c: b_bg():
- Check isstate(SH_MONITOR) instead of sh_isoption(SH_MONITOR) &&
  job.jobcontrol before throwing a 'no job control' error.
  This fixes a minor bug: fg, bg and disown could quietly fail.

src/cmd/ksh93/tests/jobs.sh:
- Add tests for 'fg' with job control IDs (%%, %1) in scripts.
- Add test checking that a background job launched from a subsell
  with job control enabled correctly becomes the leader of its own
  process group.

Makes progress on: #119
@McDutchie
Copy link
Author

As of 41ebb55, job control is not completely broken. As far as I can tell, everything now functions correctly on scripts and interactive shells. The only remaining issue is that the job's command for list purposes is taken from the history list, causing <command unknown> in scripts and inaccurate job lists on interactive shells. That's really more of a cosmetic bug, though. It would be nice to fix it but it's really for a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

1 participant