Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bin/shtests io leaves processes running #213

Closed
posguy99 opened this issue Mar 10, 2021 · 52 comments · Fixed by #214
Closed

bin/shtests io leaves processes running #213

posguy99 opened this issue Mar 10, 2021 · 52 comments · Fixed by #214
Labels
bug Something is not working

Comments

@posguy99
Copy link

Version AJMv 93u+m/1.0.0-alpha+d4adc8fc 2021-03-09
[427] mint $ cd ksh
/home/mwilson/src/ksh
[428] mint $ ps ax | grep -i ksh | grep -v grep | wc -l
1
[429] mint $ bin/shtests io
#### Regression-testing /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh ####
test io begins at 2021-03-09+18:20:16
test io passed at 2021-03-09+18:20:17 [ 123 tests 0 errors ]
test io(C.UTF-8) begins at 2021-03-09+18:20:17
test io(C.UTF-8) passed at 2021-03-09+18:20:19 [ 123 tests 0 errors ]
test io(shcomp) begins at 2021-03-09+18:20:19
test io(shcomp) passed at 2021-03-09+18:20:20 [ 123 tests 0 errors ]
Total errors: 0
CPU time       user:      system:
main:      0m00.011s    0m00.000s
tests:     0m00.667s    0m00.351s
[430] mint $ ps ax | grep -i ksh | grep -v grep | wc -l
58

Huh? Why are all these processes being left? What are they?

[431] mint $ ps ax | grep -i ksh | grep -v grep
  78454 pts/0    Ss     0:00 ksh
  78724 pts/0    SN     0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh -ic echo >(true) >/dev/null
  78738 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78739 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78740 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78741 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78742 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78743 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78744 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78745 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78746 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78747 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78748 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78749 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78750 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78751 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78752 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78753 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78754 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78755 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78958 pts/0    SN     0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh -ic echo >(true) >/dev/null
  78972 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78973 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78974 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78975 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78976 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78977 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78978 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78979 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78980 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78981 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78982 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78983 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78984 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78985 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78986 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78987 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78988 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  78989 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /home/mwilson/src/ksh/src/cmd/ksh93/tests/io.sh
  79193 pts/0    SN     0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh -ic echo >(true) >/dev/null
  79207 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79208 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79209 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79210 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79211 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79212 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79213 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79214 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79215 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79216 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79217 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79218 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79219 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79220 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79221 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79222 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79223 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh
  79224 pts/0    S      0:00 /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.78477.18846/shcomp-io.ksh

This is happening on Linux. I have a Mint VM, as well as Ubuntu and CentOS. They all do it. They hang around forever.

For once, macOS does not do it. So far, anyway.

@McDutchie
Copy link

McDutchie commented Mar 10, 2021

I'm not having any luck reproducing this. Tried Slackware and CentOS, both x86_64, on VirtualBox VMs. Tried replacing the login shell with ksh. Tried compiling with and without vmalloc. No change, no hanging processes, everything seems normal.

@JohnoKing, @hyenias, any ideas?

@JohnoKing
Copy link

I haven't been able to reproduce the hanging bug. The closest I've gotten is by running ps with a short delay after shtests (although the processes still don't hang around forever):

$ bin/shtests io; sleep .01; ps ax | grep -i ksh | grep -v grep
#### Regression-testing /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh ####
test io begins at 2021-03-09+19:55:49
test io passed at 2021-03-09+19:55:50 [ 123 tests 0 errors ]
test io(C.UTF-8) begins at 2021-03-09+19:55:50
test io(C.UTF-8) passed at 2021-03-09+19:55:51 [ 123 tests 0 errors ]
test io(shcomp) begins at 2021-03-09+19:55:51
test io(shcomp) passed at 2021-03-09+19:55:53 [ 123 tests 0 errors ]
Total errors: 0
CPU time       user:      system:
main:      0m00.005s    0m00.000s
tests:     0m00.401s    0m00.119s
28698 pts/0    SN     0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh -ic echo >(true) >/dev/null
28712 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28713 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28714 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28715 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28716 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28717 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28718 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28719 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28720 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28721 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28722 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28723 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28724 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28725 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28726 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28727 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28728 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
28729 pts/0    S      0:00 /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.27982.32406/shcomp-io.ksh
$ ps ax | grep -i ksh | grep -v grep | wc -l
1
$ ps ax | grep -i ksh | grep -v grep        
29257 pts/0    Ss     0:00 arch/linux.i386-64/bin/ksh

@posguy99
Copy link
Author

I'm open to anything about how to test for this. If it's the OS rather than ksh, wtf about the OS could it be?

All three VM's have ksh u+m as their login shell, all use the same rc files. That shouldn't matter, but I put it out there as data.

These are Fusion VM's.

A fresh checkout of the repo does this.

It's additive, each time shtests io runs, more processes are added.

I'm all the way back to d89ef0f and haven't found it to stop happening. I had not been running u+m on Linux at all until very recently, I was using 93u+ 20120801 on CentOS. I only noticed it because the tty assigned to the shell was increasing when I killed the terminal window and launched a new one.

@posguy99
Copy link
Author

I haven't been able to reproduce the hanging bug. The closest I've gotten is by running ps with a short delay after shtests (although the processes still don't hang around forever):

The only ones that hang around for you are the shcomp tests, for me it's all three posix/utf8/shcomp.

I let it sit for ten minutes, they're still there.

Version AJMv 93u+m/1.0.0-alpha+d4adc8fc 2021-03-09
[ 8:14 PM][0 +1][~]
[515] mint $ cd ksh
/home/mwilson/src/ksh
[ 8:14 PM][0 +1][~/src/ksh][master@d4adc8f]
[516] mint $ bin/shtests io 
#### Regression-testing /home/mwilson/src/ksh/arch/linux.i386-64/bin/ksh ####
test io begins at 2021-03-09+20:14:43
test io passed at 2021-03-09+20:14:44 [ 123 tests 0 errors ]
test io(C.UTF-8) begins at 2021-03-09+20:14:44
test io(C.UTF-8) passed at 2021-03-09+20:14:46 [ 123 tests 0 errors ]
test io(shcomp) begins at 2021-03-09+20:14:46
test io(shcomp) passed at 2021-03-09+20:14:47 [ 123 tests 0 errors ]
Total errors: 0
CPU time       user:      system:
main:      0m00.004s    0m00.004s
tests:     0m00.645s    0m00.295s
[ 8:14 PM][0 +1][~/src/ksh][master@d4adc8f]
[517] mint $ ps ax | grep -i ksh | grep -v grep | wc -l
58
[ 8:14 PM][0 +1][~/src/ksh][master@d4adc8f]
[518] mint $ 
[ 8:24 PM][0 +1][~/src/ksh][master@d4adc8f]
[518] mint $ ps ax | grep -i ksh | grep -v grep | wc -l
58
[ 8:24 PM][0 +1][~/src/ksh][master@d4adc8f]
[519] mint $

@posguy99
Copy link
Author

posguy99 commented Mar 10, 2021

Determined something else.

If I ssh into the VM, and do the tests, the processes do not appear to hang. I get the same behavior @JohnoKing does.

But running the test in gnome-terminal (apparently on all three of them, it's something called gnome-terminal-server, I don't know what's actually presenting the UI, it calls itself gnome-terminal in About, but the -server process is all I see) leads to the io tests hanging. I have zero idea how Gnome does things these days.

If I run the test in a simple xterm, the presentation is the same as in #213 (comment) again. Momentary processes, then they're gone.

@posguy99
Copy link
Author

posguy99 commented Mar 10, 2021

(sigh) Of course, /usr/bin/gnome-terminal is a python script. Of course it is. If Gnome can find a way to make something more baroque and incomprehensible, they will certainly do so.

I wonder what problem they think they're fixing with this client/server terminal architecture.

I suppose yes, this issue is a problem. Is it a ksh problem? Perhaps not. A Gio problem? Likely, but Gnome won't care.

Interesting: https://eklitzke.org/gnome-terminal-server

@avih
Copy link

avih commented Mar 10, 2021

As far as I know urxvt and mlterm are unrelated to gnome or VTE, and both can/do also run as client/server, though I don't have much experience with them and I don't know under which conditions. Perhaps it's worth trying them too.

@JohnoKing
Copy link

I've managed to reproduce the bug in an Ubuntu 20.04 virtual machine using gnome-terminal, urxvt and xterm:

$ bin/shtests io
#### Regression-testing /home/johno/ksh/arch/linux.i386-64/bin/ksh ####
test io begins at 2021-03-09+22:31:37
test io passed at 2021-03-09+22:31:38 [ 123 tests 0 errors ]
test io(C.UTF-8) begins at 2021-03-09+22:31:38
test io(C.UTF-8) passed at 2021-03-09+22:31:40 [ 123 tests 0 errors ]
test io(shcomp) begins at 2021-03-09+22:31:40
test io(shcomp) passed at 2021-03-09+22:31:41 [ 123 tests 0 errors ]
Total errors: 0
CPU time       user:      system:
main:      0m00.013s    0m00.000s
tests:     0m00.689s    0m00.292s
$ ps -ax | grep ksh | grep -v grep | wc -l
58
$ ps -ax | grep ksh | grep -v grep        
   1307 pts/0    S      0:00 arch/linux.i386-64/bin/ksh
   1560 pts/0    SN     0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh -ic echo >(true) >/dev/null
   1574 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1575 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1576 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1577 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1578 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1579 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1580 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1581 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1582 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1583 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1584 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1585 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1586 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1587 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1588 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1589 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1590 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1591 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1794 pts/0    SN     0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh -ic echo >(true) >/dev/null
   1808 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1809 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1810 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1811 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1812 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1813 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1814 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1815 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1816 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1817 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1818 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1819 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1820 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1821 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1822 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1823 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1824 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   1825 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /home/johno/ksh/src/cmd/ksh93/tests/io.sh
   2029 pts/0    SN     0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh -ic echo >(true) >/dev/null
   2043 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2044 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2045 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2046 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2047 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2048 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2049 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2050 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2051 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2052 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2053 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2054 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2055 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2056 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2057 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2058 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2059 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh
   2060 pts/0    S      0:00 /home/johno/ksh/arch/linux.i386-64/bin/ksh /tmp/ksh93.shtests.1313.26109/shcomp-io.ksh

The bug does not occur in ksh93u+ but it can be reproduced in ksh93u+m. I couldn't get the bug to occur on bare metal.

@JohnoKing
Copy link

This bug was introduced in commit ab5dedd. I can't get it to occur in the latest commit if SHOPT_DEVFD is enabled by removing these two lines:

#undef SHOPT_DEVFD
#define SHOPT_DEVFD 0

Also of note is that the bug behaves differently in ab5dedd. bin/shtests io locks up entirely when run in an Ubuntu VM and needs to be stopped with SIGINT. Commit 970069a changed the complete lockup to the current behavior.

@JohnoKing
Copy link

Using a process substitution triggers the bug:

$ cat ./procsub.sh
func() {
    true
}
/bin/true <(true) >(true)
true <(true) >(true)
func <(true) >(true)
ps ax | grep ksh | grep -v grep | wc -l

$ arch/*/bin/ksh ./procsub.sh
8
$ arch/*/bin/ksh ./procsub.sh
14
$ arch/*/bin/ksh ./procsub.sh
20
$ arch/*/bin/ksh ./procsub.sh
26

@McDutchie
Copy link

McDutchie commented Mar 10, 2021

Great find. So, this is likely the same bug as #67, just manifesting differently. I thought not using SHOPT_DEVFD was an effective workaround, but apparently this only masks the bug. That's not good news. I've done many attempts to trace #67 and they've all gone nowhere.

@JohnoKing
Copy link

JohnoKing commented Mar 10, 2021

After running strace -f arch/*/bin/ksh bin/shtests io, I found the orphaned processes from the process substitutions are running in loops repeating the following syscalls:

[pid 64493] setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=499643}}, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}) = 0
[pid 64493] getppid()                   = 658
[pid 64493] rt_sigreturn({mask=[]})     = -1 EINTR (Interrupted system call)
[pid 64493] openat(AT_FDCWD, "/tmp/ksh.f1cd32g", O_WRONLY <unfinished ...>
[pid 64494] <... openat resumed>)       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 64494] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 64494] alarm(0)                    = 0
[pid 64494] rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
[pid 64494] rt_sigaction(SIGALRM, {sa_handler=0x55796b9a663a, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fd09d0b0210}, {sa_handler=0x55796b9a663a, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fd09d0b0210}, 8) = 0

The repeated getppid shows the fifo_check function is being called repeatedly:

static void fifo_check(void *handle)
{
Shell_t *shp = (Shell_t*)handle;
pid_t pid = getppid();
if(pid==1)
{
unlink(shp->fifo);
sh_done(shp,0);
}
}

fifo_check should remove the FIFO when init (PID 1) is the parent process of the process substitution. Normally, this prevents infinite loops as once the parent ksh script closes, PID 1 will adopt the orphaned process (from the process substitution) and cause getppid to return 1. On the Ubuntu VM this isn't the case; the orphaned process is adopted by a systemd process with PID 658. Since the PPID is never 1, the orphaned process loops:

$ ps -eH | grep 'systemd\|ksh'   
      1 ?        00:00:00 systemd
    222 ?        00:00:00   systemd-journal
    244 ?        00:00:00   systemd-udevd
    485 ?        00:00:00   systemd-resolve
    486 ?        00:00:00   systemd-timesyn
    560 ?        00:00:00   systemd-logind
    658 ?        00:00:00   systemd
  69150 pts/1    00:00:00         ksh93
  69116 pts/1    00:00:00     ksh
  69117 pts/1    00:00:00     ksh
  69119 pts/1    00:00:00     ksh
  69120 pts/1    00:00:00     ksh
  69121 pts/1    00:00:00     ksh
  69122 pts/1    00:00:00     ksh
  69128 pts/1    00:00:00     ksh
  69129 pts/1    00:00:00     ksh
  69131 pts/1    00:00:00     ksh
  69132 pts/1    00:00:00     ksh
  69133 pts/1    00:00:00     ksh
  69134 pts/1    00:00:00     ksh

In my case, if I modify the following check to look for PID 658 the bug goes away in my Ubuntu VM:

if(pid==1)

This indicates fifo_check shouldn't be checking for PID 1, but rather should use a better method of determining when the parent script closes.

@JohnoKing
Copy link

JohnoKing commented Mar 10, 2021

I can't reproduce the bug using the patch below. @posguy99 does this diff fix the bug on your end? (edit: diff updated multiple times):

diff --git a/src/cmd/ksh93/include/defs.h b/src/cmd/ksh93/include/defs.h
index 41b1729..f1b4310 100644
--- a/src/cmd/ksh93/include/defs.h
+++ b/src/cmd/ksh93/include/defs.h
@@ -138,6 +138,7 @@ struct shared
 	pid_t		pid;		/* $$, the main shell's PID (invariable) */
 	pid_t		ppid;		/* $PPID, the main shell's parent's PID */
 	pid_t		current_pid;	/* ${.sh.pid}, PID of current ksh process (updates when subshell forks) */
+	pid_t		parent_pid;	/* PID of the parent shell or subshell (used in process substitutions) */
 	int		realsubshell;	/* ${.sh.subshell}, actual subshell level (including virtual and forked) */
 	unsigned char	sigruntime[2];
 	Namval_t	*bltin_nodes;
diff --git a/src/cmd/ksh93/sh/init.c b/src/cmd/ksh93/sh/init.c
index bc226e9..5d5c3d2 100644
--- a/src/cmd/ksh93/sh/init.c
+++ b/src/cmd/ksh93/sh/init.c
@@ -1238,7 +1238,7 @@ Shell_t *sh_init(register int argc,register char *argv[], Shinit_f userinit)
 		sh_regress_init(shp);
 #endif
 		shgd = sh_newof(0,struct shared,1,0);
-		shgd->current_pid = shgd->pid = getpid();
+		shgd->parent_pid = shgd->current_pid = shgd->pid = getpid();
 		shgd->ppid = getppid();
 		shgd->userid=getuid();
 		shgd->euserid=geteuid();
@@ -1634,7 +1634,7 @@ int sh_reinit(char *argv[])
 	shp->errtrap = 0;
 	shp->end_fn = 0;
 	/* update ${.sh.pid}, $$, $PPID */
-	shgd->current_pid = shgd->pid = getpid();
+	shgd->parent_pid = shgd->current_pid = shgd->pid = getpid();
 	shgd->ppid = getppid();
 	return(1);
 }
diff --git a/src/cmd/ksh93/sh/subshell.c b/src/cmd/ksh93/sh/subshell.c
index 28d1c40..ef59cbc 100644
--- a/src/cmd/ksh93/sh/subshell.c
+++ b/src/cmd/ksh93/sh/subshell.c
@@ -215,6 +215,7 @@ void sh_subfork(void)
 		sp->subpid=0;
 		shp->st.trapcom[0] = (comsub==2 ? NULL : trap);
 		shp->savesig = 0;
+		shgd->parent_pid = shgd->current_pid;
 		/* sh_fork() increases ${.sh.subshell} but we forked an existing virtual subshell, so undo */
 		shgd->realsubshell--;
 	}
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index c116e34..a11d6d9 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -85,8 +85,7 @@ struct funenv
     static void fifo_check(void *handle)
     {
 	Shell_t	*shp = (Shell_t*)handle;
-	pid_t pid = getppid();
-	if(pid==1)
+	if(kill(shp->gd->parent_pid,0) < 0)
 	{
 		unlink(shp->fifo);
 		sh_done(shp,0);
@@ -1894,7 +1893,7 @@ int sh_exec(register const Shnode_t *t, int flags)
 					_sh_fork(shp,pid,0,0);
 				if(pid==0)
 				{
-					shgd->current_pid = getpid();
+					shgd->parent_pid = shgd->current_pid = getpid();
 					sh_exec(t->par.partre,flags);
 					shp->st.trapcom[0]=0;
 					sh_done(shp,0);
diff --git a/src/cmd/ksh93/tests/io.sh b/src/cmd/ksh93/tests/io.sh
index ace60fe..c900b4f 100755
--- a/src/cmd/ksh93/tests/io.sh
+++ b/src/cmd/ksh93/tests/io.sh
@@ -715,5 +715,18 @@ got=$(export tmp; "$SHELL" -ec \
 (redirect {v}>$tmp/v.out; echo ok2 >&$v) 2>/dev/null
 [[ -r $tmp/v.out && $(<$tmp/v.out) == ok2 ]] || err_exit 'redirect {varname}>file not working in a subshell'
 
+# ======
+# Test for process substitution infinite looping
+procsub_pid="$(
+	ulimit -t unlimited  # fork the subshell
+	true >(true)
+	echo "$!"
+)"
+sleep 2 # wait for process to close (long wait to avoid false failures)
+
+kill -0 $procsub_pid 2> /dev/null &&
+	kill -KILL $procsub_pid && # don't leave around what is effectively a zombie process
+	err_exit "process substitutions loop after parent shell finishes"
+
 # ======
 exit $((Errors<125?Errors:125))

@McDutchie
Copy link

McDutchie commented Mar 10, 2021

Thank you for tracing that. Glad it's not the same bug as #67 after all. Instead, it's systemd breaking UNIX. No big shocker there…

Note that shp->gd->pid (a.k.a. shgd->pid) is $$, the main shell's PID. The main shell is not necessarily the parent shell of the process substitutions, because subshells that have forked can also fork process substitutions. The check in the patch is not valid in that case. (edit: that comment was to a previous, edited-out version of the patch)

JohnoKing added a commit to JohnoKing/ksh that referenced this issue Mar 10, 2021
src/cmd/ksh93/sh/xec.c:
- An infinite loop may occur if an orphaned process from a process
  substitution isn't adopted by PID 1. Rather than close the FIFO
  when the parent process is init, check for the existence of the
  the parent shell or parent forked subshell by using kill(2).

src/cmd/ksh93/include/defs.h,
src/cmd/ksh93/sh/init.c,
src/cmd/ksh93/sh/subshell.c:
- Add a new 'parent_pid' variable to store the correct PPID. The
  PPID cannot be obtained after creating the process substitution
  (as that causes intermittent looping) and 'shgd->current_pid'
  can't be used as it's the process substitution's PID. As it's
  only used for FIFO process substitutions, hide it behind
  !SHOPT_DEVFD.

src/cmd/ksh93/tests/io.sh:
- Test for the infinite looping bug by using a forked
  command substitution.

Fixes: ksh93#213
@posguy99
Copy link
Author

posguy99 commented Mar 10, 2021

The check in the patch is not valid in that case.

Given this comment, is the patch under test the one in #213 (comment) , or some other patch? I want to test the right one. :)

Edit: tested the patch in #213 (comment), the issue does not occur on Mint.

@JohnoKing
Copy link

The original patch I posted used shp->gd->pid to avoid the bug. The one under #213 (comment) uses shgd->parent_pid instead, which is valid for forked subshells.

@posguy99
Copy link
Author

Update: tested the patch from #213 (comment) in all three Linux VMs.

Mint, Ubuntu, CentOS... the issue does not occur in any of them.

@posguy99
Copy link
Author

posguy99 commented Mar 10, 2021

Instead, it's systemd breaking UNIX. No big shocker there…

Come on @McDutchie , aren't you a believer in LennartOS?

@hyenias
Copy link

hyenias commented Mar 10, 2021

Well, I went public web surfing looking up what a process substitution was and found the following test describing an issue returning spaces before and after its output which messed up calls to tar due to the extra spaces around it. I regret that I cannot find where I read it from as my history and further attempts to find it have failed. The following test creates an orphaned process 100% of the time on multiple terminal clients tried whether bare metal or ssh'ed across multiple boxes: Ubuntu 20.04, macOS, and FreeBSD 12.2.

$ echo $KSH_VERSION
Version AJMv 93u+m/1.0.0-alpha+37c9ac27 2021-03-01
$ echo foo<(:)bar
foo /tmp/ksh.f060mj5 bar
$ jobs -l
[1] + 109506	 Running                 echo foo<(:)bar
$ kill 109506
$ jobs -l
$ exit
~ $ /bin/ksh
$ echo $KSH_VERSION
Version AJM 93u+ 2012-08-01
$ echo foo<(:)bar
foo /dev/fd/4 bar
$ jobs -l
$ 

I will now work on updating and testing current #214.

@posguy99
Copy link
Author

Confirmed, even with #213 (comment) applied, the orphan process is created.

@JohnoKing
Copy link

JohnoKing commented Mar 10, 2021

Confirmed as well. The fix I applied isn't correct as it causes wait in the main shell to freeze after a process substitution:

$ true <(true)
$ wait
# Freezes

The patch below takes a different approach. Rather than wait for the parent shell to close, maybe the check itself is unnecessary (edit: the check is necessary, see #213 (comment)):

diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index c116e341..19f68bd3 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -85,12 +85,8 @@ struct funenv
     static void fifo_check(void *handle)
     {
 	Shell_t	*shp = (Shell_t*)handle;
-	pid_t pid = getppid();
-	if(pid==1)
-	{
-		unlink(shp->fifo);
-		sh_done(shp,0);
-	}
+	unlink(shp->fifo);
+	sh_done(shp,0);
     }
 #endif /* !SHOPT_DEVFD */
 

This fixes the wait freeze, fixes the orphaned processes (at least from my own testing) and doesn't have regression test failures. Note that it doesn't fix the issue with spaces detailed in att#1487.

@McDutchie
Copy link

McDutchie commented Mar 10, 2021

I was coming to the same conclusion. I'm not a fan of adding the extra global flag for this specific issue, either.

There should be no need for any timer or check. Once the system has a file descriptor handle on a file, the file can be unlinked and it will remain open and working until it is closed. That is fundamental to all UNIX-like systems. The same is true for a FIFO, which is not even really a file -- it's just a pipe with a directory entry in the file system. Once open(2) has succeeded, the directory entry is superfluous and the FIFO can be unlinked.

I think the whole fifo_check() function is unnecessary. Perhaps it was once needed to work around bugs on long-dead systems. It should not be needed now, if it ever was.

Also, there was no check for whether sh_open (which calls open(2)) succeeded. That should be added.

Please try this patch. It doesn't break anything on my end. (edit: never mind, wait breaks.)

diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index c116e341..14d4f59e 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -81,19 +81,6 @@ struct funenv
 
 /* ========	command execution	========*/
 
-#if !SHOPT_DEVFD
-    static void fifo_check(void *handle)
-    {
-	Shell_t	*shp = (Shell_t*)handle;
-	pid_t pid = getppid();
-	if(pid==1)
-	{
-		unlink(shp->fifo);
-		sh_done(shp,0);
-	}
-    }
-#endif /* !SHOPT_DEVFD */
-
 #if _lib_getrusage
 /* getrusage tends to have higher precision */
 static void get_cpu_times(Shell_t *shp, struct timeval *tv_usr, struct timeval *tv_sys)
@@ -1704,16 +1691,20 @@ int sh_exec(register const Shnode_t *t, int flags)
 #if !SHOPT_DEVFD
 				if(shp->fifo && (type&(FPIN|FPOU)))
 				{
-					int	fn,fd = (type&FPIN)?0:1;
-					void	*fifo_timer=sh_timeradd(500,1,fifo_check,(void*)shp);
+					int	fn, fd, save_errno;
+					fd = (type & FPIN) ? 0 : 1;
 					fn = sh_open(shp->fifo,fd?O_WRONLY:O_RDONLY);
-					timerdel(fifo_timer);
-					sh_iorenumber(shp,fn,fd);
-					sh_close(fn);
-					sh_delay(.001,0);
+					save_errno = errno;
 					unlink(shp->fifo);
 					free(shp->fifo);
 					shp->fifo = 0;
+					if(fn<0)
+					{
+						errno = save_errno;
+						errormsg(SH_DICT,ERROR_SYSTEM|ERROR_PANIC,"process substitution: FIFO open failed");
+					}
+					sh_iorenumber(shp,fn,fd);
+					sh_close(fn);
 					type &= ~(FPIN|FPOU);
 				}
 #endif /* !SHOPT_DEVFD */

@McDutchie
Copy link

McDutchie commented Mar 10, 2021

By the way, not that it matters if we are getting rid of fifo_check(), but the systemd behaviour of reverting to a parent-PID other than 1 is actually allowed by POSIX, as it turns out. It is certainly not traditional UNIX behaviour though.

@JohnoKing
Copy link

That patch causes the regression tests added in #214 to fail:

$ bin/shtests io
#### Regression-testing /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh ####
test io begins at 2021-03-10+14:13:05
/home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/tests/io.sh: line 722: panic: process substitution: FIFO open failed [Interrupted system call]
	io.sh[729]: process substitutions loop infinitely after parent shell finishes
	io.sh[742]: process substitutions freeze parent shell after running 'wait'
test io failed at 2021-03-10+14:13:09 with exit code 2 [ 125 tests 2 errors ]
test io(C.UTF-8) begins at 2021-03-10+14:13:09
/home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/tests/io.sh: line 722: panic: process substitution: FIFO open failed [Interrupted system call]
	io.sh[729]: process substitutions loop infinitely after parent shell finishes
	io.sh[742]: process substitutions freeze parent shell after running 'wait'
test io(C.UTF-8) failed at 2021-03-10+14:13:14 with exit code 2 [ 125 tests 2 errors ]
test io(shcomp) begins at 2021-03-10+14:13:14
/tmp/ksh93.shtests.25134.15198/shcomp-io.ksh: line 722: panic: process substitution: FIFO open failed [Interrupted system call]
	shcomp-io.ksh[729]: process substitutions loop infinitely after parent shell finishes
	shcomp-io.ksh[742]: process substitutions freeze parent shell after running 'wait'
test io(shcomp) failed at 2021-03-10+14:13:18 with exit code 2 [ 125 tests 2 errors ]
Total errors: 6

@McDutchie
Copy link

Confirmed as well. The fix I applied isn't correct as it causes wait in the main shell to freeze after a process substitution:

$ true <(true)
$ wait
# Freezes

Hmm. It does that with my patch, as well. Darn.

@McDutchie
Copy link

The error check I added does suggest the reason for the freeze:

$ wait
^Carch/darwin.i386-64/bin/ksh: panic: process substitution: FIFO open failed [Interrupted system call]

@McDutchie
Copy link

I think @JohnoKing's patch is correct. Works for me. However, I'd also like to add the error check to sh_open().

JohnoKing added a commit to JohnoKing/ksh that referenced this issue Mar 10, 2021
@hyenias
Copy link

hyenias commented Mar 10, 2021

As you all have already confirmed, it works as far as I know as well. That's two fixes for the price of one--excellent. BTW, I used the following method to acquire an updated patch file to use with git apply just by appending .diff to the end of the PR#: https://github.com/ksh93/ksh/pull/214.diff.

@McDutchie McDutchie added the bug Something is not working label Mar 10, 2021
@McDutchie
Copy link

The error check I added does suggest the reason for the freeze:

$ wait
^Carch/darwin.i386-64/bin/ksh: panic: process substitution: FIFO open failed [Interrupted system call]

sh_open() blocks infinitely because the command true does not use a file name argument, so does not ever read from the FIFO. So that is what the timer is for. Makes sense now.

@McDutchie
Copy link

McDutchie commented Mar 10, 2021

I found a way to break the current PR. Reproducer:

$ fn() { sleep 1; cat "$1"; }
$ fn <(echo hi)
cat: /var/folders/nx/k3vcpmhj2mv05p9mhrwf8_9c0000gr/T/ksh.f2tqq6n: No such file or directory

(expected output: hi after one second)

Because the parent PID check was removed, the FIFO was deleted after 500ms (half a second), so the function cannot access it after sleeping for one second.

Back to the drawing board. Sorry.

@McDutchie
Copy link

McDutchie commented Mar 10, 2021

OK, stepping back and analysing the situation... the failed attempts have given us useful info.

With my more radical patch, that removes the whole timer altogether, the reproducer above works fine. Which is not surprising because sh_open() will block and wait forever for the command to open the FIFO for reading.

And waiting forever is normally what we want, if the command actually uses the FIFO file name argument and opens it. It is allowed to take an indeterminate amount of time to get around to doing that, though. And it may not ever do it at all, such as when passing it as an argument to true. This is why we need all of the original code, including the check every 500ms to see if the process that is supposed to open the FIFO at the other end still exists.

The problem is, fifo_check() is not POSIX compliant because, to determine if the parent process still exists, it checks if the parent process ID (PPID) has changed to 1, which is what Unix systems traditionally do. But POSIX says the PPID can change to any implementation-defined system process.

Instead, then, maybe it will work to:

  • restore all of the original code
  • get and save the original PPID before calling sh_open()
  • add that as a new argument to fifo_check, and make that function check if the current PPID has changed from that to whatever else, instead of 1.

I think this should make it work while avoiding a global flag.

Thoughts?

@McDutchie
Copy link

add that as a new argument to fifo_check, and make that function check if the current PPID has changed from that to whatever else, instead of 1.

Ah yes, of course we cannot add a new argument to fifo_check, because it is passed as a function pointer to sh_timer() which calls it with one argument. But we can use a global static variable in xec.c instead.

@JohnoKing
Copy link

JohnoKing commented Mar 11, 2021

get and save the original PPID before calling sh_open()

I tried that already and it doesn't work correctly. It reintroduces the original bug (#213 (comment)) if placed right before sh_open, but makes the bug occur with less. It also doesn't fix #213 (comment):

$ bin/shtests io
#### Regression-testing /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh ####
test io begins at 2021-03-10+16:04:22
	io.sh[743]: process substitutions freeze parent shell after running 'wait'
test io failed at 2021-03-10+16:04:25 with exit code 1 [ 125 tests 1 error ]
test io(C.UTF-8) begins at 2021-03-10+16:04:25
	io.sh[743]: process substitutions freeze parent shell after running 'wait'
test io(C.UTF-8) failed at 2021-03-10+16:04:28 with exit code 1 [ 125 tests 1 error ]
test io(shcomp) begins at 2021-03-10+16:04:28
	shcomp-io.ksh[743]: process substitutions freeze parent shell after running 'wait'
test io(shcomp) failed at 2021-03-10+16:04:32 with exit code 1 [ 125 tests 1 error ]
Total errors: 3

@McDutchie
Copy link

McDutchie commented Mar 11, 2021

I tried that already and it doesn't work correctly. It reintroduces the original bug (#213 (comment)) if placed right before sh_open, but makes the bug occur with less. It also doesn't fix #213 (comment):

Confirmed.

Turns out this is not actually a new problem. It was not introduced by your original patch but was there ever since ksh 93u+m switched to the FIFO method in ab5dedd. So it's an original bug in that method. Of course it should still be fixed somehow, but at this point I don't know how to go about that.

@JohnoKing
Copy link

JohnoKing commented Mar 11, 2021

In the master branch FIFOs stay open after being opened by a process substitution. Reading from the FIFO closes it:

$ echo <(true)
/tmp/ksh.f2ikp4r
$ [ -e /tmp/ksh.f2ikp4r ]; echo $?
0
$ cat /tmp/ksh.f2ikp4r
$ [ -e /tmp/ksh.f2ikp4r ]; echo $?
1

Perhaps this bug could be fixed by removing the FIFO when it's no longer needed (i.e., when the command finishes running). fifo_check could then be modified to check for the existence of the FIFO and stop the process substitution when it's removed.

@JohnoKing
Copy link

get and save the original PPID before calling sh_open()

I'll note this is can be done without reintroducing #213 (comment) (although it can't fix the wait hangup). To avoid intermittent test failures the correct PPID can be set before the process substitution is forked. This is what I did in my patch that uses a shgd->parentpid variable.

@McDutchie
Copy link

I'll note this is can be done without reintroducing #213 (comment) (although it can't fix the wait hangup). To avoid intermittent test failures the correct PPID can be set before the process substitution is forked.

That would be progress, at least.

I've pushed a commit to the PR that I believe implements this in a more specific manner. The io.sh[743] wait hang test still fails as expected (i.e. the original 93u+ bug in the FIFO method) but everything else seems to work. Can you confirm?

@JohnoKing
Copy link

Everything is working correctly in the new commit (besides the wait regression test).

JohnoKing added a commit to JohnoKing/ksh that referenced this issue Mar 11, 2021
@McDutchie
Copy link

McDutchie commented Mar 11, 2021

When bash uses the FIFO method and the command doesn't use the filename argument passed by the command substitution, it hangs, too. This is on AIX, which does not have /dev/fd so bash falls back to using FIFOs:

0[aix7]home/m/md $ uname -a
AIX aix7 1 7 000ACFDE4C00
0[aix7]home/m/md $ bash
bash-5.0$ true <(true)
bash-5.0$ wait                       <== it hangs here
^C
bash: cannot open named pipe /tmp//sh-np.XZagya for writing: A system call received an interrupt.

@McDutchie
Copy link

I suspect the remaining regression is either not feasible to fix or not worth bothering with, as it is inherent to using FIFOs. If the parent process never uses the FIFO and never terminates either, then the child is going to block on opening the FIFO forever -- that is just how FIFOs work. Trying to get around that might get very hairy, particularly as you can pass any number of process substitutions to one command.

I'm going to sleep on it and come back with my thoughts tomorrow. The original bug is now fixed, so tomorrow I might decide to merge the PR minus the failing test.

@hlangeveld
Copy link

By the way, not that it matters if we are getting rid of fifo_check(), but the systemd behaviour of reverting to a parent-PID other than 1 is actually allowed by POSIX, as it turns out. It is certainly not traditional UNIX behaviour though.

I remember similar issues with the Solaris SMF which allows for different 'restarter' processes that take over part of the old job of init.

@McDutchie
Copy link

I may have been too pessimistic in my message yesterday. It was in fact rather hairy, but I think I've got something working that properly stops unused process substitutions from lingering around. I've added another commit to the PR. See the commit message for the explanation of the new code. Please test, everyone.

@McDutchie
Copy link

@hyenias wrote:

Well, I went public web surfing looking up what a process substitution was and found the following test describing an issue returning spaces before and after its output which messed up calls to tar due to the extra spaces around it. […]

With the latest commit added to PR #214, the reproducer in your comment no longer leaves a lingering process. Can you confirm?

As for the added spaces, that is indeed a difference from bash and zsh, but I'm not sure if it should be called a bug or not. Process substitutions are replaced by file names and it generally doesn't make much sense to concatenate those with other strings. OTOH I could also see an argument that if there isn't a space, ksh has no business inserting one.

@hyenias
Copy link

hyenias commented Mar 11, 2021

@JohnoKing found that reproducer I used at att#1487. I thought its case made sense to me but I have zero experience with process substitution. I go test now.

@McDutchie
Copy link

McDutchie commented Mar 11, 2021

@JohnoKing found that reproducer I used at att#1487. I thought its case made sense to me but I have zero experience with process substitution. I go test now.

It does make sense actually. What happens is not exactly that a space is inserted but that <(...) and >(...) always expand to separate arguments, even if not surrounded by whitespace. And that is incompatible with GNU-style long options that append the option-argument to the option string itself. However, that problem is for a different issue -- this one is long and complicated enough as it is.

edit: Now filed as #215.

@hyenias
Copy link

hyenias commented Mar 11, 2021

No trouble with any of my attempts including using the echo foo<(:)bar. I was attempting to somehow exceed the imposed shorten 50ms timer when running commands remotely from another machine via WiFi or using a local nfs share. Nothing failed for me on my attempts but I do not have a remote network to test with.

I was wondering what could make that 50ms timer to be exceeded during the process substitution launch before it uses the FIFO. Perhaps network latency and/or remote program startup? Maybe launching a larger program than just a little /bin/* command perhaps a python3 script or a java app just might take longer than 50ms to respond. If so, perhaps if that hard coded 50ms could be tunable via a variable would allow some more latency/startup cushion in a particular case.

@McDutchie
Copy link

The timer is not a timeout, it's the interval for the check if the child process forked by the process substitution should still be waiting for the parent to open the pipe. It does that by checking if the parent process still exists. That check was done every 500ms, it's now done every 50ms. I shortened the interval because computers are 1000s of times faster than when this was written and still won't even register the overhead on the radar, and it leaves any hanging process open for that much shorter.

@hyenias
Copy link

hyenias commented Mar 11, 2021

Thank you for the clarification. And with that, everything seems good to me.

@JohnoKing
Copy link

I've tested the latest commit and everything is working on my end. The wait builtin no longer causes a hangup after true <(true) and processes aren't lingering after bin/shtests io.

@posguy99
Copy link
Author

posguy99 commented Mar 12, 2021

I've tested here on Catalina, which doesn't break anything, and on Mint, which is no longer broken when running shtests io.

[ 5:59 PM][0 +1][~/src/ksh-213][master@d4adc8f]
[601] mint $ ps ax | grep ksh | grep -i grep | wc -l
1
[ 5:59 PM][0 +1][~/src/ksh-213][master@d4adc8f]
[602] mint $ bin/shtests io
#### Regression-testing /home/mwilson/src/ksh-213/arch/linux.i386-64/bin/ksh ####
test io begins at 2021-03-11+17:59:50
test io passed at 2021-03-11+17:59:52 [ 126 tests 0 errors ]
test io(C.UTF-8) begins at 2021-03-11+17:59:52
test io(C.UTF-8) passed at 2021-03-11+17:59:54 [ 126 tests 0 errors ]
test io(shcomp) begins at 2021-03-11+17:59:54
test io(shcomp) passed at 2021-03-11+17:59:56 [ 126 tests 0 errors ]
Total errors: 0
CPU time       user:      system:
main:      0m00.005s    0m00.005s
tests:     0m00.613s    0m00.284s
[ 5:59 PM][0 +1][~/src/ksh-213][master@d4adc8f]
[603] mint $ ps ax | grep ksh | grep -i grep | wc -l
1
[ 6:00 PM][0 +1][~/src/ksh-213][master@d4adc8f]
[604] mint $ 

The test from @hyenas also works now:

[611] mint $ arch/linux.i386-64/bin/ksh                                                                                
Version AJMv 93u+m/1.0.0-alpha+dev 2021-03-11
[ 6:09 PM][0 +2][~/src/ksh-213][master@d4adc8f]
[612] mint $ echo foo<(:)bar                                                     
foo /tmp/ksh.f0ac98h bar
[ 6:09 PM][0 +2][~/src/ksh-213][master@d4adc8f]
[613] mint $ jobs -l
[ 6:10 PM][0 +2][~/src/ksh-213][master@d4adc8f]
[614] mint $ 

Edit: Yes, the first time I posted, I thought the echo reproducer failed. It helps to run the right ksh binary. :(

@posguy99
Copy link
Author

The wait freeze no longer occurs as well.

[625] mint $ arch/linux.i386-64/bin/ksh                                                                                
Version AJMv 93u+m/1.0.0-alpha+dev 2021-03-11
[ 6:18 PM][0 +2][~/src/ksh-213][master@d4adc8f]
[626] mint $ true <(true)
[ 6:18 PM][0 +2][~/src/ksh-213][master@d4adc8f]
[627] mint $ wait
[ 6:18 PM][0 +2][~/src/ksh-213][master@d4adc8f]
[628] mint $ 

@McDutchie
Copy link

Sounds great, thanks for testing everyone. I'll merge the pull request now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants