Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rc.initial infinite loop (+ fix) #1240

Closed
speed47 opened this issue Oct 29, 2016 · 9 comments · Fixed by #1249
Closed

rc.initial infinite loop (+ fix) #1240

speed47 opened this issue Oct 29, 2016 · 9 comments · Fixed by #1249
Assignees
Labels
bug Production bug
Milestone

Comments

@speed47
Copy link
Contributor

speed47 commented Oct 29, 2016

I noticed that my up to date OPNsense was experiencing a constant loadavg of 8 after a few days of uptime without having any process in top eating cpu. The "last pid" indicator was however getting +10000 per /second/.

After some digging I found the culprit: a couple of lingering rc.initial processes (root's shell) from previous dead ssh connections were eating all cpu cycles.

Truss indicated that these processes were stuck in a loop trying to write to stdout, getting i/o errors, trying to read input from a closed stdin, and back again.

The fix is trivial: replace

while : ;

By

while test -t 1 ;

In the rc.initial script, which breaks the loop when stdout is closed (and we no longer have a tty).

How to reproduce:

  1. login with ssh to an unprivileged account
  2. get root rights using sudo su -
  3. you get the root shell (menu), type 8
  4. use ps to find the sshd process attached to your connection
  5. kill -9 its pid. You are now disconnected.
  6. check by reconnecting that there is an rc.initial process taking 100% cpu.
speed47 added a commit to speed47/core that referenced this issue Oct 29, 2016
if the tty is closed while the script is running,
it would previously go in an infinite loop trying
to read from and write to the now-defunct tty.

fixes: opnsense#1240
@fichtner fichtner self-assigned this Oct 29, 2016
@fichtner fichtner added the bug Production bug label Oct 29, 2016
@fichtner fichtner added this to the 17.1 milestone Oct 29, 2016
@fichtner
Copy link
Member

Hi @speed47,

Thanks for the analysis! I will have to test if this reliably exits under all circumstances.

Also, shouldn't we catch this inside the loop and exit 1 just to be safe?

Cheers,
Franco

@fichtner
Copy link
Member

maybe "set -e" works here too?

@speed47
Copy link
Contributor Author

speed47 commented Oct 29, 2016

I guess the shell should at least get a SIGHUP when it happens, so we could add a handler for that, didn't try it!
But set -e should theoretically work too indeed, even if for the future reader, the importance of its presence to fix this corner case might not appear obvious (unless it's commented of course!). However it could have adverse effects, as it's quite powerful/intrusive, so we should test thoroughly.
That's why I preferred to change juste the loop condition (which still needs to be tested with more cases too, you're right)

@fichtner
Copy link
Member

@speed47 how about protecting the command lauchers with "|| true"? these are the ones that could fail, but that's ok from the "shell" perspective.

In the far history of this code, the problem was that CTRL+C et al could allow escaping to a shell. In almost-far history the issue was we used this as a real shell so CTRL+C would log you out much to the annoyance of the user.

@speed47
Copy link
Contributor Author

speed47 commented Oct 29, 2016

|| true on all lines always sounds a bit like a "dirty" hack to me, but regarding CTRL+C and the history of the code, I understand why you feel set +e would be "safer".
I did'nt try to play around CTRL+C before/after the patch to see if the behaviour changed I admit!

@fichtner
Copy link
Member

set +e
$cmd
set -e

or

if $cmd; then :; fi

possibilities are endless, but I agree they all look weird :D

your suggestion to annotate with comments is probably in order whatever we do

@speed47
Copy link
Contributor Author

speed47 commented Oct 29, 2016

Yes :) Well || true is probably the less strange in the end! (and as it's a somewhat common hack, it should be more readable)

@speed47
Copy link
Contributor Author

speed47 commented Nov 3, 2016

Shall I propose a PR with what we've discussed, so we can start testing?

@fichtner
Copy link
Member

fichtner commented Nov 3, 2016

If you want please do, otherwise just let me know to take over. :)

speed47 added a commit to speed47/core that referenced this issue Nov 3, 2016
if the tty is closed while the script is running,
it would previously go in an infinite loop trying
to read from and write to the now-defunct tty.

fixes: opnsense#1240
speed47 added a commit to speed47/core that referenced this issue Nov 3, 2016
if the tty is closed while the script is running,
it would previously go in an infinite loop trying
to read from and write to the now-defunct tty.

fixes: opnsense#1240
fichtner pushed a commit that referenced this issue Nov 7, 2016
if the tty is closed while the script is running,
it would previously go in an infinite loop trying
to read from and write to the now-defunct tty.

PR: #1240

(cherry picked from commit a11850c)
(cherry picked from commit 8986792)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

Successfully merging a pull request may close this issue.

2 participants