Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial ttyS0 respawns continuously on a VM without a serial console #1644

Closed
gdha opened this issue Dec 12, 2017 · 9 comments
Closed

Serial ttyS0 respawns continuously on a VM without a serial console #1644

gdha opened this issue Dec 12, 2017 · 9 comments
Assignees
Labels
blocker The next ReaR version is not released unless that issue is solved. bug The code does not do what it is meant to do fixed / solved / done
Milestone

Comments

@gdha
Copy link
Member

gdha commented Dec 12, 2017

  • rear version (/usr/sbin/rear -V): 2.3pre (frozen)
  • OS version (cat /etc/rear/os.conf or lsb_release -a): ubuntu 16 / fedora26
  • Are you using legacy BIOS or UEFI boot? BIOS
  • Brief description of the issue: after applying PR Solved issue #878 (Can't see login prompt on serial console)  #1615 we see issues with serial ttyS0 on ubuntu16 and with fedora26

fedora 26 had the following issue with a recover VM that contains a serial console (libvirt)(timeout after a couple of minutes), but afterwards everything seems normal:
screenshot_rear-recover-kvm_2017-12-12_16 02 53

fedora26 without a serial console has the following problem:

Dec 12 15:23:37 fedora agetty[525]: /dev/ttyS0: not a tty
Dec 12 15:23:40 fedora kernel: random: crng init done
Dec 12 15:23:47 fedora systemd[1]: Received SIGCHLD from PID 525 (agetty).
Dec 12 15:23:47 fedora systemd[1]: Child 525 (agetty) died (code=exited, status=1/FAILURE)
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Child 525 belongs to serial-getty@ttyS0.service
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Main process exited, code=exited, status=1/FAILURE
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Changed running -> dead
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Changed dead -> auto-restart
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Service has no hold-off time, scheduling restart.
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Trying to enqueue job serial-getty@ttyS0.service/restart/fail
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Installed new job serial-getty@ttyS0.service/restart as 39
Dec 12 15:23:47 fedora systemd[1]: serial-getty@ttyS0.service: Enqueued job serial-getty@ttyS0.service/restart as 39
Dec 12 15:23:47 fedora systemd[528]: serial-getty@ttyS0.service: Executing: /sbin/agetty -s ttyS0 115200,38400,9600
Dec 12 15:23:47 fedora agetty[528]: /dev/ttyS0: not a tty

ubuntu 16.04 : The serial ttyS0 is getting re-spawned continuously. The test VM has no serial ttyS0 defined (in virtual box). We see the same messages as we saw in fedora26.

Has to do with #878 and PR #1615 - @didacog is aware of the issue and is looking into it.

@gdha gdha added the bug The code does not do what it is meant to do label Dec 12, 2017
@gdha gdha added this to the ReaR v2.3 milestone Dec 12, 2017
@gdha gdha added the blocker The next ReaR version is not released unless that issue is solved. label Dec 12, 2017
@gdha gdha changed the title Serial ttyS0 respawns contuously on a VM with a serial console Serial ttyS0 respawns contuously on a VM without a serial console Dec 12, 2017
@gdha
Copy link
Member Author

gdha commented Dec 13, 2017

Document https://serverfault.com/questions/736624/systemd-service-automatic-restart-after-startlimitinterval contains some useful info to improve the situation IMHO

@jsmeix jsmeix changed the title Serial ttyS0 respawns contuously on a VM without a serial console Serial ttyS0 respawns continuously on a VM without a serial console Dec 13, 2017
@didacog
Copy link
Contributor

didacog commented Dec 13, 2017

@gdha

I've already tested with StartLimitBurst but it not worked in my test, I'll try with more effort if works.
Nevertheless I tested with Restart=no and it worked fine, with serial port started OK and without serial no errors in journal and the service is not present in systemctl status output. (with ubuntu 16.04 in virtualbox)

Tomorrow I will test it on real HW with RHEL7.2

On the other hand I've detected that, at the recovery boot process, the serial console has no output in last steps, and with Before=getty.service configured you get the serial console prompt before the whole process ends. But if not this way, with After=getty.service if you are only connected to the serial the wait is long and seems it hanged.

What option seems better in your opinion?
IMHO I prefer getting a fast login prompt, maybe is possible to put some warning to check if finished before run rear recover or better, if possible, mirror the ouptut form serial and console :P.

Regards,

@didacog
Copy link
Contributor

didacog commented Dec 13, 2017

Hi again, I found a cleaner solution than Restart=no.

/usr/share/rear/skel/default/usr/lib/systemd/system/serial-getty@.service

#  This file is part of systemd.
#
[Unit]
Description=Serial Getty on %I
BindTo=dev-%i.device
After=dev-%i.device

# We must wait for ReaR boot script to finish.
# this prevents the login serial prompt to
# be ready before the whole recovery boot process is done
After=getty.target


[Service]
Environment=TERM=vt100
ExecStart=-/sbin/agetty -s %I 115200,38400,9600
Restart=on-failure
RestartSec=0
StartLimitAction=none
StartLimitBurst=3
StartLimitInterval=60
UtmpIdentifier=%I
KillMode=process


# Some login implementations ignore SIGTERM, so we send SIGHUP
# instead, to ensure that login terminates cleanly.
KillSignal=SIGHUP

[Install]
WantedBy=getty.target

The key is Restart= action, now is on-failure (see this: https://www.freedesktop.org/software/systemd/man/systemd.service.html) and in case of failure will use StartLimit* settings ( finally worked after some attemtps :P ). This prevents restarts of the serial if not present, and also aviods weird messages in console like "... Failed to start getty on ttyS0 ..." when StartLimitInterval is reached and no more auto-restart attempts will be done.

On the other hand, improving this also reduced some of the waiting time for login prompt in the serial console with the After=getty.target clause set. Now I rather think that is better to keep it and avoid running rear recover by mistake from serial connection before rear boot script is finished.
Maybe a future improvement could be possible to show the rear boot script output to console to provide better feedback to the user.

During the debugging of the issue I found that the getty.service clause DefautInstance=tty0 was not recognized because a typo (my fault), I also adjusted it to DefaultInstance=%I (see: https://www.freedesktop.org/software/systemd/man/systemd.unit.html#DefaultInstance=) and now is working Ok.

All this was tested on VBox with Ubuntu 16.04 VMs with and without serial connected. All tests worked well, and there are no more infinite serial-getty respawns in journal.

Tomorrow I will test these changes on physical HW with RHEL7.2 and serial console. If no issues a PR will be ready by tomorrow evening, I guess.

Correct me if I'm wrong, but a new PR is needed, isn't it?

Regards,
Didac

@gdha
Copy link
Member Author

gdha commented Dec 14, 2017

@didacog great news indeed! I think you have found the fix 👍 And, yes please prepare a fresh PR which will be accepted with great pleasure :-)

@gdha gdha self-assigned this Dec 14, 2017
@didacog
Copy link
Contributor

didacog commented Dec 14, 2017

@gdha, last changes worked also on SuperdomeX servers with RHEL7.2 with serial console.

Later today I will send the PR with updated changes, this should solve all problems from issue #878 and PR #1615 without the infinite respawn problem. :-)

Regards,
Didac

@schabrolles
Copy link
Member

@didacog did you have a look on /usr/lib/systemd/system-generators/systemd-getty-generator ?
It should be responsible of serial console detection and automatically create serial-ttyS0@.service or serial-hvc0@.service. Have a look on #1442 and #1448

@didacog
Copy link
Contributor

didacog commented Dec 14, 2017

@schabrolles yes I've tested, as it is, with "/usr/lib/systemd/system-generators/systemd-getty-generator" in COPY_AS_IS without changes, but issue #878 happens.

Maybe with some bigger changes/cleanups in getty services in systemd could work with the generator at the end, but at least for now, I prefer to solve these issues and after, with them solved, we may start a discussion in how to improve it to have a better implementation?

@didacog
Copy link
Contributor

didacog commented Dec 14, 2017

PR is ready! :-P

@jsmeix
Copy link
Member

jsmeix commented Dec 15, 2017

With #1649 merged
I consider this issue to be fixed (if not it can be reopened).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker The next ReaR version is not released unless that issue is solved. bug The code does not do what it is meant to do fixed / solved / done
Projects
None yet
Development

No branches or pull requests

4 participants