Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to shutdown the system (N900) #85

Closed
sicelo opened this issue Feb 23, 2018 · 30 comments
Closed

Not possible to shutdown the system (N900) #85

sicelo opened this issue Feb 23, 2018 · 30 comments
Assignees

Comments

@sicelo
Copy link
Collaborator

sicelo commented Feb 23, 2018

Attempts to halt/shutdown the system do not seem to work.
Both shutdown -h now and poweroff result in a reboot. Most of the time, the screen backlight also does not light up for uboot.

@dderby
Copy link
Member

dderby commented Feb 23, 2018

I've seen this too on both the N900 and Droid 4.

@MerlijnWajer
Copy link
Member

@sicelo reported that this might just be related to the kernel configuration. I think @parazyd said he'd look at it.

@sicelo
Copy link
Collaborator Author

sicelo commented Feb 26, 2018

With latest image, http://maedevu.maemo.org/images/n900/20180224/, shutdown still does not work.

However, when I am in a recovery shell, with the same image/kernel, poweroff works just fine all the time. Maybe the problem is in userspace after all?

@MerlijnWajer
Copy link
Member

MerlijnWajer commented Feb 26, 2018

I am thinking it might be watchdog related. What if DSME stops, and the watchdogs are not disabled properly? Then the device reboots.

Might be related to this:

CONFIG_WATCHDOG_NOWAYOUT=y

@MerlijnWajer
Copy link
Member

Can either of you confirm that this is enabled or disabled in PMOS?

@dderby
Copy link
Member

dderby commented Feb 27, 2018

@dderby
Copy link
Member

dderby commented Feb 27, 2018

There is a possible related issue where the watchdog/lifeguard kicks in and triggers a reboot when doing things like switching runlevels or sending a SIGHUP to Xorg. Fremantle has this behaviour too but I always thought of them as bugs. My personal preference would be to fix these sorts of things. In my opinion, sending a SIGHUP to any daemon should cause it to restart itself, not restart the system. It really depends on the direction you want to take the project. Do we want to make Leste behave like Fremantle or do we make it more Unix-like? If you agree with me I'll open new issues for these.

@MerlijnWajer
Copy link
Member

Well, they do not run dsme, so they likely do not even start a watchdog, so then the kernel always takes care of it. Meaning that the issue likely doesn't even occur. I wonder what happens if we remove (as a test) the watchdog, or have dsme not kick any watchdog at all, and then power off the system. Might try it later.

As for the general idea of your post, let's not make it reboot randomly. We can have named runlevels, OpenRC supports those, so in general it can be less hacky. You can make an issue for it - we also are working on cleaning up the general boot process, so there's a lot to be done and discussed there anyway.

@MerlijnWajer
Copy link
Member

So I suppose the first obvious test would be to remove any watchdog kicking support from dsme (for the test only) and see if reboot and such works as expected.

@MerlijnWajer MerlijnWajer self-assigned this Dec 1, 2018
@MerlijnWajer
Copy link
Member

It looks like dsme has no way to close a watchdog properly, even if the magic close is supported:

https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt

When the device is closed, the watchdog is disabled, unless the "Magic
Close" feature is supported (see below).  This is not always such a
good idea, since if there is a bug in the watchdog daemon and it
crashes the system will not reboot.  Because of this, some of the
drivers support the configuration option "Disable watchdog shutdown on
close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when compiling
the kernel, there is no way of disabling the watchdog once it has been
started.  So, if the watchdog daemon crashes, the system will reboot
after the timeout has passed. Watchdog devices also usually support
the nowayout module parameter so that this option can be controlled at
runtime.

Magic Close feature:

If a driver supports "Magic Close", the driver will not disable the
watchdog unless a specific magic character 'V' has been sent to
/dev/watchdog just before closing the file.  If the userspace daemon
closes the file without sending this special character, the driver
will assume that the daemon (and userspace in general) died, and will
stop pinging the watchdog without disabling it first.  This will then
cause a reboot if the watchdog is not re-opened in sufficient time.

So we can try to disable NOWAYOUT and see if this still happens. The right way to do this would be to implement dsme watchdog stopping support, I think? But right now it looks like dsme doesn't implement stop() at all in the initscript, but that may be by design?

@freemangordon - any comments?

@MerlijnWajer
Copy link
Member

I disabled NOWAYOUT, but 'poweroff' still seems to reboot the device (although mine is in development mode, not sure if that matters)

@MerlijnWajer
Copy link
Member

When I boot without dsme (and thus no h-d, etc), 'poweroff' works as expected.

@MerlijnWajer
Copy link
Member

MerlijnWajer commented Dec 13, 2018

When I implement a basic dsme kill/stop() in the openrc init script; I see this while shutting down:

watchdog0: watchdog did not stop!
watchdog1: watchdog did not stop!

Code in question is here:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/watchdog/watchdog_dev.c?#n882

Which seems to indicate that dsme MUST send the magic close command, otherwise the kernel will not stop the watchdog, and the shutdown process will fail (because it gets interrupted by the watchdog that is still running)

@MerlijnWajer
Copy link
Member

When dsme sends magic close command on quit, then this works fine. I will prepare a new version.

@MerlijnWajer
Copy link
Member

MerlijnWajer commented Dec 13, 2018

Version 0.61.1 of dsme (now in leste-devel repo) should have this fix. Please let me know if it works for you, and then I will also make sure it ends up in leste repo.

It's possible that it doesn't work for you yet, because I disabled NOWAYOUT.

@MerlijnWajer
Copy link
Member

See maemo-leste/dsme@a9bf82e

@MerlijnWajer
Copy link
Member

The alternatively, that I don't like as much, is to set a very high watchdog timeout before dsme stops.

@spinal84
Copy link

Version 0.61.1 of dsme (now in leste-devel repo) should have this fix. Please let me know if it works for you, and then I will also make sure it ends up in leste repo.

It's possible that it doesn't work for you yet, because I disabled NOWAYOUT.

Installed update. Doesn't work for me. When I issue sudo poweroff, device (N900) reboots.

@MerlijnWajer
Copy link
Member

Right, you also need to disable CONFIG_WATCHDOG_NOWAYOUT in kernel (might be a cmdline param for it). @freemangordon is working on another (possibly preferred) solution, and he said he'd try to get it done this weekend.

@MerlijnWajer
Copy link
Member

@freemangordon found the likely cause that dsme exits too early:

10:23 < freemangordon> ok, that's it
10:24 < Wizzup> ?
10:24 < freemangordon> https://pastebin.com/07ZRnLa1
10:24 < Wizzup> ...
10:24 < freemangordon> seems dbus is acting
10:25 < freemangordon> I don't have dbus debug symbols installed
10:25 < freemangordon> but at least we know what happens
10:25 < freemangordon> bye...
10:25 < Wizzup> ciao
10:25 < Wizzup> good work
10:26 < spiiroin> dbus exits & dsme does not handle system bus disconnect well?
10:26 < Wizzup> it looks like libdbus just calls _exit()
10:26 < Wizzup> which is immediate termination
10:26 < spiiroin> it is default behavior for some things
10:27 < Wizzup> ah, in case there is no handler
10:27 < spiiroin> like unhandled disconnect

Paste contains:

Thread 1 "dsme-server" hit Breakpoint 1, __GI__exit (status=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
27	../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.
(gdb) bt
#0  0xb6dec6b8 in __GI__exit (status=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
#1  0xb62ddaae in  () at /lib/arm-linux-gnueabihf/libdbus-1.so.3
#2  0xb62c9058 in  () at /lib/arm-linux-gnueabihf/libdbus-1.so.3
(gdb)

To be continued later today.

@MerlijnWajer
Copy link
Member

10:28 < spiiroin> then IIRC also things like passing invalid / not utf8 cleans strings / making method calls on diconnected but existing connection
10:28 < Wizzup> *nod*
10:29 < spiiroin> https://git.merproject.org/mer-core/dsme/commit/ae1b8ba
10:30 < Wizzup> ... we *REALLY* need to go through those commits ;)
10:30 < Wizzup> thanks

@MerlijnWajer
Copy link
Member

@sicelo - I think it should now work with the latest dsme. (install upgrade, sync, reboot any way you like, and then try to shutdown using poweroff.

@spinal84
Copy link

The problem remains after update.

@spinal84
Copy link

spinal84 commented Jan 14, 2019

I propose setting nowayout to "no". That should fix the issue (taking into account PowerVR oops fix).
No need to mess with openrc/dsme internals in the current state of development.

@spinal84
Copy link

spinal84 commented Jan 14, 2019

Need to implement PowerVR kernel module oops fix either by userspace or in kernel.
Until we have proper fix in the kernel, I propose fixing it in userspace.

@MerlijnWajer
Copy link
Member

With the latest kernel from leste devel repo, poweroff seems to do the right thing for me. Please confirm.

If you run into issues installing the new package, ensure that /boot (p1) is 'ext2' and not vfat (as in current images).

@MerlijnWajer
Copy link
Member

With latest dsme and n900 kernel from leste devel, poweroff now seems to work.

@MerlijnWajer
Copy link
Member

@spinal84 or @sicelo - please confirm and then we can close this issue.

@MerlijnWajer
Copy link
Member

BTW: when we do close it, let's open a new one for enabling NOWAYOUT and fixing it proper. I think one of the issues is that openrc runs all of its own scripts, and the devuan ones. And the latter take a long time to stop, longer than the WD timeout. This might be better in Devuan ceres already.

@MerlijnWajer
Copy link
Member

See #219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants