Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phosphor-fan-control-init@0.service will never stop on failures #23

Closed
geissonator opened this issue Jul 10, 2020 · 5 comments
Closed

Comments

@geissonator
Copy link
Contributor

This service was hitting a failure recently on a system (missing hwmon interface). It was noticed that this service just kept restarting over and over....thousands of times.

OpenBMC has built in systemd thresholds to catch and stop these types of services but it appears this service overrides these settings with the following:

cat phosphor-fan-control-init\@0.service.d/fan-watchdog-monitor.conf 
[Unit]
#These overrides allow the fan watchdog to take over when this service dies
OnFailure=fan-watchdog-monitor@%n.service
StartLimitIntervalSec=5
StartLimitBurst=3

The issue is that the AST2500 is not fast enough to ever hit this (service must fail 4(StartLimitBurst+1) times within 5 seconds). Here's a journal showing that not happening:

Jul 10 21:07:28 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 152.
Jul 10 21:07:29 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 21:07:29 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Jul 10 21:07:31 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 153.
Jul 10 21:07:32 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 21:07:32 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Jul 10 21:07:33 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 154.
Jul 10 21:07:34 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 21:07:34 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Jul 10 21:07:35 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 155.
Jul 10 21:07:36 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 21:07:36 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Jul 10 21:07:37 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 156.
Jul 10 21:07:37 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 21:07:37 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Jul 10 21:07:38 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 157.
Jul 10 21:07:39 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 21:07:39 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Jul 10 21:07:40 witherspoon-YL30UF74T02X systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 158.

I think we need to either reduce StartLimitBurst to 1 or 2 or increase StartLimitIntervalSec to something like 10.

@msbarth
Copy link
Contributor

msbarth commented Aug 6, 2020

This use to work, so I can only think that additional workloads have been added since this was configured. I've also found that with the current StartLimit... values, the service is stopped and the fan watchdog monitor trips on Witherspoon hardware using a 2.9.0-dev build. That would make it seem, whatever build this issue was created against must have even more workload on the BMC during a poweron where the phosphor-fan-control-init@0.service could not be restarted 3 times within 5 seconds.

I've verified setting StartLimitIntervalSec=10 works fine, so I'll bump it to that limit interval.

@msbarth
Copy link
Contributor

msbarth commented Aug 6, 2020

Also, not sure where the "service must fail 4(StartLimitBurst+1)" part came from...from the definition of StartLimitBurst, its the number of starts allowed within the StartLimitIntervalSec where the initial start of the service is included in that number (i.e. not number of restarts).
https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=interval

@msbarth
Copy link
Contributor

msbarth commented Aug 6, 2020

Testing my bump to 10sec has produced sporadic results on the number of restarts that occur even getting up to 12 restarts total which included 4 back-to-back

Aug 06 20:21:53 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 06 20:21:53 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/lib being skipped.
Aug 06 20:21:53 witherspoon phosphor-fan-control[2549]: Uncaught DBus service lookup failure exception
Aug 06 20:21:53 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 06 20:21:53 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 06 20:21:54 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 9.
Aug 06 20:21:55 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 06 20:21:55 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 06 20:21:55 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 06 20:21:55 witherspoon phosphor-fan-control[2550]: Uncaught DBus service lookup failure exception
Aug 06 20:21:55 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 06 20:21:55 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 06 20:21:56 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 10.
Aug 06 20:21:56 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 06 20:21:56 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 06 20:21:56 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 06 20:21:56 witherspoon phosphor-fan-control[2551]: Uncaught DBus service lookup failure exception
Aug 06 20:21:56 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 06 20:21:56 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 06 20:21:57 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 11.
Aug 06 20:21:58 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 06 20:21:58 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 06 20:21:58 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 06 20:21:58 witherspoon phosphor-fan-control[2552]: Uncaught DBus service lookup failure exception
Aug 06 20:21:58 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 06 20:21:58 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 06 20:21:59 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 12.

Something else is going on here...

@msbarth
Copy link
Contributor

msbarth commented Aug 7, 2020

I am completely dumbfounded on how systemd uses the StartLimitIntervalSec and StartLimitBurst as when I have

StartLimitIntervalSec=10
StartLimitBurst=2

the service shows a "restart" count of 2 (but is only started twice) and the service is no longer attempted to be started as expected.

However, if I use

StartLimitIntervalSec=10
StartLimitBurst=3

the service shows a total "restart" count of 6 times until the service is no longer attempted to be started...but, the journal shows that 3 starts failed within 10sec

Aug 07 14:20:26 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 1.
Aug 07 14:20:26 witherspoon vrm-control.sh[8191]: vdna        0.898V        0.898V        1.000A
Aug 07 14:20:27 witherspoon systemd[1]: Finished Scan FSI devices.
Aug 07 14:20:27 witherspoon max31785-hwmon-helper[8188]: Job for xyz.openbmc_project.Hwmon@-ahb-apb-bus\x401e78a000-i2c\x2dbus\x40100-max31785\x4052.service failed because the control process exited with error code.
Aug 07 14:20:27 witherspoon max31785-hwmon-helper[8188]: See "systemctl status "xyz.openbmc_project.Hwmon@-ahb-apb-bus\\x401e78a000-i2c\\x2dbus\\x40100-max31785\\x4052.service"" and "journalctl -xe" for details.
Aug 07 14:20:27 witherspoon systemd[1]: max31785-hwmon-helper@ahb-apb-bus\x401e78a000-i2c\x2dbus\x40100-max31785\x4052.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 14:20:27 witherspoon systemd[1]: max31785-hwmon-helper@ahb-apb-bus\x401e78a000-i2c\x2dbus\x40100-max31785\x4052.service: Failed with result 'exit-code'.
Aug 07 14:20:27 witherspoon systemd[1]: Failed to start MAX31785 hwmon helper.
Aug 07 14:20:27 witherspoon systemd[1]: Dependency failed for Phosphor Fan Control Daemon.
Aug 07 14:20:27 witherspoon systemd[1]: Dependency failed for Chassis0 (Power On).
Aug 07 14:20:27 witherspoon phosphor-systemd-target-monitor[213]: Monitored systemd unit has hit an error
Aug 07 14:20:27 witherspoon systemd[1]: obmc-chassis-poweron@0.target: Job obmc-chassis-poweron@0.target/start failed with result 'dependency'.
Aug 07 14:20:27 witherspoon systemd[1]: phosphor-fan-control@0.service: Job phosphor-fan-control@0.service/start failed with result 'dependency'.
Aug 07 14:20:27 witherspoon systemd[1]: phosphor-fan-control@0.service: Triggering OnFailure= dependencies.
Aug 07 14:20:27 witherspoon phosphor-fru-fault-monitor[340]: /xyz/openbmc_project/logging/entry/51 created
Aug 07 14:20:28 witherspoon vrm-control.sh[8191]: vdnb        0.898V        0.898V        0.000A
Aug 07 14:20:28 witherspoon systemd[1]: Finished Apply voltage overrides to VRMs.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Create System Users being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/cache being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/lib being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/spool being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Update is Completed being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Starting Enable the AVS bus on VRMs...
Aug 07 14:20:28 witherspoon systemd[1]: Condition check resulted in Applies user CFAM register overrides from file being skipped.
Aug 07 14:20:28 witherspoon systemd[1]: Starting Fan Watchdog Failure Monitor...
Aug 07 14:20:28 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 07 14:20:29 witherspoon systemd[1]: fan-watchdog-monitor@phosphor-fan-control@0.service.service: Succeeded.
Aug 07 14:20:29 witherspoon systemd[1]: Finished Fan Watchdog Failure Monitor.
Aug 07 14:20:29 witherspoon phosphor-fan-control[8244]: Uncaught DBus service lookup failure exception
Aug 07 14:20:29 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 14:20:29 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 07 14:20:29 witherspoon systemd[1]: Finished Enable the AVS bus on VRMs.
Aug 07 14:20:29 witherspoon systemd[1]: Starting Power on bind ir35221 device driver devs...
Aug 07 14:20:31 witherspoon systemd[1]: Finished Power on bind ir35221 device driver devs.
Aug 07 14:20:31 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 2.
Aug 07 14:20:31 witherspoon systemd[1]: Reached target Start Host0 (Pre).
Aug 07 14:20:31 witherspoon systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
Aug 07 14:20:31 witherspoon systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Aug 07 14:20:31 witherspoon systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
Aug 07 14:20:31 witherspoon systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
Aug 07 14:20:31 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 07 14:20:31 witherspoon systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
Aug 07 14:20:31 witherspoon systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Create System Users being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/cache being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/lib being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/spool being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Condition check resulted in Update is Completed being skipped.
Aug 07 14:20:32 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 07 14:20:32 witherspoon systemd[1]: Started Phosphor Hwmon Poller.
Aug 07 14:20:32 witherspoon systemd[1]: Started Phosphor Hwmon Poller.
Aug 07 14:20:32 witherspoon systemd[1]: Started Phosphor Hwmon Poller.
Aug 07 14:20:32 witherspoon systemd[1]: Started Phosphor Hwmon Poller.
Aug 07 14:20:33 witherspoon phosphor-fan-control[8278]: Uncaught DBus service lookup failure exception
Aug 07 14:20:33 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 14:20:33 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 07 14:20:35 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 3.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Create System Users being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/cache being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/lib being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/spool being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Condition check resulted in Update is Completed being skipped.
Aug 07 14:20:35 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 07 14:20:36 witherspoon phosphor-fan-control[8284]: Uncaught DBus service lookup failure exception
Aug 07 14:20:36 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 14:20:36 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 07 14:20:37 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 4.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Create System Users being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/cache being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/lib being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/spool being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Condition check resulted in Update is Completed being skipped.
Aug 07 14:20:37 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 07 14:20:38 witherspoon phosphor-fan-control[8285]: Uncaught DBus service lookup failure exception
Aug 07 14:20:38 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 14:20:38 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 07 14:20:39 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 5.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Create System Users being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/cache being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/lib being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/spool being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Condition check resulted in Update is Completed being skipped.
Aug 07 14:20:39 witherspoon systemd[1]: Started Phosphor Fan Control Initialization.
Aug 07 14:20:40 witherspoon phosphor-fan-control[8286]: Uncaught DBus service lookup failure exception
Aug 07 14:20:40 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 14:20:40 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 07 14:20:41 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Scheduled restart job, restart counter is at 6.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Stopped Phosphor Fan Control Initialization.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Create System Users being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/cache being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/lib being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Bind mount volatile /var/spool being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: Condition check resulted in Update is Completed being skipped.
Aug 07 14:20:41 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Start request repeated too quickly.
Aug 07 14:20:41 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Failed with result 'exit-code'.
Aug 07 14:20:41 witherspoon systemd[1]: Failed to start Phosphor Fan Control Initialization.
Aug 07 14:20:41 witherspoon systemd[1]: phosphor-fan-control-init@0.service: Triggering OnFailure= dependencies.

Using

StartLimitIntervalSec=10
StartLimitBurst=2

seems to be sufficient and stops restarting after the correct number of times? So I'm going to go with this I guess.

@msbarth
Copy link
Contributor

msbarth commented Aug 7, 2020

Resolved by https://gerrit.openbmc-project.xyz/35438

geissonator pushed a commit to openbmc/openbmc that referenced this issue Sep 17, 2020
It was found that the fan control services were constantly getting
restarted due to not failing within the previous start limits. After
experimenting with different combinations of limits, using the default
values for StartLimitBurst and StartLimitIntervalSec is sufficient.

Tested:
    Changed limits, powered on, watched service fails in journal until
fan watchdog monitor started

Resolves: openbmc/phosphor-fan-presence#23

(From meta-ibm rev: b8a65368cb39d6d82c4b025b25fdbe868dbbfe89)

Change-Id: Ibcb35028e8dbc67d7df70dfeee25d098e6041fe8
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants