New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phosphor-fan-control-init@0.service will never stop on failures #23
Comments
This use to work, so I can only think that additional workloads have been added since this was configured. I've also found that with the current I've verified setting |
Also, not sure where the "service must fail 4(StartLimitBurst+1)" part came from...from the definition of |
Testing my bump to 10sec has produced sporadic results on the number of restarts that occur even getting up to 12 restarts total which included 4 back-to-back
Something else is going on here... |
I am completely dumbfounded on how systemd uses the
the service shows a "restart" count of 2 (but is only started twice) and the service is no longer attempted to be started as expected. However, if I use
the service shows a total "restart" count of 6 times until the service is no longer attempted to be started...but, the journal shows that 3 starts failed within 10sec
Using
seems to be sufficient and stops restarting after the correct number of times? So I'm going to go with this I guess. |
Resolved by https://gerrit.openbmc-project.xyz/35438 |
It was found that the fan control services were constantly getting restarted due to not failing within the previous start limits. After experimenting with different combinations of limits, using the default values for StartLimitBurst and StartLimitIntervalSec is sufficient. Tested: Changed limits, powered on, watched service fails in journal until fan watchdog monitor started Resolves: openbmc/phosphor-fan-presence#23 (From meta-ibm rev: b8a65368cb39d6d82c4b025b25fdbe868dbbfe89) Change-Id: Ibcb35028e8dbc67d7df70dfeee25d098e6041fe8 Signed-off-by: Matthew Barth <msbarth@us.ibm.com> Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
This service was hitting a failure recently on a system (missing hwmon interface). It was noticed that this service just kept restarting over and over....thousands of times.
OpenBMC has built in systemd thresholds to catch and stop these types of services but it appears this service overrides these settings with the following:
The issue is that the AST2500 is not fast enough to ever hit this (service must fail 4(StartLimitBurst+1) times within 5 seconds). Here's a journal showing that not happening:
I think we need to either reduce StartLimitBurst to 1 or 2 or increase StartLimitIntervalSec to something like 10.
The text was updated successfully, but these errors were encountered: