-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeat error message: "PWM of ${FAN_NAME} was changed by third party! Last set PWM value was" #64
Comments
I'm getting this warning too. I know with the corsair commander fan curves are stored on the device, do you think its the case that the commander is fighting with fan2go for control of the fan's curve? |
As @elais already said, its not an error, its a warning and if you know what you are doing you can ignore it. Not sure why you would want to have an external software (fan2go) control your fans if you also have a big (and probably expensive) hardware controller in your system to do the same thing 🤔 If the "corsair commander pro" does not set a PWM value of f.ex. In most cases this is due to some other software also trying to control a fan (like f.ex. the BIOS/UEFI). In others it seems like a "feature"... hmm... I am not fond of adding a config option to disable this message, as it should not be ignored in most cases. Adding an override option for |
I'm already ignoring it, but it's the kind of edge case compatibility quirk that turns into an actual bug later, so I wanted to go ahead and confirm that it was not unique to Elias's specific hardware and that it's consistent across systems with this device.
Unlike older fan controllers that are standalone devices, the commander pro is a purely software controlled device with thermal probes and PWM headers that are exposed directly to the software. (as well as a few other ports) With the iCue software, that allows for a lot of extremely detailed micromanaging of fan RPM and noise levels/frequencies that you can't do without building and programming custom hardware or spending hundreds on a pwm controller. Since the commander pro driver was only added to the kernel a couple of revisions ago, I'm not surprised that there would be a few quirks, but I really want to be able to get reliable results with this hardware on linux.
I'm pretty sure that the driver for the commander is doing some kind of rounding, since I can reliably see the values are off by one or two when I set them from the command line using echo.
The numbers that are "true' to what was set are also VERY consistent.
Agreed. The message itself being so persistent is a minor issue though, as it effectively floods any other messages off screen.
If by "initialization sequence" you are referring to determining the start and stop values, that seems to have ran fine despite the warnings.
The only real concern I have with the warnings is that the software may be trying to disable PWM control when it closes and is failing because the fan controller doesn't have a no-pwm mode at all. I have been noticing that the PWM[1-6] files are maintaining the value set through reboots as long as the system doesn't completely lose power, which means that the fans are stuck at whatever speed they were at prior to the restart until Fan2go is restarted. The software gracefully detecting that a controller doesn't have a pwm_enable setting and falling back to explicitly setting a sane value when the program exits would be greatly appreciated. |
This should already be the case, but maybe the logic can be improved, see here. fan2go remembers the state of the fan and tries to restore it when stopping. If this fails, it tries to set the fan to max speed. |
In my situation I’m watercooling and the hardware controller allows me to attach multiple sensors and devices (fans, pumps, a flow indicator) that I do not have enough headers for fine grained control on my itx board. What fan2go gives me in this situation is the ability to combine multiple sensors for use in controlling the fans. I want to base my fan curve on the delta between water temp and ambient temp of the room. It’s pure vanity to want this but Fan2go is the best chance I have at getting this*. As for the warning when it appears it floods out any other message. Especially when running four fans because that means it appears four times at each adjustment interval. I’d rather not flood the logs especially since it means a quick As always thank you for your patience. * pull request incoming 😉 I’m not sure how this created a new issue sorry for flooding your board. I’m posting from mobile and may have fat fingered something. |
Thx for the explanation, my bad, I thought the commander pro was more than a cable "hub" 😄
I understand. However, before changing anything about this log, I would really like to know if this rounding behaviour is "ok" (from a standards point of view), or not. If it is, then fan2go needs to be able to detect each and every working PWM value beforehand (during initialization f.ex.) and ignore all other values. Implementing this would complicate the logic massively though... If it is not, the warning is hinting at a faulty driver implementation and should also not be ignored. (but I guess thats not the case here)
No worries, we are all just a very big team 🤓 |
Maybe introduce log levels? That way we can run the program at a different log level that would filter out warnings. it’s a stopgap but at least we can solve this in the short term while leaving the door open for more fine grained logging if a user/developer chooses it. |
From what I can tell, when manually run from a terminal or via init script; Stopping ffan2gan2go is currently not setting the PWM to max when stopped (nor, seemingly, setting PWM_Enable to false) . The code for responding to SIGTERM is here Unless there's other code I'm missing; That looks like Ctrl+C/SIGTERM only results in the INFO text "Exiting...", and the software stops immediately, leaving the PWM values set to whatever they happened to be at. I can also 100% confirm that the pwm[1-6] files are not edited on exit, and those values persist until the software is restarted, and remain even through restart on my hardware. (so the exhibited behavior matches how I interpret that code) please note: I'm a sysadmin, not a programmer, and this is only like the third or fourth time I've ever had to deal with GO, so I apologize for any errors or misunderstanding.
That seems very complicated. Mangled example: 2 points is only a 0.8% error. It might be a better idea to make that value optional or user specified, but that code isn't really verifying the correct value, it's just verifying that something else isn't making conflicting changes.
I agree with elais here. That said, "Something else is making changes" IS a very important error, that we don't really want to ignore. I would prefer a solution that corrects the false positive rather than an option to ignore things that are actually potentially serious. |
So looking more into it I believe the cpro is rounding to the nearest multiple of 3 - 1 (due to pwm values being index 0-255, making 0 the first number). So if your pwm is 90 it rounds down to 89, if it is 91 it rounds up to 92, so on and so forth. My guess is this is something unique to the cmdr pro. I believe the solution to this would be to add a step size configuration where I could tell fan2go to only move in step sizes I define. In @eldorel and my case, we'd want to only move in step sizes of 3 to keep up with the pwm set by our controller. Others may run into similar issues where they have a controller that moves in steps of 2, 3, 5, etc. The algorithm here would be something like @markusressel what do you say to adding a configuration option like EDIT: Tested more values and it isn't as straightforward as I thought. I'm now in favor of thresholds. |
Still, I wouldn't want either. fan2go already tries to set each and every PWM value in the range [0..255] during initialization of a new fan. We just have to store which values actually work and which don't. Since setting a PWM value of 89 would result in a PWM value of 89, 89 is a usable PWM value. 90 on the other hand is not, as setting it does not result in the expected PWM value. I already saw another behaviour of a GPU fan, where it was simply impossible (i think it printed an error?) to set a PWM value below a certain value (f.ex. This might lead to a situation, where a (stupid) controller always sets Step size is too specific, it seems like it wouldn't solve your problem Again, I am fine with fan2go "figuring out" which values work during initialization. I don't like just assuming that everything is probably most certainly maybe fine because asking for PWM 90 yields 89 which is not off enough to count it as an error... Log levels technically already exist in fan2go, but currently there is only the |
A developer trying to get a software controlled microcontroller that communicates in percentages to emulate a well supported legacy device spec that was originally designed to us an 8bit register to communicate with a discrete logic loop.
Looking at the kernel driver code, the commander's "PWM" setting on the hardware is a percentage value from 1-100%, and the driver is rounding that.
I would argue that this is a completely different use case. Establishing the lower (or upper) limits of a controller is a one-time assessment to set a limit vs wasting cycles referencing a list of 'valid' values for every PWM change. Having looked at the code for the driver, I would like to suggest a flag for fan config that enables a little rounding logic for devices that are percentage based. Note: The commander pro isn't the only USB device on the market that works like this, it's just the first one that someone took the time to reverse engineer the USB communications for. |
So what you're saying here is if we were to map pwm value to duty somehow all our problems would be solved? It looks like the corsair's driver converts a long representing the pwm value using this macro #define DIV_ROUND_CLOSEST_ULL(x, divisor)( \
{ \
typeof(divisor) __d = divisor; \
unsigned long long _tmp = (x) + (__d) / 2; \
do_div(_tmp, __d); \
_tmp; \
} \
) |
Interesting, thx for looking it up.
If you are looking for a fan controller that uses the least amount of CPU cycles, I am not sure fan2go is the best option 😛 Figuring out which PWM values "work" is also a one-time task, the interesting part would be how this knowledge would be applied when updating the PWM value. But again, although I obviously don't try to make fan2go waste resources, I also didn't start fan2go to built the smallest and fastest fan controller ever. That would be Auto: Off, PWM: 255 😆 I just want to avoid implementing multiple flags/workarounds/things for different fan controllers.
I don't want the user to tell fan2go about quirks of their hardware. It should just work.
And there probably are (or will be) devices that use yet another range of values and the |
Figuring that out for systems with multiple fans and pumps could take hours depending on the settle time for the device.
I'm not really suggesting any device specific (or even permanent) workarounds. There are a few things that we know.
and
A flag can be implemented in a small amount of time to account for this new hardware class and deprecated later if a more robust solution is implemented.
The software could tell them during the INIT process as soon as it encounters one fan doing this and add the flag itself for each fan that it detects doing this, or alternately just add it to the help docs.
Having an extremely detailed test for "does this device use percentages" is fine.
As I said above, No matter what the hardware is, the LM_Sensors drivers are going to target the 1-255 value range and have a fanX_input file to report the current RPM. Allowing for some minor variation in specified PWM values (and verifying the actual RPM matches the approximate value expected) would not only allow for fan2go to gracefully adapt to those different rounding errors, it would also catch hardware errors like a dead or blocked fan running at the wrong RPM. And seriously, modern and future hardware programmers are more likely to aim for 1-100% to represent duty cycle than they are to make up some other system, and even if they DO decided to use another system, the kernel drivers are still going to attempt to map it to the standard with the tiniest rounding error they can, but there will be rounding errors. TL;DR Writing and executing code to test every single possible PWM value and create a blacklist for 'invalid' values for every single fan for a new type of hardware with an extremely predictable rounding error is an inefficient use of both programmer effort and the user's time. |
For the sake of argument I just want to say if a build is using water its on to run the fan tests in parallel because the water won't heat up fast enough to cause damaging problems. I ran my fans tests in parallel and cpu remained fine. The water temp rose to only 7°-8° above ambient. |
How long a system can run without heating up really depends on the setup and the environment (including the user). Extra note: It looks like DIV_ROUND_CLOSEST_ULL and DIV_ROUND_CLOSEST are both already used for a handful of other lm_sensors compatible hardware drivers as well for the same reason. The are both also defined in the kernel itself, so this isn't some 'random' rounding code that someone pulled out of their rear end (recently at least). At a quick glance, I found a handful of drivers for fan headers on intel server boards and intel gpus that I have available to test on, and with a bit of poking, they do apparently use a 1-100% duty_cycle. |
Do you have an example / a suggestion how an algorithm that detects the use of percentage values might work? An algorithm that could automatically turn on "the switch" in fan2go which tells it to use 1-100 instead of 1-255? |
You are right, this functionality apparently got lost in the recent rewrite, I will open a new issue for that. |
This is where I wish I spent more time coding things rather than installing them, I'd rather just submit code, but that'll take me weeks to get right... My initial though is that the round_closest results are predictable, and we can use that. So adding a sampling routine during the existing init that intentionally checks the actual PWM value for a handful of test values that would be rounded up or down to see if they return the result that was set by fan2go or the predicted rounded results. So instead of just verifying RPM values you also check the actual PWM that was set by the driver, and then compare that list to a pre-generated list of rounding error tuples. If the actual results match that list all the way down, then it's safe to assume a percentage device and apply the +-2 on the validation or alternately use the same rounding code as the driver to round the curve-value before passing the PWM value to the driver. However without doing more math to verify, that only guarantees matches the 0-100% values, and would possibly fail to spot 1-100 or 0-99. (possibly. Those results might result in the rounding exactly matching, in which case we've covered an entire group of hardware.) My second thought was to look at the deviation between expected and actual results across the test, and if every 'incorrect' value is within ~0.5% deviation and close to the expected results, then you can safely establish that this system is using some form of rounding. Then it's possible to use that devices actual upper and lower deviation amounts as the 'wiggle room' for the Warning test code. In the absence of more clever and math heavy ideas that involve plotting the function of the entire rounded entire results list to figure out what 255 and 0 are *actually mapped to, I think a combination of those two may be the most reliable and least painful option. Adding the 'Is this a 100% device' code would be a functional stopgap, finding the deviation and allowing for that much "error" resolves the log spam for most devices, and then the clever solution can be implemented in a later revision after it's been found. (and the clever solution may not be that difficult to work out, since you should be able to find the function from 5 or so values since the function should still be almost perfectly linear. |
Looking at the kernel driver mapping one thing I noticed is the pwm value is taken as a long rather than an integer in that div operarion. I'll look through `fancontrol` and `liquidctl` to see how they submit pwm values. Liquidctl may bypass the driver and talk directly to the USB device though.
It may be the case that we would get more accurate results using duty values instead of pwm directly but I will have to play around with it.
…On Dec 8, 2021, 00:41 -0700, eldorel ***@***.***>, wrote:
> > TL;DR Writing and executing code to test every single possible PWM value and create a blacklist for 'invalid' values for every single fan for a new type of hardware with an extremely predictable rounding error is an inefficient use of both programmer effort and the user's time.
> Do you have an example / a suggestion how an algorithm that detects the use of percentage values might work? An algorithm that could automatically turn on "the switch" in fan2go which tells it to use 1-100 instead of 1-255?
This is where I wish I spent more time coding things rather than installing them, I'd rather just submit code, but that'll take me weeks to get right...
My initial though is that the round_closest results are predictable, and we can use that.
So adding a sampling routine during the existing init that intentionally checks the actual PWM value for a handful of test values that would be rounded up or down to see if they return the result that was set by fan2go or the predicted rounded results.
So instead of just verifying RPM values you also check the actual PWM that was set by the driver, and then compare that list to a pre-generated list of rounding error tuples.
If the actual results match that list all the way down, then it's safe to assume a percentage device and apply the +-2 on the validation or alternately use the same rounding code as the driver to round the curve-value before passing the PWM value to the driver.
However without doing more math to verify, that only guarantees matches the 0-100% values, and would possibly fail to spot 1-100 or 0-99. (possibly. Those results might result in the rounding exactly matching, in which case we've covered an entire group of hardware.)
My second thought was to look at the deviation between expected and actual results across the test, and if every 'incorrect' value is within ~0.5% deviation and close to the expected results, then you can safely establish that this system is using some form of rounding.
That allows for setting a "percentage" flag.
Then it's possible to use that devices actual upper and lower deviation amounts as the 'wiggle room' for the Warning test code.
In the absence of more clever and math heavy ideas that involve plotting the function of the entire rounded entire results list to figure out what 255 and 0 are *actually mapped to, I think a combination of those two may be the most reliable and least painful option.
Adding the 'Is this a 100% device' code would be a functional stopgap, finding the deviation and allowing for that much "error" resolves the log spam for most devices, and then the clever solution can be implemented in a later revision after it's been found.
(and the clever solution may not be that difficult to work out, since the function should still be almost perfectly linear no matter what the hardware's version of 0-100 happens to be, and once you have that function it should be trivial to find out what 255 and 0 are mapping from... )
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
I've figured out a solution but @markusressel may not like it. So here's our situation: many controllers like the corsair commander pro or hydro platinum normalize their pwms, mapping to a value between 0 and 100. The function to do something like that is along the lines of // Round rounds the pwm value to a percentage of the pwm between 0 and 1 rounded
// to two decimal places.
func Round(pwm int) int {
duty := math.Round((float64(pwm) / 255.0) * 100) / 100
return int(math.Round(duty * 255))
} This takes the pwm value and rounds it to a fraction of 255 that has no more than two decimal places behind zero and then rounds again right before the percentage is converted back into an integer for a nice integer between 0 and 255. The main places to add this function are in the two func (f *fanController) UpdateFanSpeed() error {
fan := f.fan
target := util.Round(f.calculateTargetPwm()) // <- we round here
if target >= 0 {
err := f.setPwm(target)
if err != nil {
ui.Error("Error setting %s: %v", fan.GetId(), err)
trySetManualPwm(fan)
}
}
return nil
} func (fan *HwMonFan) SetPwm(pwm int) (err error) {
ui.Debug("Setting Fan PWM of '%s' to %d ...", fan.GetId(), util.Round(pwm))
err = util.WriteIntToFile(util.Round(pwm), fan.PwmOutput)
return err
} Now the problem here is that devices that want this are only a subset of devices out there and we don't want to test for that. So to please everybody my proposed solution is to normalize all of the pwm values before they're written with There's good reasons to just round everything
Reasons to not do it In my opinion, normalizing the values handles that warning for people who have a device with a duty of 0 to 100 and doesn't hurt those who have a device that doesn't. I tested this on my system but I'd rather get your opinions before I submit another PR @markusressel @eldorel |
No worries 😄 . I just cannot invest as much time into this than I would like, and this issue isn't really that straight forward to solve in a good way. I have started to work on this already, but I am currently very limited in my ability to test things on real hardware.
Indeed I did think about using percentage values for curves, instead of However, the range of numbers that are computed by the curves and what value is later applied to the fans should be independent from one another. If I were to change the curve implementation to use percentage values, I would also give it more accuracy by using float values, which means it really only changes the range of numbers, but doesn't (negatively) affect accuracy. With this it would still be possible to map
I bet you they will 😝 . I very much appreciate the continued interest in fan2go. I just need some time to evaluate the options we have discussed. |
That makes sense, and I have time if there's anything you want me to take a stab at. The codebase is well organized and easy to navigate so I think I can help lighten the load. Until then I believe I'm going to work off my fork just to suppress those warnings and finish up my service. I'll still track this repo as upstream and offer any help if you need any. |
Just to followup on the logspam for later reference: and JournalD shows the total line count in the log on the bottom left corner. That's an increase of over 2000 lines in approximately 20 minutes, with me having the tickrate and poll rates all set to 1 or 2 seconds vs the defaults... I'm also only actively controlling 3 of 6 fans on that controller at the moment. |
@eldorel not to endorse a fork but I "fixed" this in the main branch of my fork. I round the values like I explained in the comment up above. So if you want a stop gap until we figure out a general solution just build for my fork and use that. |
I usually try to stay with the main project on tools I use so that I can provide direct feedback without trying to ID if it's an issue from the fork. That said, I REALLY appreciate how quickly both of you are working to get these USB hid devices to work without major headaches. @markusressel @elais Thank you both. |
Well... fan2go thinks that something is really wrong with multiple fans, so thats expected. As soon as fan2go knows how to handle fans with percentages, the log shouldn't even pop up. So to summarize: |
@markusressel would you be open to creating a feature issue for using percentages in addition to raw pwm values? I have some ideas that I'd like to run by you. |
@elais If you are unsure about some idea, just open a new discussion (or issue, I don't really care) and create a writeup of your proposal. We can always figure out together if the idea makes sense or not. |
@elais @eldorel I have worked on this issue again. Could one (or both) of you check out the feature/#64_support_rounding_hwmon_drivers branch and test if it works for you? Make sure to delete the fan curve database before updating, as the old data will still contain every PWM value in
|
Hey I will be traveling out of town and thus away from my pc over the Christmas holidah. I will leave it on for ssh but I can't guarantee I'll be able to do much testing (because if it gets shut down or frozen I won't be able to get to it for 2 weeks) |
No worries, take a break and happy holidays 🎄 🎅 🤶 |
…_hwmon_drivers Feature/#64 support rounding hwmon drivers
I went ahead and released this in |
My logs were being flooded with |
This error seems to be related to a file that LM_Sensors doesn't create for my fan controller (a corsair commander pro), but it seems to also prevent the PWM values from being set to sane values when fan2go is closed or the system reboots.
Here are all of the errors, for just one fan. I have 6 fans on this controller, but the errors are all identical. :
INFO Using configuration file at: /etc/fan2go/fan2go.yaml
WARNING Cannot read pwm_enable value of Front-02
INFO Gathering sensor data for Front-02...
INFO Loading fan curve data for fan 'Front-02'...
INFO Start PWM of Front-02: 0
INFO Max PWM of Front-02: 255
WARNING Could not enable fan control on Front-02, trying to continue anyway...
INFO Starting controller loop for fan 'Front-02'
This error only occurs when the fans are not at 255 PWM or 0 PWM.
The text was updated successfully, but these errors were encountered: