Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat error message: "PWM of ${FAN_NAME} was changed by third party! Last set PWM value was" #64

Closed
eldorel opened this issue Dec 5, 2021 · 34 comments · Fixed by #77
Closed
Assignees
Labels
enhancement New feature or request

Comments

@eldorel
Copy link

eldorel commented Dec 5, 2021

This error seems to be related to a file that LM_Sensors doesn't create for my fan controller (a corsair commander pro), but it seems to also prevent the PWM values from being set to sane values when fan2go is closed or the system reboots.

Here are all of the errors, for just one fan. I have 6 fans on this controller, but the errors are all identical. :

INFO Using configuration file at: /etc/fan2go/fan2go.yaml

WARNING Cannot read pwm_enable value of Front-02
INFO Gathering sensor data for Front-02...
INFO Loading fan curve data for fan 'Front-02'...
INFO Start PWM of Front-02: 0
INFO Max PWM of Front-02: 255
WARNING Could not enable fan control on Front-02, trying to continue anyway...
INFO Starting controller loop for fan 'Front-02'

WARNING PWM of Front-02 was changed by third party! Last set PWM value was: 249 but is now: 250
WARNING PWM of Front-02 was changed by third party! Last set PWM value was: 249 but is now: 250

This error only occurs when the fans are not at 255 PWM or 0 PWM.

@elais
Copy link
Contributor

elais commented Dec 6, 2021

I'm getting this warning too. I know with the corsair commander fan curves are stored on the device, do you think its the case that the commander is fighting with fan2go for control of the fan's curve?

@markusressel
Copy link
Owner

As @elais already said, its not an error, its a warning and if you know what you are doing you can ignore it.

Not sure why you would want to have an external software (fan2go) control your fans if you also have a big (and probably expensive) hardware controller in your system to do the same thing 🤔

If the "corsair commander pro" does not set a PWM value of f.ex. 50 when told to set a value of 50, but instead sets a value of 48, then fan2go will detect that when trying to update the PWM value and notice that the PWM value set on that fan is not the one that fan2go has set the last time.

In most cases this is due to some other software also trying to control a fan (like f.ex. the BIOS/UEFI). In others it seems like a "feature"... hmm...

I am not fond of adding a config option to disable this message, as it should not be ignored in most cases. Adding an override option for maxPWM in addition to startPWM should be enough to prevent the initialization sequence from running, the "last PWM" problem probably isn't really fixable though.

@eldorel
Copy link
Author

eldorel commented Dec 6, 2021

its a warning and if you know what you are doing you can ignore it.

I'm already ignoring it, but it's the kind of edge case compatibility quirk that turns into an actual bug later, so I wanted to go ahead and confirm that it was not unique to Elias's specific hardware and that it's consistent across systems with this device.

Not sure why you would want to have an external software (fan2go) control your fans if you also have a big (and probably expensive) hardware controller in your system to do the same thing

Unlike older fan controllers that are standalone devices, the commander pro is a purely software controlled device with thermal probes and PWM headers that are exposed directly to the software. (as well as a few other ports)

With the iCue software, that allows for a lot of extremely detailed micromanaging of fan RPM and noise levels/frequencies that you can't do without building and programming custom hardware or spending hundreds on a pwm controller.

Since the commander pro driver was only added to the kernel a couple of revisions ago, I'm not surprised that there would be a few quirks, but I really want to be able to get reliable results with this hardware on linux.

If the "corsair commander pro" does not set a PWM value of f.ex. 50 when told to set a value of 50, but instead sets a value of 48, then fan2go will detect that when trying to update the PWM value and notice that the PWM value set on that fan is not the one that fan2go has set the last time.
In most cases this is due to some other software also trying to control a fan (like f.ex. the BIOS/UEFI). In others it seems like a "feature"... hmm...

I'm pretty sure that the driver for the commander is doing some kind of rounding, since I can reliably see the values are off by one or two when I set them from the command line using echo.
That said, the new value is instantaneous, and always within 1 or 2 of the intended value.

echo 90 > /sys/class/hwmon/hwmon7/pwm1 && cat /sys/class/hwmon/hwmon7/pwm1
89
echo 91 > /sys/class/hwmon/hwmon7/pwm1 && cat /sys/class/hwmon/hwmon7/pwm1
92
echo 100 > /sys/class/hwmon/hwmon7/pwm1 && cat /sys/class/hwmon/hwmon7/pwm1
99

The numbers that are "true' to what was set are also VERY consistent.
Setting the PWM to any of [...]97,99,102,105,107,110[...] does return the assigned number, but any other values in those ranges round to the nearest "true" number.

I am not fond of adding a config option to disable this message, as it should not be ignored in most cases.

Agreed. The message itself being so persistent is a minor issue though, as it effectively floods any other messages off screen.
Perhaps a safe compromise would be an option similar to the "maxRpmDiffForSettledFan" to allow for the software to accept values within 5 points of the requested PWM value as 'good enough'?

Adding an override option for maxPWM in addition to startPWM should be enough to prevent the initialization sequence from running

If by "initialization sequence" you are referring to determining the start and stop values, that seems to have ran fine despite the warnings.

the "last PWM" problem probably isn't really fixable though.

The only real concern I have with the warnings is that the software may be trying to disable PWM control when it closes and is failing because the fan controller doesn't have a no-pwm mode at all.

I have been noticing that the PWM[1-6] files are maintaining the value set through reboots as long as the system doesn't completely lose power, which means that the fans are stuck at whatever speed they were at prior to the restart until Fan2go is restarted.

The software gracefully detecting that a controller doesn't have a pwm_enable setting and falling back to explicitly setting a sane value when the program exits would be greatly appreciated.

@markusressel
Copy link
Owner

markusressel commented Dec 6, 2021

The software gracefully detecting that a controller doesn't have a pwm_enable setting and falling back to explicitly setting a sane value when the program exits would be greatly appreciated.

This should already be the case, but maybe the logic can be improved, see here. fan2go remembers the state of the fan and tries to restore it when stopping. If this fails, it tries to set the fan to max speed.

@elais
Copy link
Contributor

elais commented Dec 6, 2021

Not sure why you would want to have an external software (fan2go) control your fans if you also have a big (and probably expensive) hardware controller in your system to do the same thing

In my situation I’m watercooling and the hardware controller allows me to attach multiple sensors and devices (fans, pumps, a flow indicator) that I do not have enough headers for fine grained control on my itx board.

What fan2go gives me in this situation is the ability to combine multiple sensors for use in controlling the fans. I want to base my fan curve on the delta between water temp and ambient temp of the room. It’s pure vanity to want this but Fan2go is the best chance I have at getting this*.

As for the warning when it appears it floods out any other message. Especially when running four fans because that means it appears four times at each adjustment interval. I’d rather not flood the logs especially since it means a quick tail command to find a problem from within five minutes ago would involve tailing the last ~1200 log entries and hoping I can grep the right one.

As always thank you for your patience.

* pull request incoming 😉

I’m not sure how this created a new issue sorry for flooding your board. I’m posting from mobile and may have fat fingered something.

@markusressel
Copy link
Owner

Thx for the explanation, my bad, I thought the commander pro was more than a cable "hub" 😄

As for the warning when it appears it floods out any other message. Especially when running four fans because that means it appears four times at each adjustment interval. I’d rather not flood the logs especially since it means a quick tail command to find a problem from within five minutes ago would involve tailing the last ~1200 log entries and hoping I can grep the right one.

I understand. However, before changing anything about this log, I would really like to know if this rounding behaviour is "ok" (from a standards point of view), or not.

If it is, then fan2go needs to be able to detect each and every working PWM value beforehand (during initialization f.ex.) and ignore all other values. Implementing this would complicate the logic massively though...

If it is not, the warning is hinting at a faulty driver implementation and should also not be ignored. (but I guess thats not the case here)

As always thank you for your patience.

No worries, we are all just a very big team 🤓

@elais
Copy link
Contributor

elais commented Dec 6, 2021

Maybe introduce log levels? That way we can run the program at a different log level that would filter out warnings.

it’s a stopgap but at least we can solve this in the short term while leaving the door open for more fine grained logging if a user/developer chooses it.

@eldorel
Copy link
Author

eldorel commented Dec 7, 2021

The software gracefully detecting that a controller doesn't have a pwm_enable setting and falling back to explicitly setting a sane value when the program exits would be greatly appreciated.

This should already be the case, but maybe the logic can be improved, see here. fan2go remembers the state of the fan and tries to restore it when stopping. If this fails, it tries to set the fan to max speed.

From what I can tell, when manually run from a terminal or via init script; Stopping ffan2gan2go is currently not setting the PWM to max when stopped (nor, seemingly, setting PWM_Enable to false) .

The code for responding to SIGTERM is here

Unless there's other code I'm missing; That looks like Ctrl+C/SIGTERM only results in the INFO text "Exiting...", and the software stops immediately, leaving the PWM values set to whatever they happened to be at.

I can also 100% confirm that the pwm[1-6] files are not edited on exit, and those values persist until the software is restarted, and remain even through restart on my hardware. (so the exhibited behavior matches how I interpret that code)

please note: I'm a sysadmin, not a programmer, and this is only like the third or fourth time I've ever had to deal with GO, so I apologize for any errors or misunderstanding.

I would really like to know if this rounding behaviour is "ok" (from a standards point of view), or not.

If it is, then fan2go needs to be able to detect each and every working PWM value beforehand (during initialization f.ex.) and ignore all other values. Implementing this would complicate the logic massively though...

That seems very complicated.
Would it not be less complicated to adjust the PWM monitor logic to consider a range of results as acceptable instead of looking for a single exact value?

Mangled example: if lastSetPWM != ( currentPWM +2 || currentPWM -2)

2 points is only a 0.8% error. It might be a better idea to make that value optional or user specified, but that code isn't really verifying the correct value, it's just verifying that something else isn't making conflicting changes.

Maybe introduce log levels? That way we can run the program at a different log level that would filter out warnings.
it’s a stopgap but at least we can solve this in the short term while leaving the door open for more fine grained logging if a user/developer chooses it.

I agree with elais here.
loglevels would allow people who "know what they're doing" to reduce the log spam and make it much easier to test configs.

That said, "Something else is making changes" IS a very important error, that we don't really want to ignore.

I would prefer a solution that corrects the false positive rather than an option to ignore things that are actually potentially serious.

@elais
Copy link
Contributor

elais commented Dec 7, 2021

So looking more into it I believe the cpro is rounding to the nearest multiple of 3 - 1 (due to pwm values being index 0-255, making 0 the first number). So if your pwm is 90 it rounds down to 89, if it is 91 it rounds up to 92, so on and so forth. My guess is this is something unique to the cmdr pro.

I believe the solution to this would be to add a step size configuration where I could tell fan2go to only move in step sizes I define. In @eldorel and my case, we'd want to only move in step sizes of 3 to keep up with the pwm set by our controller. Others may run into similar issues where they have a controller that moves in steps of 2, 3, 5, etc.

The algorithm here would be something like currentPwm = pwmValue - ( pwmValue mod stepSize - 1) where we use the values of 0 or 255 exclusively to handle edge cases.

@markusressel what do you say to adding a configuration option like pwmStepSize? My guess is we would want to use it under the fans directive rather than curves.

EDIT: Tested more values and it isn't as straightforward as I thought. I'm now in favor of thresholds.

@markusressel
Copy link
Owner

Still, I wouldn't want either.

fan2go already tries to set each and every PWM value in the range [0..255] during initialization of a new fan. We just have to store which values actually work and which don't. Since setting a PWM value of 89 would result in a PWM value of 89, 89 is a usable PWM value. 90 on the other hand is not, as setting it does not result in the expected PWM value.

I already saw another behaviour of a GPU fan, where it was simply impossible (i think it printed an error?) to set a PWM value below a certain value (f.ex. 100). This is also a case where only "trial and error" and a memory of which PWM values worked would really always work. There is no step size, no threshold, just a lower boundary.

This might lead to a situation, where a (stupid) controller always sets x+1 instead of x and not a single PWM value looks "right" for fan2go. But its still the most correct way of handling it imho.

Step size is too specific, it seems like it wouldn't solve your problem
and it is nothing I would want to encourage hardware makers to do. A fan should be controllable with a "normal" range of numbers, like 0.00 - 1.00 or 0-255 or whatever. Not 0,2,5,8... who thought this was a good idea?!

Again, I am fine with fan2go "figuring out" which values work during initialization. I don't like just assuming that everything is probably most certainly maybe fine because asking for PWM 90 yields 89 which is not off enough to count it as an error...

Log levels technically already exist in fan2go, but currently there is only the --verbose flag which enables DEBUG logs. We could add more verbosity levels, but honestly I don't want people to debug fan2go using the logs. There should be a (terminal or other) UI that is able to give the required information at a glance.

@eldorel
Copy link
Author

eldorel commented Dec 7, 2021

Step size is too specific, it seems like it wouldn't solve your problem
and it is nothing I would want to encourage hardware makers to do. A fan should be controllable with a "normal" range of numbers, like 0.00 - 1.00 or 0-255 or whatever. Not 0,2,5,8... who thought this was a good idea?!

A developer trying to get a software controlled microcontroller that communicates in percentages to emulate a well supported legacy device spec that was originally designed to us an 8bit register to communicate with a discrete logic loop.

So looking more into it I believe the cpro is rounding to the nearest multiple of 3 - 1 (due to pwm values being index 0-255, making 0 the first number).

Looking at the kernel driver code, the commander's "PWM" setting on the hardware is a percentage value from 1-100%, and the driver is rounding that.
Which makes the actual mapping 1:2.55, round closest.

already saw another behaviour of a GPU fan, where it was simply impossible (i think it printed an error?) to set a PWM value below a certain value (f.ex. 100). This is also a case where only "trial and error" and a memory of which PWM values worked would really always work. There is no step size, no threshold, just a lower boundary.

I would argue that this is a completely different use case. Establishing the lower (or upper) limits of a controller is a one-time assessment to set a limit vs wasting cycles referencing a list of 'valid' values for every PWM change.

Having looked at the code for the driver, I would like to suggest a flag for fan config that enables a little rounding logic for devices that are percentage based.

Note: The commander pro isn't the only USB device on the market that works like this, it's just the first one that someone took the time to reverse engineer the USB communications for.
I have a pile of asus 'ROG fan controller' units that come with some of their motherboards that have basically the same hardware as the commander, and supposedly they also communicate with the ArmoryCrate software using a percentage.

@elais
Copy link
Contributor

elais commented Dec 7, 2021

Looking at the kernel driver code, the commander's "PWM" setting on the hardware is a percentage value from 1-100%, and the driver is rounding that.
Which makes the actual mapping 1:2.55, round closest.

So what you're saying here is if we were to map pwm value to duty somehow all our problems would be solved? It looks like the corsair's driver converts a long representing the pwm value using this macro

#define DIV_ROUND_CLOSEST_ULL(x, divisor)(		\
{							\
	typeof(divisor) __d = divisor;			\
	unsigned long long _tmp = (x) + (__d) / 2;	\
	do_div(_tmp, __d);				\
	_tmp;						\
}							\
)

@markusressel
Copy link
Owner

Looking at the kernel driver code, the commander's "PWM" setting on the hardware is a percentage value from 1-100%, and the driver is rounding that. Which makes the actual mapping 1:2.55, round closest.

Interesting, thx for looking it up.

I would argue that this is a completely different use case. Establishing the lower (or upper) limits of a controller is a one-time assessment to set a limit vs wasting cycles referencing a list of 'valid' values for every PWM change.

If you are looking for a fan controller that uses the least amount of CPU cycles, I am not sure fan2go is the best option 😛 Figuring out which PWM values "work" is also a one-time task, the interesting part would be how this knowledge would be applied when updating the PWM value. But again, although I obviously don't try to make fan2go waste resources, I also didn't start fan2go to built the smallest and fastest fan controller ever. That would be Auto: Off, PWM: 255 😆

I just want to avoid implementing multiple flags/workarounds/things for different fan controllers.
If devices don't behave all the same, I would rather implement a generic solution (like checking each and every PWM value) than some special code that detects the Commander Pro, or similar devices, and applies a "percentage rounding special sauce".

Having looked at the code for the driver, I would like to suggest a flag for fan config that enables a little rounding logic for devices that are percentage based.

I don't want the user to tell fan2go about quirks of their hardware. It should just work.
All configuration required by the user should be kept to a minimum, giving options where possible.
So, if possible, I would like to avoid a flag like that and have fan2go figure it out by itself.
I am not even sure how a user would be able to tell that he needs this flag enabled. I wouldn't know that I need it without googling about it. A Q&A in the README specifically mentioning each and every controller that uses percentages? Sounds like a maintenance nightmare to me 😅

Note: The commander pro isn't the only USB device on the market that works like this, it's just the first one that someone took the time to reverse engineer the USB communications for. I have a pile of asus 'ROG fan controller' units that come with some of their motherboards that have basically the same hardware as the commander, and supposedly they also communicate with the ArmoryCrate software using a percentage.

And there probably are (or will be) devices that use yet another range of values and the DIV_ROUND_CLOSEST_ULL method isn't applicable. I would rather built fan2go in a way that allows it to work with all of them, not only the variants we have seen.

@eldorel
Copy link
Author

eldorel commented Dec 8, 2021

Figuring out which PWM values "work" is also a one-time task, the interesting part would be how this knowledge would be applied when updating the PWM value.

Figuring that out for systems with multiple fans and pumps could take hours depending on the settle time for the device.
The current initialization routine is already a little stressful for a system with only one pump or cpu fan, but it may get into 'potentially damaging' if you have to add a validation cycle on top of that.

I just want to avoid implementing multiple flags/workarounds/things for different fan controllers.

I'm not really suggesting any device specific (or even permanent) workarounds.

There are a few things that we know.

  • Software based USB fan (and aRGB) controllers are becoming more common
  • Many of the ones on the market are using the same microcontroller as the corsair, which uses a 1-100% value
  • Even ones not using the exact same chip are likely to use atmel or cortex based CPUs, which have build in PWM timers that allow for setting the duty cycle as a percentage.
  • No matter what the hardware does, The LM_sensors kernel driver devs are going to be aiming to maintain compatibility with the long-standing 1-255 values that most linux/BSD/etc based fan/thermal controllers are expecting.

and

  • Any effective solution that can adapt to the rounded PWM values returned by a 1-100% based system should apply gracefully to any other devices of this type or similar.

I don't want the user to tell fan2go about quirks of their hardware. It should just work.
All configuration required by the user should be kept to a minimum, giving options where possible.
So, if possible, I would like to avoid a flag like that and have fan2go figure it out by itself.

A flag can be implemented in a small amount of time to account for this new hardware class and deprecated later if a more robust solution is implemented.
Elias and I are not the only people running into this, there are issues open on the fancontrol, and thinkfan issue trackers because people with windows systems are currently switching to linux thanks to windows 11 and steam's Proton making the transition slightly less painful.

I am not even sure how a user would be able to tell that he needs this flag enabled

The software could tell them during the INIT process as soon as it encounters one fan doing this and add the flag itself for each fan that it detects doing this, or alternately just add it to the help docs.
As you said yourself, you expect users to know what they're doing.

A Q&A in the README specifically mentioning each and every controller that uses percentages? Sounds like a maintenance nightmare to me

Having an extremely detailed test for "does this device use percentages" is fine.
It's just a bit inconsiderate to not allow the user a chance to go "ok, yeah fan number one requires this to be enabled, so I should just add that setting to the other 5+ fans on that controller and avoid having to wait to test all of them".

And there probably are (or will be) devices that use yet another range of values and the DIV_ROUND_CLOSEST_ULL method isn't applicable. I would rather built fan2go in a way that allows it to work with all of them, not only the variants we have seen.

As I said above, No matter what the hardware is, the LM_Sensors drivers are going to target the 1-255 value range and have a fanX_input file to report the current RPM.
That standard consistency despite different hardware implementations is the point of LM_sensors.

Allowing for some minor variation in specified PWM values (and verifying the actual RPM matches the approximate value expected) would not only allow for fan2go to gracefully adapt to those different rounding errors, it would also catch hardware errors like a dead or blocked fan running at the wrong RPM.

And seriously, modern and future hardware programmers are more likely to aim for 1-100% to represent duty cycle than they are to make up some other system, and even if they DO decided to use another system, the kernel drivers are still going to attempt to map it to the standard with the tiniest rounding error they can, but there will be rounding errors.

TL;DR Writing and executing code to test every single possible PWM value and create a blacklist for 'invalid' values for every single fan for a new type of hardware with an extremely predictable rounding error is an inefficient use of both programmer effort and the user's time.

@elais
Copy link
Contributor

elais commented Dec 8, 2021

Figuring that out for systems with multiple fans and pumps could take hours depending on the settle time for the device.

For the sake of argument I just want to say if a build is using water its on to run the fan tests in parallel because the water won't heat up fast enough to cause damaging problems. I ran my fans tests in parallel and cpu remained fine. The water temp rose to only 7°-8° above ambient.

@eldorel
Copy link
Author

eldorel commented Dec 8, 2021

Figuring that out for systems with multiple fans and pumps could take hours depending on the settle time for the device.

For the sake of argument I just want to say if a build is using water its on to run the fan tests in parallel because the water won't heat up fast enough to cause damaging problems. I ran my fans tests in parallel and cpu remained fine. The water temp rose to only 7°-8° above ambient.

How long a system can run without heating up really depends on the setup and the environment (including the user).
Additionally, there's a pretty good chance that fan2go will be able to directly control usb connected pumps soon via LM_Sensors, so I'd still argue that maintaining or reducing the initialization time should be a design concern.

Extra note:

It looks like DIV_ROUND_CLOSEST_ULL and DIV_ROUND_CLOSEST are both already used for a handful of other lm_sensors compatible hardware drivers as well for the same reason.

The are both also defined in the kernel itself, so this isn't some 'random' rounding code that someone pulled out of their rear end (recently at least).

At a quick glance, I found a handful of drivers for fan headers on intel server boards and intel gpus that I have available to test on, and with a bit of poking, they do apparently use a 1-100% duty_cycle.

@markusressel
Copy link
Owner

markusressel commented Dec 8, 2021

TL;DR Writing and executing code to test every single possible PWM value and create a blacklist for 'invalid' values for every single fan for a new type of hardware with an extremely predictable rounding error is an inefficient use of both programmer effort and the user's time.

Do you have an example / a suggestion how an algorithm that detects the use of percentage values might work? An algorithm that could automatically turn on "the switch" in fan2go which tells it to use 1-100 instead of 1-255?

@markusressel
Copy link
Owner

From what I can tell, when manually run from a terminal or via init script; Stopping ffan2gan2go is currently not setting the PWM to max when stopped (nor, seemingly, setting PWM_Enable to false) .

The code for responding to SIGTERM is here

You are right, this functionality apparently got lost in the recent rewrite, I will open a new issue for that.

@eldorel
Copy link
Author

eldorel commented Dec 8, 2021

TL;DR Writing and executing code to test every single possible PWM value and create a blacklist for 'invalid' values for every single fan for a new type of hardware with an extremely predictable rounding error is an inefficient use of both programmer effort and the user's time.

Do you have an example / a suggestion how an algorithm that detects the use of percentage values might work? An algorithm that could automatically turn on "the switch" in fan2go which tells it to use 1-100 instead of 1-255?

This is where I wish I spent more time coding things rather than installing them, I'd rather just submit code, but that'll take me weeks to get right...

My initial though is that the round_closest results are predictable, and we can use that.

So adding a sampling routine during the existing init that intentionally checks the actual PWM value for a handful of test values that would be rounded up or down to see if they return the result that was set by fan2go or the predicted rounded results.

So instead of just verifying RPM values you also check the actual PWM that was set by the driver, and then compare that list to a pre-generated list of rounding error tuples.

If the actual results match that list all the way down, then it's safe to assume a percentage device and apply the +-2 on the validation or alternately use the same rounding code as the driver to round the curve-value before passing the PWM value to the driver.

However without doing more math to verify, that only guarantees matches the 0-100% values, and would possibly fail to spot 1-100 or 0-99. (possibly. Those results might result in the rounding exactly matching, in which case we've covered an entire group of hardware.)

My second thought was to look at the deviation between expected and actual results across the test, and if every 'incorrect' value is within ~0.5% deviation and close to the expected results, then you can safely establish that this system is using some form of rounding.
That allows for setting a "percentage" flag.

Then it's possible to use that devices actual upper and lower deviation amounts as the 'wiggle room' for the Warning test code.

In the absence of more clever and math heavy ideas that involve plotting the function of the entire rounded entire results list to figure out what 255 and 0 are *actually mapped to, I think a combination of those two may be the most reliable and least painful option.

Adding the 'Is this a 100% device' code would be a functional stopgap, finding the deviation and allowing for that much "error" resolves the log spam for most devices, and then the clever solution can be implemented in a later revision after it's been found.

(and the clever solution may not be that difficult to work out, since you should be able to find the function from 5 or so values since the function should still be almost perfectly linear.
Once the program plots that function it should be trivial to find out what 255 and 0 are mapping from... )

@elais
Copy link
Contributor

elais commented Dec 8, 2021 via email

@elais
Copy link
Contributor

elais commented Dec 10, 2021

I've figured out a solution but @markusressel may not like it.

So here's our situation: many controllers like the corsair commander pro or hydro platinum normalize their pwms, mapping to a value between 0 and 100. The function to do something like that is along the lines of

// Round rounds the pwm value to a percentage of the pwm between 0 and 1 rounded
// to two decimal places.
func Round(pwm int) int {
	duty := math.Round((float64(pwm) / 255.0) * 100) / 100
	return int(math.Round(duty * 255))
}

This takes the pwm value and rounds it to a fraction of 255 that has no more than two decimal places behind zero and then rounds again right before the percentage is converted back into an integer for a nice integer between 0 and 255. The main places to add this function are in the two SetPwm functions found in file.go and hwmon.go and the updateFanSpeed function found in controller.go

func (f *fanController) UpdateFanSpeed() error {
	fan := f.fan
	target := util.Round(f.calculateTargetPwm()) // <- we round here
	if target >= 0 {
		err := f.setPwm(target)
		if err != nil {
			ui.Error("Error setting %s: %v", fan.GetId(), err)
			trySetManualPwm(fan)
		}
	}

	return nil
}
func (fan *HwMonFan) SetPwm(pwm int) (err error) {
	ui.Debug("Setting Fan PWM of '%s' to %d ...", fan.GetId(), util.Round(pwm)) 
	err = util.WriteIntToFile(util.Round(pwm), fan.PwmOutput)
	return err
}

Now the problem here is that devices that want this are only a subset of devices out there and we don't want to test for that. So to please everybody my proposed solution is to normalize all of the pwm values before they're written with SetPwm

There's good reasons to just round everything

  1. ) The PWM values will always be within 1 to 2 integers of the values calculated by our curves. In practice this will not change their shapes and even if they aren't exact there are few instances where a user will notice
    2.) This saves us the work of having to make special cases for some controllers because in practice they will almost always use either a percentage or a direct pwm value.
    3.) Users will not be able to tell that their curve has been normalized. In practice we're talking about the difference between 106 (the result of just converting the float to an integer without rounding) and 107 (the result of rounding before turning the float into an integer) in pwm values for most cases.
    4.) we can introduce a config option for users to just use the percentage instead of pwm values. I think most would rather give a percentage from 0 to 100 than an actual pwm value. I know in my case I'm usually busting out the calculator to figure out what pwm will run the fans at 30%, 50%, 75% etc.

Reasons to not do it
1.) it may mean that step configuration won't accurately reflect a number on the curve (being off by one or two values in either direction).
2.) it may break some tests.
3.) I personally am afraid I've been adding a lot of changes and you may be tired of me.

In my opinion, normalizing the values handles that warning for people who have a device with a duty of 0 to 100 and doesn't hurt those who have a device that doesn't.

I tested this on my system but I'd rather get your opinions before I submit another PR @markusressel @eldorel

@markusressel
Copy link
Owner

markusressel commented Dec 10, 2021

3.) I personally am afraid I've been adding a lot of changes and you may be tired of me.

No worries 😄 . I just cannot invest as much time into this than I would like, and this issue isn't really that straight forward to solve in a good way. I have started to work on this already, but I am currently very limited in my ability to test things on real hardware.

we can introduce a config option for users to just use the percentage instead of pwm values.

Indeed I did think about using percentage values for curves, instead of [0..255], mainly to make it easier for people to specify custom curves in the config (without the need for a calculator 😛 ).

However, the range of numbers that are computed by the curves and what value is later applied to the fans should be independent from one another. If I were to change the curve implementation to use percentage values, I would also give it more accuracy by using float values, which means it really only changes the range of numbers, but doesn't (negatively) affect accuracy. With this it would still be possible to map [0.00 - 100.00] to all values in [0..255].

Users will not be able to tell that their curve has been normalized.

I bet you they will 😝 .
It just feels wrong to always use a rounded value on an API that is supposed to work with [0..255], just because some device category does it differently. And in all seriousness, someone, somewhere at some point will open an issue about fan2go skipping some PWM values for no apparent reason 😄

I very much appreciate the continued interest in fan2go. I just need some time to evaluate the options we have discussed.

@markusressel markusressel added the enhancement New feature or request label Dec 10, 2021
@elais
Copy link
Contributor

elais commented Dec 10, 2021

That makes sense, and I have time if there's anything you want me to take a stab at. The codebase is well organized and easy to navigate so I think I can help lighten the load. Until then I believe I'm going to work off my fork just to suppress those warnings and finish up my service. I'll still track this repo as upstream and offer any help if you need any.

@eldorel
Copy link
Author

eldorel commented Dec 11, 2021

Just to followup on the logspam for later reference:
I was just opening a different ticket, and can show just how fast the logs accrue thanks to this error with screenshots now.

image

and

image

JournalD shows the total line count in the log on the bottom left corner.

That's an increase of over 2000 lines in approximately 20 minutes, with me having the tickrate and poll rates all set to 1 or 2 seconds vs the defaults...
It comes out to ~30 lines per minute, per fan

I'm also only actively controlling 3 of 6 fans on that controller at the moment.
(the other three are parked at 0 PWM since my case is <35C.)
that 3 lines per second makes it rather hard to see other errors without grep --line-buffered.

@elais
Copy link
Contributor

elais commented Dec 11, 2021

@eldorel not to endorse a fork but I "fixed" this in the main branch of my fork. I round the values like I explained in the comment up above. So if you want a stop gap until we figure out a general solution just build for my fork and use that.

@eldorel
Copy link
Author

eldorel commented Dec 11, 2021

@eldorel not to endorse a fork but I "fixed" this in the main branch of my fork. I round the values like I explained in the comment up above. So if you want a stop gap until we figure out a general solution just build for my fork and use that.

I usually try to stay with the main project on tools I use so that I can provide direct feedback without trying to ID if it's an issue from the fork.
The error messages are a minor issue that I can work around now that a few of the other issues have been addressed so quickly, but I did want to provide more context on them since they could be a problem for people who don't know how to filter journald output.

That said, I REALLY appreciate how quickly both of you are working to get these USB hid devices to work without major headaches. @markusressel @elais

Thank you both.

@markusressel
Copy link
Owner

That's an increase of over 2000 lines in approximately 20 minutes, with me having the tickrate and poll rates all set to 1 or 2 seconds vs the defaults...
It comes out to ~30 lines per minute, per fan

Well... fan2go thinks that something is really wrong with multiple fans, so thats expected. As soon as fan2go knows how to handle fans with percentages, the log shouldn't even pop up.

So to summarize:
In a working environment, there should be absolutely no log spam.
In a failing environment, things go nuts.

@elais
Copy link
Contributor

elais commented Dec 13, 2021

@markusressel would you be open to creating a feature issue for using percentages in addition to raw pwm values? I have some ideas that I'd like to run by you.

@markusressel
Copy link
Owner

markusressel commented Dec 13, 2021

@elais If you are unsure about some idea, just open a new discussion (or issue, I don't really care) and create a writeup of your proposal. We can always figure out together if the idea makes sense or not.

@markusressel
Copy link
Owner

markusressel commented Dec 17, 2021

@elais @eldorel I have worked on this issue again. Could one (or both) of you check out the feature/#64_support_rounding_hwmon_drivers branch and test if it works for you?

Make sure to delete the fan curve database before updating, as the old data will still contain every PWM value in [0..255]:

sudo systemctl stop fan2go
sudo rm /etc/fan2go/fan2go.db
sudo cp ./fan2go /usr/bin/fan2go
sudo systemctl restart fan2go

@elais
Copy link
Contributor

elais commented Dec 18, 2021

Hey I will be traveling out of town and thus away from my pc over the Christmas holidah. I will leave it on for ssh but I can't guarantee I'll be able to do much testing (because if it gets shut down or frozen I won't be able to get to it for 2 weeks)

@markusressel
Copy link
Owner

No worries, take a break and happy holidays 🎄 🎅 🤶

markusressel added a commit that referenced this issue Dec 19, 2021
…_hwmon_drivers

Feature/#64 support rounding hwmon drivers
@markusressel
Copy link
Owner

I went ahead and released this in 0.3.0. Feel free to create a new issue (or reopen this one) if the implementation isn't sufficient.

@forestofrain
Copy link

My logs were being flooded with changed by third party and it appears to be related to using OpenRGB. Since, I don't have any lights connected to the commander pro, I disabled that in OpenRGB and no longer see the log message. I have the commander core and pro in my system. The core drives the H100i elite capillex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants