-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU stuck at 400MHz with thermald enabled with kernel 5.8.16 #280
Comments
I suggest.
First disable thermald and reboot.
#systemctl disable thermald
reboot
#thermald --no-daemon --loglevel=debug --adaptive
and do whatever you do to get this issue.
Then copy the log from terminal and attach
On Wed, 2020-11-04 at 02:16 -0800, Marc Hanisch wrote:
Hello,
I'm running into the strange case, that my processor gets throttled to 400MHz when I'm enabling thermald. This happens since I upgraded to a new release of Fedora (Fedora 33), which uses the kernel version 5.8.16-300.x86_64.
When I run
$ cpu-power frequency-info
with thermald enabled, I get the following output:
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 400 MHz - 2.00 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 400 MHz and 2.00 GHz.
The governor
"
powersave
"
may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 400 MHz (asserted by call to kernel)
boost state support:
Supported: no
Active: no
The system is not responsive and very slow.
When I disable thermald with
$ sudo systemctl disable thermald
everything works fine again:
$ cpu-power frequency-info
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 400 MHz - 2.00 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 400 MHz and 2.00 GHz.
The governor "powersave" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 2.00 GHz (asserted by call to kernel)
boost state support:
Supported: no
Active: no
My system is a ASUS Laptop (Modelnr. X541UAK) with an Intel CORE i3 processor. thermald hasn't made any problems in the last years to me. To make sure that no other service is causing the problems, I've removed powertop as well as tlp.
Any suggestions? Thanks in advance!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#280>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA5ALNUT7XNRZFNTNW4D3D3SOESX5ANCNFSM4TJ3LYVA>.
|
I've done exactly that and the laptop was slow again immediately after starting thermald. I've attached the output - many thanks! :-) |
Very strange. Thermald didn't take any action here.
Is this full log?
When slowdown happens what are the values
#rdmsr -a 0x774
#grep -r . /sys/class/powercap/intel-rapl/intel-rapl\:0/*
#grep -r . /sys/class/powercap/intel-rapl-mmio/intel-rapl-mmio\:0/*
Also when you disable and reboot, what are the values?
Thanks,
Srinivas
On Wed, 2020-11-04 at 12:28 -0800, Marc Hanisch wrote:
I've done exactly that and the laptop was slow again immediately after starting thermald. I've attached the output - many thanks! :-)
thermald-output.txt<https://github.com/intel/thermal_daemon/files/5490335/thermald-output.txt>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#280 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA5ALNR3JCHYR3CUBPQNEO3SOG2NNANCNFSM4TJ3LYVA>.
|
Any update? |
Sorry, I will post more details later this week! :-) |
So I finally run the commands you mentioned. I've attached two files: one with the output when There are some differences between the outputs, but unfortunately I can't interpret them. Thank you so much for your help :-) |
In one window echo 0 > /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/enabledwhile true; do echo /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/enabled; sleep 1; doneAnd check if you go to this state? I guess you have to run something to go to this state. |
Mistake in above while command, change echo to "cat" If the above doesn't fix then also do this step echo 0 > /sys/class/powercap/intel-rapl-mmio/intel-rapl-mmio:0/enabled |
I have added a fix. Please help to verify. This is on special branch. After checkout thermald change the branch to Then follow README.txt for build procedure. |
Hey @spandruvada I am having the same issue on a Lenovo S730-13IWL with Intel Core i7-8565U 4 x 1.8 - 4.6 GHz, 16 GB RAM, and Intel UHD Graphics 620. I see that the branch is tagged for Asus but I thought it'd give it a shot and see if it worked for me. After building and installing it I am still having the same issue as @dubst3pp4. Witt thermald stopped and disabled: with thermald started and enabled (both latest for f33 and the one from this branch) Thanks for looking into this! |
Lets' start with Disable thermald #rdmsr -a 0x774 #thermald --no-daemon --loglevel =info --adaptive #rdmsr -a 0x774 Send the thermald log and also sysfs dumps before and after We can address this ASAP. |
For the sake of clarity I ran this with the thermald built from the asus_pl2_fix branch. Logs attached as requested. Let me know if there is anything else you'd like |
Attached is another thermald log when seeing sustained throttling at 400 MHz. Edit: I also didn't notice the kernel version in this issue, I have my packages fully updated in F33 and I am running 5.9.8-200.fc33.x86_64 |
Please apply the attached patch with git apply after unzip git apply test_lenovo_patch.diff |
Thanks @spandruvada, I've just build thermald and started it. The system runs without problems now! When I run
the output varies depending on the load (and is not stuck at 400MHz):
Do you need any additional info? Thanks for your support! |
After applying this patch I was not able to replicate my issue and it appears to be fixed. I gave it as much load as it could take and the thermald performance was great. I have included the log in case it yields any more useful info. Thank you so much for your efforts! |
Thanks. I will merge this as part of 2.4 release. So please check if you still have issue after the release.
On Wed, 2020-11-18 at 05:34 -0800, Marc Hanisch wrote:
I have added a fix. Please help to verify. This is on special branch.
https://github.com/intel/thermal_daemon/commits/asus_pl2_fix
After checkout thermald change the branch to
$ git checkout remotes/origin/asus_pl2_fix -b asus_pl2_fix
Then follow README.txt for build procedure.
Thanks @spandruvada<https://github.com/spandruvada>, I've just build thermald and started it. The system runs without problems now! When I run
cpupower frequency-info
the output varies depending on the load (and is not stuck at 400MHz):
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 400 MHz - 2.00 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 400 MHz and 2.00 GHz.
The governor "powersave" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 1.66 GHz (asserted by call to kernel)
boost state support:
Supported: no
Active: no
Do you need any additional info? Thanks for your support!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#280 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA5ALNRWWVAYWOTKYFD6KD3SQPEN7ANCNFSM4TJ3LYVA>.
|
Thanks for quick turnaround. I will roll out this fix in 2.4 release.
I see in the log that it is fixed.
On Wed, 2020-11-18 at 06:13 -0800, Rob Musial wrote:
Please apply the attached patch with git apply after unzip
git apply test_lenovo_patch.diff
test_lenovo_patch.zip<https://github.com/intel/thermal_daemon/files/5559976/test_lenovo_patch.zip>
After applying this patch I was not able to replicate my issue and it appears to be fixed. I gave it as much load as it could take and the thermald performance was great. I have included the log in case it yields any more useful info.
Thank you so much for your efforts!
thermald.1.log<https://github.com/intel/thermal_daemon/files/5560537/thermald.1.log>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#280 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA5ALNRCDZ2MIKJBJZYVDSTSQPJB3ANCNFSM4TJ3LYVA>.
|
@dubst3pp4 |
I know you addressed this to @dubst3pp4 but I gave it a run as well and was not able to replicate my previous errors. To confirm, this had the Lenovo fix as well? Thanks! |
On Sat, 2020-11-21 at 21:29 -0800, Rob Musial wrote:
@dubst3pp4<https://github.com/dubst3pp4>
Please test the 2.4-pre release. This is pushed to master branch. I see that we can improve more on your platform.
Check the version with "thermald -v".
Please attach a log.
I know you addressed this to @dubst3pp4<https://github.com/dubst3pp4> but I gave it a run as well and was not able to replicate my previous errors. To confirm, this had the Lenovo fix as well?
Yes. This has all the fixes which were high priority.
Thanks,
Srinivas
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#280 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA5ALNXS7346L4DAQXLVSNDSRCOVBANCNFSM4TJ3LYVA>.
|
Excellent, thanks. I just wanted to make sure I wasn't getting a false result and that my testing was correct. I've been running it for about 24 hours and it has been great. Thanks again for working on this so quickly. Really impressed. |
Thanks for your kind words.
On Sun, 2020-11-22 at 11:30 -0800, Rob Musial wrote:
Excellent, thanks. I just wanted to make sure I wasn't getting a false result and that my testing was correct. I've been running it for about 24 hours and it has been great. Thanks again for working on this so quickly. Really impressed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#280 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA5ALNSOCKFBKMYG6QHPVXLSRFRF3ANCNFSM4TJ3LYVA>.
|
I will test this tomorrow! Sorry once again for the slow response time ;-) |
Perhaps it's you, and not your PC, that's being throttled down to 400MHz @dubst3pp4 ? 😉 |
@ferdnyc Yes, indeed, I'm a little bit throttled because of some stack overflows ;-) @spandruvada I've just build the new version, checked that I'm using the correct binary version and run thermald with the debug log level. Everything seems to work very well 👍 See my logfile attached. I can just second @robmusial 's statement: thank you so much for your work! I rarely had so much support from the maintainers of an OSS project! 😀 |
Thanks for the test and compliments. I will release this version soon.
On Tue, 2020-11-24 at 23:30 -0800, Marc Hanisch wrote:
@ferdnyc<https://github.com/ferdnyc> Yes, indeed, I'm a little bit throttled because of some stack overflows ;-)
@spandruvada<https://github.com/spandruvada> I've just build the new version, checked that I'm using the correct binary version and run thermald with the debug log level. Everything seems to work very well 👍 See my logfile attached.
I can just second @robmusial<https://github.com/robmusial> 's statement: thank you so much for your work! I rarely had so much support from the maintainers of an OSS project! 😀
log_thermald.txt<https://github.com/intel/thermal_daemon/files/5595038/log_thermald.txt>
log_version.txt<https://github.com/intel/thermal_daemon/files/5595039/log_version.txt>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#280 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA5ALNS4SJU6QLVWGL3AAIDSRSXA5ANCNFSM4TJ3LYVA>.
|
Released version 2.4 with changes. |
Hello,
I'm running into the strange case, that my processor gets throttled to 400MHz when I'm enabling
thermald
. This happens since I upgraded to a new release of Fedora (Fedora 33), which uses the kernel version5.8.16-300.x86_64
.When I run
with
thermald
enabled, I get the following output:analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 2.00 GHz available cpufreq governors: performance powersave current policy: frequency should be within 400 MHz and 2.00 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 400 MHz (asserted by call to kernel) boost state support: Supported: no Active: no
The system is not responsive and very slow.
When I disable
thermald
witheverything works fine again:
My system is a ASUS Laptop (Modelnr. X541UAK) with an Intel CORE i3 processor.
thermald
hasn't made any problems in the last years to me. To make sure that no other service is causing the problems, I've removedpowertop
as well astlp
.Any suggestions? Thanks in advance!
The text was updated successfully, but these errors were encountered: