-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dell Lattitude 5420 (TGL) throttled to 400MHz CPU / 100MHz GPU after 30s #293
Comments
I neglected to mention: running an identical workload on windows completes with no degradation in power. The system gets much warmer and the fans run at a clearly higher speed. Based on this observation, it seems clear that the system is not being limited by some physical thermal problem. |
On Fri, 2021-03-05 at 12:34 -0800, Mark Janes Intel wrote:
Kernel: 5.11.3
Debian: Testing
thermald: 2.4.3 (debian unstable)
processor: i7-1185G7 -- 28 W TDP
After running power-intensive workloads for a short amount of time,
the CPU and/or GPU will be throttled down drastically to ~10% of
peak.
Running turbostat reveals that the peak current is ~16W, far below
the TDP limit.
Running lm-sensors shows that the peak temp is ~50C, far below the
limit.
After reading #291 and #280, I enabled debug logs for thermald.
thermald.log
@spandruvada let me know if more information is needed. I can also
bring the system to you in JF1. Mesa team will be using this laptop
model for perf analysis.
If this is the complete log, then as you observed that non of the
temperature triggered any throttling.
Bring the system, we can take a look.
Thanks.
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
On Fri, 2021-03-05 at 12:37 -0800, Mark Janes Intel wrote:
I neglected to mention: running an identical workload on windows
completes with no degradation in power. The system gets much warmer
and the fans run at a clearly higher speed. Based on this
observation, it seems clear that the system is not being limited by
some physical thermal problem.
Probably some stetting, which Windows is aware of it.
We can't compare with Windows as we don't have support of several
conditions in the table on this system, so using best effort.
Particularly power slider and probably fan control stuff.
So we need to find what else we can do with these limitations.
Thanks.
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I see different behavior setting on 2.4.3 version from this repository and version in debian. The power limit is not getting set in Debian version. So is Debian back-porting patches? If that is the case they should have different private version. |
I'll sort that out first thing Tuesday. |
@spandruvada thanks for the work to figure out why this system was turning the gpu down to 100mhz. Your test branch improves the situation substantially, although it looks like there is still a long way to go. lm-sensors reports that the package max temp is 100 degrees Celsius. Is that accurate/realistic? If so, then it seems like thermald should wait longer before cutting power. If not, then it seems like thermald could settle on a much higher current for the package... at 12W, the package temp declines below what is necessary and performance suffers. I used unigine heaven for this data point. I took a look ath the Thermal Analysis Tool on Windows, but I couldn't see how to get similar data from that platform. If you can give me some pointers, I should be able to at least understand what frequency/power levels windows achieves, and what the stable max temp is. |
When I booted to the windows partition, I noticed that updates were running in the background, which can perturb performance measurements. I let the system complete a full software update, which updated the firmware on the device. |
So, the power slider condition we could support (either using a default value or pulling a value from However, that would only help if we can resolve the |
I've met the same issue. @benzea does thermald 2.4.4 resolve the issue? I've built 2.4.4 for Fedora 34, installed it, and now have almost constant 1700Mhz instead of 400Mhz. That's fine, but my CPU temp is still too low (~54C), so I am sure that the CPU can gain a higher clock speed. Is it possible somehow? |
If it will help - that's a log from
Laptop: Dell Latitude 5420 with 11th Gen Intel(R) Core(TM) i7-1165G7 CPU. |
Oh, a newer thermald for Fedora would help? Sorry about that. I thought I had picked up the important patches downstream already (even if I had an older version). I can update the package so that others benefit from that. |
Yeah, 2.4.4 helps somehow on Fedora but not completely resolve the issue. So without thermald 2.4.4 (with older thermald version or without it) is still downclocked to 400Mhz after ~30 secs. With thermald 2.4.4 the highest clock is 1700 Mhz. Would be awesome if you'll build thermald 2.4.4 for Fedora :) It's still too low since the usual clock for the CPU is 2800Mhz. And I have no idea how it can be fixed :( |
@benzea any news about Fedora updates? |
On its way now. |
I am not familiar with modern Linux CPU scheduling but I think the real root of the issue is some bugs in Maybe anyone from thermald team can provide more information. I will try to test another Dell Latitude 5420. Also in a few days I'll test Dell Latitude 5410 (hope it'll work better). By the way - with modern Intel CPUs is using Thermald necessary or not? |
Please don't jump to such conclusions. The problem is that we need to do thermal management in userspace. To do so, we need to parse data from ACPI which we are not fully implementing because Intel is not publishing the specification. And, on top of that, there may also be vendor specific things. i.e. probably
Yes. |
@benzea Thanks! Can you please describe to me a little bit more, what is the real difference in thermal management between the
Do you have any suggestions, how can I debug it? Maybe there is some already existing guide for it. I am ready to invest some time into it and assist you as much as I can. |
Not really. You can enable debug logging for thermald and it'll dump more detailed information. It might be possible to guess what the condition is based on by looking at the values and the various limits that are being applied. At the end, if we can just emulate a sane default value, we might not even need to know the exact meaning. For power-slider we just assume a "balanced" performance right now for example. |
I pushed another change to fix the performance gap once you update BIOS on this system. |
Absolutelly the same issue with Latitude 7520 |
If the issue is same on 7520, does the latest thermald fix the issue? |
The same as for @zamazan4ik
With the latest version CPU stuck on 1800mhz, without thermald -- 400mhz |
Since now I have Dell Latitude 5410 - I cannot test the latest thermald on 5420. I'll try to test the latest thermald on the 5410. I hope @benzea ported latest changes to the Fedora version. |
Fedora 34 and 35 both have thermald 2.4.6 currently. |
I've attached debug log with the latest(2.4.6) version of thermald. Not sure if it's helpful In the log we can see dropping frequency to 1800mhz(temp down to 55 from 73) after a few seconds of
|
Is this log with --adaptive option?
…On Tue, 2021-06-29 at 04:56 -0700, Dmitry Rubtsov wrote:
I've attached debug log with the latest(2.4.6) version of thermald.
Not sure if it's helpful
In the log we can see dropping frequency to 1800mhz(temp down to 55
from 73) after a few seconds of stress -c 8
thermald.log
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
no, I attached a new with adaptive option |
The concern is that there are no sensors:
RN]Unable to find a zone for TSKN
[1624968504][WARN]Unable to find a zone for NGFF
[1624968504][WARN]Unable to find a zone for TMEM
[1624968504][WARN]Unable to find a zone for TMEM
[1624968504][WARN]Unable to find a zone for TMEM
[1624968504][WARN]Unable to find a zone for TMEM
[1624968504][WARN]Unable to find a zone for TSSD
[1624968504][DEBUG]check trip zone:0:0
What is the kernel version?
Check
/sys/class/thermal/thermal_zone*/type
if these sensors exist.
…On Tue, 2021-06-29 at 05:10 -0700, Dmitry Rubtsov wrote:
no, I attached a new with adaptive option
thermald-adaptive.log
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
|
May be try to update to the latest BIOS.
This doesn't show sensors described in the thermal configuration.
Do you see driver loaded
lsmod | grep -i int3
What is the output of
ls /sys/bus/platform/devices/
Thanks.
…On Tue, 2021-06-29 at 05:35 -0700, Dmitry Rubtsov wrote:
***@***.*** ~]# cat /sys/class/thermal/thermal_zone*/type
INT3400 Thermal
TCPU
iwlwifi_1
x86_pkg_temp
***@***.*** ~]# uname -a
Linux dell 5.12.13-arch1-2 #1 SMP PREEMPT Fri, 25 Jun 2021 22:56:51
+0000 x86_64 GNU/Linux
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
How long did you run the stress test and at what were the temperatures? |
@sebastianha 2 minutes. 67C. It's very unstable. Currently for one minutes I got 3700 MHZ with 97C and then drop to 2300MHZ. Each time different result. Sometimes stuck with 400 MHz |
Thanks, that is really strange behaviour. Will test it as soon as I have time. |
Does anyone try solution like that https://www.ultrabookreview.com/14875-fix-throttling-xps-15/? on 5420 or 7420? |
As it works for me correctly when using Windows I don't assume a hardware problem here. With Windows and throttle stop I get constant high frequencies at high temperatures. Of course with new cooling paste you will get a little longer the higher speeds but currently my problem is the extrem throttling which only occurs on Linux. |
On Linux on 7320 no matter what I do I can't reach 60°C on CPU or more, also cooling fan even on ultra performance mode won't hit fastest speed. So I believe that it has nothing to do with temperature. Also I played with throttled - there is some perfmon which tells that system is throttled because of power. That sounds more probably that something is throttling to get power consumption under some level. But issue is that it's not possible to move that level (and doesn't matter if laptop is powered from dock, travel charger or battery, same behaviour). |
Open new ticket as current was closed. |
I've just upgraded BIOS on 7320 to 1.14.1 and no change. Still downscale to 1800MHz under load, CPU temperature ~50°C. Something new about Dell guys? :-) |
Same here, sadly nothing new from Dell :( |
I do have a slight update from Dell, looks like the remote Linux engineering team has agreed to take a look at this issue, but they don't currently have a local engineer assigned to interface with this team yet, so I don't have any further information. I have been advised that there is no guarantee that this will be fixed, and even hoping for a fix should be cautioned. |
Dell is collection information: https://www.dell.com/community/Latitude/Latitude-5420-7420-7520-CPU-Throttling-Issue-on-Linux/m-p/8129749/highlight/true#M39458 |
@sebastianha since I have no Dell account, can you please re-post information from me to this Dell thread? Thanks in advance! Model: Dell Latitude 5410 |
done |
A finding from my side: When booting without thermald the throttling goes down to 400MHz. When starting systemd it is higher and throttles to 1800MHz. I have the same effect when not using thermald but only disabling "intel-rapl:0":
Then the CPU does not go down to 400MHz anymore! I did a strace on thermald and checked for changed in /sys/. Then I did it step by step and broke it down to the line above.I still think that thermald is not in control of the CPU at all on this notebooks. My hypothesis: Currently system is only triggering that the CPU is no longer limited to 400MHz but everything else has no effect. |
#318 (comment) With enabled 2.4.8 thermald.service I got stable 1500MHz. Current BIOS version: 1.15.1 |
All the commentators need to start mentioned their Linux version. Many incorporations of TOPower features were incorporated into the Linux kernel over revisions up to 5.15. |
Sorry for being a bit off topic, but did Dell just silently delete this post? I see message not found when I go there and it vanished from my subscriptions in account settings. |
Same for me, the post vanished and is not appearing in the subscriptions in my account settings. |
That's ridiculous!! They also deleted related post here: |
They could collect all our posts in one thread, or we all just create a new thread each 😇 |
Well, people recommending Lenovo on Dell's forum probably didn't help. Still, this is totally ridiculous - I'd get if they just deleted Lenovo promotion posts (that were totally justified, IMO), but nuking whole conversation where their own community manager asked for help from community which community happily provided is just insane. |
From Dell support:
(Edit) And a second one:
|
I am on 5.16.8, issue still persists. Seems Dell does not play nicely with Linux kernels and updates. |
5.16.9, issue exists. |
5.16.10-1.el8.elrepo.x86_64 it still exists here as well. There's one other thing that just came to mind: the fan sensors don't work. In all of my previous Dell laptops, the system is able to read fan speed from lm_sensors. In this case the fan speed is not available. I wonder if the lack of fan speed data is causing thermald to make some assumptions that aren't correct. |
I have a Dell Latitude 5420 running Ubuntu 20.04 with kernel 5.14.0-1024-oem and BIOS 1.15.1. Running stress all cores start at 4,3GHz and then stabilize at ~3,8GHz until I hit exactly 5 minutes of stress test. At this point, all cores go down to 2,5GHz and the cores stay at ~60ºC. |
That is definitely different behavior than on my 7320. I can see similar behavior when using windows. There also there is top speed at high temperatures for a few minutes and then it throttles and goes down to ~70°C. |
To clean up things I opened a new ticket only for the Latitude 7320 with all my current findings: #341 |
Kernel: 5.11.3
Debian: Testing
thermald: 2.4.3 (debian unstable)
processor: i7-1185G7 -- 28 W TDP
After running power-intensive workloads for a short amount of time, the CPU and/or GPU will be throttled down drastically to ~10% of peak.
Running turbostat reveals that the peak current is ~16W, far below the TDP limit.
Running lm-sensors shows that the peak temp is ~50C, far below the limit.
After reading #291 and #280, I enabled debug logs for thermald.
thermald.log
@spandruvada let me know if more information is needed. I can also bring the system to you in JF1. Mesa team will be using this laptop model for perf analysis.
The text was updated successfully, but these errors were encountered: