Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi3B thermal throttling #1337

Closed
clivem opened this issue Mar 10, 2016 · 30 comments

Comments

@clivem
Copy link
Contributor

commented Mar 10, 2016

You might want to consider throttling the Pi3B lower than 600MHz from the VC firmware.

This is with a 10mm high Pimoroni ali heatsink fitted to Pi3B, in the official case, 4 cpu cores being stressed, and it appears, that even though the temp is still just shy of 91degC, the firmware won't throttle any lower than the base 600MHz.

$ vcgencmd measure_temp && vcgencmd measure_clock arm
temp=90.8'C
frequency(45)=600000000

@popcornmix

This comment has been minimized.

Copy link
Collaborator

commented Mar 10, 2016

Are you testing with latest rpi-update firmware? There have been some tweaks since launch.
Can you show output of vcgencmd get_config int and say what you are running to stress it?

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 10, 2016

cpuburn-a53 generates the most heat!

$ vcgencmd version
Mar 7 2016 17:08:55
Copyright (c) 2012 Broadcom
version 552adf40d2c18ab95fbfbbca990d303a170f3d74 (clean) (release)

$ vcgencmd get_config int
arm_freq=1200
audio_pwm_mode=1
audio_sdm_mod_order=2
config_hdmi_boost=5
core_freq=250
desired_osc_freq=0x36ee80
disable_audio_dither=1
disable_commandline_tags=2
disable_l2cache=1
emmc_pll_core=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_drive=2
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
max_usb_current=1
over_voltage_avs=0x1cfde
overscan_bottom=47
overscan_left=47
overscan_right=47
overscan_top=47
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
sdtv_aspect=1
temp_limit=85

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 10, 2016

$ vcgencmd measure_temp && vcgencmd measure_clock arm
temp=90.8'C
frequency(45)=600000000

$ vcgencmd version
Mar 9 2016 18:12:03
Copyright (c) 2012 Broadcom
version 3a754304b032a5298ee7889b179c667bbc75dec5 (clean) (release)

@popcornmix

This comment has been minimized.

Copy link
Collaborator

commented Mar 10, 2016

Can you confirm that vcgencmd measure_volts is returning 1.2V

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 10, 2016

$ vcgencmd measure_volts
volt=1.2000V

@robingroppe

This comment has been minimized.

Copy link

commented Mar 11, 2016

Wow. That thing is getting really hot. I cannot get it any hotter than 70deg C with the powersave governor selected.

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2016

Yep, after leaving it overnight, it's now > 100degC, while throttled to 600MHz. I really do think the firmware should throttle lower than 600MHz if temp > 85degC.

Fri 11 Mar 11:16:04 GMT 2016
temp=101.0'C
frequency(45)=600000000

@popcornmix

This comment has been minimized.

Copy link
Collaborator

commented Mar 11, 2016

On my pi3 with your set up, the temperature is constrained to ~85'C (86'C is the highest I see). The arm is mostly held at 600MHz but occasionally gets to about 800MHz.
This is without a heat sink and with no case. What effect does removing the case lid and/or removing the whole case have?

Can you test with this firmware: https://dl.dropboxusercontent.com/u/3669512/temp/firmware_thermal.zip
It should allow the arm to be limited down to 300MHz. Would be interesting to know how low it gets for you.

@robingroppe

This comment has been minimized.

Copy link

commented Mar 11, 2016

I dont think that it would help. I have tested a few things over the last
few days. Reducing the clock to 400mhz has no impact on temerature neither
have i noticed a difference with over_voltage=-2.
Am 11.03.2016 12:20 schrieb "Clive Messer" notifications@github.com:

Yep, after leaving it overnight, it's now > 100degC, while throttled to
600MHz. I really do think the firmware should throttle lower than 600MHz if
temp > 85degC.

Fri 11 Mar 11:16:04 GMT 2016
temp=101.0'C
frequency(45)=600000000


Reply to this email directly or view it on GitHub
#1337 (comment).

@robingroppe

This comment has been minimized.

Copy link

commented Mar 11, 2016

Cool! Hopefully ;) I will try that too.
Am 11.03.2016 12:58 schrieb "popcornmix" notifications@github.com:

On my pi3 with your set up, the temperature is constrained to ~85'C (86'C
is the highest I see). The arm is mostly held at 600MHz but occasionally
gets to about 800MHz.
This is without a heat sink and with no case. What effect does removing
the case lid and/or removing the whole case have?

Can you test with this firmware:
https://dl.dropboxusercontent.com/u/3669512/temp/firmware_thermal.zip
It should allow the arm to be limited down to 300MHz. Would be interesting
to know how low it gets for you.


Reply to this email directly or view it on GitHub
#1337 (comment).

@popcornmix

This comment has been minimized.

Copy link
Collaborator

commented Mar 11, 2016

@clivem can you report output of:

for i in 1 2 8 10 11 14; do vcgencmd read_ring_osc $((100+$i)); done
for i in 1 2 8 10 11 14; do vcgencmd read_ring_osc $((200+$i)); done
@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2016

Will try your test firmware next.....

$ for i in 1 2 8 10 11 14; do vcgencmd read_ring_osc $((100+$i)); done
read_ring_osc(101)=16.150MHz (@1.2000V)
read_ring_osc(102)=2.836MHz (@1.2000V)
read_ring_osc(108)=16.669MHz (@1.2000V)
read_ring_osc(110)=2.818MHz (@1.2000V)
read_ring_osc(111)=2.926MHz (@1.2000V)
read_ring_osc(114)=3.106MHz (@1.2000V)

$ for i in 1 2 8 10 11 14; do vcgencmd read_ring_osc $((200+$i)); done
read_ring_osc(201)=19.588MHz (@1.3000V)
read_ring_osc(202)=3.280MHz (@1.3000V)
read_ring_osc(208)=20.194MHz (@1.3000V)
read_ring_osc(210)=3.259MHz (@1.3000V)
read_ring_osc(211)=3.373MHz (@1.3000V)
read_ring_osc(214)=3.600MHz (@1.3000V)

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2016

OK, running your test firmware... Will look again in 10 mins. (cpuburn-a53 running on 4 cores)

$ while true; do echo date && vcgencmd measure_temp && vcgencmd measure_clock arm && vcgencmd measure_clock core; sleep 1; done
Fri 11 Mar 13:19:09 GMT 2016
temp=82.7'C
frequency(45)=600000000
frequency(1)=250000000
Fri 11 Mar 13:19:10 GMT 2016
temp=83.8'C
frequency(45)=874000000
frequency(1)=400000000
Fri 11 Mar 13:19:11 GMT 2016
temp=86.0'C
frequency(45)=710000000
frequency(1)=400000000
Fri 11 Mar 13:19:12 GMT 2016
temp=85.4'C
frequency(45)=600000000
frequency(1)=250000000
Fri 11 Mar 13:19:13 GMT 2016
temp=84.4'C
frequency(45)=600000000
frequency(1)=250000000

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2016

OK, starting to see CPU speed drop below 600MHz.

temp=86.0'C
frequency(45)=593000000
frequency(1)=250000000
Fri 11 Mar 13:22:28 GMT 2016
temp=86.0'C
frequency(45)=589000000
frequency(1)=250000000
Fri 11 Mar 13:22:30 GMT 2016
temp=86.0'C
frequency(45)=593000000
frequency(1)=250000000
Fri 11 Mar 13:22:31 GMT 2016
temp=86.0'C
frequency(45)=583000000
frequency(1)=250000000
Fri 11 Mar 13:22:32 GMT 2016
temp=86.5'C
frequency(45)=563000000
frequency(1)=250000000
Fri 11 Mar 13:22:33 GMT 2016
temp=86.5'C
frequency(45)=588000000
frequency(1)=250000000
Fri 11 Mar 13:22:34 GMT 2016
temp=86.0'C
frequency(45)=582000000
frequency(1)=250000000
Fri 11 Mar 13:22:35 GMT 2016
temp=86.5'C
frequency(45)=588000000
frequency(1)=250000000

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2016

With 10mm high passive heatsink...
Fri 11 Mar 13:46:26 GMT 2016
temp=86.5'C
frequency(45)=559000000
frequency(1)=250000000

Without heatsink ....
Fri 11 Mar 13:47:12 GMT 2016
temp=87.0'C
frequency(45)=498000000
frequency(1)=250000000

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2016

The main point being, that I'm not seeing temps >=90degC, now that the firmware will drop speed lower than 600MHz, when temp >85degC. ;)

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2016

Ok, all 3 of my test units have had time to settle now. All 3 in cases, only the first (with 10mm high passive Pimoroni heatsink) has >500MHz clock speed now, other 2 sans heatsink, ~470MHz.

Fri 11 Mar 15:24:55 GMT 2016
temp=86.5'C
frequency(45)=511000000
frequency(1)=250000000

Fri 11 Mar 15:26:06 GMT 2016
temp=88.1'C
frequency(45)=470000000
frequency(1)=250000000

Fri 11 Mar 15:26:23 GMT 2016
temp=87.0'C
frequency(45)=471000000
frequency(1)=250000000

Interestingly, I tried to add a 4th Pi3B to this test. Took a new unit from RS out of box, booted.... Anyway, to cut a long story short, it hard-locks completely trying to run any NEON code on it. "stress -c 4", just fine, but try to run cpuburn-{a7,a9,a53}, and kaboom as soon as you hit return.... LOL.

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2016

@popcornmix One thing I just noticed.... Let's say it throttles below 600MHz, then load is reduced, temp is reduced, but it stays at the reduced frequency. I guess I was expecting, with a drop in temp, to less than 85degC, speed would rise back to 600MHz, but as you can see below it remains at 500MHz.

Fri 11 Mar 23:57:05 GMT 2016
temp=67.1'C
frequency(45)=507000000
frequency(1)=250000000

@robingroppe

This comment has been minimized.

Copy link

commented Mar 12, 2016

Oh thats not good. I have opened my case yesterday and after that it was
barely throttling. Somtimes a drop to 1.1 or 1.0ghz. where it was with
closed case around 700mhz. i got a little aluminium heatsink on it.
So it a matter of where should the heat go. For me everything was half the
desaster as before with an open case.
Yesterday I ordered a S-Case. Everything will be alright ;)
Am 12.03.2016 01:00 schrieb "Clive Messer" notifications@github.com:

@popcornmix https://github.com/popcornmix One thing I just noticed....
Let's say it throttles below 600MHz, then load is reduced, temp is reduced,
but it stays at the reduced frequency. I guess I was expecting, with a drop
in temp, to less than 85degC, speed would rise back to 600MHz, but as you
can see below it remains at 500MHz.

Fri 11 Mar 23:57:05 GMT 2016
temp=67.1'C
frequency(45)=507000000
frequency(1)=250000000


Reply to this email directly or view it on GitHub
#1337 (comment).

@popcornmix

This comment has been minimized.

Copy link
Collaborator

commented Mar 12, 2016

@clivem for any given stress test there will be a certain temperature it will reach for a given frequency and that will be quite stable.
So, it makes sense if you limit the temperature to 85'C then the temperature will settle at a certain frequency (e.g. 500MHz). I don't know why you'd expect the temperature to be able to return to 600MHz if it couldn't maintain the desired at 600MHz before.

Anyway, we'll be including the lower temperature cap you've tested in the next firmware update. I feel it won't affect anyone except users of cpuburn-a53. Even fairly heavy NEON usage, like multi-threaded software video decode doesn't get anywhere near the temperature of cpuburn-a53.

@amtssp

This comment has been minimized.

Copy link

commented Mar 12, 2016

@popcornmix . I agree with Clivem, that the frequency should go up again when the temperature has been reduced below the setting point (in the example by clive the actual temp is 67.1C but the frequency is still reduced).
The load on the processor might be lower now so there could be room to increase the frequency. I don't see why the frequency should stay reduced forever, just because it once needed to be reduced.

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2016

@popcornmix At the point the load on the processor is reduced and temperature has dropped to <80, I'd expect an increase back to the default base speed of 600Mhz. (Or actually, if the last requested speed was 1.2G from the kernel governor, I'd expect it to be able to rise back to that if temperature is now low enough to permit it to do so.... Gradual, like on the way down.) We are no longer talking about needing thermal throttling to prevent the temperature reaching unsafe levels.......

As to your second point about only affecting, cpuburn-a53..... I did not cite this as an example, as the code is not open source and I cannot share it with you, but I have a transcoding suite that is heavily optimised for NEON. That is a "real world" use case, and that causes >85degC temperatures, just transcoding an albums worth of music, from one sample rate to another.

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2016

Massive variation between silicon.... Same case, same software image, everything the same, except the actual Pi3B board. Both have been running "stress -c4" for 2 hours....

One unit above 80degC and throttled back to 854MHz, the other below 80degC, not throttled, still at 1.2GHz.

** Test subject 1 **
Sat 12 Mar 14:36:15 GMT 2016
temp=82.7'C
frequency(45)=854000000
frequency(1)=400000000

** Test subject 2 **
Sat 12 Mar 14:36:15 GMT 2016
temp=77.4'C
frequency(45)=1200000000
frequency(1)=400000000

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2016

Out of x8 Pi3B boards I purchased, (ignoring the one that locks-up solid as soon as any NEON code is run on it), another board, the one I "soak tested" for 24 hours exceeding 100degC@600MHz before testing the modified firmware, is now suffering from random lock-ups. I wonder whether mid to long term, the PiF may end-up wondering why they shipped a board without any proper thermal solution, either physical or software based, when it is capable of running at some of the temperatures I have seen......

I think based on several hundred B+ and 2B boards through my hands...... to the best of my recall, I only ever had one factory fresh, "faulty" board or infant mortality. That out of x8 3B's purchased, two now reside in the junk drawer after a week, doesn't bode well for long-term reliability.

@popcornmix

This comment has been minimized.

Copy link
Collaborator

commented Mar 12, 2016

@popcornmix One thing I just noticed.... Let's say it throttles below 600MHz, then load is reduced, temp is reduced, but it stays at the reduced frequency. I guess I was expecting, with a drop in temp, to less than 85degC, speed would rise back to 600MHz, but as you can see below it remains at 500MHz.

I don't see any reason why frequency would be throttled if temperature is below 80'C. That's not the intended behaviour and the code doesn't appear to be doing that. I'll have to try to reproduce it.

Also we wouldn't expect any long term harm from briefly running over temperature. We have really stress tested devices - running in an oven at 200'C as well as being overvoltaged and overclocked and running a stress test. The Pi is still fine and we couldn't measure any difference in behaviour afterwards (e.g. current drawn or max overclockability).

Does adding over_voltage=1 (or 2 or 3 or 4) bring either of the bad Pi3's back to life?

@ali1234

This comment has been minimized.

Copy link
Contributor

commented Mar 13, 2016

Question: does the Pi also underclock if an undervoltage warning condition occurs?

I plugged my Pi 3 into a slightly weak power supply and observed the red LED going out occasionally and also noticed that my benchmark/stress test was running at half the usual speed.

@popcornmix

This comment has been minimized.

Copy link
Collaborator

commented Mar 13, 2016

Question: does the Pi also underclock if an undervoltage warning condition occurs?

Yes.

@clivem

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2016

@popcornmix

Does adding over_voltage=1 (or 2 or 3 or 4) bring either of the bad Pi3's back to life?

I will test.....

popcornmix added a commit to raspberrypi/firmware that referenced this issue Mar 15, 2016
kernel: BCM270X_DT: rpi-display overlay - add swapxy param
See: #564

kernel: Remove I2S config from bt_pins
See: raspberrypi/linux#1321

kernel: bcm2835-sdhost: Workaround for slow sectors
See: raspberrypi/linux@20fe468

firmware: pwm_sdm: first pass at optimisation
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: arm_loader: Allow frequency cap to go down to 300MHz when over temperature
See: raspberrypi/linux#1337

firmware: dtoverlay: Remove support for space/tab separators
firmware: host_applications: Add dtoverlay app
firmware: dtoverlay: Several small improvements
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=139732

firmware: Add gpioman changes to correctly handle dt-blob.bin for gpio_expander gpios

firmware: Updated dt-blob.dts to include Pi 3
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=140125
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Mar 15, 2016
kernel: BCM270X_DT: rpi-display overlay - add swapxy param
See: raspberrypi/firmware#564

kernel: Remove I2S config from bt_pins
See: raspberrypi/linux#1321

kernel: bcm2835-sdhost: Workaround for slow sectors
See: raspberrypi/linux@20fe468

firmware: pwm_sdm: first pass at optimisation
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: arm_loader: Allow frequency cap to go down to 300MHz when over temperature
See: raspberrypi/linux#1337

firmware: dtoverlay: Remove support for space/tab separators
firmware: host_applications: Add dtoverlay app
firmware: dtoverlay: Several small improvements
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=139732

firmware: Add gpioman changes to correctly handle dt-blob.bin for gpio_expander gpios

firmware: Updated dt-blob.dts to include Pi 3
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=140125
@Ruffio

This comment has been minimized.

Copy link

commented Aug 17, 2016

@clivem has your issue been resolved? If so, please close this issue. Thanks.

neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
kernel: BCM270X_DT: rpi-display overlay - add swapxy param
See: raspberrypi#564

kernel: Remove I2S config from bt_pins
See: raspberrypi/linux#1321

kernel: bcm2835-sdhost: Workaround for slow sectors
See: raspberrypi/linux@20fe468

firmware: pwm_sdm: first pass at optimisation
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: arm_loader: Allow frequency cap to go down to 300MHz when over temperature
See: raspberrypi/linux#1337

firmware: dtoverlay: Remove support for space/tab separators
firmware: host_applications: Add dtoverlay app
firmware: dtoverlay: Several small improvements
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=139732

firmware: Add gpioman changes to correctly handle dt-blob.bin for gpio_expander gpios

firmware: Updated dt-blob.dts to include Pi 3
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=140125
@JamesH65

This comment has been minimized.

Copy link
Contributor

commented May 18, 2017

Closing due to lack of activity. Reopen if you feel this issue is still relevant.

@JamesH65 JamesH65 closed this May 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.