Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi 5 cannot overclock beyond 3.0GHz due to firmware limit(?) #1876

Open
youmukonpaku1337 opened this issue Mar 9, 2024 · 84 comments

Comments

@youmukonpaku1337
Copy link

Is this the right place for my bug report?
This issue seems to be firmware-related, as the clocking is done through it.

Describe the bug
Setting arm_freq beyond 3000 works fine, but vcgencmd measure_clock arm reports 3000 MHz, while software like Geekbench and btop detect it as the clock set by arm_freq, e.g. 3.1GHz

To reproduce

  1. Set arm_freq beyond 3000, and an according over_voltage_delta
  2. Reboot, and run vcgencmd measure_clock arm
  3. Check with something else, like btop or Geekbench
  4. Clocks will be mismatched and vcgencmd will only report 3.0GHz

Expected behaviour
The Pi is actually clocked beyond 3.0GHz and both vcgencmd and other software report it as such

Actual behaviour
The Pi is only clocked to 3.0GHz, and vcgencmd reports it as such, but software sees it as set in arm_freq

System
https://pastebin.com/U2KCBBnD

  • Which model of Raspberry Pi?
    Pi 5
  • Which OS and version (cat /etc/rpi-issue)?
    Raspberry Pi reference 2023-12-05 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4
  • Which firmware version (vcgencmd version)?
    2024/02/16 15:28:41 Copyright (c) 2012 Broadcom version 4c845bd3 (release) (embedded)
  • Which kernel version (uname -a)?
    Linux q-raspi5 6.6.17-v8-16k+ #1735 SMP PREEMPT Wed Feb 21 14:45:17 GMT 2024 aarch64 GNU/Linux
    Logs
    dmesg output is in the raspinfo paste

Additional context
If this is relevant, I used rpi-update to update to latest kernel and firmware version, no change
I have also set debian sources in sources.list to testing/trixie

@youmukonpaku1337
Copy link
Author

I'm pretty sure the Pi 5 can handle clocks beyond 3.0GHz as it's extremely stable at that clock, so that as well

@popcornmix
Copy link
Contributor

That's what we've been told is the limit of the PLL by Broadcom.

I've got a todo item to investigate what happens when this is exceeded, but it's not high on the priority list.

@youmukonpaku1337
Copy link
Author

That's what we've been told is the limit of the PLL by Broadcom.

I've got a todo item to investigate what happens when this is exceeded, but it's not high on the priority list.

ah i see, that's sad, hope it gets fixed soon! would love to run my pi at absurd clocks

@popcornmix
Copy link
Contributor

rpi-eeprom-recovery.zip

I've removed the 3GHz limit, and attached a zip file (you can flash it to an sdcard with rpi-imager) you can test.

Make sure you have no critical (unbacked up) data on the Pi you are testing.
Let me know if you succeed in going above 3GHz.

I could boot at 3.1GHz (and vcgencmd measure_clock arm confirmed that) but my Pi would crash when stressed.

@youmukonpaku1337
Copy link
Author

that's actually awesome, tysm, i assume i just flash it to an sd card that isnt the one i have raspbian on and boot?

@popcornmix
Copy link
Contributor

Yes - use a spare card.

@youmukonpaku1337
Copy link
Author

alright, thanks, will test asap

@geerlingguy
Copy link

Dangit how did I not see this issue before now :)

Going to see if I accidentally nuke my 'blessed' Pi 5 (the only one I've been able to get to 3.0 GHz so far).

@youmukonpaku1337
Copy link
Author

Dangit how did I not see this issue before now :)

Going to see if I accidentally nuke my 'blessed' Pi 5 (the only one I've been able to get to 3.0 GHz so far).

LMAO

@Mauker1
Copy link

Mauker1 commented Mar 14, 2024

Petition to make 3.14GHz the new upper limit in the firmware.

@geerlingguy
Copy link

geerlingguy commented Mar 14, 2024

So I've been trying to get a Geekbench 6 run to complete at 3.14 GHz, testing higher and higher over_voltage_delta (with force_turbo off and on), and so far can't quite hack it.

I wound up capturing this from dmesg:

[  326.258634] ------------[ cut here ]------------
[  326.258637] Firmware transaction timeout
[  326.258646] WARNING: CPU: 3 PID: 31 at drivers/firmware/raspberrypi.c:67 rpi_firmware_property_list+0x204/0x270
[  326.258654] Modules linked in: algif_hash algif_skcipher af_alg bnep vc4 snd_soc_hdmi_codec binfmt_misc aes_ce_blk drm_display_helper cec aes_ce_cipher drm_dma_helper drm_kms_helper hci_uart ghash_ce snd_soc_core btbcm gf128mul snd_compress sha2_ce snd_pcm_dmaengine brcmfmac sha256_arm64 sha1_ce bluetooth snd_pcm brcmutil snd_timer snd rpivid_hevc(C) cfg80211 v4l2_mem2mem pisp_be videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 ecdh_generic fb_sys_fops ecc syscopyarea sysfillrect sysimgblt rfkill libaes videobuf2_common v3d videodev raspberrypi_hwmon mc gpu_sched drm_shmem_helper raspberrypi_gpiomem rp1_adc pwm_fan nvmem_rmem uio_pdrv_genirq uio fuse drm drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6 spidev i2c_brcmstb spi_bcm2835 gpio_keys
[  326.258699] CPU: 3 PID: 31 Comm: kworker/3:0 Tainted: G         C         6.1.0-rpi7-rpi-2712 #1  Debian 1:6.1.63-1+rpt1
[  326.258702] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
client_loop: send disconnect: Broken pipe

And the cursor blinks on the external display, but SSH goes away.

Checking the actual voltage:

$ vcgencmd measure_volts
volt=1.0000V

So I'm wondering if there's any way to boost that further, or if 1.0000V is the hard limit for the cores?

I should note I have an Argon THRML 60-RC and an additional giant 140mm Noctua fan blowing over everything, fan set to full blast (pinctrl FAN_PWM op dl — for some reason my custom setting for fan_temp0 through 4 to run the speed at 255 doesn't seem to make a difference).

@geerlingguy
Copy link

geerlingguy commented Mar 14, 2024

Yay! Got one run in at 3.14 GHz with:

over_voltage=8
arm_freq=3140
force_turbo=1

Honestly not sure if the over_voltage vs over_voltage_delta made a difference or if I just got lucky on this run and unlucky on the other runs.

Here's the result: https://browser.geekbench.com/v6/cpu/5314274

And a video! https://www.youtube.com/watch?v=TTIkZBsVJyA

@youmukonpaku1337
Copy link
Author

HELL YEAH!!!

@senothechad
Copy link

I've tried flashing, but I just see the boot menu. I tried two different drives with the bootloader but nothing happens?

@Cgamess
Copy link

Cgamess commented Mar 14, 2024

rpi-eeprom-recovery.zip

I've removed the 3GHz limit, and attached a zip file (you can flash it to an sdcard with rpi-imager) you can test.

Make sure you have no critical (unbacked up) data on the Pi you are testing. Let me know if you succeed in going above 3GHz.

I could boot at 3.1GHz (and vcgencmd measure_clock arm confirmed that) but my Pi would crash when stressed.

How was it made?

@pelwell
Copy link
Contributor

pelwell commented Mar 14, 2024

With a C compiler, mostly. @popcornmix is a Raspberry Pi engineer.

@youmukonpaku1337
Copy link
Author

Yay! Got one run in at 3.14 GHz with:

over_voltage=8
arm_freq=3140
force_turbo=1

Honestly not sure if the over_voltage vs over_voltage_delta made a difference or if I just got lucky on this run and unlucky on the other runs.

Here's the result: https://browser.geekbench.com/v6/cpu/5314274

And a video! https://www.youtube.com/watch?v=TTIkZBsVJyA

i might try breaking 1k singlecore >:)

@ThomasKaiser
Copy link

ThomasKaiser commented Mar 15, 2024

Here's the result: https://browser.geekbench.com/v6/cpu/5314274

That's your single/multi scores: 967 / 1793 (does really nobody notice how wrong this benchmark is when a quad core CPU scores multi-threaded not even twice as much as single-threaded?)

And here's one outperforming your setup at 972 / 1847 clocking the cores only at 3.0 GHz: https://browser.geekbench.com/v6/cpu/5312673

Forget about the displayed 3.2 GHz, that's just what the cpufreq driver thinks and on any RPi it has no clue about real clockspeeds. Geekbench on ARM with Linux starting from v4.2 on measures and also reports the clockspeeds in the warmup phase: https://browser.geekbench.com/v6/cpu/5312673.gb6 (you need a GB browser account to access these raw data files)

  "processor_frequency": {
    "frequencies": [
      2994,
      2992,
      2992,
      ...    
      2993,
      2997,
      2991
    ]

So what's different on that system? Maybe simply the user switched to performance cpufreq governor prior to firing up Geekbench? Maybe memory access is faster compared to @geerlingguy's run where the CPU cores were being measured at ~3133 MHz which is to be expected at the configured 3140 MHz?

@ThomasKaiser
Copy link

ThomasKaiser commented Mar 15, 2024

BTW: when talking about overclocking it's also a lot about DFVS since 'usually' higher clockspeeds need significantly higher supply voltages. One would expect to see a curve like this (but more exponentially growing at the right side in case of 'overclocking'):

sun50i-h6-5 4 20-OrangePi_Lite2_(worse_silicon)

With RPi 5 (at least with latest ThreadX/firmware 30cc5f37 / 2024/01/05 15:57:40) it looks either linear or funny:

arm_freq=3000:

bcm2712-30cc5f37-Raspberry_Pi_5B_(arm_freq=3000)

  1500 MHz    720.0 mV
  1600 MHz    760.0 mV
  1700 MHz    775.0 mV
  1800 MHz    790.0 mV
  1900 MHz    800.0 mV
  2000 MHz    815.0 mV
  2100 MHz    830.0 mV
  2200 MHz    845.0 mV
  2300 MHz    855.0 mV
  2400 MHz    870.0 mV
  2500 MHz    885.0 mV
  2600 MHz    900.0 mV
  2700 MHz    910.0 mV
  2800 MHz    925.0 mV
  2900 MHz    940.0 mV
  3000 MHz    955.0 mV

arm_freq=3000 combined with over_voltage=4:

bcm2712-30cc5f37-Raspberry_Pi_5B_(arm_freq=3000_over_voltage=4)

  1500 MHz    720.0 mV
  1600 MHz    860.0 mV
  1700 MHz    875.0 mV
  1800 MHz    885.0 mV
  1900 MHz    900.0 mV
  2000 MHz    915.0 mV
  2100 MHz    930.0 mV
  2200 MHz    940.0 mV
  2300 MHz    955.0 mV
  2400 MHz    970.0 mV
  2500 MHz    970.0 mV
  2600 MHz    970.0 mV
  2700 MHz    970.0 mV
  2800 MHz    970.0 mV
  2900 MHz    970.0 mV
  3000 MHz    970.0 mV

arm_freq=3000 combined with over_voltage_delta=50000:

bcm2712-30cc5f37-Raspberry_Pi_5B_(arm_freq=3000_over_voltage_delta=50000)

  1500 MHz    720.0 mV
  1600 MHz    805.0 mV
  1700 MHz    820.0 mV
  1800 MHz    835.0 mV
  1900 MHz    850.0 mV
  2000 MHz    860.0 mV
  2100 MHz    875.0 mV
  2200 MHz    890.0 mV
  2300 MHz    905.0 mV
  2400 MHz    915.0 mV
  2500 MHz    930.0 mV
  2600 MHz    945.0 mV
  2700 MHz    960.0 mV
  2800 MHz    970.0 mV
  2900 MHz    985.0 mV
  3000 MHz   1000.0 mV

It always starts at 720 mV for the lowest OPP (and this even when you adjust this with for example arm_freq_min=1000) and then some algorithm 'draws' a straight line up to the highest OPP except for the over_voltage setting where things get really weird since overvolting low OPP while keeping the same supply voltage for the 'overclocked' OPP is quite the opposite of what's expected when having silicon behaviour in mind.

Also the 'line drawing' behaviour starting at the lowest OPP ends up with strange behaviour. When not adjusting any of the arm_freq ThreadX settings the 1500 MHz OPP gets 720 mV. But when setting arm_freq_min=1000 the line gets drawn with a similar algorithm but now the 1500 MHz OPP is at 775 mV and not 720 mV any more:

  1000 MHz    720.0 mV
  1100 MHz    730.0 mV
  1200 MHz    740.0 mV
  1300 MHz    750.0 mV
  1400 MHz    765.0 mV
  1500 MHz    775.0 mV
  1600 MHz    785.0 mV
  1700 MHz    795.0 mV
  ...

One would expect that

  • the supply voltage of a certain clockspeed being a HW property and based on silicon testings with some safety headroom. Each cpufreq OPP has an ideal supply voltage (as low as possible to save energy and also as high as needed to allow for stable operation) that shouldn't 'move around' when adjusting arm_freq settings
  • ideally the SoC manufacturer allows for AVS due to 'silicon lottery' and as such 'lower quality' chips will automagically be driven with slightly higher supply voltages and optionally top cpufreq OPP denied if the supply voltage would exceed a critical limit
  • the DVFS OPP painting a curve (linear/flat on the left side and then exponentially growing for the higher/highest OPP) and not a straight line

@popcornmix are the 1st and 3rd issue somewhat addressed with the new ThreadX/firmware version you provided above?

@popcornmix
Copy link
Contributor

A few comments. The idle point of 0.72V is fixed for all boards.
Below that voltage internal RAMs become unreliable, so you can't go lower than that however low the clock go.

Each chip has a unique base voltage (vpred), determined by querying ring oscillators.
This is added to a fixed slope that increases with frequency.
True, the real curve may not be flat, but over the non-overclocked range, it's pretty close to flat.

We don't characterise the overclocked range - you are on your own and can manually adjust with over_voltage_delta.

over_voltage is deprecated. It doesn't take into account vpred.
over_voltage_delta is preferred.

There is a 1V ceiling that currently can't be exceeded.

@youmukonpaku1337
Copy link
Author

interesting. anyway, gotta break 1k in geekbench for the sillies, will see if i can manage maaaaybe 3.3ghz?

@ThomasKaiser
Copy link

ThomasKaiser commented Mar 15, 2024

Edit: haven't seen the answers above before firing up this comment

@popcornmix are the 1st and 3rd issue somewhat addressed with the new ThreadX/firmware version you provided above?

Nope, just tested with arm_freq=3100 and over_voltage_delta=100000. The upper voltage limit with ThreadX build 4d574a2e is (still?) 1000mV and the algorithm 'drawing' a linear line is also still in place:

bcm2712-4d574a2e-Raspberry_Pi_5B_(arm_freq=3100_over_voltagee_delta=100000)

  1500 MHz    720.0 mV
  1600 MHz    855.0 mV
  1700 MHz    870.0 mV
  1800 MHz    885.0 mV
  1900 MHz    900.0 mV
  2000 MHz    910.0 mV
  2100 MHz    925.0 mV
  2200 MHz    940.0 mV
  2300 MHz    955.0 mV
  2400 MHz    965.0 mV
  2500 MHz    980.0 mV
  2600 MHz    995.0 mV
  2700 MHz   1000.0 mV
  2800 MHz   1000.0 mV
  2900 MHz   1000.0 mV
  3000 MHz   1000.0 mV
  3100 MHz   1000.0 mV

As such results as expected: when 1.0V can't be exceeded (most probably for a good reason) allowing for higher clockspeeds is just asking for trouble :)

@geerlingguy
Copy link

@ThomasKaiser - Yeah, sadly it looks like 1V is the upper limit, but maybe some fancy (highly destructive) hacking around can surpass it.

Regarding that other 3.0 GHz score beating my Geekbench 6 run, I wonder if it running the 4GB RAM part has anything to do with it (I think that was a 4 GB model). I know in some microbenchmarks, at least at some point the 4 GB boards ran faster than the 8 GB boards, when memory was important to the run.

I only have two 4 GB Pi 5s, and neither goes beyond 2.8 GHz reliably, so I can't confirm much.

@youmukonpaku1337
Copy link
Author

i wonder, is there no way to OC memory on a pi? i saw some firmware opts for it but havent checked much

@ThomasKaiser
Copy link

ThomasKaiser commented Mar 15, 2024

Regarding that other 3.0 GHz score beating my Geekbench 6 run, I wonder if it running the 4GB RAM part has anything to do with it (I think that was a 4 GB model)

I'm currently testing around with my own RPi 5B (also 4 GB) and got an even better score at 3050 MHz: 975/2022. Since I'm also starting GB only through sbc-bench -G there's always memory latency measurement also done and I see here variations. My first try with 3050 MHz showed these ramlat scores:

Executing ramlat on cpu0 (Cortex-A76), results in ns:
   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.350 1.313 1.312 1.312 1.312 1.312 1.312 2.498 
     8k: 1.312 1.312 1.312 1.312 1.312 1.312 1.316 2.561 
    16k: 1.319 1.313 1.314 1.315 1.312 1.313 1.314 2.556 
    32k: 1.312 1.312 1.312 1.312 1.312 1.315 1.312 2.577 
    64k: 1.312 1.313 1.315 1.312 1.312 1.312 1.313 2.558 
   128k: 3.935 3.935 3.944 3.935 3.938 4.412 5.722 9.945 
   256k: 4.247 3.961 4.250 3.952 4.110 4.489 5.622 9.955 
   512k: 7.360 7.278 7.052 7.280 7.034 8.428 9.196 13.87 
  1024k: 13.22 12.90 12.87 12.88 13.05 13.45 15.19 22.17 
  2048k: 17.04 15.88 16.77 15.88 25.23 16.79 18.92 26.45 
  4096k: 67.59 61.33 67.46 61.59 68.93 68.06 80.18 101.3 
  8192k: 93.54 106.5 97.65 87.53 94.22 90.86 101.5 126.8 
 16384k: 104.1 101.3 103.9 101.8 103.6 104.9 126.2 127.4 
 32768k: 116.2 113.8 114.5 113.6 115.3 114.5 115.7 119.3 
 65536k: 118.7 117.5 127.3 117.3 118.6 117.7 118.7 121.1 
131072k: 120.0 118.9 119.9 118.8 120.0 128.6 119.4 120.2 

While now again testing with 3050 MHz I'm getting both worse latency and GB scores (955/1948 on average):

Executing ramlat on cpu0 (Cortex-A76), results in ns:
   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.324 1.314 1.312 1.312 1.312 1.312 1.312 2.500 
     8k: 1.312 1.312 1.312 1.312 1.312 1.312 1.312 2.555 
    16k: 1.318 1.312 1.313 1.312 1.312 1.312 1.312 2.555 
    32k: 1.312 1.312 1.312 1.312 1.312 1.312 1.312 2.579 
    64k: 1.312 1.312 1.312 1.312 1.313 1.313 1.312 2.558 
   128k: 3.935 3.936 3.936 3.935 3.935 4.468 5.765 9.939 
   256k: 3.939 3.936 3.962 3.976 4.007 4.437 5.633 9.944 
   512k: 7.443 7.492 7.521 7.490 7.456 8.786 8.990 13.97 
  1024k: 14.53 13.29 13.76 13.28 13.25 13.76 15.46 23.43 
  2048k: 26.47 27.03 26.91 26.98 41.58 25.73 27.83 34.22 
  4096k: 69.95 61.53 68.19 61.56 70.88 68.41 79.23 100.5 
  8192k: 94.90 107.5 100.6 88.62 96.38 96.22 105.3 135.3 
 16384k: 105.1 101.9 104.2 101.5 104.8 105.9 117.9 144.4 
 32768k: 117.5 115.9 116.9 115.7 117.1 115.6 119.2 129.1 
 65536k: 119.7 118.7 128.3 118.5 119.9 118.9 120.4 125.6 
131072k: 121.9 121.1 121.8 121.1 121.9 128.1 120.9 123.8 

Some of the individual benchmarks are rather sensitive to memory speed, some not (I tested this with a RK3588 board where one can easily adjust memory clock between 528 and 2112 MHz from userspace though forgot where I documented the results – maybe on your site somewhere in the comments). At least with my tests it looks like this comparing both runs:
https://browser.geekbench.com/v6/cpu/compare/5326061?baseline=5324484

Asides different temperatures (in the first run my 'monster cooler' kept temperatures below/around 40°C and then I tried higher temps as per your recommendation wrt stability) I don't see any settings that might have changed and affect the behaviour...

IMG_2819 klein

As such would be interesting if you could compare memory bandwidth between 4GB/8GB models (an sbc-bench will already do it). And as a side note: with more recent ThreadX versions memory access seems to be faster than in the beginning.

Edit: my conclusions wrt memory speed affecting fluctuating GB scores were BS since the first run ('with lower memory latency') also produced scores that vary substantially: https://browser.geekbench.com/v6/cpu/compare/5324361?baseline=5324484 – unfortunately GB also has some sort of random number generator in place when generating scores.

Edit 2: confirmed. Another run at 3050 MHz with sbc-bench -G (always executing GB twice for a reason) ends up with the same picture: standard deviation way too high or in other words: Geekbench 6 on ARM and especially RISC-V sucks:

First run:

   Single-Core Score     949              
   Multi-Core Score      1960              

Second run:

   Single-Core Score     973              
   Multi-Core Score      2010              

https://browser.geekbench.com/v6/cpu/compare/5326624?baseline=5326716

@youmukonpaku1337
Copy link
Author

interesting..

@popcornmix
Copy link
Contributor

with more recent ThreadX versions memory access seems to be faster than in the beginning.

Actually there is no ThreadX on a Pi5. The bootloader code runs with no RTOS.
ThreadX is used by start*.elf on pi0-4.

SDRAM performance has been improved recently by scaling back refresh with (sdram) temperature
See: #1854 (comment)

@youmukonpaku1337
Copy link
Author

with more recent ThreadX versions memory access seems to be faster than in the beginning.

Actually there is no ThreadX on a Pi5. The bootloader code runs with no RTOS. ThreadX is used by start*.elf on pi0-4.

SDRAM performance has been improved recently by scaling back refresh with (sdram) temperature See: #1854 (comment)

is it possible to use both the test bootloader and the patched firmware?

@pelwell
Copy link
Contributor

pelwell commented Mar 15, 2024

@ThomasKaiser You keep referring to ThreadX, but there is no ThreadX running on a Pi 5.

@youmukonpaku1337
Copy link
Author

nevermind, looks like the sdram change was merged

@ThomasKaiser
Copy link

will have to test with sbcbench i guess

Pro tip: wait a bit after rebooting and do it in a loop w/o display connected. Curious whether you're able to 'beat' https://browser.geekbench.com/v6/cpu/5389240 :)

BTW: score generated passively cooled at ~21°C ambient temp standing upright w/o any throttling after running several benchmarks in a row:

IMG_3179 klein

@geerlingguy
Copy link

Very nice; looks like you have the score to beat! I'm still wondering why the multicore multiplier is a bit low, would be nice to unlock a little more performance when running all four cores hard.

@ThomasKaiser
Copy link

ThomasKaiser commented Mar 26, 2024

looks like you have the score to beat!

@geerlingguy currently all GB6 scores above 1000/2000 single/multi are my setup. I call it 'setup' and not board for a number of reasons:

  • running a lightweight and freshly debootstrapped minimal Ubuntu Jammy instead of Raspberry Pi OS
  • executing geekbench6 in a row since highest scores need either a few GB6 executions or an +20 min uptime (I now believe the last time I tried to check this I made a mistake and currently I'm not interested any more in this 'overclocking' nonsense anyway)
  • setting cpufreq governor to performance prior to executing the benchmark

I'm still wondering why the multicore multiplier is a bit low, would be nice to unlock a little more performance when running all four cores hard

Easy, just use another benchmark! For example GB5! Or check only the A76 cores online on an RK3588 with DRAM clock 528 vs. 2112 MHz.

Now checking whether connecting a display makes a difference or not (finally found my 'Micro-HDMI adapter' that is in fact a cable, as such I searched for the wrong thing for quite some time): another run of for i in 1 2 3 4 5 ; do sbc-bench.sh -G ; done with arm_freq=3080/over_voltage_delta=100000 settings (that are ofc nonsense since BCM2712 is nowhere near stable with these DVFS OPP): https://browser.geekbench.com/user/tkaiser (the 10 entries above 'Radxa ROCK 5B (only A76 cores)')

One execution failed, uptime when starting the whole thing was 3 minutes and as expected the 1st two GB6 executions scored lower and then nice high scores in a row. I currently don't see connecting a display making that much of a difference especially since the most recent score is from the same loop but with display powered off and a 'sleep 1500' in between.

As for the question whether high GB6 scores are the domain of 4GB boards I made a different test with Rock 5B since there we can adjust the DRAM clockspeed simply from userspace (adjustable between 528 and 2112 MHz). sbc-bench when being executed on hybrid designs (big.LITTLE on ARM or Intel's x86_64 designs combining 'Atom' and 'Core-i' cores) tests each CPU cluster individually and as such we have numbers only for the quad-core A76 on the RK3588 one time at lowest and one time at highest RAM speed:

GB6-comparison

On the left comparing RK3588 at 528/2112 MHz DRAM clock (with the A76 clocking in at ~2340 MHz and all A55 killed) and on the right my 4GB RPi 5 at 3080 MHz vs. your 8GB board at 3140 MHz.

The 'bigger picture' seems to hint at differences in DRAM access speed between the 4GB and 8GB model with GB6 (as explained by @popcornmix already) as such I once again ask for sbc-bench (w/o -G switch) output with current bootloader/kernel from an 8GB board to be able to compare whether there's a general difference or this is only the result of GB6 using not that common RAM access patterns.

@geerlingguy
Copy link

geerlingguy commented Apr 1, 2024

A new player has entered the chat: https://browser.geekbench.com/v6/cpu/5556526

I'd like to see if I can apply a few different tricks to a 4GB model to see if it can be nudged further. It does seem the RAM makes a difference there.

@ThomasKaiser - I can run it on my Pi 5 8GB in a bit...

@ThomasKaiser
Copy link

A new player has entered the chat: https://browser.geekbench.com/v6/cpu/5556526

Really funny how one setup at 3250 MHz is 'as fast' (as in 'generates same scores') as my setup at 3000 MHz: https://browser.geekbench.com/v6/cpu/compare/5365900?baseline=5556526

I'd like to see if I can apply a few different tricks to a 4GB model to see if it can be nudged further.

I hope you don't mean by that tweaking the clockspeeds further since with the 1000mV ceiling I highly doubt any BCM2712 is stable at 3000 MHz (at least mine is not while being able to generate silly GB6 scores at up to 3080 MHz). And it's easy to test for reliabilty as explained above: ThomasKaiser/sbc-bench@8183b18#commitcomment-139927301

Still interested in getting a plain sbc-bench result of an 8GB board with latest firmware/kernel at either 2400 or 3000 MHz to be able to compare with a 4GB board (other clockspeeds aside those are IMO nonsense).

@geerlingguy
Copy link

geerlingguy commented Apr 1, 2024

I hope you don't mean by that tweaking the clockspeeds further since with the 1000mV ceiling I highly doubt any BCM2712 is stable at 3000 MHz.

There's a way to go beyond 1V, though I haven't tried it yet, and am going to test a few different cooling / thermal control setups (try to get it chilled, then also try to hold temps at around 50 or 60°C to see if that fares better than cold).

Still interested in getting a plain sbc-bench result of an 8GB board with latest firmware/kernel at either 2400 or 3000 MHz to be able to compare with a 4GB board (other clockspeeds aside those are IMO nonsense).

I'll be doing it at 2400 soon, if not today then tomorrow—just have been busy today doing Monday stuff :)

@jonatron
Copy link

jonatron commented Apr 1, 2024

I managed to get a couple of runs at 3300 MHz, the better one was https://browser.geekbench.com/v6/cpu/5559357 . Run 5556526 was at 3250 MHz . I haven't shared the voltage limit remover publicly because I have no idea at what point it'll blow up. I have an explosion containment pi fridge and geerlingguy said he wasn't concerned about fire or breaking chips. I can't iterate very fast with geekbench, I'll look at switching to something else to push further.

@geerlingguy
Copy link

@ThomasKaiser - Default clocks Pi 5 https://sprunge.us/7JUWFT

It gave me one of those "Too much other background activity: 0% avg, 6% max" warnings, but it's literally running nothing else and I rebooted twice after running all updates. Not sure what's up there.

And I am concerned about breaking chips, but I set a budget for broken Pis per quarter, and so far I'm only up to 1 this quarter of the 2 I normally allocate :)

@ThomasKaiser
Copy link

ThomasKaiser commented Apr 2, 2024

Default clocks Pi 5 https://sprunge.us/7JUWFT

Thank you! So let's compare 4GB and 8GB boards at exactly same software versions:

Bandwidth comparison:

tinymembench check 4GB 8GB
C copy backwards 5651.4 MB/s 4879.5 MB/s
C copy backwards (32 byte blocks) 5656.1 MB/s 4911.7 MB/s
C copy backwards (64 byte blocks) 5658.1 MB/s 4894.7 MB/s
C copy 6155.2 MB/s 5557.4 MB/s
C copy prefetched (32 bytes step) 6136.2 MB/s 5522.4 MB/s
C copy prefetched (64 bytes step) 6140.3 MB/s 5511.5 MB/s
C 2-pass copy 1871.4 MB/s 1321.2 MB/s
C 2-pass copy prefetched (32 bytes step) 1717.1 MB/s 1206.2 MB/s
C 2-pass copy prefetched (64 bytes step) 1725.1 MB/s 1208.6 MB/s
C scan 8 1197.9 MB/s 1195.7 MB/s
C scan 16 2395.2 MB/s 2395.1 MB/s
C scan 32 4738.2 MB/s 4608.4 MB/s
C scan 64 9407.4 MB/s 9122.2 MB/s
C fill 14904.3 MB/s 14315.1 MB/s
C fill (shuffle within 16 byte blocks) 14892.7 MB/s 14329.6 MB/s
C fill (shuffle within 32 byte blocks) 14850.9 MB/s 14421.9 MB/s
C fill (shuffle within 64 byte blocks) 14888.4 MB/s 14324.8 MB/s
---
libc memcpy copy 6119.4 MB/s 5509.1 MB/s
libc memchr scan 14861.6 MB/s 14391.1 MB/s
libc memset fill 14900.2 MB/s 14293.0 MB/s
---
NEON LDP/STP copy 6118.6 MB/s 5504.6 MB/s
NEON LDP/STP copy pldl2strm (32 bytes step) 6114.2 MB/s 5504.4 MB/s
NEON LDP/STP copy pldl2strm (64 bytes step) 6121.8 MB/s 5332.0 MB/s
NEON LDP/STP copy pldl1keep (32 bytes step) 6110.6 MB/s 5507.6 MB/s
NEON LDP/STP copy pldl1keep (64 bytes step) 6110.2 MB/s 5504.5 MB/s
NEON LD1/ST1 copy 6116.8 MB/s 5518.5 MB/s
NEON LDP load 14874.8 MB/s 14371.9 MB/s
NEON LDNP load 14828.6 MB/s 14369.4 MB/s
NEON STP fill 14823.0 MB/s 14335.7 MB/s
NEON STNP fill 14859.9 MB/s 14288.6 MB/s
ARM LDP/STP copy 6129.2 MB/s 5509.0 MB/s
ARM LDP load 14872.4 MB/s 14367.5 MB/s
ARM LDNP load 14803.6 MB/s 14325.0 MB/s
ARM STP fill 14890.4 MB/s 14416.5 MB/s
ARM STNP fill 14917.6 MB/s 14404.7 MB/s

Latency (tinymembench):

4GB

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.1 ns          /     0.0 ns 
    131072 :    1.2 ns          /     1.5 ns 
    262144 :    2.0 ns          /     2.0 ns 
    524288 :    2.2 ns          /     2.2 ns 
   1048576 :    9.7 ns          /    11.2 ns 
   2097152 :   15.7 ns          /    15.7 ns 
   4194304 :   50.1 ns          /    75.6 ns 
   8388608 :   79.4 ns          /   107.3 ns 
  16777216 :   92.6 ns          /   116.7 ns 
  33554432 :  102.1 ns          /   123.4 ns 
  67108864 :  106.6 ns          /   127.6 ns 

8GB

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.1 ns          /     0.0 ns 
    131072 :    1.2 ns          /     1.5 ns 
    262144 :    2.0 ns          /     2.0 ns 
    524288 :    3.0 ns          /     2.9 ns 
   1048576 :    9.8 ns          /    11.2 ns 
   2097152 :   17.9 ns          /    20.0 ns 
   4194304 :   52.5 ns          /    79.9 ns 
   8388608 :   83.6 ns          /   113.3 ns 
  16777216 :   98.2 ns          /   125.2 ns 
  33554432 :  107.7 ns          /   131.8 ns 
  67108864 :  113.0 ns          /   135.7 ns 

Latency (ramlat):

4GB

   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.679 1.669 1.667 1.667 1.668 1.669 1.667 3.171 
     8k: 1.668 1.678 1.667 1.667 1.667 1.667 1.668 3.250 
    16k: 1.674 1.667 1.668 1.667 1.667 1.667 1.667 3.248 
    32k: 1.667 1.668 1.667 1.668 1.667 1.668 1.667 3.277 
    64k: 1.668 1.668 1.669 1.668 1.668 1.669 1.676 3.252 
   128k: 5.105 5.107 5.106 5.106 5.105 5.719 7.392 12.64 
   256k: 5.002 5.002 5.002 5.002 5.002 5.704 7.161 12.64 
   512k: 7.364 6.727 7.007 6.721 6.950 8.018 8.550 14.35 
  1024k: 18.45 17.96 18.34 18.01 17.85 18.64 21.25 30.06 
  2048k: 20.57 20.08 20.40 20.09 32.85 21.11 24.57 32.27 
  4096k: 73.70 65.95 71.84 65.79 73.22 72.47 85.67 109.3 
  8192k: 98.67 98.44 116.3 94.75 99.00 96.01 111.1 141.3 
 16384k: 109.3 109.4 108.9 109.5 110.6 111.0 129.8 128.4 
 32768k: 122.8 120.5 122.3 119.8 122.2 120.4 121.6 125.7 
 65536k: 125.9 133.8 126.1 124.2 126.1 124.1 124.9 127.9 
131072k: 127.8 126.4 127.9 126.3 136.1 126.3 126.7 128.8 

8GB

   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.692 1.673 1.670 1.671 1.668 1.667 1.670 3.175 
     8k: 1.667 1.667 1.667 1.670 1.668 1.669 1.668 3.249 
    16k: 1.679 1.667 1.667 2.009 1.667 1.670 1.670 3.249 
    32k: 1.671 1.669 1.670 1.669 1.668 1.667 1.670 3.282 
    64k: 1.668 1.668 1.668 1.669 1.669 1.668 1.668 3.251 
   128k: 5.001 5.002 5.001 5.002 5.001 5.580 7.368 12.64 
   256k: 5.015 5.012 5.015 5.009 5.004 5.550 7.158 12.65 
   512k: 7.334 8.822 7.210 8.827 7.748 10.05 9.814 15.58 
  1024k: 17.45 16.88 17.03 16.87 17.28 17.67 19.65 29.05 
  2048k: 26.89 25.64 26.44 27.45 36.09 26.03 28.38 36.92 
  4096k: 75.73 70.14 78.52 69.06 77.10 75.65 88.55 112.6 
  8192k: 104.3 123.6 106.7 101.1 107.2 103.7 116.0 167.0 
 16384k: 115.5 113.4 115.6 111.9 115.6 131.0 128.5 146.3 
 32768k: 130.3 128.1 129.2 127.7 129.9 127.1 129.7 138.8 
 65536k: 142.3 130.3 132.1 130.3 131.8 130.4 131.9 137.5 
131072k: 133.8 132.4 143.8 132.5 133.8 132.2 133.0 134.1 

It gave me one of those "Too much other background activity: 0% avg, 6% max" warnings, but it's literally running nothing else

Well, it is running something else in the background (with all the single-threaded benchmarks on an absolutely idle system %cpu would not exceed 25% but it reads like this on your setup):

Time        fake/real   load %cpu %sys %usr %nice %io %irq   Temp    VCore    PMIC   DC(V)
17:39:04: 2400/2400MHz  1.02  29%   0%  28%   0%   0%   0%  50.7°C  0.8945V   4.0W   5.09V 
17:39:51: 2400/2400MHz  1.01  26%   0%  24%   0%   0%   0%  49.0°C  0.8945V   2.9W   5.08V 
17:42:20: 2400/2400MHz  1.01  31%   1%  24%   0%   5%   0%  49.6°C  0.8937V   3.4W   5.09V 

...and most probably that's the reason why you can't get highest Geekbench scores with official Bookworm image (applies to both single and multi scores since with the former there's a cpu0 bottleneck)

timg236 added a commit to timg236/rpi-eeprom that referenced this issue Apr 5, 2024
…install and enable over-clocking to > 3GHz (latest)

* bootloader: clock_2712: Remove restriction on arm_freq <= 3000
  See: raspberrypi/firmware#1876
* arm_dt: Update max_current to match HAT value
* arm_dt: Remove unused legacy parameters (core_freq, arm_freq, uart0_clkrate and cache_line_size)
* Add support for custom CA cert for network install
    You need to specify
    HTTP_HOST=myhost.com
    HTTP_PATH=/path/to/files
    HTTP_CACERT_HASH=<hash>

    where <hash> is a sha256 hash of the der encoded ca certificate.
    CA cert is added using rpi-eeprom-config.
* Optimise Vbat current draw with charging disabled
* Display OTP boot status in UART log messages.
* Preliminary support for secure-boot OTP provisioning.
* Update PCIE DET_WAKE pinmux for D0 products
timg236 added a commit to timg236/rpi-eeprom that referenced this issue Apr 5, 2024
…install and enable over-clocking to > 3GHz (latest)

* bootloader: clock_2712: Remove restriction on arm_freq <= 3000
  See: raspberrypi/firmware#1876
* arm_dt: Update max_current to match HAT value
* arm_dt: Remove unused legacy parameters (core_freq, arm_freq, uart0_clkrate and cache_line_size)
* Add support for custom CA cert for network install
    You need to specify
    HTTP_HOST=myhost.com
    HTTP_PATH=/path/to/files
    HTTP_CACERT_HASH=<hash>

    where <hash> is a sha256 hash of the der encoded ca certificate.
    CA cert is added using rpi-eeprom-config.
* Optimise Vbat current draw with charging disabled
* Display OTP boot status in UART log messages.
* Preliminary support for secure-boot OTP provisioning.
* Update PCIE DET_WAKE pinmux for D0 products
timg236 added a commit to raspberrypi/rpi-eeprom that referenced this issue Apr 5, 2024
…install and enable over-clocking to > 3GHz (latest)

* bootloader: clock_2712: Remove restriction on arm_freq <= 3000
  See: raspberrypi/firmware#1876
* arm_dt: Update max_current to match HAT value
* arm_dt: Remove unused legacy parameters (core_freq, arm_freq, uart0_clkrate and cache_line_size)
* Add support for custom CA cert for network install
    You need to specify
    HTTP_HOST=myhost.com
    HTTP_PATH=/path/to/files
    HTTP_CACERT_HASH=<hash>

    where <hash> is a sha256 hash of the der encoded ca certificate.
    CA cert is added using rpi-eeprom-config.
* Optimise Vbat current draw with charging disabled
* Display OTP boot status in UART log messages.
* Preliminary support for secure-boot OTP provisioning.
* Update PCIE DET_WAKE pinmux for D0 products
timg236 added a commit to timg236/rpi-eeprom that referenced this issue Apr 18, 2024
Interesting changes since the last automatic update:
* Enable network install
* Enable over-clocking frequencies > 3GHz
  See: ttps://github.com/raspberrypi/firmware/issues/1876
* Adjust SDRAM refresh rate according to temperature and address a performance
  gap between 4GB and 8GB parts in benchmarks.
  See: raspberrypi/firmware#1854
* Support custom CA certs with HTTPS boot
* Move non Kernel ARM stages back to 512KB
  raspberrypi/firmware#1868
* Assorted HAT+ and NVMe interop improvements.
* Fix TRYBOOT if secure-boot is enabled.
* Preliminary support for D0 and CM5.
timg236 added a commit to timg236/rpi-eeprom that referenced this issue Apr 18, 2024
Interesting changes since the last automatic update:
* Enable network install
* Enable over-clocking frequencies > 3GHz
  See: ttps://github.com/raspberrypi/firmware/issues/1876
* Adjust SDRAM refresh rate according to temperature and address a performance
  gap between 4GB and 8GB parts in benchmarks.
  See: raspberrypi/firmware#1854
* Support custom CA certs with HTTPS boot
* Move non Kernel ARM stages back to 512KB
  raspberrypi/firmware#1868
* Assorted HAT+ and NVMe interop improvements.
* Fix TRYBOOT if secure-boot is enabled.
* Preliminary support for D0 and CM5.
timg236 added a commit to raspberrypi/rpi-eeprom that referenced this issue Apr 18, 2024
Interesting changes since the last automatic update:
* Enable network install
* Enable over-clocking frequencies > 3GHz
  See: ttps://github.com/raspberrypi/firmware/issues/1876
* Adjust SDRAM refresh rate according to temperature and address a performance
  gap between 4GB and 8GB parts in benchmarks.
  See: raspberrypi/firmware#1854
* Support custom CA certs with HTTPS boot
* Move non Kernel ARM stages back to 512KB
  raspberrypi/firmware#1868
* Assorted HAT+ and NVMe interop improvements.
* Fix TRYBOOT if secure-boot is enabled.
* Preliminary support for D0 and CM5.
@wtarreau
Copy link

@youmukonpaku1337 - There are a few power supplies capable of supplying the clean 5V/5A the Pi 5 requires. Having a cable soldered straight into the power adapter reduces a tiny amount of insertion loss you get with any adapter with a detachable cable.

BTW Jeff, the real benefit of a fixed cable is that the feedback wires can be connected to the outer plug and measure the voltage at the USB connector (compensating for cable losses) instead of measuring it before the cable. This requires to pass two extra thin wires (+ and -) and the power adapter needs to be properly designed so that it takes the reference ground from that thin wire instead of the internal ground.

@wtarreau
Copy link

There's a way to go beyond 1V, though I haven't tried it yet, and am going to test a few different cooling / thermal control setups (try to get it chilled, then also try to hold temps at around 50 or 60°C to see if that fares better than cold).

In the old days of Cyrix 6x86, the right way to offset the regulator voltage was to draw with a pencil on one of the resistors of the feedback divider. That would deposit some graphite that lowered the resistor value. It very likely continues to work, provided that you find either the divider that sets the reference voltage or find the feedback divider. If there's a direct measurement, graphite between the feedback pin and GND may help but it's tricky since you don't know how much is needed, and it's easy to fry everything.

@geerlingguy
Copy link

Btw, for anyone following this issue, @jonatron posted a nice summary of his testing up to 3.3 GHz, hitting a voltage of 1.0437V at a clock of 3300034816: Beating Jeff's 3.14 Ghz Raspberry Pi 5

@laur2010
Copy link

hello can you overclock your raspberry pi 4?

@JamesH65
Copy link
Contributor

hello can you overclock your raspberry pi 4?

Please use the forum for this sort of question. forums.raspberrypi.com

@geerlingguy
Copy link

FYI I've been able to run stable beyond 1.1V... though running into some stability issues at 1.2V. https://browser.geekbench.com/v6/cpu/7058700 - more to come :)

@geerlingguy
Copy link

geerlingguy commented Jul 31, 2024

Here's a blog post and a video going through the overclock. I published a GitHub repo, pi-overvolt, collaborating with jonatron on the code.

And in the course of today, I also found out SkatterBencher has an even more in-depth OC/overvolt guide!.

Next question, of course: is there any way to hack the PMIC to allow even greater voltages? ;)

@ThomasKaiser
Copy link

https://browser.geekbench.com/v6/cpu/7058700 - more to come :)

Quite impressive... since when looking at https://browser.geekbench.com/v6/cpu/search?dir=desc&q=Raspberry+Pi+5+Model+B&sort=multicore_score it seems the most important part of you getting better GB6 scores was adopting the 'NUMA emulation' patch.

In a quick conversation with the patch author I mentioned the 'usual' GB6 behaviour (on RPi 5) of producing lower scores directly after boot but higher ones after e.g. a sleep 1200. He wrote 'You can try my series and see if removes this sleep 1200 variance when run with numactl --interleave.'

Since I kinda lost interest in this GB6 score tweaking nonsense I didn't give it a try so the question is: did you see this variance with the NUMA emulation patch?

@wtarreau
Copy link

Also keep in mind that the memory bus is only 32-bit wide, so scores (especially multi-threaded ones) will not continue to grow indefinitely with CPU frequency as memory is a bottleneck for many multi-threaded/multi-proc workloads.

@geerlingguy
Copy link

geerlingguy commented Jul 31, 2024

@ThomasKaiser - I tried replicating what you noticed with and without the NUMA patch (firmware and OS both at latest release), and couldn't — I ran one geekbench6 run immediately after reboot, then waited at 5 minute intervals and ran four more, they were all within about 1% of each other. (These runs were about a week ago)

@popcornmix
Copy link
Contributor

Yeah - I also fact checked that comment. Let me find the email:

    Here are some results (with a 20min sleep between each run)

      Score              773/1509
      Score              775/1517
      Score              774/1517
      Score              776/1511
      Score              776/1512
      Score              773/1510
      Score              772/1513
      Score              773/1508

    And this morning, after an uptime of 21 hours:
       Score              774/1508

    So, no, I'd say waiting (without doing some specific things) does not make the geekbench score go up.

@Admin-sator
Copy link

rpi-eeprom-recovery.zip

I've removed the 3GHz limit, and attached a zip file (you can flash it to an sdcard with rpi-imager) you can test.

Make sure you have no critical (unbacked up) data on the Pi you are testing. Let me know if you succeed in going above 3GHz.

I could boot at 3.1GHz (and vcgencmd measure_clock arm confirmed that) but my Pi would crash when stressed.

bootloader not installing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests