Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcm2835-power: Timeout waiting for grafx power OK #3046

Open
lategoodbye opened this issue Jul 3, 2019 · 34 comments
Open

bcm2835-power: Timeout waiting for grafx power OK #3046

lategoodbye opened this issue Jul 3, 2019 · 34 comments

Comments

@lategoodbye
Copy link
Contributor

lategoodbye commented Jul 3, 2019

Describe the bug
Starting with Linux 5.1 there is a new power driver for BCM2835. The idea behind this is to have a better control about the V3D power domain. After rollout i got informed that some RPI boards (currently a handfull) have issues during enabling the V3D power domain. The ramp-up runs into a timeout (20 us), because we never get a PM_POWOK. I don't have a clue what causes this issue (timing, hardware tolerance, ...). Currently i don't have a board, which is affected.

To reproduce
start the RPI with Mainline Kernel 5.1

Expected behaviour
bcm2835-power succeeded to enable V3D power domain

Actual behaviour
bcm2835-power failes to enable V3D power domain because PM_POWOK stays off

System

  • Which models of Raspberry Pi?
    RPI 2, RPI 3B and RPI 3B+
  • Which firmware version (vcgencmd version)?
    2019-02-12, 2019-03-27
  • Which kernel version (uname -a)?
    Mainline Kernel / DTB 5.1

Logs

[   13.913771] bcm2835-power bcm2835-power: Timeout waiting for grafx power OK
[   13.918555] bcm2835-power bcm2835-power: Timeout waiting for grafx power OK

More info:
anholt#153

Additional context
Add any other relevant context for the problem.

@warpme
Copy link
Contributor

warpme commented Jul 14, 2019

Just another data-point: I built https://github.com/raspberrypi/linux/tree/rpi-5.2.y and I'm getting bcm2835-power: Timeout waiting for grafx power OK on my rpi2-b.

@redchenjs
Copy link
Contributor

redchenjs commented Jul 23, 2019

model:

RPI 3B

firmware version:

2019-07-15 17:34

kernel version:

5.2.2-1-ARCH #1 SMP Sun Jul 21 19:53:44 UTC 2019 aarch64 GNU/Linux

kernel logs:

[    6.514813] bcm2835-power bcm2835-power: Timeout waiting for grafx power OK
[    6.524622] bcm2835-power bcm2835-power: Timeout waiting for grafx power OK

The VC4 driver was loaded but no GPU hardware was detected.

@xnorbt
Copy link

xnorbt commented Aug 29, 2019

I'm getting the same issue with RPi 3B+, Arch Linux aarch64, Kernel 5.2.10-1-ARCH.
No GPU hardware detected and dmesg shows
bcm2835-power: Timeout waiting for grafx power OK.

However, I have several Pi 3B+ and it is NOT happening on all of them (using the same SD card with the same image). Some of them detect the VC4 GPU during boot just fine.

And with the other boards, it appears to be temperature related. When the board is at room temperature (having been unpowered for some time) the GPU is detected normally. Also, over a couple of reboots. But after some minutes, when the temperature rises above about 50 °C, the GPU is not detected any longer on reboot and the bcm2835-power log message appears.

Maybe that additional piece of information helps tracking down the issue.

@lategoodbye
Copy link
Contributor Author

Thanks for your report. I build the Mainline kernel 5.3-rc6 with multi_v7_defconfig (Raspbian rootfs) for my RPI 3B+. Then i caused enough load to reach ~ 54 °C (no cpufreq enabled) and triggered a reboot. "Unfortunately" i wasn't able to reproduce the timeout.

@xnorbt
Copy link

xnorbt commented Aug 30, 2019

Thanks for looking into it. I have seven Pi3B+ boards and I am currently testing them all under the same conditions to see how many of them are affected (so far 2 out of 4 fail when warm, fully reproducible; the others never fail). Maybe some chips are more 'sensitive' to the power-up ramp than others. Could changing the current ramp (lower initial, lower step size, more time between steps) help? I'd try playing around with bcm2835-power.c but I have no experience integrating a custom kernel for the RPi and don't know if it is as simple as 'replace the ARCH kernel with the selfmade one'.
Let me know if I can do any useful tests with the affected boards.

@xnorbt
Copy link

xnorbt commented Aug 30, 2019

One update: I started building the (mainline) kernel using your defconfig (arm64/configs/defconfig). I interrupted when I realized that it is going to take some time... I'll do it at home over night ;-).
But: I started compilation on one of the boards which were not affected. Then, during compile, temperature rised to 65°C, and I rebooted -> no VC4 and the bcm2835-power timeout occured.
After cooling down back to 50°C the GPU was again recognized normally during several reboots.

So it is definitely a matter of temperature, but the cut between good and bad varies from device to device. Maybe you can stress your board to higher temperatures and see if the timeout appears as well.

@vianpl
Copy link
Contributor

vianpl commented Aug 30, 2019

FYI I'm seeing the timeouts on my RPI3b+ with 5.3.0-rc4. Can't really say whether it's temperature related as it always fails. I can run some debugging if needed.

@lategoodbye
Copy link
Contributor Author

lategoodbye commented Aug 30, 2019

After enabling the Mainline cpufreq driver i'm seeing the timeouts, too.

@vianpl
Copy link
Contributor

vianpl commented Aug 30, 2019

IIRC The main functional difference between the downstream cpufreq driver and upstream is that we're disabling turbo mode when changing the clocks.

What about no cpufreq and setting arm's clock @ 1.2GHz in config.txt?

@lategoodbye
Copy link
Contributor Author

I don't think there is a issue with cpufreq driver. Since my default governour is ondemand, this causes much more CPU stress during boot.

I will try to test your suggestion.

@lategoodbye
Copy link
Contributor Author

My test results:
arm_freq=1200, no cpufreq => no timeout
force_turbo=1, no cpufreq => timeout

@lategoodbye
Copy link
Contributor Author

@popcornmix Any idea to analyze this further? Without documentation i don't have a clue what's going on in the new bcm2835 pm driver.

@lategoodbye
Copy link
Contributor Author

lategoodbye commented Aug 31, 2019

I made a register dump of the PM addresses for the following cases:

  1. Linux 5.3 without e1dc2b2 (this should be similiar to pre Linux 5.1)
  2. Linux 5.3 with e1dc2b2 (this should be similiar to Linux 5.1 or newer), without timeout occured

Comparing both dumps showed only 1 difference:

  1. PM_RSTS (Addr 0x3F100020) = 0x00001000
  2. PM_RSTS (Addr 0x3F100020) = 0x00000000

Note: without e1dc2b2 and with enabled forced_turbo i'm not able to reproduce the timeout

@anholt Is this expected?

@popcornmix
Copy link
Collaborator

@lategoodbye the difference in PM_RSTS registers is just:

12 |   | HADPOR | Had a power-on reset

so I guess first was captured after a power cycle, and second after a sudo reboot

@lategoodbye
Copy link
Contributor Author

Okay, thanks. So the difference is unrelated.

I will wait for suggestions to narrow down this issue until the release of Linux 5.4-rc1, after that i will revert e1dc2b2 according to the no regression policy.

op-mirror pushed a commit to OpenPhoenux/gta04-kernel that referenced this issue Sep 11, 2019
goldelico added a commit to goldelico/letux-kernel that referenced this issue Sep 11, 2019
@satmandu
Copy link
Contributor

For what it is worth I am seeing this error pop up multiple times with 5.3.0 on a 3b+ running arm64/ubuntu using a mainline kernel from here: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3/
with config.txt using this dtb: device_tree=bcm2837-rpi-3-b-plus.dtb
dmesg:
https://paste.ubuntu.com/p/sKT7KyJdSc/

I'm noticing that a warm reboot using sudo reboot fails (or is very very very delayed), but power cycling allows the device to come up just fine.

(My setup is currently headless, so I'm not seeing what comes up on the screen when this situation arises.)

It seems this might be connected? (Or I can open another issue if it seems unconnected.)

@pelwell
Copy link
Contributor

pelwell commented Sep 16, 2019

The error message is the same, and the fact that upstream code shows the same issue is useful datapoint.

By the way, you should be able to replace device_tree=bcm2837-rpi-3-b-plus.dtb with the more general upstream_kernel=1.

ffainelli pushed a commit to Broadcom/stblinux that referenced this issue Sep 23, 2019
…of firmware."

Since release of the new BCM2835 PM driver there has been several reports
of V3D probing issues. This is caused by timeouts during powering-up the
GRAFX PM domain:

  bcm2835-power: Timeout waiting for grafx power OK

I was able to reproduce this reliable on my Raspberry Pi 3B+ after setting
force_turbo=1 in the firmware configuration. Since there are no issues
using the firmware PM driver with the same setup, there must be an issue
in the BCM2835 PM driver.

Unfortunately there hasn't been much progress in identifying the root cause
since June (mostly in the lack of documentation), so i decided to switch
back until the issue in the BCM2835 PM driver is fixed.

Link: raspberrypi/linux#3046
Fixes: e1dc2b2 (" ARM: bcm283x: Switch V3D over to using the PM driver instead of firmware.")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
starnight pushed a commit to endlessm/linux that referenced this issue Sep 25, 2019
…of firmware."

Since release of the new BCM2835 PM driver there has been several reports
of V3D probing issues. This is caused by timeouts during powering-up the
GRAFX PM domain:

  bcm2835-power: Timeout waiting for grafx power OK

I was able to reproduce this reliable on my Raspberry Pi 3B+ after setting
force_turbo=1 in the firmware configuration. Since there are no issues
using the firmware PM driver with the same setup, there must be an issue
in the BCM2835 PM driver.

Unfortunately there hasn't been much progress in identifying the root cause
since June (mostly in the lack of documentation), so i decided to switch
back until the issue in the BCM2835 PM driver is fixed.

Link: raspberrypi/linux#3046
Fixes: e1dc2b2 (" ARM: bcm283x: Switch V3D over to using the PM driver instead of firmware.")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
ffainelli pushed a commit to Broadcom/stblinux that referenced this issue Sep 30, 2019
…of firmware."

Since release of the new BCM2835 PM driver there has been several reports
of V3D probing issues. This is caused by timeouts during powering-up the
GRAFX PM domain:

  bcm2835-power: Timeout waiting for grafx power OK

I was able to reproduce this reliable on my Raspberry Pi 3B+ after setting
force_turbo=1 in the firmware configuration. Since there are no issues
using the firmware PM driver with the same setup, there must be an issue
in the BCM2835 PM driver.

Unfortunately there hasn't been much progress in identifying the root cause
since June (mostly in the lack of documentation), so i decided to switch
back until the issue in the BCM2835 PM driver is fixed.

Link: raspberrypi/linux#3046
Fixes: e1dc2b2 (" ARM: bcm283x: Switch V3D over to using the PM driver instead of firmware.")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
@lategoodbye
Copy link
Contributor Author

Yesterday, i tested the revert against current Mainline Linux 5.4 + Raspbian Buster with a Raspberry Pi 3 B+ . Unfortunately X hangs completely during boot, so i asked Florian to drop this patch :-(

op-mirror pushed a commit to OpenPhoenux/gta04-kernel that referenced this issue Oct 21, 2019
op-mirror pushed a commit to OpenPhoenux/gta04-kernel that referenced this issue Oct 27, 2019
starnight added a commit to endlessm/linux that referenced this issue Nov 1, 2019
…instead of firmware.""

Because both upstream [1] and Raspbian downstream [2] kernels drops this
patch.
This reverts commit 655c3ca.

https://phabricator.endlessm.com/T28448

[1]: https://patchwork.kernel.org/patch/11136979/#22928901
[2]: raspberrypi/linux#3046 (comment)

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
@lategoodbye
Copy link
Contributor Author

Add these lines to the dts file, compile it, replace the dtb with the newly compiled one, then the gpu will start working.

&v3d {
	power-domains = <&power RPI_POWER_DOMAIN_V3D>;
};

bcm2837-rpi-3-b.dtb.zip

This was the reason behind the revert. But the revert causes hang during boot of Raspbian, so i decided to drop the revert.

https://patchwork.kernel.org/patch/11136979/

@redchenjs
Copy link
Contributor

It seems that without these reverts, the GPU will also work, so maybe these reverts cause the X hang?

diff --git a/arch/arm/boot/dts/bcm283x.dtsi b/arch/arm/boot/dts/bcm283x.dtsi
index 2d191fc..b238567 100644
--- a/arch/arm/boot/dts/bcm283x.dtsi
+++ b/arch/arm/boot/dts/bcm283x.dtsi
@@ -3,7 +3,6 @@ 
 #include <dt-bindings/clock/bcm2835-aux.h>
 #include <dt-bindings/gpio/gpio.h>
 #include <dt-bindings/interrupt-controller/irq.h>
-#include <dt-bindings/soc/bcm2835-pm.h>

 /* firmware-provided startup stubs live here, where the secondary CPUs are
  * spinning.
@@ -121,7 +120,7 @@ 
 			#interrupt-cells = <2>;
 		};

-		pm: watchdog@7e100000 {
+		watchdog@7e100000 {
 			compatible = "brcm,bcm2835-pm", "brcm,bcm2835-pm-wdt";
 			#power-domain-cells = <1>;
 			#reset-cells = <1>;
@@ -641,7 +640,6 @@ 
 			compatible = "brcm,bcm2835-v3d";
 			reg = <0x7ec00000 0x1000>;
 			interrupts = <1 10>;
-			power-domains = <&pm BCM2835_POWER_DOMAIN_GRAFX_V3D>;
 		};

 		vc4: gpu {

@redchenjs
Copy link
Contributor

Add these lines to the dts file, compile it, replace the dtb with the newly compiled one, then the gpu will start working.

&v3d {
	power-domains = <&power RPI_POWER_DOMAIN_V3D>;
};

bcm2837-rpi-3-b.dtb.zip

@lategoodbye
Copy link
Contributor Author

It seems that without these reverts, the GPU will also work, so maybe these reverts cause the X hang?

Devicetree changes usually don't cause hangs, it's more a driver issue. According your change you combine the "best" of both power drivers. Unfortunately it's unsafe to handle the same register ranges with two Linux drivers. Currently i only see two options:

  1. Revert most of the BCM2835 power series
  2. Port parts of firmware power driver into the BCM2835 power driver

starnight added a commit to endlessm/linux that referenced this issue Nov 13, 2019
…instead of firmware.""

Because both upstream [1] and Raspbian downstream [2] kernels drops this
patch.
This reverts commit 655c3ca.

https://phabricator.endlessm.com/T28448

[1]: https://patchwork.kernel.org/patch/11136979/#22928901
[2]: raspberrypi/linux#3046 (comment)

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
goldelico added a commit to goldelico/letux-kernel that referenced this issue Nov 18, 2019
@sankayop
Copy link

sankayop commented Nov 23, 2019

Add these lines to the dts file, compile it, replace the dtb with the newly compiled one, then the gpu will start working.

&v3d {
	power-domains = <&power RPI_POWER_DOMAIN_V3D>;
};

bcm2837-rpi-3-b.dtb.zip

This solves it for me.
I just replaced the old /boot/dtbs/broadcom/bcm2837-rpi-b.dtb with yours and it worked.
Thanks @redchenjs
(cfg: raspberry pi 3b + manjaro)

jprvita pushed a commit to endlessm/linux that referenced this issue Nov 27, 2019
…of firmware."

Since release of the new BCM2835 PM driver there has been several reports
of V3D probing issues. This is caused by timeouts during powering-up the
GRAFX PM domain:

  bcm2835-power: Timeout waiting for grafx power OK

I was able to reproduce this reliable on my Raspberry Pi 3B+ after setting
force_turbo=1 in the firmware configuration. Since there are no issues
using the firmware PM driver with the same setup, there must be an issue
in the BCM2835 PM driver.

Unfortunately there hasn't been much progress in identifying the root cause
since June (mostly in the lack of documentation), so i decided to switch
back until the issue in the BCM2835 PM driver is fixed.

Link: raspberrypi/linux#3046
Fixes: e1dc2b2 (" ARM: bcm283x: Switch V3D over to using the PM driver instead of firmware.")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
jprvita pushed a commit to endlessm/linux that referenced this issue Dec 3, 2019
…of firmware."

Since release of the new BCM2835 PM driver there has been several reports
of V3D probing issues. This is caused by timeouts during powering-up the
GRAFX PM domain:

  bcm2835-power: Timeout waiting for grafx power OK

I was able to reproduce this reliable on my Raspberry Pi 3B+ after setting
force_turbo=1 in the firmware configuration. Since there are no issues
using the firmware PM driver with the same setup, there must be an issue
in the BCM2835 PM driver.

Unfortunately there hasn't been much progress in identifying the root cause
since June (mostly in the lack of documentation), so i decided to switch
back until the issue in the BCM2835 PM driver is fixed.

Link: raspberrypi/linux#3046
Fixes: e1dc2b2 (" ARM: bcm283x: Switch V3D over to using the PM driver instead of firmware.")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
starnight added a commit to endlessm/linux that referenced this issue Dec 4, 2019
…instead of firmware.""

Because both upstream [1] and Raspbian downstream [2] kernels drops this
patch.
This reverts commit 655c3ca.

https://phabricator.endlessm.com/T28448

[1]: https://patchwork.kernel.org/patch/11136979/#22928901
[2]: raspberrypi/linux#3046 (comment)

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
@maggu2810
Copy link

maggu2810 commented Dec 17, 2019

A RPi3B+ of mine has not been used for a while.
Yesterday I started with a new project and I setup the RPi.

I used a new SD card and prepared it with Arch Linux ARM AArch64.
I run into the problem reported here.

I turned the RPi off yesterday evening and turned it on this morning.
Same problem. As the RPi has been turned off for hours I don't think mine has been too hot on its first power up this morning.

So, if you need another board to get some diagnostic information, I can try to provide.

@pelwell
Copy link
Contributor

pelwell commented Dec 17, 2019

The consensus above is that this is caused by an incompatibility in the upstream/mainline 3B+ DTB. Edit the source file as described by @sankayop above and rebuild it (or download the prebuilt version they link to) and try with that.

@maggu2810
Copy link

Thank you for your reply, will give it a try later...

Will the specific change that seems to be applied to all the fedora kernel versions make it upstream?
I did not find it here https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/bcm2835-rpi.dtsi

@lategoodbye
Copy link
Contributor Author

Currently for upstream i only see two "options":

  1. revert Eric's complete bcm2835-power series
  2. merge the working parts from raspberrypi-power into bcm2835-power

I'm not happy with both of them. @sankayop patch will enable both power driver for the same power domain. I consider this as a path to hell ...

@pelwell
Copy link
Contributor

pelwell commented Dec 17, 2019

@lategoodbye Do you have a preference between 1 and 2? Is there something we can do to help?

@lategoodbye
Copy link
Contributor Author

Number 1 isn't a real option, because we need this driver for Raspberry Pi 4. Number 2 should be do able for downstream, but would result more likely in a merge of both drivers for upstream.

The best option would be to ask someone with deeper understanding of BCM2835 why the rampup causes these random timeouts (timing issue, missing requirements, wrong order of power domain handling) and fix the bcm2835 power driver.

@menteb
Copy link

menteb commented Jan 18, 2020

Any updates on this?

@lategoodbye
Copy link
Contributor Author

lategoodbye commented Apr 2, 2020

In the upstream kernel the suggested patch to revert has been applied. The hanging X issue was unrelated.
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20200402&id=e7b7daeb48e0bf5d8412d77f11069750ee7032bb

@CodingKoopa
Copy link

When booting up my Raspberry Pi 2 Model B with Arch Linux ARM, it seems one of two things happens:

  • The system boots up with the proper resolution I have specified in my config.txt. When I run startx to start LXDE, the DE launches, then almost immediately hangs.
    • The timeout log message is not present in my system journal.
  • The system boots up with a lower resolution. When I run startx, LXDE functions as expected.
    • The timeout log message is present multiple times.
    • There seems to be no GPU acceleration present.

In my testing, it does seem that the first occurrence is more likely when the Pi is cooled down, rather than right after rebooting. Most of these are things that have already been pointed out, but I wanted to provide a test case for anyone else having the issue.

Is the issue with X hanging being tracked anywhere?

@lategoodbye
Copy link
Contributor Author

Is the issue with X hanging being tracked anywhere?

Here is the accepted fix:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20200504&id=b1e7396a1d0e6af6806337fdaaa44098d6b3343c

@sixtyfive
Copy link

Seems the Pi 3 A+ has the same problem :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests