Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest firmware version breaks boot on Raspberry Pi 3 (and others) with upstream Linux when no HDMI is connected #1726

Closed
stapelberg opened this issue May 8, 2022 · 13 comments

Comments

@stapelberg
Copy link

Describe the bug

Commit 949c898 broke booting on my Raspberry Pi 3B, Pi 3B+ and Pi Zero 2 W. It seems to not have affected my Pi 4B.

The last line when booting Linux 5.17.5 (latest upstream, not the Raspberry Pi foundation version) is:

[    2.373010] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)

The commit description mentions HDMI clocks, so I think that’s related.

Note that I’m running my Raspberry Pis without an HDMI display connected. As soon as I connect an HDMI display, the device boots again.

@popcornmix
Copy link
Contributor

I'm trying to reproduce. I have confirmed that using rpi-update (i.e. firmware you indicate with rpi 5.15.38-v7+ kernel) does boot okay on a Pi3+ with hdmi disconnected.

I'll see if I can test with upstream kernel.

@popcornmix
Copy link
Contributor

I managed to get upstream kernel working, but it booted okay for me without hdmi connected:

pi@a:~ $ uname -a
Linux a 5.17.6-multi #606 SMP Mon May 9 19:55:57 BST 2022 armv7l GNU/Linux

This is with latest rpi-update firmware.

@stapelberg
Copy link
Author

Weird. Are you using the upstream DTBs, too?

I just double-checked on my Raspberry Pi 3 B+, and I upgraded to 5.17.6, just like you’re using, but I still see the same symptom: with HDMI cable connected, it boots, without HDMI cable connected, it hangs.

You can find the full disk image I’m using at https://t.zekjur.net/2022-05-09-raspberry-pi-boot-hang.img.bz2 (57 MB). You can find my kernel / dtb in the first partition (losetup -Pf 2022-05-09-raspberry-pi-boot-hang.img)

@popcornmix
Copy link
Contributor

Yes I was using upstream dtbs.

Can you try with this patch added?
https://www.spinics.net/lists/arm-kernel/msg975291.html

I think currently firmware and upstream kernel disagree with the number of clocks (which is passed through a mailbox call), so undefined behaviour is possible.

@stapelberg
Copy link
Author

Thanks for the tip!

I applied the following patches:

  1. torvalds/linux@12c90f3
  2. torvalds/linux@542acfe
  3. torvalds/linux@e9d6cea
  4. https://www.spinics.net/lists/arm-kernel/msg975291.html

And that indeed makes the new firmware version work.

Is there any chance you can fix this on the firmware side until the above changes are in a released Linux kernel version?

@popcornmix
Copy link
Contributor

Is there any chance you can fix this on the firmware side until the above changes are in a released Linux kernel version?

No, I don't think so. I don't believe it was broken by the firmware version you identified.
(it still worked for me with same kernel version, but that's what you get with undefined behaviour.)
There are two issues with the API the kernel uses for querying clocks:

  1. The firmware and kernel must agree on the number of clocks
  2. There must be a zero in the location following the clocks structure

There's some more info here: #1688

A fix for 2 is suggested in the linked issue (but it doesn't fix 1).
Now for 1, I think it's happy if the kernel has at least as many clocks.

So the only fix I can see for this is on the kernel side.
I'm thinking if the arm sets RPI_FIRMWARE_NUM_CLK_ID to a number sure to be bigger than the firmware uses, and zeros the array before passing it to firmware, the firmware will only populate the actual number of clocks and the kernel can see the zero at the end of the clock to determine the number of clocks.

A cleaner alternative is to change the API. Either a two step get_number_of_clocks, followed by get clocks data.
Or a get_first_clock, get_next_clock API that indicates when it reaches the end of the array.
But both of these also require a kernel side change.

While typing this, I believe firmware has access to the buffersize. In your case where I believe buffersize is N-1 clocks (when firmware wants to fill in N) we could do something very hacky. Fill in N-2 clocks and then set last entry to zero as wanted by kernel code. It does mean your kernel won't have access to CLOCK_PIXEL_BVB and CLOCK_VEC but I'm not sure if upstream currently makes use of that.

Just truncating at the size given is less hacky, but not guaranteed to fix the issue. It depends on whether there is a convenient 0 following the block - but we have been getting away with that for quite some time, so it may work. I'll test when I have a chance.

stapelberg added a commit to gokrazy/kernel that referenced this issue May 12, 2022
@stapelberg
Copy link
Author

Thanks for the details. I’ll carry the kernel patches for now.

@popcornmix
Copy link
Contributor

There is a firmware update that I believe resolves this issue. Are you able to test it seems okay for you?

@stapelberg
Copy link
Author

Can confirm: with the current kernel and firmware, the issue is still reproducible. With the firmware update you linked to, the issue is fixed.

Thanks for resolving this!

popcornmix added a commit that referenced this issue May 19, 2022
…y have been boosted

See: #1726

firmware: arm_dt: Try upstream DTB files if downstream absent

firmware: arm_loader: Delay the USB controller switchover

firmware: Fix for vc_image YUYV family to YUV422 planar conversion function
popcornmix added a commit that referenced this issue May 19, 2022
…y have been boosted

See: #1726

firmware: arm_dt: Try upstream DTB files if downstream absent

firmware: arm_loader: Delay the USB controller switchover

firmware: Fix for vc_image YUYV family to YUV422 planar conversion function
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this issue May 19, 2022
…y have been boosted

See: raspberrypi/firmware#1726

firmware: arm_dt: Try upstream DTB files if downstream absent

firmware: arm_loader: Delay the USB controller switchover

firmware: Fix for vc_image YUYV family to YUV422 planar conversion function
@popcornmix
Copy link
Contributor

The test firmware is now in rpi-update repo.

@colemickens
Copy link

@popcornmix Is there a tag that corresponds to "test firmware" "in rpi-update"?

A branch would be super helpful as it would just work with my update/testing automation.

@pelwell
Copy link
Contributor

pelwell commented May 19, 2022

The default branch ("master") is essentially a "test" branch - only the "stable" branch is likely to have had any significant usage.

@timg236
Copy link

timg236 commented Nov 28, 2022

It sounds like the issue was resolved. Closing as fixed.

@timg236 timg236 closed this as completed Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants