Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firmware transaction timeout - raspistill hang #1552

Closed
do-dat-thang opened this issue Mar 16, 2021 · 53 comments
Closed

Firmware transaction timeout - raspistill hang #1552

do-dat-thang opened this issue Mar 16, 2021 · 53 comments

Comments

@do-dat-thang
Copy link

Is this the right place for my bug report?
I'm pretty sure.

Describe the bug
raspistill hangs on Firmware transaction timeout. After that point Xorg also hangs up. On this latest recreate, raspinfo is also hung when running after the problem to collect data.

To reproduce
Run raspistill repeatedly until the error occurs.

Expected behaviour
The kernel & firmware should remain stable at all times. Running a camera capture should always create an image and return or gracefully fail leaving the system stable.

Actual behaviour
Running raspistill repeatedly will eventually get the firmware transaction timeout.

System

raspinfo is hung after displaying the following:

$ raspinfo

System Information

Raspberry Pi 4 Model B Rev 1.2
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"

Raspberry Pi reference 2021-01-11
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 21090519d85bdaa1615d5d5057d37b09368ea5d2, stage5

Linux projector1 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l GNU/Linux
Revision : c03112
Serial : 10000000e4a3851c
Model : Raspberry Pi 4 Model B Rev 1.2

I will continue to try to get the raspinfo or, failing that, will try to get the equivalent parts.

  • Which model of Raspberry Pi?

Pi4B, Pi3B+

  • Which OS and version (cat /etc/rpi-issue)?

Raspberry Pi reference 2021-01-11
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 21090519d85bdaa1615d5d5057d37b09368ea5d2, stage5

  • Which firmware version (vcgencmd version)? vcgencmd version hangs and cannot be interrupted by ^C
  • Which kernel version (uname -a)?

Linux projector1 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l GNU/Linux

and all subsequent levels.

Logs

dmesg.txt

Additional context
This recreate is the first time raspinfo hung.
I displaying to 2 HDMI monitors.
There are 2 USB to serial adapter cables connected but not in use during recreate.
This problem occurs in all levels following and including the Dec 14, 2020 commit.
I am running a C++ application which snaps 2 images every few seconds using these 2 commands for the 2 invocations (via system()) of raspistill:

raspistill -e jpg -w 2592 -h 1944 -t 1 -n -rot 180 -st -o /home/pi/tgt.jpg 2>&1
raspistill -e jpg -w 2592 -h 1944 -t 1 -n -rot 180 -ag 8.0 -dg 0.0 -st -o /home/pi/img.jpg 2>&1

When this problem occurs I have to either power cycle or run the following to reboot (sudo reboot seems to start rebooting, but never actually reboots):

echo s | sudo tee /proc/sysrq-trigger
echo u | sudo tee /proc/sysrq-trigger
echo b | sudo tee /proc/sysrq-trigger

No success with raspinfo - I'll recreate & see if I can collect & post the additional info.

@do-dat-thang
Copy link
Author

IDK why raspinfo hangs now (perhaps I never tried raspinfo after hitting this on 5.4.83 ??).
Attaching raspinfo from 5.1.17 from previous recreation at that level.

raspinfo.txt

@do-dat-thang
Copy link
Author

do-dat-thang commented Mar 16, 2021

On 5.4.83

vcdbg log msg

vcdbg_log_msg.txt

@feacluster
Copy link

Confirming I have the same issue using 5.4.83-v8+ and 5.10.17-v7l+ but no issue with 4.19.97-v7l+ .

@6by9
Copy link

6by9 commented Mar 19, 2021

Please try upgrading just the firmware and not the kernel.
sudo SKIP_KERNEL=1 rpi-update

Almost all of the camera stack is on the firmware side, so I'm expecting the issue to be there rather than in the kernel.

If you have a reliable way of reproducing this, then you can use sudo SKIP_KERNEL=1 rpi-update <hash> with commit hashes https://github.com/Hexxeh/rpi-firmware/commits/master to try and isolate when the issue first appeared.

@do-dat-thang
Copy link
Author

Thanks, @6x9, for the tip. I will put both kernel and firmware from 5.4.81 (problem not seen) on my rpi4 & then will do the above to update just firmware to 5.4.83 stepwise using your link, above. Is there any helpful debug information I can collect?

@do-dat-thang
Copy link
Author

Ok, did rpi-update c91312ae1df5011bd77826e49d2b914b6d0e9b08 to get back to 5.4.81. Then sudo SKIP_KERNEL=1 rpi-update fa9d0f5299d132240c94fe6e7065b9c53897d725 to get the 1st firmare update.

It appears to be running ok (no FW hang yet), but I ran raspinfo just now and was surprised to see some of info in the vcdbg log messages. I suspect the EDID error is because I didn't have my displays turned on when I started but some the stuff at the top I wouldn't expect(?). Attaching:

raspinfo.nohang-5_4_81-fa9d0f5299d132240c94fe6e7065b9c53897d725.txt

@do-dat-thang
Copy link
Author

Ran through these in order, letting them run for 1/2-1 hr:

sudo SKIP_KERNEL=1 rpi-update fa9d0f5299d132240c94fe6e7065b9c53897d725
sudo SKIP_KERNEL=1 rpi-update 942756073c6970afc17afb6a34ff217d9831e09f
sudo SKIP_KERNEL=1 rpi-update dea7234943c604462e476a8afc13c587418e8709
sudo SKIP_KERNEL=1 rpi-update 453e49bdd87325369b462b40e809d5f3187df21d

The last one (453e49bdd87325369b462b40e809d5f3187df21d) failed in 11 minutes.

Tried raspinfo but hangs after

$ raspinfo
System Information
------------------

Raspberry Pi 4 Model B Rev 1.2
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"

Raspberry Pi reference 2021-01-11
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 21090519d85bdaa1615d5d5057d37b09368ea5d2, stage5

Linux projector1 5.4.81-v7l+ #1378 SMP Mon Dec 7 18:43:09 GMT 2020 armv7l GNU/Linux
Revision        : c03112
Serial          : 10000000e4a3851c
Model           : Raspberry Pi 4 Model B Rev 1.2

dmesg_453e49bdd87325369b462b40e809d5f3187df21d.txt

@6by9
Copy link

6by9 commented Mar 20, 2021

Linking to the forum thread https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=304989

Thank you for testing.
I'm reading from that that you're using the HQ camera, even though your raspistill line is for a 5MPix capture.

453e49bdd87325369b462b40e809d5f3187df21d does have an update to the IMX477 HQ camera driver, but that should only be for the high framerate mode to avoid image quality issues. It shouldn't affect stills captures at all, but at least it's something to investigate.

@do-dat-thang
Copy link
Author

Interesting - I thought I have the V2 camera. Isn't that the 5MPix device? The PCB has Raspberry Pi Camera (B) Rev 2.0 silk-screened on it. Could the whole problem we're seeing be an issue with HW detection/recognition incorrectly identifying it?

RpiCam

@6by9
Copy link

6by9 commented Mar 20, 2021

That is not a camera module made by Raspberry Pi. It is a third party clone of the v1 module (Omnivision OV5647 5MPix).

The assert logs that had been provided on the forum thread imply that the camera module isn't shutting down cleanly, and is locking up the camera subsystem (and possibly other parts) on the GPU.
Kernel logs are largely useless on this issue. sudo vcdbg log msg, and preferably to be running with start_debug=1```` in config.txt and therefore sudo vcdbg log assert``` will tell us where the firmware has hit issues.

@do-dat-thang
Copy link
Author

so, put start_debug=1 into config.txt (assume at end of file in [all] section) and then reboot & run the vcdbg commands after recreate?

I will try that, but I believe the vcdbg commands hang...

@do-dat-thang
Copy link
Author

Nice! This time the problem hit really fast. And the vcdbg seems to have worked:

vcdbg_log_msg.txt
vcdbg_log_assert.txt

@do-dat-thang
Copy link
Author

I don't see any of the ../../../../../middleware/ISP/tuner/*.c files referenced in the vcdbg log assert in the Raspberry Pi repo. Are these available elsewhere or are they not open?

Are the "dump_stack failed" expected? Or are these also caused by this problem?

Anything else I can capture from the failure that would help debug?

@JamesH65
Copy link
Contributor

vcdbg passes back debug information from the videocore firmware, so all the debug printed is from that. And its all closed source.

@feacluster
Copy link

feacluster commented Mar 22, 2021

I too have the same issue and am using the V2 camera from Amazon.

image

@do-dat-thang
Copy link
Author

OK. Please let me know if there is anything else I can provide for debug or if you'd like me to try a potential fix/debug code.

@elzorrofxt
Copy link

elzorrofxt commented Mar 23, 2021

I'm the original poster on the

Linking to the forum thread https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=304989

Thank you for testing.
I'm reading from that that you're using the HQ camera, even though your raspistill line is for a 5MPix capture.

453e49bdd87325369b462b40e809d5f3187df21d does have an update to the IMX477 HQ camera driver, but that should only be for the high framerate mode to avoid image quality issues. It shouldn't affect stills captures at all, but at least it's something to investigate.

I'm the original poster (PiPiParty) of that thread, lemme know if there is any testing I can do, I've got the HQ camera.

@do-dat-thang
Copy link
Author

So... I'm wondering: I see multiple forum discussions and git issues on very similar problems to this and all those just stop with no resolution. With the firmware being a closed component, there is no way to assist in debug and fix. Is there any way to know if there is anyone is even working on this and/or any progress? I believe this is the right place to ask this - if not, can anyone here point me in the right direction?

I'm asking because we have many rpi's running our application which is dependent on networking as well as the camera. The wireless networking appears to be much more stable on the latest releases so we want to be on the latest. We however don't feel comfortable with a mismatch of kernel/firmware/everything else.

@naushir
Copy link

naushir commented Mar 26, 2021

I've been trying unsuccessfully to reproduce this for most of today. With a freshly run rpi-update and on both imx477 and imx219, I cannot get it to lockup if I run the following loop:

while true; do raspistill -t 1000 -o test.jpg; sleep 1; done

Edit: Perhaps I ought to remove the sleep from the loop. And without the sleep, it has failed!

@naushir
Copy link

naushir commented Mar 26, 2021

Last line of vcdbg log msg:

4255553.603: mmal: fail_destroy: Timed out waiting to destroy ril.camera

I see no asserts or exceptions on the VC side.

dmesg error stack:

[ 4241.692259] ------------[ cut here ]------------
[ 4241.692297] WARNING: CPU: 0 PID: 12711 at drivers/firmware/raspberrypi.c:64 rpi_firmware_transaction+0xec/0x128
[ 4241.692309] Firmware transaction timeout
[ 4241.692319] Modules linked in: rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc fuse 8021q garp stp llc rpivid_mem brcmfmac brcmutil sha256_generic vc4 v3d gpu_sched cfg80211 cec rfkill drm_kms_helper raspberrypi_hwmon bcm2835_codec(C) drm v4l2_mem2mem bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common drm_panel_orientation_quirks videodev snd_bcm2835(C) mc vc_sm_cma(C) snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq nvmem_rmem uio i2c_dev ip_tables x_tables ipv6
[ 4241.692957] CPU: 0 PID: 12711 Comm: kworker/0:0 Tainted: G         C        5.10.25-v7l+ #1408
[ 4241.692965] Hardware name: BCM2711
[ 4241.692981] Workqueue: events dbs_work_handler
[ 4241.692994] Backtrace: 
[ 4241.693019] [<c0b72ce8>] (dump_backtrace) from [<c0b7307c>] (show_stack+0x20/0x24)
[ 4241.693032]  r7:ffffffff r6:00000000 r5:60000013 r4:c12e6a3c
[ 4241.693047] [<c0b7305c>] (show_stack) from [<c0b7740c>] (dump_stack+0xcc/0xf8)
[ 4241.693063] [<c0b77340>] (dump_stack) from [<c0220bac>] (__warn+0xfc/0x114)
[ 4241.693075]  r10:dec05008 r9:00000009 r8:c09acd34 r7:00000040 r6:00000009 r5:c09acd34
[ 4241.693083]  r4:c0e9f12c r3:c1205094
[ 4241.693096] [<c0220ab0>] (__warn) from [<c0b73820>] (warn_slowpath_fmt+0xa4/0xd8)
[ 4241.693106]  r7:00000040 r6:c0e9f12c r5:c1205048 r4:c0e9f14c
[ 4241.693120] [<c0b73780>] (warn_slowpath_fmt) from [<c09acd34>] (rpi_firmware_transaction+0xec/0x128)
[ 4241.693131]  r9:c1a85340 r8:00000018 r7:00000000 r6:ffffff92 r5:c1a85340 r4:c1205048
[ 4241.693145] [<c09acc48>] (rpi_firmware_transaction) from [<c09ace2c>] (rpi_firmware_property_list+0xbc/0x170)
[ 4241.693155]  r7:c1205048 r6:dec05000 r5:00001000 r4:dec05024
[ 4241.693168] [<c09acd70>] (rpi_firmware_property_list) from [<c09acf50>] (rpi_firmware_property+0x70/0x118)
[ 4241.693179]  r10:c466934c r9:00030002 r8:00000018 r7:c1a85340 r6:c4f97d48 r5:0000000c
[ 4241.693186]  r4:c4669340
[ 4241.693204] [<c09acee0>] (rpi_firmware_property) from [<c07e6b78>] (raspberrypi_clock_property+0x54/0x7c)
[ 4241.693215]  r10:00000000 r9:00000000 r8:c22cfc40 r7:00000000 r6:2faf0800 r5:c4f97d70
[ 4241.693222]  r4:c1205048 r3:0000000c
[ 4241.693237] [<c07e6b24>] (raspberrypi_clock_property) from [<c07e6bec>] (raspberrypi_fw_get_rate+0x4c/0x70)
[ 4241.693245]  r5:00000000 r4:c1205048
[ 4241.693258] [<c07e6ba0>] (raspberrypi_fw_get_rate) from [<c07dd4c0>] (clk_recalc+0x44/0x84)
[ 4241.693266]  r4:c20ba000
[ 4241.693277] [<c07dd47c>] (clk_recalc) from [<c07dd7b0>] (__clk_recalc_rates+0x30/0x94)
[ 4241.693285]  r5:00000000 r4:c20ba000
[ 4241.693296] [<c07dd780>] (__clk_recalc_rates) from [<c07dd86c>] (clk_core_get_rate_recalc+0x58/0x5c)
[ 4241.693306]  r7:00000000 r6:c1205048 r5:23c34600 r4:c20ba000
[ 4241.693317] [<c07dd814>] (clk_core_get_rate_recalc) from [<c07dd898>] (clk_get_rate+0x28/0x38)
[ 4241.693325]  r5:23c34600 r4:c22cfc40
[ 4241.693341] [<c07dd870>] (clk_get_rate) from [<c0978188>] (dev_pm_opp_set_rate+0x70/0x588)
[ 4241.693349]  r5:23c34600 r4:c19a5400
[ 4241.693363] [<c0978118>] (dev_pm_opp_set_rate) from [<c0983078>] (set_target+0x38/0x3c)
[ 4241.693374]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:c1406e74 r5:c1205048
[ 4241.693381]  r4:c19a5600
[ 4241.693394] [<c0983040>] (set_target) from [<c097dd7c>] (__cpufreq_driver_target+0x1bc/0x540)
[ 4241.693406] [<c097dbc0>] (__cpufreq_driver_target) from [<c098110c>] (od_dbs_update+0xc4/0x170)
[ 4241.693417]  r10:00000000 r9:c338ba00 r8:c401bb40 r7:c338b200 r6:c338ba00 r5:c338b200
[ 4241.693424]  r4:c19a5600
[ 4241.693438] [<c0981048>] (od_dbs_update) from [<c0982370>] (dbs_work_handler+0x3c/0x64)
[ 4241.693448]  r9:00000000 r8:c131d3dc r7:c19a5600 r6:c338b204 r5:00000000 r4:c338b238
[ 4241.693463] [<c0982334>] (dbs_work_handler) from [<c023dc18>] (process_one_work+0x250/0x5a0)
[ 4241.693473]  r9:00000000 r8:00000000 r7:eff15a00 r6:eff12680 r5:c4649680 r4:c338b238
[ 4241.693486] [<c023d9c8>] (process_one_work) from [<c023dfc8>] (worker_thread+0x60/0x5c4)
[ 4241.693496]  r10:eff12680 r9:c1203d00 r8:eff12698 r7:00000008 r6:eff12680 r5:c4649694
[ 4241.693504]  r4:c4649680
[ 4241.693517] [<c023df68>] (worker_thread) from [<c0245a50>] (kthread+0x170/0x174)
[ 4241.693528]  r10:c4f1be74 r9:c4649680 r8:c023df68 r7:c4f96000 r6:00000000 r5:c4669900
[ 4241.693535]  r4:c4641e40
[ 4241.693547] [<c02458e0>] (kthread) from [<c02000ec>] (ret_from_fork+0x14/0x28)
[ 4241.693555] Exception stack(0xc4f97fb0 to 0xc4f97ff8)
[ 4241.693565] 7fa0:                                     00000000 00000000 00000000 00000000
[ 4241.693575] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 4241.693585] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 4241.693595]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c02458e0
[ 4241.693602]  r4:c4669900
[ 4241.693612] ---[ end trace 57eb65a4a490a110 ]---
``

@pelwell
Copy link
Contributor

pelwell commented Mar 26, 2021

What state is the VPU in at that point? Do vcgencmds work? What do "sudo vcdbg malloc" and "sudo vcdbg reloc" report?

@naushir
Copy link

naushir commented Mar 26, 2021

The VPU was in a locked state, but curiously only after I managed to do a vcdbg log msg.

@6by9
Copy link

6by9 commented Mar 26, 2021

I've now reproduced too on a CM4 with 477.
vcdbg and vcgencmd are working for me. Heaps look normal, although we still have some remnants of camplus/cdi still allocated.
Naush hasn't got an XJTAG with him. I have, but no debugger set up due to changing laptop.

Something in camplus_close is likely taking longer for some odd reason, and we're hitting the 2s component destroy timeout.

@pelwell
Copy link
Contributor

pelwell commented Mar 26, 2021

"log msg". "malloc" and "reloc" don't rely on cooperation from the VPU.

@naushir
Copy link

naushir commented Mar 26, 2021

"log msg". "malloc" and "reloc" don't rely on cooperation from the VPU.

My whole system locked up at that point. Had to power cycle to get back.

@6by9
Copy link

6by9 commented Mar 26, 2021

Reproduced with debugger.
Looks like we have a potential deadlock with restoring the I2C pinmuxing on close in power_pad_control.

@pelwell
Copy link
Contributor

pelwell commented Mar 26, 2021

Nice work - surely we are but moments away from a fix...

@6by9
Copy link

6by9 commented Mar 26, 2021

Hmm, don't hold your breath there. Internal issue raised. (vc4#46)

@6by9
Copy link

6by9 commented Mar 26, 2021

Patch merged to the internal repo - it should be in the next rpi-update release.

@do-dat-thang
Copy link
Author

Way awesome guys! Assuming I can just rpi-update to grab it, I'll do that later today & exercise it a bit. Thanks!!

@elzorrofxt
Copy link

Great news, thanks all!!

@do-dat-thang
Copy link
Author

Last night, I did the rpi-update and it picked up c0cf93a133dab106439c208f03d32155bc19e432 plus 5.10.25-v7l+... still running fine after 12 hours. Before, the failure usually occurred after just a few minutes. This IS great news!

After a bit more run time, do I close this issue or is that done by someone else?

@pelwell
Copy link
Contributor

pelwell commented Mar 27, 2021

Close it when you feel ready.

@feacluster
Copy link

When attempting to run "sudo rpi-update" on my Pi4+, I get this message:

Partition size 255M may not be sufficient for new Pi4 files This could result in a system that will not boot. 256M FAT partition is recommended. Ensure you have a backup if continuing.

Not sure if I can safely ignore this warning? The machine is located a half hour drive away from me. So I would rather not have to go there if it fails to boot :-)

@pelwell
Copy link
Contributor

pelwell commented Mar 28, 2021

The warning is aimed at old Raspbian images with 56MB boot partitions. I'm sure with 255MB you'll be fine, but you can always check the free space with the command df /boot.

@elzorrofxt
Copy link

I'm at 36 hours, looking good so far.

@do-dat-thang
Copy link
Author

Been running great for 3 days now. Thanks again for the great work!
Closing issue.

@feacluster
Copy link

feacluster commented Mar 31, 2021

I did the firmware update and got 5.10.25-v8+ . However, the camera still failed after one day after reboot. The error I get is:

error: failed to set camera to idle state
Perhaps another program is using camera, otherwise you need to reboot this pi

This might be a different issue from what was discussed here.

The messages I get from " sudo vcdbg log assert" are:


2387546.232: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/                                                                              components/camera.c::load_and_open_cdi line 12008 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2387546.293: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../..                                                                              /../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev edf2e9                                                                              c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447709.600: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::open_camera_peripheral line 645 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447709.675: assert( success == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::cdi_camera_change_state line 1401 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447709.765: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/c                                                                              di/cdi_camera.c::cdi_camera_open line 601 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447710.940: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/                                                                              components/camera.c::load_and_open_cdi line 12008 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447711.045: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../..                                                                              /../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev edf2e9                                                                              c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507860.963: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::open_camera_peripheral line 645 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507861.074: assert( success == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::cdi_camera_change_state line 1401 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507861.153: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/c                                                                              di/cdi_camera.c::cdi_camera_open line 601 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507862.399: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/                                                                              components/camera.c::load_and_open_cdi line 12008 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507862.884: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../..                                                                              /../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev edf2e9                                                                              c
vcdbg_ctx_get_dump_stack: dump_stack failed


@6by9
Copy link

6by9 commented Mar 31, 2021

2507860.963: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::open_camera_peripheral line 645 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed

is almost certainly that something else is using the CSI2 receiver. It's possible that a previous instance has got "stuck", but that shouldn't be the case with the fix created for this issue.
Please raise a new issue with all the standard information.

@do-dat-thang
Copy link
Author

@feacluster , If you had already hit this problem before you put the new firmware on, a regular reboot attempt (reboot/shutdown -r) seems to not really reboot the pi. You may need to power cycle or see the reboot sequence in the link @6x9 just posted (#4047). That worked for me. "uname -r" will let you know if the new kernel was loaded - IDK how to tell the currently loaded firmware level.

@simonyangme
Copy link

I'm still seeing this issue on a Pi 3B+ with the HQ camera. I'm running kernel 5.10.25-v7+ and used rpi-update with the hash listed upthread. vcgencmd version shows me the following:

Feb 25 2021 12:13:12 
Copyright (c) 2012 Broadcom
version 564e5f9b852b23a330b1764bcf0b2d022a20afd0 (clean) (release) (start_x)

Just wanted to check that this is correct and isn't due to an outdated firmware as the date it shows seems old. Running rpi-update c0cf93a133dab106439c208f03d32155bc19e432 gives me *** Your firmware is already up to date

@feacluster
Copy link

feacluster commented Apr 19, 2021

@simonyangme I am just confirming that I too still have this issue on one of my seven cameras despite upgrading to latest kernel . I also tried doing a full power cycle by unplugging as suggested earlier. I am using a Pi 4 , not 3B+.

I don't believe it is a hardware issue because the camera works fine for several hours. Often upto a full day. As a workaround I have a cronjob to reboot everyday at midnight. I suppose I can try replacing camera. But it is in a somewhat inaccessible location so would rather not :-)

@popcornmix
Copy link
Contributor

@simonyangme that firmware is too old.
The suggested hash Hexxeh/rpi-firmware@c0cf93a should be around March 26.
I assume you've rebooted after running rpi-update?

To force an rpi-update use:

sudo rm /boot/.firmware_revision
sudo rpi-update <hash>

although there's no need to specify a specific older firmware, all subsequent versions will contain this fix.

@simonyangme
Copy link

@popcornmix That seemed to do the trick!

Is there any reason this is required? Simply doing rpi-update c0cf93a133dab106439c208f03d32155bc19e432 from an older version (kernel+firmware) did not seem to pull in the right firmware. Is this expected behavior?

@popcornmix
Copy link
Contributor

/boot/.firmware_revision just contains the hash from the last run on rpi-update and if you ask for the same thing again it says it's up to date.
Only reason I can think of it failing is if you updated then ran it again without rebooting, or updated and then downgraded firmware (e.g. by running apt upgrade) in a way rpi-update is unaware of.

@simonyangme
Copy link

I'm now running:

Mar 26 2021 16:35:37 
Copyright (c) 2012 Broadcom
version edf2e9c318863999c97c50cdb74eee235ede3af5 (clean) (release) (start_x)

and still see the same issue when I switch to full frame mode and capture an image.
rpi-issue.txt

@simonyangme
Copy link

With a bit more testing, the failure seems to happen randomly on the first capture after boot on my system. If it happens, then I am forced to power cycle to recover. If it does not happen, then capturing 5+ images does not seem to trigger it.

@simonyangme
Copy link

FWIW I've tried going back to dea7234943c604462e476a8afc13c587418e8709 kernel and firmware, and have not seen the issue since. I have not had a chance to further narrow down what might be contributing to the problem, but got the following from vcdbg log assert when it does fail:

096238.052: assert( IS_ALIAS_PERIPHERAL(c->dst_addr) || (c->xfer_info & DMA0_TI_DEST_IGNORE_SET) || IS_ALIAS_NOT_L1_ALLOCATING(c->dst_addr) ) failed; ../../../../../helpers/dmalib/dmalib.c::dma_chain_start line 482 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed

Please let me know if I should open a new issue for this and if there's any other info I can provide to help track this down.

@feacluster
Copy link

Unfortunately the issue seems to be happening again. I have seven of these v2.1 cameras but two of them have started showing the same errors reported in this thread. See code snippets below: As a workaround I have a cron job that reboots the pi every day at midnight. The camera will work ok for 1-2 days but then fail with:

pi@court5:~ $  raspistill -w 640 -h 480 -q 50 -o /var/www/html/share/court5.jpg
tvservice-client: Failed to connect to TV service: -1
mmalipc: mmal_vc_init_fd: could not open vchiq service
mmal: mmal_vc_component_create: failed to initialise mmal ipc for 'vc.camera_info' (7:EIO)
mmal: mmal_component_create_core: could not create component 'vc.camera_info' (7)
mmal: Failed to create camera_info component
mmalipc: mmal_vc_init_fd: could not open vchiq service
mmal: mmal_vc_component_create: failed to initialise mmal ipc for 'vc.ril.camera' (7:EIO)
mmal: mmal_component_create_core: could not create component 'vc.ril.camera' (7)
mmal: Failed to create camera component
mmal: main: Failed to create camera component
mmal: Camera is not enabled in this build. Try running "sudo raspi-config" and ensure that "camera" has been enabled
pi@court5:~ $ uname -r
5.10.17-v8+
pi@court5:~ $ sudo vcdbg log assert
debug_sym: vc_mem_copy: Unable to open '/dev/fb0': No such file or directory(2)

110926.893: assert( success == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_change_state line 1401 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
110926.967: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_open line 601 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
110928.112: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/components/camera.c::load_and_open_cdi line 12008 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
110928.188: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../../../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111006.231: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::open_camera_peripheral line 645 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111006.304: assert( success == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_change_state line 1401 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111006.367: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_open line 601 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111007.479: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/components/camera.c::load_and_open_cdi line 12008 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111007.536: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../../../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
115859.429: assert( temps->last_tone_mapping_params_valid ) failed; ../../../../../middleware/ISP/tuner/isp_tuner_brcm_agc.c::isp_tuner_brcm_agc_switch_mode line 2745 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
115946.712: assert( pass == ISP_TUNER_PASS_NONE ) failed; ../../../../../middleware/ISP/tuner/isp_tuner_brcm_focus.c::master_focus_post_isp_update line 1912 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
116092.199: assert( pass == ISP_TUNER_PASS_NONE ) failed; ../../../../../middleware/ISP/tuner/isp_tuner_brcm_focus.c::master_focus_post_isp_update line 1912 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------

@CuriousDran
Copy link

I seem to be having the same issue mentioned in these threads. I am running Model 4B 8GB with RPi HQ camera. Raspistill calls seem to be locking up my raspberry pi, the only solution is a hard reboot. I have tried sudo rpi-update and rebooted, now the system tends to hang instead of freezing entirely. I've posted some system and debugging information below. Please let me what other information you need me to provide.

uname -a
Linux raspberrypi 5.10.46-v7l+ #1430 SMP Mon Jun 28 17:14:36 BST 2021 armv7l GNU/Linux

Raspinfo hangs after printing model:

Raspberry Pi 4 Model B Rev 1.4
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"

Raspberry Pi reference 2021-03-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 461109415073d2eb67083709662ba983cc191f14, stage4

Linux raspberrypi 5.10.46-v7l+ #1430 SMP Mon Jun 28 17:14:36 BST 2021 armv7l GNU/Linux
Revision	: d03114
Serial		: 10000000886927fb
Model		: Raspberry Pi 4 Model B Rev 1.4
cat /etc/rpi-issue
Raspberry Pi reference 2021-03-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 461109415073d2eb67083709662ba983cc191f14, stage4```

When running raspistill with the --verbose output, this gets printed in the feed at each camera call:
"raspistill" Camera App (commit 4a0a19b88b43 Tainted)

vcdbg log output:

sudo vcdbg log assert
1774957.911: assert( *p==0xa55a5aa5 ) failed; ../../../../../vcfw/rtos/common/rtos_common_malloc.c::rtos_pool_aligned_free line 205 rev 7f92cea
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------

Any help would be greatly appreciated!

@CuriousDran
Copy link

The rpi-update solution doesn't seem to be working for me.

pi@raspberrypi:~ $ sudo vcgencmd version
Mar 26 2021 16:33:56 
Copyright (c) 2012 Broadcom
version edf2e9c318863999c97c50cdb74eee235ede3af5 (clean) (release) (start_db)

I added start_debug=1 into config.txt, rebooted, and re-encountered the issue. See vcdbg log msg/assert below.

pi@raspberrypi:~ $ sudo vcdbg log msg
005845.113: arasan: arasan_emmc_open
006276.336: brfs: File read: /mfs/sd/config.txt
006277.231: brfs: File read: 1921 bytes
006394.094: HDMI1:EDID error reading EDID block 0 attempt 0
006400.118: HDMI1:EDID error reading EDID block 0 attempt 1
006406.143: HDMI1:EDID error reading EDID block 0 attempt 2
006412.165: HDMI1:EDID error reading EDID block 0 attempt 3
006418.190: HDMI1:EDID error reading EDID block 0 attempt 4
006424.213: HDMI1:EDID error reading EDID block 0 attempt 5
006430.237: HDMI1:EDID error reading EDID block 0 attempt 6
006436.259: HDMI1:EDID error reading EDID block 0 attempt 7
006442.284: HDMI1:EDID error reading EDID block 0 attempt 8
006448.307: HDMI1:EDID error reading EDID block 0 attempt 9
006449.322: HDMI1:EDID giving up on reading EDID block 0
006450.804: brfs: File read: /mfs/sd/config.txt
007747.592: gpioman: gpioman_get_pin_num: pin DISPLAY_DSI_PORT not defined
007750.213: *** Restart logging
007750.234: brfs: File read: 1921 bytes
007814.400: hdmi: HDMI1:EDID error reading EDID block 0 attempt 0
007820.427: hdmi: HDMI1:EDID error reading EDID block 0 attempt 1
007826.933: hdmi: HDMI1:EDID error reading EDID block 0 attempt 2
007840.868: hdmi: HDMI1:EDID error reading EDID block 0 attempt 3
007846.921: hdmi: HDMI1:EDID error reading EDID block 0 attempt 4
007852.975: hdmi: HDMI1:EDID error reading EDID block 0 attempt 5
007859.027: hdmi: HDMI1:EDID error reading EDID block 0 attempt 6
007865.082: hdmi: HDMI1:EDID error reading EDID block 0 attempt 7
007871.135: hdmi: HDMI1:EDID error reading EDID block 0 attempt 8
007877.189: hdmi: HDMI1:EDID error reading EDID block 0 attempt 9
007878.223: hdmi: HDMI1:EDID giving up on reading EDID block 0
007883.319: hdmi: HDMI1:EDID error reading EDID block 0 attempt 0
007889.373: hdmi: HDMI1:EDID error reading EDID block 0 attempt 1
007895.425: hdmi: HDMI1:EDID error reading EDID block 0 attempt 2
007901.479: hdmi: HDMI1:EDID error reading EDID block 0 attempt 3
007907.532: hdmi: HDMI1:EDID error reading EDID block 0 attempt 4
007913.586: hdmi: HDMI1:EDID error reading EDID block 0 attempt 5
007919.638: hdmi: HDMI1:EDID error reading EDID block 0 attempt 6
007925.692: hdmi: HDMI1:EDID error reading EDID block 0 attempt 7
007931.745: hdmi: HDMI1:EDID error reading EDID block 0 attempt 8
007937.799: hdmi: HDMI1:EDID error reading EDID block 0 attempt 9
007938.832: hdmi: HDMI1:EDID giving up on reading EDID block 0
007938.871: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
007938.901: HDMI0: hdmi_pixel_encoding: 300000000
007938.926: HDMI1: hdmi_pixel_encoding: 300000000
007944.698: dtb_file 'bcm2711-rpi-4-b.dtb'
007956.713: brfs: File read: /mfs/sd/bcm2711-rpi-4-b.dtb
007956.744: Loading 'bcm2711-rpi-4-b.dtb' to 0x100 size 0xc230
007988.731: brfs: File read: 49712 bytes
008004.003: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
008162.978: brfs: File read: 1559 bytes
008166.112: brfs: File read: /mfs/sd/config.txt
008166.845: dtparam: i2c_arm=on
008188.076: dtparam: audio=on
008213.074: brfs: File read: 1921 bytes
008230.131: brfs: File read: /mfs/sd/overlays/vc4-fkms-v3d.dtbo
008273.738: Loaded overlay 'vc4-fkms-v3d'
008273.887: dtparam: i2c_vc=0
008407.556: brfs: File read: 1446 bytes
008410.685: brfs: File read: /mfs/sd/cmdline.txt
008410.767: Read command line from file 'cmdline.txt':
008410.798: 'console=serial0,115200 console=tty1 root=PARTUUID=f9644e31-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles'
009709.743: brfs: File read: 166 bytes
011040.567: brfs: File read: /mfs/sd/kernel7l.img
011040.607: Loading 'kernel7l.img' to 0x8000 size 0x675ab0
011040.657: Device tree loaded to 0x2eff3700 (size 0xc860)
011052.212: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
016069.312: vchiq_core: vchiq_init_state: slot_zero = 0xded80000, is_master = 1
016072.895: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
016080.608: TV service:host side not connected, dropping notification 0x00000002, 0x00000001, 0x00000010
019541.005: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
019841.651: camsubs: Camera not found
019841.827: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
020142.467: camsubs: Camera found OK
021232.676: TV service:host side not connected, dropping notification 0x00000008, 0x00000004, 0x00000000
026738.219: TV service:host side not connected, dropping notification 0x00000008, 0x00000004, 0x00000000
105209.282: imx477_open: read id 1143 succes 0!
119083.945: imx477_open: read id 1143 succes 0!
135221.734: imx477_open: read id 1143 succes 0!
148889.969: imx477_open: read id 1143 succes 0!
165282.377: imx477_open: read id 1143 succes 0!
178932.040: imx477_open: read id 1143 succes 0!
195327.450: imx477_open: read id 1143 succes 0!
208998.754: imx477_open: read id 1143 succes 0!
225391.197: imx477_open: read id 1143 succes 0!
239062.029: imx477_open: read id 1143 succes 0!
255464.986: imx477_open: read id 1143 succes 0!
269124.885: imx477_open: read id 1143 succes 0!
281917.824: vcos_abort: Halting
283913.377: mmal: fail_destroy: Timed out waiting to destroy ril.camera
pi@raspberrypi:~ $ sudo vcdbg log assert
005840.941: assert( timeout != 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/clock_2711.c::clock_wait_busy line 2014 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
007669.008: assert( uV >= platform_get_min_core_voltage() * 1000 && uV <= platform_get_max_core_voltage() * 1000 + (BCM2836 ? 12500:0) ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/power.c::power_set_voltage line 3459 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
007670.095: assert( uV >= platform_get_min_core_voltage() * 1000 && uV <= platform_get_max_core_voltage() * 1000 + (BCM2836 ? 12500:0) ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/power.c::power_set_voltage line 3459 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
026573.491: assert( eptr ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_execute_update line 968 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
281917.772: assert( *p==0xa55a5aa5 ) failed; ../../../../../vcfw/rtos/common/rtos_common_malloc.c::rtos_pool_aligned_free line 205 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
283913.459: assert( Timed out waiting to destroy ril.camera ) failed; ../../../../../interface/mmal/vc/ril/mmal_ril.c::fail_destroy line 1471 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
703821.630: assert( eptr ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_execute_update line 968 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------

@CuriousDran
Copy link

I don't know what information is useful, so I'm going try to be thorough and also attach vcdbg_malloc printout.

vcdbg_malloc.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants