-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMAX hangs in OMX_EmptyThisBuffer #449
Comments
I experience this issue on many running PI's too, but the only difference is that I can not restart the aplication. Also when trying to run the app again, strace says:
... and waits. Also even with my application killed, the /opt/vc/bin/vcgencmd can not start, strace says it is waiting for the same FUTEX. |
Can you describe how to reproduce the problem? |
Thank you for answering. Unfortunately it won't be simple... the backtrace I provided was related to a run that lasted some days playing a single video file. As you can see there is one thread for audio and one for video. I wasn't able to increase the frequency of the locks, but I noticed that using multiple players seems to make it very very simple to lock. It just takes a few seconds. This is the resulting backtrace: http://pastebin.com/CmjRJAgm. In this case you can see two video/audio players and one audio player. Seems pretty similar to me. If you consider this reasonable, I can provide what is needed to reproduce this in seconds/minutes. If we are lucky it is the same issue. I'm assuming you start from a clean Raspbian (I'm using 2015-05-05) using a Pi1. Pi2 seems to work a little better.
/usr/local/Qt-rasp-5.5.0/bin/qmlscene video_audio_multi.qml file:///home/pi/big_buck_bunny_720p_h264.mov file:///home/pi/sintel_trailer_720p.mp4 file:///home/pi/master_of_puppets.mp3 So provide two video files and an audio file (like wave or mp3), in this order. Use a URI, not a path. I just tried this procedure and it seems to work. I hope I didn't forget anything. Please let me know if something goes wrong. I could even reproduce with this QML: https://github.com/carlonluca/pi/blob/master/piomxtextures_samples/video_position.qml. One video file only and a timer frequently asking for the current position. But it really requires days to lock with the baktrace of the first message. |
Hello, I catched another lock up and created a coredump from GDB. Is it of any use (i can send it with the QT app)? |
@popcornmix I read a comment you wrote some time ago in #377: "I believe mmal and openmax are thread safe, so you don't need additional locking. And in fact in omxplayer omx calls seem not to be guarded. Can you confirm that OpenMAX libs in raspberry are thread-safe? |
Yes, openmax calls are thread safe. |
By doing a large amount of seeks I can now reproduce a similar situation "quickly". This backtrace is a little clearer:
This situation required around 10,000 seeks. Before calling openmax I acquire a mutex now. What I see here is that Thread 4 is waiting on a semaphore inside libopenmax and is keeping my mutex locked. According to the openmax specification for OMX_EmptyThisBuffer: "This call is a non-blocking call since the component will queue the buffer and return immediately. The buffer will be emptied later at the proper time. If the parameter nInputPortIndex in the buffer header does not specify a valid input port, the component returns OMX_ErrorBadPortIndex. The component should return from this call within five milliseconds." So, as I see that OMX_EmptyThisBuffer call is not returning, do you know of any possible reason why OMX_EmptyThisBuffer may behave like this? If I knew a possible reason for this I may be able to somehow fix my implementation. In this situation the Pi is stuck. I won't be able to run applications like omxplayer or vcgencmd. If I try to debug vcgencmd with parameter "version" I get this:
The process never exits. Probably useless but I try any way. |
It sounds like you've managed to get the GPU stuck such that it can't process the EmptyThisBuffer call. NB There is a difference between blocking such that buffer queues can be made thread safe, rather than blocking for the entire buffer to be processed. In the Pi case, the buffer queues are on the GPU, so have to go via |
@carlonluca If you can provide a test app I can run (e.g. a tarball I can unpack and run an executable, I may be able to see how the GPU is stuck). But, get the assert log suggested by @6by9 first, as that may contain some clues. |
This is what I had before starting the test:
this is what I got after the lock:
I made a couple of tests and the second time I got nothing at all, just the first three lines. The procedure is similar to what I described above, unfortunately running requires some libs, plugins etc... I can provide the newer version, I can lock even with one player only now, but if you want something quicker I could provide a raspbian image with everything in place. You just run the binary. Would that make things simpler for you? |
I don't think any of the asserts are necessarily relevant. |
Hello, I prepared the image (https://goo.gl/cL3fld). It is intended for armv6 as the issue seems to manifest quicker on Pi1. I built the code in debug so you can get a backtrace if you need it. What you have to do is simply flash it, boot it and, from the home of the pi user run the proper command.
I suggest you use a 720p video instead of the 1080p I provided in the home, because this player plays both videos concurrently. What you should see is something like this: https://youtu.be/MuPGAWp7vag. The framerate is going to be far worse as the one in the video is a Pi2 and without all those debug logs. It should hang pretty soon always with the same backtraces I already reported.
this uses just a single player, like omxplayer, but still seems to hang in ilcs_execute_function_ex as I reported above. This typically requires from 10,000 to 100,000 seeks to hang (500ms for a seek so it takes hours). I understand it is difficult to reproduce something like this, so the first option may be the best to start with. Running a single player without further interaction with openmax seems to be ok (I had a run of over 45 days). But trying to get the current position often using OMX_GetConfig or similar may result in these situations I reported after hours or days of playback. These are the best codes I found to let you reproduce quickly. |
@popcornmix were you able to reproduce the issue? Is there anything else I can do to help you with this? |
I've had a look. I don't see anything suspicious on gpu side. No memory corruption. All tasks call stacks are valid with no sign of deadlock. Everything is idle waiting for a command. The gencmd hanging was suspicious. Quit the piomxtextures_pocplayer process and the gencmd completes. I wondered if this was due to too many gencmd clients (we only support 3). I tried rebuilding the firmware to support 10 gencmd clients and it still hangs. Checking, I am only getting one VCHI_CALLBACK_SERVICE_OPENED and a few VCHI_CALLBACK_MSG_AVAILABLE when launching piomxtextures_pocplayer. After piomxtextures_pocplayer has stalled, then gencmd causes a VCHI_CALLBACK_SERVICE_OPENED message to get though but no VCHI_CALLBACK_MSG_AVAILABLE which is why it hangs. When piomxtextures_pocplayer is quit I see the VCHI_CALLBACK_MSG_AVAILABLE and the two VCHI_CALLBACK_SERVICE_CLOSED messages. So, this feels like a vchiq deadlock issue, rather than an openmax issue. I think I need to call in @pelwell for vchiq help here. Phil is it possible you can look at this? I have an sdcard image that provokes the hang in a small number of seconds. |
OK. Leave it where I can look at it in the morning. |
I think I understand this now. In the logging I captured there were a number of places where VCHIQ - the driver, the user library or the queue utility that the ILCS client uses - was stalling because a queue was full, but I now think this was a symptom of the real problem, which is an ILCS limit on fhe number of concurrent outstanding requests coupled with a complication of allowing application callbacks to make further requests. When a message is sent that expects a response, the calling thread blocks until the response comes back. This is implemented by associating a semaphore with a transaction ID, and signalling the semaphore when a response with a matching transaction ID is received. For reasons of either convenience or resource limiting, there is a fixed number of these semaphores available (4). If a thread finds that all semaphores are in use then it must wait for one to become available due to the completion of an outstanding request. This mechanism would be simple and safe were it not for the fact that callbacks into applications are allowed to make requests. Such callbacks are made in the context of the server thread, and if the server thread is blocked then no responses are processed, thus the consequence of the server thread failing to get a slot is deadlock. The function in question (ilcs_execute_function_ex) pays lip service to the problem without actually doing enough to avoid it. My work around is for the server thread to try to claim the event in a non-blocking fashion, and on failure to process some messages before trying again. If the second attempt fails then it sleeps for 1 millisecond before going round the loop again. This logic has been enough to stop the test from stalling, without the need to increase the concurrency above 4. You can find the patch in this gist, to be applied against the userland repo. Build using the buildme script, and copy the build products from build/lib to /opt/vc/lib . |
With that patch applied the test has been running for 15 hours without stalling. Since there are going to be performance penalties caused by the server thread waiting and retrying, I'll also double the thread limit to 8 - the additional memory and search time is negligible. |
Thank you @pelwell for you work! I'll test your patch asap. This may explain why I was able to prevent a bit the lockup by calling omx only inside a mutex zone. |
I would say that the seek lockup isn't another manifestation of the same problem - I would expect to see at least four threads waiting in ilcs_execute_function_ex, when in the traces above there is only one. |
Do you have any advice on how I may be able to investigate that problem further? Are there other possible reasons why that thread may be stuck in ilcs_execute_function_ex? Any other idea off the top of your head? |
Sorry, no - yesterday was my first real skirmish with the OpenMAX code. My suggestion would be to fetch the userland repo, modify buildme or the CMake files to build a debug version (this is easily buildable on the Pi), and copy the contents of build/lib into /opt/vc/lib. That way, the backtrace should at least tell you which line of the source it was on. |
…m edges See: #463 firmware: di_adv: Add config setting to add nop delays to shader See: http://forum.kodi.tv/showthread.php?tid=231092&pid=2150605#pid2150605 firmware: vcilcs: Avoid a potential deadlock when very threaded See: #449 firmware: vrf: Add spinlock around vrf acquire/release calls to avoid restoring an invalid p10 from ISR context firmware: rpi_display: only ratelimit if the backlight is actually changed See: raspberrypi/linux#1179 firmware: di_adv: Support multiple instances of qpu deinterlace at SD resolution See: popcornmix/omxplayer#386 linux: rpi-ft5406: Use interruptible sleep to avoid high load reported See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=125034 linux: dts: Added overlay for Adafruit PiTFT 2.8 capacitive touch screen See: raspberrypi/linux#1192 linux: config: Add MCP320X See: raspberrypi/linux#1189 linux: Build i2c_gpio module and add a device tree overlay to configure it See: raspberrypi/linux#1183
…m edges See: raspberrypi/firmware#463 firmware: di_adv: Add config setting to add nop delays to shader See: http://forum.kodi.tv/showthread.php?tid=231092&pid=2150605#pid2150605 firmware: vcilcs: Avoid a potential deadlock when very threaded See: raspberrypi/firmware#449 firmware: vrf: Add spinlock around vrf acquire/release calls to avoid restoring an invalid p10 from ISR context firmware: rpi_display: only ratelimit if the backlight is actually changed See: raspberrypi/linux#1179 firmware: di_adv: Support multiple instances of qpu deinterlace at SD resolution See: popcornmix/omxplayer#386 linux: rpi-ft5406: Use interruptible sleep to avoid high load reported See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=125034 linux: dts: Added overlay for Adafruit PiTFT 2.8 capacitive touch screen See: raspberrypi/linux#1192 linux: config: Add MCP320X See: raspberrypi/linux#1189 linux: Build i2c_gpio module and add a device tree overlay to configure it See: raspberrypi/linux#1183
@pelwell's patch is in latest rpi-update libs. |
Thanks @popcornmix, I'm already testing it and I see there is a improvement in the second situation, but just like @pelwell said, the "seek test" still seems to manifest issues. But something may have changed, I'm following @pelwell's advice and I'll try to provide some more info assuming you are interested. |
We're always interested, it's just a matter of time and priorities. |
I perfectly understand time is a limited resource and that this issue may not be affecting many users as not exactly a "regular" use. I really appreciate your help. However I'm running more tests and what I can see is that the first of the two tests is now far better, but I can confirm I can lock the "seek test". This is the backtrace after a lockup that includes more info related to the userland libs: http://pastebin.com/Ek5P4y5q. What I see is that the situation is somehow similar, but we are probably in another case (vcos_thread_current() != &st->thread). Please note that in this case there is no mutex on my side before calling openmax. Not sure if this is relevant, but as you can see the main thread is itself also calling openmax to get the current position, in addition to the video and audio decoders. Also the media decoding thread (thread 13 in the backtrace), which is more or less what omxplayer.cpp does, may be calling openmax concurrently. Maybe I should be synchronising all these calls? As you probably read above, my code somehow integrates into omxplayer code and is built over it. I therefore enabled the logs from omxplayer code and this is the last portion of 130MB of logs: http://pastebin.com/Mv6idTHA. Not sure if these can be of any use. Of course this again can be a bug in my code, but it is difficult to say. Do you guys have any other advice? |
That backtrace looks like the ARM is waiting for responses from the VPU, while the server thread is ready to receive such responses. In other words, I think the problem is likely to be in the VPU. |
Thanks for your help. Is there anything I can do to somehow collect more information about this? May be a bug in my code and more information may help finding the bug... Can I debug the issue in any way? Any other advice? |
Increasing the limit is easy and can probably go in today. Apart from using slightly more memory it should have no negative consequences, and you've already tested it fairly thoroughly. Adding some sort of diagnostic output when the limit is reached needs more care. If I post a small patch, as before, would you be happy to test it with a reduced limit? |
ok. |
Try this patch:
You will get a series of messages to stderr as you reach and exceed each warning level, starting at the halfway mark:
The |
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902 bootcode: Ensure LED is switched off after halt on Pi3 firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock See: #449
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902 bootcode: Ensure LED is switched off after halt on Pi3 firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock See: raspberrypi/firmware#449
Hi @pelwell, So I am ok with both improvements you made
Whenever you can, pls release firmware with both improvements. THANKS. Case 1 Case 2 |
@pelwell, thanks for adding logging to show usage state of vchiu_queue. THANKS. |
It will probably go out this weekend. |
kernel: Add Support for BoomBerry Audio boards See: raspberrypi/linux#1397 kernel: Add support for the Digital Dreamtime Akkordion music player See: raspberrypi/linux#1406 kernel: Add support for mcp7940x family of RTC See: raspberrypi/linux#1397 firmware: vcilcs: Warn as message queue approaches fullness See: #449 firmware: dtoverlay: Copy overrides before applying firmware: dtmerge: Pack the merged DTB before writing firmware: arm_ldconfig: Fix detection of kernel8.img firmware: arm_loader: Enable DT by default, read addresses back from stub See: #579 firmware: ldconfig: Add [none] section as a convenience as config.txt filter firmware: pwm_sdm: Bugfixes See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445 firmware: gencmd: Add command to read current and historical throttled state
kernel: Add Support for BoomBerry Audio boards See: raspberrypi/linux#1397 kernel: Add support for the Digital Dreamtime Akkordion music player See: raspberrypi/linux#1406 kernel: Add support for mcp7940x family of RTC See: raspberrypi/linux#1397 firmware: vcilcs: Warn as message queue approaches fullness See: raspberrypi/firmware#449 firmware: dtoverlay: Copy overrides before applying firmware: dtmerge: Pack the merged DTB before writing firmware: arm_ldconfig: Fix detection of kernel8.img firmware: arm_loader: Enable DT by default, read addresses back from stub See: raspberrypi/firmware#579 firmware: ldconfig: Add [none] section as a convenience as config.txt filter firmware: pwm_sdm: Bugfixes See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445 firmware: gencmd: Add command to read current and historical throttled state
Latest rpi-update optionally includes the warnings. |
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902 bootcode: Ensure LED is switched off after halt on Pi3 firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock See: raspberrypi#449
kernel: Add Support for BoomBerry Audio boards See: raspberrypi/linux#1397 kernel: Add support for the Digital Dreamtime Akkordion music player See: raspberrypi/linux#1406 kernel: Add support for mcp7940x family of RTC See: raspberrypi/linux#1397 firmware: vcilcs: Warn as message queue approaches fullness See: raspberrypi#449 firmware: dtoverlay: Copy overrides before applying firmware: dtmerge: Pack the merged DTB before writing firmware: arm_ldconfig: Fix detection of kernel8.img firmware: arm_loader: Enable DT by default, read addresses back from stub See: raspberrypi#579 firmware: ldconfig: Add [none] section as a convenience as config.txt filter firmware: pwm_sdm: Bugfixes See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445 firmware: gencmd: Add command to read current and historical throttled state
@carlonluca Okay to close this? |
Yes, I think you can close this, thanks. |
I'm getting exactly these symptoms right now, even with the latest firmware. |
Many different bugs in the calling code can cause this symptom. Yours is probably not related to this github issue. |
You mention bugs in the calling code, can you give any hints so I can check ? |
…m edges See: raspberrypi#463 firmware: di_adv: Add config setting to add nop delays to shader See: http://forum.kodi.tv/showthread.php?tid=231092&pid=2150605#pid2150605 firmware: vcilcs: Avoid a potential deadlock when very threaded See: raspberrypi#449 firmware: vrf: Add spinlock around vrf acquire/release calls to avoid restoring an invalid p10 from ISR context firmware: rpi_display: only ratelimit if the backlight is actually changed See: raspberrypi/linux#1179 firmware: di_adv: Support multiple instances of qpu deinterlace at SD resolution See: popcornmix/omxplayer#386 linux: rpi-ft5406: Use interruptible sleep to avoid high load reported See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=125034 linux: dts: Added overlay for Adafruit PiTFT 2.8 capacitive touch screen See: raspberrypi/linux#1192 linux: config: Add MCP320X See: raspberrypi/linux#1189 linux: Build i2c_gpio module and add a device tree overlay to configure it See: raspberrypi/linux#1183
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902 bootcode: Ensure LED is switched off after halt on Pi3 firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock See: raspberrypi#449
kernel: Add Support for BoomBerry Audio boards See: raspberrypi/linux#1397 kernel: Add support for the Digital Dreamtime Akkordion music player See: raspberrypi/linux#1406 kernel: Add support for mcp7940x family of RTC See: raspberrypi/linux#1397 firmware: vcilcs: Warn as message queue approaches fullness See: raspberrypi#449 firmware: dtoverlay: Copy overrides before applying firmware: dtmerge: Pack the merged DTB before writing firmware: arm_ldconfig: Fix detection of kernel8.img firmware: arm_loader: Enable DT by default, read addresses back from stub See: raspberrypi#579 firmware: ldconfig: Add [none] section as a convenience as config.txt filter firmware: pwm_sdm: Bugfixes See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445 firmware: gencmd: Add command to read current and historical throttled state
Hello, what configs are necessary to activate thew "ILCS queue full" logs in the latest debian jessie? Is setting env var ILCS_WARN=1 enough? The firmware version i have is:
|
Yes, ILCS_WARN should be supported with that firmware |
Thanks for the answer. We are having a very similar issue (backtrace attached) but its very hard to reproduce. It just happens once in 2-3 days on a random PI (we are running aboout 30 pi's ). However, I enabled ILCS_WARN but there is no sign of queue being full. When this happens. the application can not be restarted and full system restart is needed (the second backtrace is taken after the restart when the app is stuck). Any idea how to debug this? What extra logs I can activate in order to gather more info about it? |
Not sure if this is a fault of my code or not, but I'm experiencing a hang in OMX_EmptyThisBuffer. It seems very similar to #134.
My project is open: https://github.com/carlonluca/pi. I've been reported hangs to happen very rarely when playing a single video file (including audio), it takes many days of continuous playback to reproduce. I had runs of more then 40 days without issues.
As you'll see in the backtrace, one thread is decoding video to texture using EGLImage, one thread is decoding audio and one thread asks OpenMAX for the current position in the media here. Part of the code is taken from omxplayer.
What seems to be the same issue can be reproduced in seconds when trying to decode multiple video files at the same time on Pi1. Pi2 instead seems to work better. The firmware version on Pi2 is:
The kernel version on Pi1:
Can't determine the GPU firmware version on Pi1 cause I'm keeping the locked "scene" under gdb and vcgencmd is not returning (would take other days to reproduce again), but the firmware should be the same of Pi2 (latest Raspbian image available on the Raspberry web site 2015-05-05).
Do you think this is a fault of my code or something firmware related?
This is a backtrace I got from the locked Pi1.
The text was updated successfully, but these errors were encountered: