Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoCore(?) crash with more than eight windowed intsances on Raspberry Pi #63

Open
stewiem2000 opened this issue Sep 26, 2013 · 9 comments

Comments

@stewiem2000
Copy link

Using a 512MB Raspberry Pi Model B with a 256MB GPU allocation and an HDMI-connected screen, I can easily run eight omxplayer instances, each using the --win parameter, pulling in a 320x240, 5-FPS, h264 RTSP stream, but launching the ninth, or sometimes the tenth, instance will invariably cause some kind of VideoCore lockup -- either the previous streams freeze, or the display goes black. If the ninth instance failed, then the rest of the Pi seems works fine, but I need to reboot it to do anything with the VideoCore. If the tenth instance fails, the whole Pi is inaccessible.

I'm up-to-date with the latest Pi firmware/software and have tried using prebuilt binaries from omxplayer.sconde.net and even cross-compiling the latest GitHub version (3e8d718).

I am aware that this could be a hardware limitation, but would have hoped for a setup-failure rather than lock-up/crash if that were the case.

Here is what's logged from the process before it crashes.

DEBUG: DllOMX: Using omx system library
DEBUG: Previous line repeats 1 times.
DEBUG: DllAvFormat: Using libavformat system library
DEBUG: DllAvUtilBase: Using libavutil system library
DEBUG: DllAvCodec: Using libavcodec system library
DEBUG: DllAvFormat: Using libavformat system library
DEBUG: COMXPlayer::OpenFile - avformat_open_input rtsp://server/stream
DEBUG: COMXPlayer::OpenFile - avformat_open_input enabled SEEKING
DEBUG: COMXCoreComponent::Initialize : OMX.broadcom.clock handle 0x1498760 dllopen : 1
DEBUG: COMXCoreComponent::Initialize OMX.broadcom.clock input port 80 output port 81
DEBUG: DllAvUtilBase: Using libavutil system library
DEBUG: DllAvCodec: Using libavcodec system library
DEBUG: DllAvFormat: Using libavformat system library
DEBUG: DllOMX: Using omx system library
DEBUG: Previous line repeats 9 times.
DEBUG: COMXCoreComponent::Initialize : OMX.broadcom.video_decode handle 0x14672b0 dllopen : 1
DEBUG: COMXCoreComponent::Initialize OMX.broadcom.video_decode input port 130 output port 131
DEBUG: COMXCoreComponent::AllocInputBuffers component(OMX.broadcom.video_decode) - port(130), nBufferCountMin(1), nBufferCountActual(60), nBufferSize(81920), nBufferAlignmen(16)

@popcornmix
Copy link
Owner

It may not be the last one that locks it up. I'm guessing one of the instances may have returned an OMX_ErrorInsufficientResources. You may need to check the logs form all processes.

The display going black may well have been a memory bandwidth causing underflow to the composition engine.
You can mitigate that with "dispmanx_offline=1" in config.txt at some extra memory and processing cost.

But in reality you are pushing the Pi to the limit. While the intention is that everything fails gracefully, it is very difficult to guarantee that in all circumstances once memory is exhausted.

We can add a limit (e.g. to two video players) if that is desired.

@stewiem2000
Copy link
Author

Thanks for your response!

That's a good point about it not necessarily being the last process that's "crashed"; I'll see what I can do to log all of them. I'll also try adding the "dispmanx_offline=1" config line.

Mind you, would an ErrorInsufficientResources result in the VideoCore being inaccessible once this has happened, even after all omxplayer instances have been (gracefully) stopped? Or is it that something's crashed and so not being cleanly torn-down to a usable state?

Limits: I'm basically trying to get as many as 16 streams flowing. I didn't know exactly what/where the limits lie, so this is proving interesting finding out :) I must admit that I had (optimistically) hoped that since the Pi could decode a 1080p file OK, it'd be able to cope with several smaller resolution files, at lower frame-rates - though I appreciate there's overhead involved in multiple processing too.

I have had a look though the code to see if I could, in some way, push two decoders' output through one renderer as a way to reduce the number of "blocks" in the flow-diagram. Do you think this is even viable?

Thanks.

@perrypoint
Copy link

The way h.264 works is that each frame is stored as deltas (changes) compared to previous frames and/or following frames. The hardware h.264 decoder is thus storing a pipeline of multiple frames at once that may be in different stages of completeness of decoding - referencing other frames in the pipeline. That number of frames varies by the nature of the content and depends on many factors, including the GOP size (group of pictures) for the encoded stream. The way OMX works is as a pipeline of components - one component is the h.264 decoder and the last component being the renderer. The renderer would likely hold a relatively small number of completed frames waiting to be presented compared to everything else going on - so my guess is that even if you composited a bunch of those smaller frames onto a single large frame for rendering it wouldn't save a lot. You may get an extra stream or two, but you would also have more work compositing the output in software prior to rendering - so you could also get fewer streams. The final render is a trivial operation compared to decoding h.264.

@popcornmix
Copy link
Owner

It's the buffers rather than the components that use the memory, so I don't think trying to share a renderer would help much (or even work).

For your use case setting more than 256M (e.g. 384M) may be sensible. At least if that supports more components then it will prove it is gpu memory you are exceeding.

ErrorInsufficientResources means an allocation has failed and the host has been informed.
Shutting down from there should be safe, in theory.
However when really low on memory, you may find that the act of shutting down requires an allocation and then things can go wrong.
When running multiple instances you are more likely to still be trying to allocate when other components are shutting down and it's hard to protect against all possible failure conditions.

@stewiem2000
Copy link
Author

perrypoint: That's a fair point. My thought was that it might have been a sync'ing issue between the (too) many rendering pipelines - i.e. that decoding wasn't the problem, but that trying to get nine/more stream's frames into the video buffer was casuing locking contention/issues.

popcornmix: I had previously had 128MB before the 256MB, but will try 384MB. Is there any way to find out how much bufferspace has been allocated / is left? Woudl they all be allocated from the 'GPU slice'?

@popcornmix
Copy link
Owner

vcgencmd get_mem gpu shows the total. For current info:
sudo vcdbg reloc
sudo vcdbg malloc
shows info on gpu memory allocations (and what is free).
Be aware that these calls are purely for debug info, and unsafely parse the malloc/reloc lists through a non-cache coherent memory view. It will sometimes report that memory is corrupted or otherwise give an inconsistant state.
Running "vcgencmd cache_flush" beforehand will improve the chances of a successful read.
Two similar reads in a row should be fairly convincing.

@stewiem2000
Copy link
Author

Hello. Sorry for the radio-silence; Real Life getting in the way! ;)

I tried bumping the GPU allocation to 384MB as suggested. It's still crashed with nine feeds though.

If I run 'vcgencmd get_mem gpu' before launching omxplayer instances, I get the expected 'gpu=384M'; however, after I've luanched them, it fails with:
vc_gencmd_send returned -1
vchi_msg_dequeue -> -1(22)

'vcdbg reloc' ran OK (no errors reported) even with several runs; it seems to suggest there's a good chunk of RAM available:
Relocatable heap version 4 found at 0x8000000
total space allocated is 364M, with 362M relocatable, 2.3M legacy and 0 offline
1 legacy blocks of size 2359296
free list at 0x90e7ba0
269M free memory in 7 free block(s)
largest free block is 269M bytes
....
0xdc95360: free 269M
0x1e9c0000: legacy block 2.3M
small allocs not requested

I've only pasted the top and tail above, if you need more, please say; however, 1527 entries were along the lines of the following, mostly using 80K or 1.0K:
[1698] 0xd9a8060: used 80K (refcount 1 lock count 0, size 81920, align 32, data 0xd9a8080, d1ruAl) 'RIL buffer'

'vcdbg malloc' top and tail below, with the same number, 1527, of entries similar to '0x1f4ab080 = malloc(72) [RIL buffer header]'
Pool 0x9f2aefd8 (1ee3c140)
Malloc pool size=13M (pool=0x1f2aefd8-0x1ff7ff2c)
....
Malloced:3.7M Remaining:9.1M

So, if I'm reading this right, there's plenty of GPU-RAM available and I could probably easily scale the allocation back to 256MB or even 128MB...?

Interestingly, I left the processes running whilst composing this email and am seeing periodic batches of the following on the console:
smsc95xx 101.1:1.0: eth0: kevent 2 may have been dropped

Also, there's the occasional omxplayer.bin termination due to insufficient memory and often after blocking:
INFO: task omxplayer.bin:1769: blocked form more than 120 seconds.
omxplayer.bin D c03a0214 1849 1745 0x000000005
...
Killed proccess 1769 (omxplayer.bin) total-vm:104740kB, anon-rss:10332kB, file-rss:616kB.

@stewiem2000
Copy link
Author

As a quick follow-up: having decreased the GPU-allocation back down to 192MB, I've not seen either the kevent-2 nor omxplayer.bin out-of-memory errors.

@Ruffio
Copy link

Ruffio commented Jun 17, 2015

@popcornmix I believe that this issue can be closed as a solution has been found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants