Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmal: image_fx: Rendering delay #287

Closed
julianscheel opened this issue Jun 12, 2014 · 52 comments
Closed

mmal: image_fx: Rendering delay #287

julianscheel opened this issue Jun 12, 2014 · 52 comments

Comments

@julianscheel
Copy link

We stumbled into a problem with our mmal video_render plugin VLC, which I'd like to discuss.
The concept of the VLC video_output system is that the VLC core times the output of frames and calls display() at the exact point where the frame is to be shown. Optionally a pre-ordered prepare() call can be used to preprocess buffers (upload to GPU or whatever).
Now in our mmal plugin we use the mmal video_render element with the approach to render immediately what we feed into it through mmal_port_send_buffer.
Most of the time this works pretty good and not a single frame is lost. But sometimes it seems that the delta between our mmal_port_send_buffer call and the actual display of the frame can be several milliseconds or even tenth of milliseconds. This causes frames to be dropped because more than one frame is shown in a single display refresh cycle then. So we're thinking about approaches to avoid this issue.

The following options come to my mind:
a) Find a way to increase priority of handling image_fx input buffers, so that the delta between sending a buffer and rendering of the picture is constantly below ~10ms. I don't know if this is possible from GPU side at all, it just came to my mind as I recall there is some priority option for dispmanx update calls...
b) Prepend a scheduler to the video_render and pass some frames in advance to the video_render element along with their desired pts, so that the GPU can schedule them on it's own. I assume this is the desired approach from your side? It's just a little bit sub-optimal as we need to add some artificial delay into rendering to get this fitted into VLCs core concept.

Could you maybe comment on the approaches or propose further options if there are any I could not think of yet?

@julianscheel
Copy link
Author

To provide some statistics, which shows the percentage of differences between targeted display time and the actual point in time where mmal_port_send_buffer is called:

<  1ms: 93.77%
<  2ms: 3.44%
<  5ms: 2.68%
< 10ms: 0.11%
< 15ms: 0.00%
< 20ms: 0.00%
> 20ms: 0.00%

The statistic is calculated from the timestamps of 766498 displayed frames (roughly 255 minutes).

The video is 50i content which was deinterlaced by image_fx, so that the frame duration is 20ms. Hence a delay below 10ms should not cause a framedrop to my expectation.
There are actually periods watching the video where everything is smooth but then again there are sequences where a few frames seem to be dropped each one or two seconds.

@popcornmix
Copy link
Contributor

image_fx runs on second core of VPU with priority 8 (priorty 0 is highest).
video_decode runs on first core of VPU with priority 9.
video_scheduler runs on first core of VPU with priority 1.
video_render runs on first core of VPU with priority 1.

So I wouldn't expect it to be thread priority issue. The VPU has much better interrupt response/task switch time than linux. It is more likely things are happening late on the ARM side.

Yes, I would recommend using a video_schedule component and having a few frames queued up to protect against frame drops when ARM is busy.

I imagine that by using vcdbg you can extract a log with timestamps that shows when the video frame is received and rendered which may give some clues.
If you had a video_schedule component then you would also get messages when frames were late.

You may have to explain exactly how your media clock is generated, and how you schedule video frames compared to where vsync is. Is audio of video the master of the clock?

@julianscheel
Copy link
Author

Actually my logs show that the ARM has not dropped a frame at all in the time. So I don't think this is the problem.
I will try to read through the vcdbg logs to figure things out.

What might be the key though is what you write in your last sentence: The relation to vsync. In VLC the display of frames is scheduled related to the so-called wallclock, which on a linux-system is just gettimeofday(). The primary timing in VLC is the video timing, audio gets resampled to fit.

So the wallclock will be the clock which the Pi uses to provide the kernel time. If this one is not in sync with the HDMI clock it is likely that it will drift and at some point in time we will probably schedule frames right at the vsync boundaries. This can probably cause two frames being rendered in one vsync period and hence dropping it. Would you agree about it?
The question is if there is any chance to get to know when vsync happens in userspace?

If this is not possible there's probably no other chance than using the scheduler element.
Has the scheduler actually been tested with mmal? Seems there's no public code using it yet.
One generic question about it: In OMX we explicitly created a clock component and linked it to the scheduler input. In MMAL there seems no publicly accesible clock component at all. Is this auto-generated internally? And if so how could I control the clock? (ie setting the now-time related to our timestamps...)

@popcornmix
Copy link
Contributor

Yes, if wallclock is purely generated by gettimeofday then it will be asynchronous to HDMI clock (and hence vsyncs) as it comes from a different PLL.

The phase of submitted frames to vsync will slowly drift. As you start submitting frames close to vsync you will get dropped/duplicated frames. A small amount of jitter (e.g. a few ms) on present times will mean that you will get dropped/duplicated frames across tens of frames. You should notice periods of perfect video, alternating with periods of jittery video.

I believe the frames returned from video_schedule will be synchronous to (and just after) the vsync where that frame was replaced. I think plotting timestamps of this event would be illuminating.

I would suggest that VLC's wall clock should be derived from these callbacks.

@popcornmix
Copy link
Contributor

@6by9 may be able to answer the mmal/scheduler/clock questions.

@julianscheel
Copy link
Author

Yes, this is exactly how it behaves. Periods where it is absolutely perfect in sync and then periods where it drops/duplicates frames.Let me try analyzing the callback timings. I will post the results later.

@julianscheel
Copy link
Author

@popcornmix
The input_port_cb from video_render which returns the displayed pictures seems not to be related to vsync. It actually seems to always appear very quickly after sending the picture to the port.
I just recognized that you actually wrote that video_schedule should be synchronous to vsync not video_render. So probably there's no way to recognise when vsync happens without adding a scheduler element to the pipe?

@6by9
Copy link

6by9 commented Jun 13, 2014

If you're using ENCODING_OPAQUE, then as per image_fx, the sink component actually acquires the image underlying the opaque buffer, and returns the buffer immediately. So looking at the input_port_cb will be fairly meaningless.
AFAIK the scheduler components all work off a wall clock - none of them interface to VSYNC.

video_render automatically synchronises updates to the VSYNC to avoid tearing (it just updates with the last frame provided, throwing away any extras provided during the frame period). But if your updates are almost exactly hitting the VSYNC, then some updates will be a frame early, and others a frame late.
I'm having email conversations with popcornmix and others internally to get some ideas on VSYNC synchronisation.

@julianscheel
Copy link
Author

@6by9 Thank you for the details, VSYNC synchronisation really seems to be what we are missing. I grabbed more timing data and it is clearly no difference in the timinig behaviour between the phase with no frame drops and all and the phase where it starts to get sluggish. If we just had some kind of a vsync event we could wait for after we draw a frame this will probably avoid these issues.

@6by9
Copy link

6by9 commented Jun 13, 2014

Or a wall clock that is accurately synchronised to the VSYNC.
If your frame rate is not an exact multiple of the VSYNC rate (eg 24fps on a 50fps display), then there have to be frames that will either be duplicated an extra time (as in the 24vs50fps case), or sometimes dropped. I know there mechanisms for altering the VSYNC rate on HDMI, but I'm guessing those aren't being used.

A GPU-side scheduler will have much better timing accuracy than sending buffers across from the ARM. Even if we add an event that is produced on VSYNC, then you're still at the mercy of the Linux scheduler to pass that message to you for processing, so it won't necessarily help that much.

@popcornmix
Copy link
Contributor

@6by9 won't the video_render callback returning a frame be synced to vsync?
I assume it gets released from the dispmanx_update_submit callback (which is triggered by vync).

@6by9
Copy link

6by9 commented Jun 13, 2014

@popcornmix mmal_port_ buffer_send will deliver an opaque buffer to the input port.
The component then does a pool_image_acquire on the underlying image from the image pool. The buffer gets returned at this point as there is no need to hang on to it.
On the dispmanx_update_submit callback, pool_image_release will get called, but there is no interaction with a buffer.

If we were using ENCODING_I420, then yes the buffer would be returned after the dispmanx callback (but it would also involve a load of VCHIQ transfers).

@julianscheel
Copy link
Author

Just a thought: Can't I trigger a dispmanx update from my side manually to get the sync time?
Something like this: Every n-th frame I do a dispmanx_update_sync for some empty surface and wait for it to finish which should be right after vsync?
Not exactly a pretty solution, but if it gives us access to the vsync time it would be worth a try.

@popcornmix
Copy link
Contributor

@6by9 makes sense. A vsync related callback message would be useful.

@julianscheel
omxplayer/xbmc uses an additional mechanism for vsync.
You can enable a mode in video_render where if the period of frames submitted (averaged over a couple of seconds) is close enough to vsync rate (~0.1%) we adjust the HDMI pixel clock up/down at ensure scheduled frames occur in middle of vsync intervals.

See: https://github.com/popcornmix/omxplayer/blob/master/OMXVideo.cpp#L243

You might want to try this, but I'm not sure if it will work with your set up,
and I suspect deriving wall clock from vsync is actually the scheme that will fit VLC better.

Obviously it only makes sense when hdmi mode matches video framerate.
(also are you controlling setting PAL/NTSC framerates to get correct 24/23.97?)

@popcornmix
Copy link
Contributor

@julianscheel
Yes, using dispmanx to find occurrence of vsync should work.
A bit ugly, but may be okay if you are using dispmanx anyway (e.g. for subs).

@julianscheel
Copy link
Author

@popcornmix
We're using dispmax for subtitles already, so this sounds doable.

Regarding the latencytarget stuff: This seems like a good option either, but will it work with video_render standalone without a video_scheduler? And secondly, can it be configured through MMAL? I haven't found matching definitions in the headers.

@6by9
Copy link

6by9 commented Jun 13, 2014

OMX_IndexConfigLatencyTarget (which maps to MMAL_PARAMETER_AUDIO_LATENCY_TARGET just to be a little confusing!) is supported by the IL audio_render, video_render, and clock components.
So yes it should be supported by video_render when used via MMAL and it doesn't rely on a scheduler.

@julianscheel
Copy link
Author

@6by9 Ah, thank you.
The latency is automatically relative to HDMI vsync when applying MMAL_PARAMETER_AUDIO_LATENCY_TARGET to video_render port?
And the values from https://github.com/popcornmix/omxplayer/blob/master/OMXVideo.cpp#L243 are sane? :)

@6by9
Copy link

6by9 commented Jun 13, 2014

@popcornmix is going to know more about what the parameter does (I'm mainly a camera, video encode, and image encode person!), but yes, the structures are just copied a field at a time from the MMAL to OMX structures to the fields with almost the obviously linked name.

@popcornmix
Copy link
Contributor

Yes, the values specified are sane.
If you add "enable_hdmi_status=1" to config.txt and run:
vcgencmd hdmi_status_show 1
you should see an overlay with HDMI debug info.
The bars along bottom describe:
match (how close period is correlated to vsync - should be above green marker for good sync),
period (measure of period of frames)
phase (phase of rendered frames compared to vsync - should be in 25%-50% range)
clock (hdmi pixel clock. green marker is nominal, but may increase or decrease to manage phase)

Try with "omxplayer -r" to see what it should look like when working.

@julianscheel
Copy link
Author

@popcornmix Thanks, the hdmi_status_show seems really useful!
Have it enabled now and the values look quite exactly like I'd expect after your explanations :)
I will let it run for a while now and notify you if the frame drops are solved with this.

Thank you very much for all your efforts!

@julianscheel
Copy link
Author

Things work very good so far with latency target. So I'll close the bug for now. Thank you again!

@julianscheel
Copy link
Author

@popcornmix I have one more question abou the latency target. When we play streams with a http source instead of rtsp/udp which we normally do, the phase seems to start at a random point of the bar.
If this one is close to the end of the bar we get frame drops/dups.

I expected the phase to basically be what is configured through the target parameter. Is that correct?
Is there some initial sync happening to try to hit that target parameter? Or will it start at a random point and then try to smoothly shift towards hitting the target?

@julianscheel julianscheel reopened this Jun 18, 2014
@popcornmix
Copy link
Contributor

Afraid the initial phase is random, so when it is right on the vsync you will get frame drops/dupes for a number of seconds at start.

When video_schedule and video_render are tunnelled together we do detect this and request video_schedule to adjust its schedule times by half a vsync until the phase in a safe area. That means you get one frame drop when adjusting one way, and no frame drops the other way.

If you add the video_schedule component you should get this behaviour.

I'll have a think about whether video_render can handle this half vsync adjustment itself, but I think the reason it was done by video_schedule was that video_schedule needed to know it had happened (can't quite remember what the issue was - possibly a danger of frames being reordered when the adjustment is being added or removed, e.g. with a 24Hz display and 60fps video).

@julianscheel
Copy link
Author

Ok, I'll try to achieve an initial sync with the approach of using vc_dispmanx_update_submit_sync.
But having this integrated into video_renderer automatically would be great. If there are corner cases (ie frequency missmatch) where it would not work we could enable it only explicitly when we can ensure that all preconditions are met.

@julianscheel
Copy link
Author

It turns out doing an inital sync with vsync in VLC is much more complicated than I had thought. So if you see a chance to get this handled in video_render it would be highly appreciated.

@julianscheel
Copy link
Author

Is the algorithm dynamically deciding which direction to shift phase to hit the target?
Actually it seems as if in a case where the initial phase is really close to vsync, the bar is just jumping between almost full and almost empty without evolving towards the desired target at all.

@julianscheel
Copy link
Author

@popcornmix I think I managed to implement a proper initial sync in VLC now. It's a bit hackish, so I'd still prefer if mmal did it magically on it's own but at a first glance it seems to work this way. I will test it today and give an update with test results here.
Regarding the case where the filter did not manage to work towards target at all: It seems this was cause by a fault in my code which und rare circumstances caused all frames to be drawn immediately instead of waiting for their desired display time.

@julianscheel
Copy link
Author

Thanks, this would be awesome

popcornmix added a commit that referenced this issue Jun 26, 2014
effects: Initial code for 3d sbs to anaglyph

video_decode: Fix nBufferSize setting getting lost
See: #240 (comment)

video_render: Add command for querying render statistics
See: #287 (comment)

hvs: Add gencmd to configure only updating display list on odd/even field
See: #292
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Jun 26, 2014
effects: Initial code for 3d sbs to anaglyph

video_decode: Fix nBufferSize setting getting lost
See: raspberrypi/firmware#240 (comment)

video_render: Add command for querying render statistics
See: raspberrypi/firmware#287 (comment)

hvs: Add gencmd to configure only updating display list on odd/even field
See: raspberrypi/firmware#292
@popcornmix
Copy link
Contributor

This is added in latest firmware.
Most elements in the OMX_CONFIG_BRCMRENDERSTATSTYPE are from the hdmi status overlay.
nHvsStatus is a HVS status register. Bit 27 is the active field when last frame was rendered (0=even, 1=odd).

@julianscheel
Copy link
Author

This sounds promising. Is it possible to query OMX_CONFIG_BRCMRENDERSTATSTYPE via mmal as well?

@popcornmix
Copy link
Contributor

My understanding was yes, mmal will behave the same. (But I've not tested it).

@julianscheel
Copy link
Author

How would I send a OMX_IndexConfig type to a MMAL element? For OMX I assume I would use OMX_SetConfig, but I don't see how I could use that on an MMAL element?

@popcornmix
Copy link
Contributor

I'm not sure, I'll need to check with @6by9.

@6by9
Copy link

6by9 commented Jun 27, 2014

I'm pointing @popcornmix to the appropriate place in the GPU source. There are a bunch of translation functions and the relevant entries need to be added.

@6by9
Copy link

6by9 commented Jun 27, 2014

@popcornmix is out of the office today, so I've pushed a test firmware and the updated header to https://github.com/6by9/RPiTest/tree/master/render_stats. I'm afraid I haven't been able to do any testing on it though.

@julianscheel
Copy link
Author

@6by9 Thanks a lot, I'll give it a try. One more question: Could you give some description on the units of the parameters? Especially of the phase parameter.
Is it time in us between sync and frame draw? Or some relative measurement? I need to understand that to properly translate it into a pts offset to apply in VLC.

@popcornmix
Copy link
Contributor

I think match was out of 1000000 (with 1000000 being perfectly matched)
Period is HDMI vsync period in microseconds
Phase is offset from vsync in microseconds
PixelClockNominal is the standard pixel frequency going out to display (in Hz)
PixelClock is zero if there has been no adjustment. After adjustment it is adjusted frequency in Hz

@julianscheel
Copy link
Author

Thank you, that's perfect. Will let you know if it works well.

@julianscheel
Copy link
Author

@6by9 Shouldn the MMAL_PARAMETER_VIDEO_RENDER_STATS_T struct start with a header element: MMAL_PARAMETER_HEADER_T hdr; ?
mmal_parameter_get requires a pointer to a MMAL_PARAMETER_HEADER.

@6by9
Copy link

6by9 commented Jun 27, 2014

Oops, yes. Sorry about that. Will try to get the test firmware and headers updated tomorrow - trying them now will almost certainly crash things.

@julianscheel
Copy link
Author

Alright. Thank you :)

@6by9
Copy link

6by9 commented Jun 28, 2014

Having had another think, the GPU doesn't actually use that structure directly (there's a reason that the parameter layout and sizing is the same as the IL parameter!), so you should be able to just add MMAL_PARAMETER_HEADER_T hdr to MMAL_PARAMETER_VIDEO_RENDER_STATS_T in your headers and it should work. I'll update the official headers too, but that shouldn't block you testing.

@julianscheel
Copy link
Author

Sorry had some interruption on this. Back on it now. On which port shall I be able to query the render stats? I tried it on input and control port of the video_render component but only get a status 3, EINVAL back.

This is what my code looks like:

static void detect_vsync(vout_display_t *vd)
{
    MMAL_PARAMETER_VIDEO_RENDER_STATS_T render_stats = {
        .hdr = { MMAL_PARAMETER_VIDEO_RENDER_STATS, sizeof(render_stats) },
    };
    vout_display_sys_t *sys = vd->sys;
    MMAL_STATUS_T status;

    status = mmal_port_parameter_get(sys->input, &render_stats.hdr);
    if (status != MMAL_SUCCESS) {
        msg_Err(vd, "Failed to read render stats on input port %s (status=%"PRIx32" %s)",
                        sys->input->name, status, mmal_status_to_string(status));
        return;
    }

    if (render_stats.valid) {
        msg_Dbg(vd, "render_stats: match: %u, period: %u us, phase: %u us, hvs: %u",
                render_stats.match, render_stats.period, render_stats.phase,
                render_stats.hvs_status);
    } else {
        msg_Warn(vd, "could not read valid render_stats");
    }
}

And it leads to Failed to read render stats on control port vc.ril.video_render:in:0(OPQV) (status=3 EINVAL)

@popcornmix
Copy link
Contributor

From openmax it is on the input port (90).

@julianscheel
Copy link
Author

@6by9 Have you had a change to give it a try with your testing firmware yourself? I can't seem to get it working.

popcornmix added a commit that referenced this issue Jul 5, 2014
kernel: conifg: Add CONFIG_DEVPTS_MULTIPLE_INSTANCES
See: raspberrypi/linux#603

firmware: dispmanx: Fix snapshot scaling when a fullscreen dest rect is used
See: #295

firmware: dispmanx: allow rotations into 24-bit memory displays
See: #267

firmware: Add MMAL parameter for render stats
See: #287

firmware: image_decode: Move decode thread to second core
This improves jpeg decode time a little

firmware: imagefx: Separate out the fast 1080i deinterlace algorithm
As the 1080i deinterlace doesn't require the 3 frames of context we can save ~6MB by requesting it explicitly
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Jul 5, 2014
kernel: conifg: Add CONFIG_DEVPTS_MULTIPLE_INSTANCES
See: raspberrypi/linux#603

firmware: dispmanx: Fix snapshot scaling when a fullscreen dest rect is used
See: raspberrypi/firmware#295

firmware: dispmanx: allow rotations into 24-bit memory displays
See: raspberrypi/firmware#267

firmware: Add MMAL parameter for render stats
See: raspberrypi/firmware#287

firmware: image_decode: Move decode thread to second core
This improves jpeg decode time a little

firmware: imagefx: Separate out the fast 1080i deinterlace algorithm
As the 1080i deinterlace doesn't require the 3 frames of context we can save ~6MB by requesting it explicitly
@6by9
Copy link

6by9 commented Jul 7, 2014

I was out of the office all last week - sorry.
Just tried it. Error visible with "vcdbg log msg" - "204456.932: mmal: mmal_ril_param_get_generic: RIL data larger than MMAL data (-12 > 36)"
The OMX structure hadn't been properly completed by the component and it was writing an incorrect size and version. Just amending that now and testing.

@6by9
Copy link

6by9 commented Jul 7, 2014

Seems to return sensible numbers, so I've updated the test image on my github account. The changes have been pushed to the internal server, so should be in the next release.

@julianscheel
Copy link
Author

Thank you, this works well now. Will leave the ticket open until merged into the official firmware.

neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
effects: Initial code for 3d sbs to anaglyph

video_decode: Fix nBufferSize setting getting lost
See: raspberrypi#240 (comment)

video_render: Add command for querying render statistics
See: raspberrypi#287 (comment)

hvs: Add gencmd to configure only updating display list on odd/even field
See: raspberrypi#292
neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
kernel: conifg: Add CONFIG_DEVPTS_MULTIPLE_INSTANCES
See: raspberrypi/linux#603

firmware: dispmanx: Fix snapshot scaling when a fullscreen dest rect is used
See: raspberrypi#295

firmware: dispmanx: allow rotations into 24-bit memory displays
See: raspberrypi#267

firmware: Add MMAL parameter for render stats
See: raspberrypi#287

firmware: image_decode: Move decode thread to second core
This improves jpeg decode time a little

firmware: imagefx: Separate out the fast 1080i deinterlace algorithm
As the 1080i deinterlace doesn't require the 3 frames of context we can save ~6MB by requesting it explicitly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants