Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMAL image_fx deinterlacer with Full HD video #1031

Closed
luiscgalo opened this issue Sep 3, 2018 · 19 comments
Closed

MMAL image_fx deinterlacer with Full HD video #1031

luiscgalo opened this issue Sep 3, 2018 · 19 comments

Comments

@luiscgalo
Copy link

luiscgalo commented Sep 3, 2018

Hi,
I'm trying to build an HDMI video recorder application with B101 module connected to Raspberry Pi3+.
The intended video format to be recorded, from a video camera, is 1080i50 (25 fps interlaced).
My prototype source code is almost working with the help of 6by9.
The prototype is available at https://github.com/luiscgalo/rpi-video-recorder

However I'm currently facing some issues related with the deinterlacer functionality of image_fx MMAL component where I see choppy motion (back and forward movements) on the produced video.
Basically, I'm trying to convert 25fps interlaced into 50fps progressive, encoding them with H264 encoder component.
At the moment, I'm using the following configuration on image_fx:

img_fx_param.effect = MMAL_PARAM_IMAGEFX_DEINTERLACE_ADV;
img_fx_param.num_effect_params = 4;
img_fx_param.effect_parameter[0] = 3; // interlaced input frame with both fields / top field first
img_fx_param.effect_parameter[1] = 0; // frame period (1000000 * 1 / 25);
img_fx_param.effect_parameter[2] = 0; // half framerate ?
img_fx_param.effect_parameter[3] = 1; // use QPU ?

Theoretically, this means that output frame rate is 50fps and indeed I see 2 output frames for each interlaced one sent to image_fx input port. However, the output frames seem to be choppy, and still representing a video feed at 25fps (not doubling the frame rate).
One important note is that, if I set "img_fx_param.effect_parameter[2]" to one (half frame rate), the output video is a clean stream at 25fps. However I want to duplicate the frame rate (25i to 50p) to have smooth movements on the recorded video.
Please refer to my last 2 posts on Raspberry Pi forum for more technical info: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=218928&sid=f2b3ac8939840c1b21ea3365931ae2b2&start=25#p1358983

Based on my tests, it seems that this issue is indeed related with the deinterlacer fucntionality of image_fx or by inappropriate configuration on my side.
I've discussed this a little bit with 6by9 but it seems that he his quite busy, without time to investigate this issue.

  1. The image_fx deinterlacer supports conversion of Full HD video, from 25i to 50p, right? If yes, can you please help me understanding why the deinterlacer is struggling when I request conversion from 25i to 50p?

  2. One additional question regarding "img_fx_param.effect_parameter[0]" parameter since it seems to support multiple values:

0 - The data is not interlaced, it is progressive scan
1 - The data is interlaced, fields sent separately in temporal order, with upper field first 
2 - The data is interlaced, fields sent separately in temporal order, with lower field first
3 - The data is interlaced, two fields sent together line interleaved, with the upper field temporally earlier
4 - The data is interlaced, two fields sent together line interleaved, with the lower field temporally earlier
5 - The stream may contain a mixture of progressive and interlaced frames (all bets are off).

Since I'm able to receive each top/bottom field individually, can set the parameter to "1" in order to avoid building a temporary frame by myself, containing both top and bottom fields? Currently I'm setting to "3" in my prototype app.

Thanks in advance for your help.

@popcornmix
Copy link
Contributor

I think you are just trying to do too much on the pi.
I believe the encode hardware is only designed for 1080p30.
The advanced deinterlacer is very expensive in processing and sdram bandwidth.
Switching to the simpler bob deinterlace will help, but I suspect that will also be too much.
I think it's very unlikely you'll be able to do both at once at 1080p.

I think you'll need to lower the framerate or resolution.

@luiscgalo
Copy link
Author

luiscgalo commented Sep 3, 2018

Hi popcornmix,
Thanks for your quick feedback. ;)

Well, in my prototype the RPi is just executing this application and running raspian strech image (with no graphical applications running which means that the GPU is dedicated to this task)
Describing briefly my data processing chain:
Rawcam (BGR24 top/bottom fields capture from B101 module) --> top/bottom field merge in software to create a temporary Full HD interlaced frame --> ISP component (BGR24 to I420 conversion) --> image_fx (deinterlacer) -->H264 video encode block

This setup is properly working, excluding the deinterlacer issue (image_fx component).
From my tests, both ISP and H264 video encode blocks are able to handle 1080p50.
Please note that my RPi 3B+ is overclocked and I have the GPU core running at 500MHz (with heat sink on it).

My current suspicion is that, for some unknown reason, the image_fx component is applying MMAL_PARAM_IMAGEFX_DEINTERLACE_FAST mode, even when I request MMAL_PARAM_IMAGEFX_DEINTERLACE_ADV (please refer to the settings that I have on my previous comment).

From previous conversations with 6by9 in RPi forum, it seems that "Advanced" mode with QPUs enabled is the only one which is capable of handling 1080p50 video.
However, in my case, with referred settings I'm seeing exactly the same problematic output as requesting "Fast" deinterlace mode.
I don't know exactly why this is happening and since there is almost no information about how the deinterlacer works it is quite difficult to debug this...

That's why I will appreciate your expertise to help me understanding if I'm configuring anything wrong or if there is any limitation related with the image_fx component for this task.

@6by9
Copy link

6by9 commented Sep 4, 2018

Try halving the input frame rate to produce 1080i25 by dropping (returning early) every other completed frame after the RGB merge. That would confirm if the deinterlacing is actually doing something wrong or is just overwhelmed.

You really are stressing memory bandwidth. Each 1920x1080 image is 3MB (24Mbit) for YUV or 6MB (48Mbit) for RGB. So:

  • rawcam is writing 1080i50, or 155MB/s (1.2GBit/s)
  • The ARM is reading and then writing the same.
  • The ISP is then reading 155MB/s and writing 77MB/s (622Mbit/s)
    image_fx is then needing to read at least 3*77MB/s, and write 155MB/s (doubled frame rate - 1.2Gbit/s)
  • Video encode then has to format convert the data first, so that's another 155MB/s read and write, and then the encode process which will be a read of at least 310MB/s (reading the new frame and reference frames once each).

Adding that up comes to around 697MB/s (5.5Gbit/s) of writes, and 1GB/s (8Gbit/s) of reading. I can't remember the spec for LPDDR2, but that is a lot of data being thrown around, fortunately most of it to/from dedicated hardware blocks. I think the only block other than memory that you have contention on is the ISP, as video_encode uses it for the format conversion as well as you using it for the RGB to I420 conversion.

@6by9
Copy link

6by9 commented Sep 4, 2018

I should say that I've only had 1080p50 encoding working on an almost otherwise idle Pi.
IIRC Rawcam throwing UYVY at video_encode, with perhaps the HVS displaying the frame as well.

@luiscgalo
Copy link
Author

luiscgalo commented Sep 4, 2018

Hi 6by9,
Thanks for your tip. I'll try your suggestion tonight (dropping half of interlaced frames to reduce frame rate, memory and GPU usage) and I'll post the results here.
You could be right and the problem here is indeed related with memory bandwidth usage...

I don't know if it helps but, if the image_fx deinterlacer supports "img_fx_param.effect_parameter[0]" parameter set to "1" instead of "3" (sending individual top/bottom fields instead of a full interlaced frame) I can remove the top/bottom merge mechanism which also consumes memory and processing time...
However I don't know how I can send individual top/bottom fields to the image_fx component.
At least I'm not able to find any example available on the Internet...

What do you think, it's a reasonable idea or does not make any sense?
If yes, how can I send individual top/bottom fields to the image_fx instead of a full frame?

@popcornmix
Copy link
Contributor

sdram_freq=450 => 900MHz DDR rate, 32 bits wide.
This is theoretical 3.6GB/s sdram bandwidth (but that includes refresh, commands, page open/close etc so usable bandwidth is lower).

And the advanced deinterlace uses 5 input fields to produce an output frame.
There's likely to also be display of framebuffer console (and possibly the video frame).

But the real issue is when there are multiple heavy users of sdram, the latency to memory goes up and so does the processing time. Video encode and deinterlace are already jobs that take significant time. They will both take much longer when done together.

@luiscgalo
Copy link
Author

Ok
So that means that this effect could result from combined usage of image_fx and H264 video encoder.
I should also try to modify my prototype without using the H264 video encode block (a simple test could be saving some I420 output frames from deinterlacer to the SD card) to check if the produced result is correct.

And, what about sending individual top/bottom fields to the image_fx (deinterlacer)?
According to the documentation available, it seems that if set "img_fx_param.effect_parameter[0]" to "1" I'm able to send individual fields. However I didn't saw any example on the Internet using this configuration...

Since rawcam component receives top/bottom fields individually from the CSI-2 bus, I'm able to directly send them to the ISP --> image_fx instead of creating a temporary Full HD interlaced frame which consumes more memory and processing time.

@popcornmix
Copy link
Contributor

The deinterlace component expects the two fields to be woven into a single frame.
I don't believe any other scheme is supported.

@luiscgalo
Copy link
Author

Hum... Ok
Thanks for the clarification.

I've asked that because on the documentation there are the following possible values for the "img_fx_param.effect_parameter[0]" (first parameter of deinterlacer configuration)

0 - The data is not interlaced, it is progressive scan
1 - The data is interlaced, fields sent separately in temporal order, with upper field first 
2 - The data is interlaced, fields sent separately in temporal order, with lower field first
3 - The data is interlaced, two fields sent together line interleaved, with the upper field temporally earlier
4 - The data is interlaced, two fields sent together line interleaved, with the lower field temporally earlier
5 - The stream may contain a mixture of progressive and interlaced frames (all bets are off).

On the Internet I've found examples with values "3" and "5".
That's why I was wondering if I could use value "1" for my application.

@luiscgalo
Copy link
Author

luiscgalo commented Sep 4, 2018

Another question just by curiosity: how applications like Kodi handle deinterlacing?
They support Full HD video deinterlacing in realtime, right? (H264 video decode plus image_fx as deinterlacer)

@popcornmix
Copy link
Contributor

Yes, but decode is simpler than encode + ISP.
And it needs care that everything is done with zero copy.
Any extra copies is pixel buffers is enough to make 50/60fps impossible.

@luiscgalo
Copy link
Author

luiscgalo commented Sep 4, 2018

Yes, it's true, decoding is simpler than encoding...
I will make some tests tonight having just Rawcam --> Top/Bottom field merge --> image_fx --> storage of some I420 frames to SD card (without using H264 encode block).
I will also test reducing the frame rate to see if it helps.
I will post the results here.

@6by9
Copy link

6by9 commented Sep 4, 2018

The video decoder already writes out interlaced content as the two interleaved fields hence avoiding any conversions. Deinterlace was generally intended to handle the video decoder, hence only supporting the interleaved format.
I don't know the history of the modules as to why the other formats had enums assigned to them, but I haven't seen any evidence that they're supported.

Passing the deinterlaced image to video_render and sticking it on the screen is probably the quickest and easiest way to assess whether the deinterlacing is doing something sensible. I think I did that on my branch of your project.
As another simple test, reduce the resolution as the frame goes through the ISP, but that may change things too much and not be a useful comparison.

@luiscgalo
Copy link
Author

luiscgalo commented Sep 4, 2018

I've modified a little bit 6by9 branch of my prototype application to present the deinterlaced (image_fx output) video directly on an HDMI screen connected to my RPi - i.e. without H264 encoder.
When changing that, I've noticed again some back and forward movements (it seems like flickering since it is realtime video and it is quite hard to distinguish what's happening).
Then I've tried inverting the order of top/bottom fields on rawcam part.
And... tcharam!!! The output video on screen is perfectly smooth now with 50fps output frame rate :)
I've pushed the changes to https://github.com/luiscgalo/rpi-video-recorder/tree/6by9/src in case you want to test this.

Based on this, I can conclude that one of my root problems was related with proper order of top/bottom fields sent to image_fx.
Unfortunately I've also corrected top/bottom fields order on my app version which records video using H264 encoding block and I'm still getting choppy video (movements are not fluid).

The challenge now will be to check/test if the RPi is able to do deinterlacing and H264 encoding at the same time.
Do you have any tips/recommendations to improve performance on this (overclock, optimizations, etc)?
I know that I'm pushing the RPi to its limits but this in this case the system is 100% dedicated to the video recording task (is needed we can cut/disable functionalities that could impact on performance and which are not needed for this application)

Thanks again for your valuable help 6by9 and popcornmix. ;)

@popcornmix
Copy link
Contributor

In terms of overclock sdram_freq is likely the most significant bottleneck and therefore the most important part of the overclock. After that probably core_freq, then v3d_freq (if using advanced deinterlace) and then h264_freq.
arm_freq probably won't have much effect on this (I'm hoping when this is running the arm is fairly lightly loaded - check with top).

Note: overclocking is not guaranteed. You'll probably get crashes when experimenting, which if you are unlucky could corrupt the sdcard. Backup before trying.

An extreme overclock would look like:

sdram_freq=600
sdram_schmoo=0x0200020
over_voltage_sdram=2
gpu_freq=500
over_voltage=4

But that will be unstable on many Pi's.
Perhaps start with sdram_freq=500 and run memtester (available from apt) to confirm it is stable. Then bump to 550 and test again. Then try 600. If you get failures back off the frequency a bit.

@luiscgalo
Copy link
Author

Ok,
I'll try to fine tune the overclock settings to check if it helps.
And yes, when running my application, the CPU is almost all the time in idle state (less then 4% usage by looking to "htop" output).
The majority of the work here is handled by the GPU (color conversion, deinterlacing and H264 encoding)...

@luiscgalo
Copy link
Author

luiscgalo commented Sep 6, 2018

I've made some more tests yesterday and now I've my prototype "recorder" almost working.
Now the movements are more fluid and at expected 50fps.
The latest version of my prototype code is available at https://github.com/luiscgalo/rpi-video-recorder

I think that I'm almost there and it is just a question of fine tuning the things on my source code...
I'm facing two final issues:

  1. The movements on produced video are now smooth and at expected frame rate (50fps). However, I'm seeing a strange problem where the image always has an up-down movement, toggling one pixel up and down in each produced frame. It is something quite similar to what seen with simple BOB algorithm. I don't know exactly where the problem is but I guess that it could be related with incorrect top/bottom field order and/or defect on my "merge" mechanism where I create a full frame from individual top/bottom fields. I have to investigate what's happening here.

  2. My second issue is related with performance of the pipeline... Basically I'm experience some situations where the ISP input rejects data (no buffers available), dropping some frames. The obvious conclusion is that the RPi is not achieving the performance needed to process all the data.
    During my tests, I saw a single frame drop for every 10 ... 15 processed ones.
    I'll have to check what I can improve here in order achieve realtime processing of the video feed.

If you are interested, you can checkout my latest version of the prototype code to see if you face the same issues. Maybe something comes up to your expert minds which could help me improving the performance of the application...
Excluding that, the application is working perfectly. :)
Thanks again in advance for all your tips and support.

@JamesH65
Copy link
Contributor

JamesH65 commented Jan 9, 2019

@luiscgalo can this now be closed?

@luiscgalo
Copy link
Author

Hi JamesH65,
Yes, I think that we can consider this issue as closed.

Summarising results for this topic:
From my experiments and in my particular situation (with H264 encoder running), I'm able to achieve the following results on the image_fx component acting as deinterlacer:
- 1080i50Hz to 1080p25 -> OK
- 1080i50Hz to 1080p50 -> NOK (no enought processing power)
- 720i50Hz to 720p25 -> OK
- 720i50Hz to 720p50 -> OK (requiring overclock)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@popcornmix @JamesH65 @6by9 @luiscgalo and others