Fast-Path performance issue for YUV! Need "packed" GL_RGBA instead of GL_LUMINANCE #427

tschesnok · 2017-09-29T01:57:54Z

Pardon me if this is the wrong place since this is not a bug but rather an important performance suggestion. All other sites seem to have very old discussion on them so I'll try here:

The problem with EGL_IMAGE_BRCM_MULTIMEDIA_Y is that it gets linked to a GL_LUMINANCE texture via the fast-path direct GPU GL texture access. GL_LUMINANCE however is not all that useful from a performance perspective. It would be much more useful to be able to get the YUV component channels delivered via a GL_RGBA type of texture at 1/4 the x dimension (i.e. pack 4 Luminance values into a single RGBA pixel). This way I can load 4 values in a shader with a single texture read call. It is 4x faster!!! ( GL_LUMINANCE does not get optimized over GL_RGBA on a per-texture read basis on the PI).

Put another way - I get about the same performance saving the YUV buffer to CPU memory and packing them into a GL_RGBA texture that is 1/4 the size for further processing. Imagine running a sobel filter with 1/4 the pixel reads required in a shader.

So fast-path seems to offer no benefit for YUV since you have to make a copy anyways to pack them into an RGBA texture for better performance down the line.

Or perhaps I'm a moron and don't know how to trick ES2.0 to use a GL_LUMINANCE texture as a GL_RGBA texture? I would be super thankful if someone had some ideas for a workaround on this...

In any case I would think that this is an easy feature to implement and it will increase FPS - especially on a Sobel type of filter.

And feel free to yell at me if this is not the right place for this feature request. I'll remove it.

popcornmix · 2017-09-29T08:58:12Z

We are moving from using the firmware side graphics driver to the arm side graphics driver.
It might be interesting to test the performance of the ARM side driver and what features are supported.
This is enabled through dtoverlay=vc4-fkms-v3d (or using the menu option in raspi-config).
As such we are very unlikely to add new features to the firmware driver, but it may be possible with the arm side driver.

tschesnok · 2017-09-29T11:12:02Z

Wow. I had no idea. Where do I go for information on this? All I can read is that once turned on everything 3D breaks. I'm using GLES2.0. What is the migration process for developers? What changes are coming for MML and camera access? (Sorry - I know you are all busy and any help is much appreciated and not required)

popcornmix · 2017-09-29T13:40:25Z

The arm side driver is a standard mesa driver. Any standard linux app that uses opengl or opengles should work without specific Pi porting (e.g. you can apt-get install neverball and the standard debian package will wok).
So, in general you don't want the pi specific adaptations (the dispmanx calls) and you should link with /usr/lib/arm-linux-gnueabihf/libEGL.so / /usr/lib/arm-linux-gnueabihf/libGLESv2.so rather than the firmware versions in /opt/vc/lib.

The initial announcement was long ago. Searching for vc4-kms-v3d (and vc4-fkms-v3d which is probably a better choice for now) should get lots of info.

Currently integration with mmal/camera is in progress but not released.

lagurus · 2018-03-06T15:19:03Z

You can also "pack" it yourself in "compress" shader and then use glReadPixels with (width / 4)

Shader should look like:


varying vec2 tcoord;
uniform sampler2D tex;
uniform vec2 fTexelSize;

void main(void) 
{
    vec2 coord = vec2( (4.0 * floor(gl_FragCoord.x) + 0.5) * fTexelSize.x, (1.0 * floor(gl_FragCoord.y) +0.5 ) * fTexelSize.y);
	
    float pt_x = texture2D(tex, coord).r;
    float pt_xx = texture2D(tex, coord+vec2(fTexelSize.x, 0.0)).r;
    float pt_xxx = texture2D(tex, coord+vec2(2.0*fTexelSize.x, 0.0)).r;
    float pt_xxxx = texture2D(tex, coord+vec2(3.0*fTexelSize.x, 0.0)).r;
	
    gl_FragColor.r = pt_x;
    gl_FragColor.g = pt_xx;
    gl_FragColor.b = pt_xxx;
    gl_FragColor.a = pt_xxxx;
}

JamesH65 · 2018-10-29T12:46:25Z

Closing due to lack of activity. Please request to be reopened if you feel this issue is still relevant.

JamesH65 added the Close within 30 days Issue will be closed within 30 days unless requested to stay open label Jul 2, 2018

JamesH65 closed this as completed Oct 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast-Path performance issue for YUV! Need "packed" GL_RGBA instead of GL_LUMINANCE #427

Fast-Path performance issue for YUV! Need "packed" GL_RGBA instead of GL_LUMINANCE #427

tschesnok commented Sep 29, 2017

popcornmix commented Sep 29, 2017 •

edited

Loading

tschesnok commented Sep 29, 2017

popcornmix commented Sep 29, 2017

lagurus commented Mar 6, 2018

JamesH65 commented Oct 29, 2018

Fast-Path performance issue for YUV! Need "packed" GL_RGBA instead of GL_LUMINANCE #427

Fast-Path performance issue for YUV! Need "packed" GL_RGBA instead of GL_LUMINANCE #427

Comments

tschesnok commented Sep 29, 2017

popcornmix commented Sep 29, 2017 • edited Loading

tschesnok commented Sep 29, 2017

popcornmix commented Sep 29, 2017

lagurus commented Mar 6, 2018

JamesH65 commented Oct 29, 2018

popcornmix commented Sep 29, 2017 •

edited

Loading