Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast-Path performance issue for YUV! Need "packed" GL_RGBA instead of GL_LUMINANCE #427

Closed
tschesnok opened this issue Sep 29, 2017 · 5 comments
Labels
Close within 30 days Issue will be closed within 30 days unless requested to stay open

Comments

@tschesnok
Copy link

Pardon me if this is the wrong place since this is not a bug but rather an important performance suggestion. All other sites seem to have very old discussion on them so I'll try here:

The problem with EGL_IMAGE_BRCM_MULTIMEDIA_Y is that it gets linked to a GL_LUMINANCE texture via the fast-path direct GPU GL texture access. GL_LUMINANCE however is not all that useful from a performance perspective. It would be much more useful to be able to get the YUV component channels delivered via a GL_RGBA type of texture at 1/4 the x dimension (i.e. pack 4 Luminance values into a single RGBA pixel). This way I can load 4 values in a shader with a single texture read call. It is 4x faster!!! ( GL_LUMINANCE does not get optimized over GL_RGBA on a per-texture read basis on the PI).

Put another way - I get about the same performance saving the YUV buffer to CPU memory and packing them into a GL_RGBA texture that is 1/4 the size for further processing. Imagine running a sobel filter with 1/4 the pixel reads required in a shader.

So fast-path seems to offer no benefit for YUV since you have to make a copy anyways to pack them into an RGBA texture for better performance down the line.

Or perhaps I'm a moron and don't know how to trick ES2.0 to use a GL_LUMINANCE texture as a GL_RGBA texture? I would be super thankful if someone had some ideas for a workaround on this...

In any case I would think that this is an easy feature to implement and it will increase FPS - especially on a Sobel type of filter.

And feel free to yell at me if this is not the right place for this feature request. I'll remove it.

@popcornmix
Copy link
Contributor

popcornmix commented Sep 29, 2017

We are moving from using the firmware side graphics driver to the arm side graphics driver.
It might be interesting to test the performance of the ARM side driver and what features are supported.
This is enabled through dtoverlay=vc4-fkms-v3d (or using the menu option in raspi-config).
As such we are very unlikely to add new features to the firmware driver, but it may be possible with the arm side driver.

@tschesnok
Copy link
Author

Wow. I had no idea. Where do I go for information on this? All I can read is that once turned on everything 3D breaks. I'm using GLES2.0. What is the migration process for developers? What changes are coming for MML and camera access? (Sorry - I know you are all busy and any help is much appreciated and not required)

@popcornmix
Copy link
Contributor

The arm side driver is a standard mesa driver. Any standard linux app that uses opengl or opengles should work without specific Pi porting (e.g. you can apt-get install neverball and the standard debian package will wok).
So, in general you don't want the pi specific adaptations (the dispmanx calls) and you should link with /usr/lib/arm-linux-gnueabihf/libEGL.so / /usr/lib/arm-linux-gnueabihf/libGLESv2.so rather than the firmware versions in /opt/vc/lib.

The initial announcement was long ago. Searching for vc4-kms-v3d (and vc4-fkms-v3d which is probably a better choice for now) should get lots of info.

Currently integration with mmal/camera is in progress but not released.

@lagurus
Copy link

lagurus commented Mar 6, 2018

You can also "pack" it yourself in "compress" shader and then use glReadPixels with (width / 4)

Shader should look like:


varying vec2 tcoord;
uniform sampler2D tex;
uniform vec2 fTexelSize;

void main(void) 
{
    vec2 coord = vec2( (4.0 * floor(gl_FragCoord.x) + 0.5) * fTexelSize.x, (1.0 * floor(gl_FragCoord.y) +0.5 ) * fTexelSize.y);
	
    float pt_x = texture2D(tex, coord).r;
    float pt_xx = texture2D(tex, coord+vec2(fTexelSize.x, 0.0)).r;
    float pt_xxx = texture2D(tex, coord+vec2(2.0*fTexelSize.x, 0.0)).r;
    float pt_xxxx = texture2D(tex, coord+vec2(3.0*fTexelSize.x, 0.0)).r;
	
    gl_FragColor.r = pt_x;
    gl_FragColor.g = pt_xx;
    gl_FragColor.b = pt_xxx;
    gl_FragColor.a = pt_xxxx;
}

@JamesH65 JamesH65 added the Close within 30 days Issue will be closed within 30 days unless requested to stay open label Jul 2, 2018
@JamesH65
Copy link
Collaborator

Closing due to lack of activity. Please request to be reopened if you feel this issue is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Close within 30 days Issue will be closed within 30 days unless requested to stay open
Projects
None yet
Development

No branches or pull requests

4 participants