Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve glBufferSubData usage where safe #33527

Merged

Conversation

@clayjohn
Copy link
Contributor

clayjohn commented Nov 10, 2019

Fixes: #23956

Special thanks to @oeleo1 for testing out and identifying the best candidates.

Tagged as ios, but this may benefit other platforms as well as certain drivers handle glBufferData better than glBufferSubData.

There may be other places where this optimization may be appropriate, however, these are the places @oeleo1 identified as being safest (minus a couple that I removed). For more information about the rationale see this excellent comment #23956 (comment)

This PR now also has the changes from #23956 (comment). This uses a glBufferData() orphaning trick to vastly speed up the usage of glBufferSubData on mobile and webgl

@clayjohn clayjohn added this to the 3.2 milestone Nov 10, 2019
@clayjohn clayjohn requested a review from reduz as a code owner Nov 10, 2019
@clayjohn clayjohn force-pushed the clayjohn:GLES2-bufferdata_optimization branch from 842134a to 1253a33 Nov 12, 2019
@clayjohn clayjohn changed the title Use BufferData instead of BufferSubData where safe Improve glBufferSubData usage where safe Nov 12, 2019
@@ -3099,7 +3099,7 @@ void RasterizerStorageGLES3::_update_material(Material *material) {
}

glBindBuffer(GL_UNIFORM_BUFFER, material->ubo_id);
glBufferSubData(GL_UNIFORM_BUFFER, 0, material->ubo_size, local_ubo);
glBufferData(GL_UNIFORM_BUFFER, material->ubo_size, local_ubo, GL_STATIC_DRAW);

This comment has been minimized.

Copy link
@akien-mga

akien-mga Nov 12, 2019

Member

I noticed that this is the only call using GL_STATIC_DRAW instead of GL_DYNAMIC_DRAW, is this on purpose or an oversight?

This comment has been minimized.

Copy link
@oeleo1

oeleo1 Nov 12, 2019

On purpose. The buffer was allocated a few lines above with GL_STATIC_DRAW. We match the access type at the time of creation. This is only a hint anyway.

@reduz

This comment has been minimized.

Copy link
Member

reduz commented Nov 19, 2019

This is a strange optimization, but if it works faster I guess its fine. Does it make sense only on GL or would desktop also take advantage of it too?

@clayjohn

This comment has been minimized.

Copy link
Contributor Author

clayjohn commented Nov 19, 2019

@reduz the benefit only applies to iOS and some Android devices. On desktop there is no known benefit.

I prefer to keep it in the ifdefs because of the risk of this interacting weird with bad drivers.

@akien-mga

This comment has been minimized.

Copy link
Member

akien-mga commented Nov 19, 2019

Let's give it a spin in 3.2 beta 2 today and see whether users report any regression on GLES/WebGL.

@akien-mga akien-mga merged commit 6536105 into godotengine:master Nov 19, 2019
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@akien-mga

This comment has been minimized.

Copy link
Member

akien-mga commented Nov 19, 2019

Thanks!

@oeleo1

This comment has been minimized.

Copy link

oeleo1 commented Nov 19, 2019

Great. Thanks to you -- this makes a huge difference on mobile. On desktop, the improvement is barely noticeable so I share @clayjohn conservatism - I had to VSync off to go beyong the 60fps mark and I get around 900-950 fps in both cases (Win 10, OpenGL ES 2.0 Renderer: Quadro K2100M/PCIe/SSE2).

@lawnjelly

This comment has been minimized.

Copy link
Contributor

lawnjelly commented Nov 25, 2019

I may be a couple of decades out of date on this, but if you are getting pipeline stalls as a result of trying to write to data that is still being used, an alternative is to create 3 buffers and use them in round robin fashion.

That said, if you mark a buffer as dynamic you would assume it was doing this behind the scenes, but maybe not on some drivers.

https://www.bfilipek.com/2015/01/persistent-mapped-buffers-in-opengl.html

@oeleo1

This comment has been minimized.

Copy link

oeleo1 commented Nov 25, 2019

@lawnjelly You are absolutely right. Good pointer - thanks. I think we got 3 problems here in decreasing priority:

  • Problem 1: Godot's status quo is that it uses a single big fixed max size poly buffer preallocated at startup. So it writes on this buffer over and over again at every poly draw request. That mildly sucks and we addressed that problem with the least obtrusive code changes (code readability counts, indeed).
  • Problem 2: Round-robin is cool but requires 3 max size buffers with the status quo.
  • Problem 3: No "optimal" size buffer management, optimal being subjective (needed size, power of 2, a mix of that, or something else).
    I guess great minds will join forces to fix all that in 4.0 and beyond by rewriting the whole buffer allocation stuff...
@lawnjelly

This comment has been minimized.

Copy link
Contributor

lawnjelly commented Nov 25, 2019

It sounds as though glBufferData is avoiding the stall because it is effectively saying 'I don't care about the old contents of the buffer, allocate me a completely new buffer'. So internally it is likely using the same amount of memory internally as if you explicitly allocated these yourself, so it may not be 'saving memory' versus 2 / 3 buffers as much as expected.

Ok, you in theory you can save a bit by only allocating what is needed, but on the other hand this is traded off against the risk of potentially thrashing memory around to achieve this variable size allocation (or an allocation failure worse case!). So deciding on a maximum (or at least only changing it infrequently) would probably be my preference, as it is constant very small time. This is the usual divergence between general app memory allocation schemes and game allocation schemes.

The optimal size to use sounds doable, via the methods you suggest.

@oeleo1

This comment has been minimized.

Copy link

oeleo1 commented Nov 25, 2019

Agreed again. We are definitely not 'saving memory' by replacing glBufferSubData with glBufferData -- quite the contrary. And allocating on-demand size is indeed suboptimal versus fixed cap size. The focus here is to find the trade-off between spending more "reasonably" to avoid the stalls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.