Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 3D MSAA and scaling support to GLES3 #83976

Merged
merged 1 commit into from Dec 7, 2023

Conversation

BastiaanOlij
Copy link
Contributor

@BastiaanOlij BastiaanOlij commented Oct 26, 2023

This PR adds MSAA and scaling support to the compatibility renderer.

Implemented in this PR are:

  • 3D MSAA
  • Bilinear upscaling
  • Making back buffers work when upscaling is used (may not work in stereo)

Moved to followup PR:

  • 2D MSAA

I did a big rewrite of this so GL_EXT_multisampled_render_to_texture and GL_OVR_multiview_multisampled_render_to_texture are used if supported. These are OpenGLES extensions so only available on Android. The advantage of using these techniques is that MSAA data is never written out of tile memory but resolved when written out to texture.
The added advantage for VR is that this can be combined with foveated rendering.

On other devices separate MSAA buffers are being used, just like in Godot 3. This can not be combined with foveated rendering (just doesn't have a performance improvement).

Stupid little test project ;)
MSAA test.zip

@BastiaanOlij
Copy link
Contributor Author

cc @Calinou , you might enjoy testing this :)

doc/classes/ProjectSettings.xml Outdated Show resolved Hide resolved
doc/classes/ProjectSettings.xml Outdated Show resolved Hide resolved
modules/openxr/doc_classes/OpenXRInterface.xml Outdated Show resolved Hide resolved
modules/openxr/doc_classes/OpenXRInterface.xml Outdated Show resolved Hide resolved
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected. Downsampling is also possible with a resolution scale set above 1.0, and it can be combined with MSAA.

Performance

Tested on https://github.com/godotengine/godot-demo-projects/tree/master/3d/truck_town in 3840×2160 fullscreen.
Largely academic because a RTX 4090 is hardly kept busy when using the Compatibility rendering method, but I figure I'd include these nonetheless. Expect the MSAA impact to be greater on slower GPUs.

Similar test on integrated graphics: #83976 (comment)

MSAA

No MSAA 2× MSAA 4× MSAA 8× MSAA
1274 FPS (0.78 mspf) 1246 FPS (0.80 mspf) 1245 FPS (0.80 mspf) 1241 FPS (0.81 mspf)

Resolution scaling

0.25 0.99 1.0 1.5 2.0 2.0 + 8× MSAA
1240 FPS (0.80 mspf) 1244 FPS (0.80 mspf) 1274 FPS (0.78 mspf) 988 FPS (1.01 mspf) 654 FPS (1.52 mspf) 336 FPS (2.97 mspf)

654 FPS in an internal resolution of 8K is nothing to scoff at, if you ask me. 🙂

Performance is slightly lower when scaling kicks in at sub-native resolutions, but this is likely because the cost of additional copies/framebuffers outweighs the cost of actually rendering the scene on such a fast GPU.

Issues encountered

I noticed the FSR2 error is printed 4 times when you set the scaling 3D mode to FSR2. This doesn't occur with FSR1. Also, this should be a warning rather than an error for consistency with the FSR1 warning:

image

@Calinou
Copy link
Member

Calinou commented Oct 26, 2023

Tests done on https://github.com/godotengine/godot-demo-projects/tree/master/3d/truck_town, in 2560×1600 fullscreen on i9-13900H's Intel Iris Xe (DDR5-5600 C40 dual channel). Laptop on battery in power-saving power profile – this is a worst-case scenario, but I made sure it was consistent across all tests.

MSAA

Forward+

No MSAA 2× MSAA 4× MSAA 8× MSAA
61 FPS 39 FPS 31 FPS 22 FPS

Forward Mobile

No MSAA 2× MSAA 4× MSAA 8× MSAA
82 FPS 51 FPS 41 FPS 29 FPS

Compatibility

No MSAA 2× MSAA 4× MSAA 8× MSAA
122 FPS 76 FPS 58 FPS 44 FPS

Resolution scaling

0.5 (= 1280×800) 0.99 (= 2534×1584) 1.0 (= 2560×1600) 2.0 (= 5120×3200)
124 FPS 61 FPS 61 FPS 21 FPS

Forward Mobile

0.5 0.99 1.0 2.0
141 FPS 82 FPS 82 FPS 30 FPS

Compatibility

0.5 0.99 1.0 2.0
152 FPS 105 FPS 122 FPS 42 FPS

It seems the Compatibility renderer's scaling method is much more expensive, despite bilinear scaling being used in all tests here (also in Forward+/Mobile). This is also something you could notice on the RTX 4090 tests above (albeit it's much less significant there). Notice how 0.99 is much slower than 1.0 despite the pixel count being nearly identical (2534×1584 instead of 2560×1600).

@BastiaanOlij
Copy link
Contributor Author

@Calinou great testing, thanks!

The FSR2 error was already there, indeed agree we should make this a warning and ensure it's outputted once.

Note when looking at scaling, if you set scaling to 1.0 and method to bilinear, scaling is disabled (this was actually broken when FSR2 was introduced), so the difference between 0.9 and 1.0 is that with 1.0 scaling is skipped, while 0.9 adds an extra blit pass. I suspect the difference in performance will be more extreme on TBDR.

As I mentioned when fixing things in the Vulkan renderer a few days ago, I believe that we should actually have a proper "OFF" mode for scaling, instead of relying on scaling being set to 1.0, but that would be a breaking change due to us wanting OFF to be mode 0 instead of Bilinear being mode 0.

@BastiaanOlij
Copy link
Contributor Author

BastiaanOlij commented Oct 27, 2023

Ok, so turns out MSAA is different on OpenGLES than it is on OpenGL. Working on resolving that...

edit did a first pass of this but haven't tested yet, so may not work.

@BastiaanOlij
Copy link
Contributor Author

Ok, after talking to Clay I'm going to change things around to using glRenderbufferStorageMultisample, which should work on both Android and Desktop

@BastiaanOlij BastiaanOlij self-assigned this Oct 30, 2023
@BastiaanOlij
Copy link
Contributor Author

BastiaanOlij commented Nov 5, 2023

@clayjohn

So.... diving more into MSAA I've run into a snag.
From what I've found there are four extensions that implement MSAA differently that we're dealing with.

  • GL_EXT_framebuffer_multisample is the extension we use in Godot 3. Here you create multisample render buffers and render into them, then resolve them to normal textures in a separate step.
  • GL_EXT_multiview_texture_multisample is an extension that allows multisample rendering in multiview, but unlike the above we don't use render buffers but allow creating multisample layered texture objects. The problem is that this is an OpenGL 4 feature.
  • GL_EXT_multisampled_render_to_texture is an extension we also check, but which actually works very differently, you create normal textures, but create a special type of framebuffer. This is really ended to TBDR, basically the tile becomes a multisampled tile, but it is resolved as it is written to texture. Kinda what we do with subpasses in Vulkan I guess.
  • GL_OVR_multiview_multisampled_render_to_texture is an extension that builds on the previous one and adds multiview support to this. As far as I can tell it's the only approach that supports MSAA and multiview

Now these last two extensions actually make a lot of sense on mobile hardware, especially considering the warning ARM gave that on mobile GPUs you should never write out MSAA data to texture, it kills all your bandwidth. This is also something that Meta goes into some detail in in their materials (I think Meta took the lead on defining GL_OVR_multiview_multisampled_render_to_texture).

So it really would make sense to base our MSAA approach on these later two extensions with maybe a fallback to the first if the others are not supported (that would only apply to normal older devices anyway) and thus disabling MSAA in multiview unless GL_OVR_multiview_multisampled_render_to_texture is supported.

I have some ideas for this and will try them out, I'm writing this mostly to get my thoughts down, but it fits well with our single buffer approach. Our biggest issue is the split we have in render buffers (2D) and scene buffers (3D), with configure being called to setup 3D rendering. I think I can get away with that though..

The idea I have is to base rendering purely on these two extensions, with maybe a fallback to using the first but just not making that available if multiview is used.

@dsnopek
Copy link
Contributor

dsnopek commented Nov 5, 2023

I haven't ever implemented this, but if you use the Oculus multiview extension for WebXR (as opposed to the standard one), you can implement MSAA in a way that appears to work similar to the latter two extension that you link. You setup the FBO with framebufferTextureMultisampleMultiviewOVR() and I think the resolve is all handled automatically by the driver. So, if we supported those two extension for MSAA on desktop OpenGL, I think it'd make it fairly simple to add support for MSAA on WebGL/WebXR as well, and so I'm all for it. :-)

@BastiaanOlij
Copy link
Contributor Author

@dsnopek I think it makes a lot of sense, I'll be busy for a few days yet but I think I have a pretty good picture of making it work.

@BastiaanOlij
Copy link
Contributor Author

@clayjohn @dsnopek so it turns out that the render_to_texture extensions are GLES only (so Android only), that leaves an issue with multisample on multiview. There is an extension for that as well:

GL_EXT_multiview_texture_multisample

This extension is however an OpenGL 4 extension that removes a limitation on glFramebufferTextureMultiviewOVR so multisample multilayer texture objects can be used.

Funky how all these work differently.

I'm rewriting stuff so we use the correct extension depending on whether we're on mobile hardware or not...

@BastiaanOlij
Copy link
Contributor Author

Ok, I think I have most of 3D MSAA working now, I'm still having an issue when combining MSAA and upscaling on mobile hardware using the render_to_texture approach, but on desktop using old fashioned MSAA works.

Also still hunting down a few stray errors and still need to test multiview.

Once thats all confirmed to be working I'll implement 2D MSAA which I think will mostly be copy/paste.

@BastiaanOlij BastiaanOlij force-pushed the gles3_msaa_scaler branch 2 times, most recently from 6791410 to 638a85e Compare November 8, 2023 02:31
@dsnopek
Copy link
Contributor

dsnopek commented Nov 9, 2023

I just posted draft PR #84686 which adds support for MSAA with multiview on the web (via WebXR). It depends on the changes in this PR, and the diff that I put in my previous comment.

@BastiaanOlij
Copy link
Contributor Author

@dsnopek your changes are merged in so you can rebase your PR, thanks!

I've also removed all the 2D MSAA stuff and moving that into a separate PR, so that we can move to merging this once 4.2 is released

@BastiaanOlij BastiaanOlij changed the title Add MSAA and scaling support to GLES3 Add 3D MSAA and scaling support to GLES3 Nov 10, 2023
@MrZak-dev
Copy link

@dsnopek your changes are merged in so you can rebase your PR, thanks!

I've also removed all the 2D MSAA stuff and moving that into a separate PR, so that we can move to merging this once 4.2 is released

Thank you guys for the great progress. With @dsnopek changes, is it ready to test some MSAA for 3D for Web Export ?

@dsnopek
Copy link
Contributor

dsnopek commented Nov 10, 2023

@MrZak-dev:

Thank you guys for the great progress. With @dsnopek changes, is it ready to test some MSAA for 3D for Web Export ?

Yep! It worked in my testing, but more testing would certainly be welcome :-)

@MrZak-dev
Copy link

@BastiaanOlij @dsnopek
I did some testing for web export and the MSAA doesn't look like its working for my scenario .

image

@dsnopek
Copy link
Contributor

dsnopek commented Nov 10, 2023

@MrZak-dev Ah, I think I know the solution to this! The key error in your screenshot is "Samples must not be greater than the maximum supported value for the format". In my WebXR PR, I added some code to limit the samples to the reported maximum, which should probably be brought over into this PR. Until then, you could try setting a lower MSAA setting, maybe 2x, and see if that works?

@BastiaanOlij See 63dd028#diff-f6e1f37382eb13c287a1e289ce2eb6f83c2301827e2c2b94cc8ad2ac2ba47571R244 - I can make a diff of just that later

@MrZak-dev
Copy link

@dsnopek Yes it did work on 2x and 4x .

@dsnopek
Copy link
Contributor

dsnopek commented Nov 10, 2023

@MrZak-dev:

@dsnopek Yes it did work on 2x and 4x .

Great, thanks!

@BastiaanOlij Merge this diff into this PR to fix the problem on the web that @MrZak-dev encountered:

diff --git a/drivers/gles3/storage/render_scene_buffers_gles3.cpp b/drivers/gles3/storage/render_scene_buffers_gles3.cpp
index c096f0ec6e..65f7eaed1f 100644
--- a/drivers/gles3/storage/render_scene_buffers_gles3.cpp
+++ b/drivers/gles3/storage/render_scene_buffers_gles3.cpp
@@ -215,6 +215,12 @@ void RenderSceneBuffersGLES3::configure(const RenderSceneBuffersConfiguration *p
                const GLsizei samples[] = { 1, 2, 4, 8 };
                msaa3d.samples = samples[msaa3d.mode];
 
+               GLint max_samples;
+               glGetIntegerv(GL_MAX_SAMPLES, &max_samples);
+               if (msaa3d.samples > max_samples) {
+                       msaa3d.samples = max_samples;
+               }
+
                if (!use_multiview && !config->rt_msaa_supported) {
                        // Render to texture extensions not supported? fall back to MSAA framebuffer through GL_EXT_framebuffer_multisample.
                        // Note, if 2D MSAA matches 3D MSAA and we're not scaling, it would be ideal if we reuse our 2D MSAA buffer here.

@BastiaanOlij
Copy link
Contributor Author

@dsnopek I did it slightly differently so we retrieve this once in GLES3::Config and then re-use the value, we'll later need to embed this for 2D MSAA as well.

Let met know if it works fine for you guys.

@dsnopek
Copy link
Contributor

dsnopek commented Nov 11, 2023

@BastiaanOlij The code looks good, and works in my testing!

@rafa985
Copy link

rafa985 commented Nov 13, 2023

will we see this implemented on the next beta update for Godot ?

@Calinou
Copy link
Member

Calinou commented Nov 13, 2023

will we see this implemented on the next beta update for Godot ?

4.2 is in feature freeze, so any new features will be for 4.3 at the earliest.

@BastiaanOlij
Copy link
Contributor Author

will we see this implemented on the next beta update for Godot ?

4.2 is in feature freeze, so any new features will be for 4.3 at the earliest.

Indeed, and I think these changes are too substantial to cherry pick for a 4.2.1 release.
Shame because it's a hell of an improvement :)

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and this works great. Left a suggestion to fix a harmless OpenGL error.

Code seems fine to me.

I can't test all the edge cases with the multisample extensions. So I have to trust that you tested them all sufficiently.

drivers/gles3/storage/render_scene_buffers_gles3.cpp Outdated Show resolved Hide resolved
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now!

@akien-mga akien-mga merged commit a311a4b into godotengine:master Dec 7, 2023
15 checks passed
@akien-mga
Copy link
Member

Thanks!

@dsnopek
Copy link
Contributor

dsnopek commented Dec 7, 2023

Huzzah! 🚀 Now that this one is merged, the WebXR/multiview MSAA PR is ready for review - see #84686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants