Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement custom mip map generation & raise maximum texture atlas size to 4096x4096 #5508

Merged
merged 58 commits into from
Apr 17, 2023

Conversation

Tom94
Copy link
Collaborator

@Tom94 Tom94 commented Nov 12, 2022

Undoes #2585

Larger texture atlases drastically reduce the number of draw calls by avoiding texture switching.

The previous limit of 1024x1024 is not even enough to hold the latin font (as used in the font tests), let alone cjk. In osu! song select, the number draw calls is reduced from ~900 to ~400.

Large texture atlases were previously not used, because their mipmap generation (then based on GL.GenerateMipmap) was slow, causing stutters. This PR introduces a custom mipmap generation pipeline that only touches the regions of the texture atlas that are being updated, hence circumventing stutters.

@peppy
Copy link
Sponsor Member

peppy commented Nov 12, 2022

The only apprehension I have here is our usage of glGenerateMipmap, which could potentially be much more expensive for every texture upload to an atlas as a result of this change. This could of course be optimised away by being smarter about mipmap generation (not regenerating the full atlas mipmap every upload), but would require further work.

GL.GenerateMipmap(TextureTarget.Texture2D);

Before merging, this should be tested on every platform we have access to, especially lower powered ones.

@peppy peppy self-requested a review November 12, 2022 06:57
@peppy
Copy link
Sponsor Member

peppy commented Nov 12, 2022

Would be good to have a test scene to test this precise scenario (which uploads a new texture each test step).

@peppy
Copy link
Sponsor Member

peppy commented Nov 15, 2022

With a focus on the spikes above average:

1024 With glGenerateMipmap:

osu Framework Tests 2022-11-15 at 07 16 56

8192 With glGenerateMipmap:

osu Framework Tests 2022-11-15 at 07 14 23

8192 With glGenerateMipmap but no Nicest hinting:

osu Framework Tests 2022-11-15 at 07 18 40

8192 Without glGenerateMipmap:

osu Framework Tests 2022-11-15 at 07 12 07

Pretty safe to say there is overhead which we need to address, but a path forward is probably very simple.

I'd want to see similar testing done on an M1 GPU, as @smoogipoo has mentioned overheads with texture uploads.

Diff to disable mipmapping:

diff --git a/osu.Framework/Graphics/Textures/TextureAtlas_BackingAtlasTexture.cs b/osu.Framework/Graphics/Textures/TextureAtlas_BackingAtlasTexture.cs
index 6233a9950..abdd41544 100644
--- a/osu.Framework/Graphics/Textures/TextureAtlas_BackingAtlasTexture.cs
+++ b/osu.Framework/Graphics/Textures/TextureAtlas_BackingAtlasTexture.cs
@@ -32,7 +32,7 @@ private class BackingAtlasTexture : Texture
             private static readonly Rgba32 initialisation_colour = default;
 
             public BackingAtlasTexture(IRenderer renderer, int width, int height, bool manualMipmaps, TextureFilteringMode filteringMode = TextureFilteringMode.Linear, int padding = 0)
-                : this(renderer.CreateTexture(width, height, manualMipmaps, filteringMode, initialisationColour: initialisation_colour))
+                : this(renderer.CreateTexture(width, height, true, TextureFilteringMode.Linear, initialisationColour: initialisation_colour))
             {
                 this.padding = padding;
                 atlasBounds = new RectangleI(0, 0, Width, Height);

Testing can be done using #5517. Note that the screenshots above were running one upload per test step, while I've changed the test to run per 50ms, so the spikes will be more frequent on that branch. Can be adjusted as required.

Copy link
Sponsor Member

@peppy peppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires further consideration

@frenzibyte
Copy link
Member

frenzibyte commented Nov 16, 2022

Since M1 GPU was requested, I've gave this a whirl. The results seem interesting:

1024 with glGenerateMipmap:

CleanShot 2022-11-17 at 00 40 40

8192 with glGenerateMipmap:

CleanShot 2022-11-17 at 00 42 56

8192 with glGenerateMipmap but no Nicest hinting:

CleanShot 2022-11-17 at 00 44 45

8192 without glGenerateMipmap:

CleanShot 2022-11-17 at 00 48 22

@Tom94 Tom94 changed the title Raise maximum texture atlas size to 8192x8192 Implement custom mip map generation & raise maximum texture atlas size to 8192x8192 Nov 17, 2022
@peppy
Copy link
Sponsor Member

peppy commented Nov 17, 2022

Performance wise, the new changes bring this back in line with master levels. Have not yet gone over the code changes with a fine comb.

@Tom94
Copy link
Collaborator Author

Tom94 commented Nov 17, 2022

Should be good for review now -- I'm done making implementation changes.

@Morilli
Copy link

Morilli commented Nov 17, 2022

Is memory usage not a concern here?
Testing with osu!(lazer), memory allocation appears to be around 1GB higher at startup (2 8192x8192 texture atlases used), and about 1.5GB higher when starting to play a beatmap (4 texture atlases used) compared to without this change, which seems like a considerable amount.
The texture atlas allocations themselves seem to also create a big lagspike at the start of the first beatmap and when starting the game itself.

@peppy
Copy link
Sponsor Member

peppy commented Nov 17, 2022

Is memory usage not a concern here?
Testing with osu!(lazer), memory allocation appears to be around 1GB higher at startup (2 8192x8192 texture atlases used), and about 1.5GB higher when starting to play a beatmap (4 texture atlases used) compared to without this change, which seems like a considerable amount.

If it's an issue we can drop back down to 4096x feasibly. But going forward we would be moving towards using a single atlas game-wide. I am currently working on combining the main atlas with the font store one, so that will reduce the startup overhead to just one atlas texture.

For gameplay, I'm not sure what is making the atlases but it will need further investigation. The eventual plan is to add support for evicting textures from atlases so we can potentially have one shared global atlas across the whole game.

The texture atlas allocations themselves seem to also create a big lagspike at the start of the first beatmap and when starting the game itself.

Can you show frame graphs of this happening, and provide hardware specs?

@Morilli
Copy link

Morilli commented Nov 17, 2022

going forward we would be moving towards using a single atlas game-wide

That sounds good and would probably solve that issue completely once implemented.

Can you show frame graphs of this happening, and provide hardware specs?

My CPU is an intel i5 7600K (4 core ~4GHz), and my GPU a nvidia gtx 1060 (6GB). Here's a video showing the first map start:

lazer.map.lag.textureatlas.mp4

@Tom94
Copy link
Collaborator Author

Tom94 commented Nov 17, 2022

@Morilli thanks for testing! Could you pull the latest commit and see if the lag spike got better?

@Morilli
Copy link

Morilli commented Nov 17, 2022

Indeed, this is much better! The tiny lag remaining is basically unnoticable now 👍

@peppy
Copy link
Sponsor Member

peppy commented Nov 17, 2022

Also see #5521 which will reduce the memory by some factory.

Gameplay requires a bit of further consideration, but on an initial check we can probably remove one of the two atlases there as well.

@peppy peppy self-requested a review November 18, 2022 02:07
@peppy
Copy link
Sponsor Member

peppy commented Nov 18, 2022

From a memory overhead angle, we may want to consider bumping this up to 4096x4096 to start with (67mb per atlas texture vs 268mb for 8192x). At least until we have atlas eviction support and a better algorithm to allow for sharing a single atlas game-wide.

For example, right now during gameplay there are 4-5 TextureStores which each create their own atlas. Even if they don't consume anywhere near the full size of the atlas texture they will currently incur the above overhead at a GPU/driver level.

@frenzibyte frenzibyte self-requested a review November 20, 2022 13:02
Copy link
Member

@frenzibyte frenzibyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags look a bit darker than usual, and resizing the game down to 640x480 makes them even darker.

Normal size:

CleanShot 2022-11-20 at 17 44 18

640x480:

CleanShot 2022-11-20 at 17 44 08

(notice the colour of the flag in the leaderboard card)


That being said, I haven't dwelt into the ends of the mipmap generation logic but overall it looks pretty interesting, and can see it eventually turning into its own class instantiated in Renderer and exposed to textures via a GenerateMipmap method that accepts a list of quads. But that can definitely be left for a follow-up.

Comment on lines 448 to 453
// Initialize texture to solid color
int frameBuffer = GL.GenFramebuffer();
GL.BindFramebuffer(FramebufferTarget.Framebuffer, frameBuffer);
GL.FramebufferTexture2D(FramebufferTarget.Framebuffer, FramebufferAttachment.ColorAttachment0, TextureTarget2d.Texture2D, TextureId, level);
Renderer.Clear(new ClearInfo(initialisationColour));
GL.DeleteFramebuffer(frameBuffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like scissoring and blending mask (technically colour write mask) should be reset here as well, otherwise I believe they will affect the initialisation here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scissoring, depth, and colour write mask are reset internally to Renderer.Clear(). Am I missing something here? Blending is not reset (probably bad), and depth still performs a depth test (probably bad) but those don't seem to have any effect in resolving the issue I pointed out in #5508 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scissoring and depth test sure, but colour write mask? I can't seem to find any line that resets that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about that - blend mask isn't set. Doesn't seem to have any effect though (disabled FTB).

@peppy
Copy link
Sponsor Member

peppy commented Dec 16, 2022

@Tom94 since you know this best, can you give the flag issue above a quick check and provide your thoughts? I hope to get to reviewing this in the next few days.

@frenzibyte
Copy link
Member

I had revisited this PR in efforts to improve performance on mobile devices (iOS specifically), and spent a good few hours now to get it to fully work on Veldrid. I've only tested this on Metal, checking other backends would be appreciated.

@frenzibyte
Copy link
Member

@smoogipoo I can't reproduce what you're seeing above, maybe because I'm testing on M1.

@EVAST9919

This comment was marked as resolved.

osu.Framework/Resources/Shaders/sh_mipmap.vs Outdated Show resolved Hide resolved
@frenzibyte
Copy link
Member

frenzibyte commented Apr 16, 2023

Turns out D3D11 doesn't like using the same texture as a sampling resource and a render target at the same time, so I've separated them for now. This makes me wonder whether we can replace all of this with a compute shader at some point, but I haven't looked into whether it can sample from and write to different mipmap levels of the texture (it may be feasible with read-write TextureView for the target mip level and a Sampler instance that reads the previous mip level).

@EVAST9919 would appreciate confirming whether it works on your end now.

@frenzibyte
Copy link
Member

@peppy need veldrid package update with ppy/veldrid#12 included for iOS to work (got broken with my latest changes above).

@EVAST9919
Copy link
Contributor

@EVAST9919 would appreciate confirming whether it works on your end now.

Yep, all good now.

@peppy peppy enabled auto-merge April 17, 2023 07:31
@peppy peppy merged commit a7aacf8 into ppy:master Apr 17, 2023
@smoogipoo
Copy link
Contributor

smoogipoo commented Apr 17, 2023

Legacy-GL: This has broken some storyboards. Here's one, where random text will be black: https://osu.ppy.sh/beatmapsets/499488#osu/1073964.

Legacy GL: My texture corruption issue still hasn't been fixed, but I haven't been able to reproduce any real world issues.

Veldrid-VK:

image

@Memresable
Copy link

uh... apparently this pull request (I think? since textures usually take up memory space if they are huge) seems to have caused massive memory usage spike in the latest release (2023.4.19, Linux) and now It's unplayable unless if i rollback to older versions (this is on OpenGL, not the legacy one)
image

@peppy
Copy link
Sponsor Member

peppy commented Apr 20, 2023

This should not have increased memory usage in any meaningful way (at most 50mb or so). I can't reproduce.

Please continue dsucssion in the new issue you opened rather than taking a shot in the dark at the cause. It's quite possibly not related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants