New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dear Imgui Windows slow down render on Emscripten builds... #399
Comments
Digging in a little bit more it looks like there is still a slowdown regardless of which timer I use. Just the Sokol Timer reports more correctly. But I've got an app generating approx 15 windows. (blank right now...) The 4rd and 5th windows each seem to degrade performance by 10fps. If 3 windows are open it runs at just under 60fps. If 15 windows are open it runs at a little under 10fps. I'm running this in Enscripten without webGPU because you don't have it compiling on Linux yet. Is that what is causing things to slow down so much...? I've been talking with the crew on the ImGui Discord and they think I'm nuts for the times I'm reporting. Any Ideas? These are blank windows here. It's really weird... |
No idea to be honest, does it "feel" like 10 fps when moving windows around, or is rendering itself smooth and only the displayed frame rate is off? Does the slowdown also happen when opening all the UI windows here: https://floooh.github.io/tiny8bit/bombjack-ui.html I haven't managed to slow down rendering with ImGui yet no matter how many windows I've thrown at it, but there are some cases where special care must be taken (for instance when rendering large lists, those must use a clipper to not render list items outside the visible list section). Do you have a native version of that app working, and does the slowdown happen there as well? The recommended way to measure frame time with sokol_time.h is calling stm_laptime() once per frame, and optionally run the result through stm_round_to_common_refresh_rate() to eliminate any microstutter. PS: this is the sokol_time.h code on emscripten which takes the current time: Lines 201 to 205 in d5e6190
This should run once per frame (e.g. in stm_laptime()), if this would cause performance problems it would be really strange. |
I've been able to debug enough that I'm sure it's not the timer code.
Yes!! I opening every debug window I could find and it got really sluggish.
Haven't tested that yet...
I noticed that. My code is actually built out from the HDPI example... I've tested HDPI or not and that doesn't seem to matter too much. It seems like something with ImGui vs Emscripten in the way that Sokol compiles things. You can easily get into a pass the buck sort of scenario trying to blame the actual code doing this... I'm running this on Chromebook right now. Don't know if GPU acceleration might be turned off and I'm running everything from CPU. I'll have to dig into my chrome flags. I'll also try to test my code on a faster computer and see if that changes things... If I can generate an APK I could install that on the same computer to test Android vs ChromeOS+Emscripten. The whole thing is baffling... |
Hmm, the only thing I can think of is overdraw. ImGui windows are rendered with alpha-blending, so they cannot benefit from the z-buffer reducing overdraw. If you have a slow GPU combined with a high-resolution display, combined with big windows that overlap, this stuff can add up quickly. If you open the same amount of windows, but arrange them so that they are small and don't overlap, do you also see this extreme slowdown?
...this would actually fit my "overdraw theory" ;) PS: another way to test this overdraw theory: a single window covering the entire screen should be just as slow as any number of smaller non-overlapping windows which taken together cover the same screen area. |
See frame rate on menu bar. It's calculated two ways. The first frame rate is the average frame rate from ImGui and the second is the actual delta time for that particular frame... Here is one with overlapping windows... Size does not seem to matter. Nor does overlap. I also test one large and it didn't seem to have any effect on the frame rate... I even tried turning the background fade on and off which also had no effect on the frame rate. The only thing that seem seems to matter is number of windows. I've been able to duplicate the problem both with your app BombJack and with mine... Furthermore, I've tested the ImGui Manual by opening many windows... (which uses Hello Imgui to build Emscripten) See frame rate on the bottom right... This leads me to believe that this really is a Sokol specific problem. Not sure how or why though... Still trying to dig deeper into this. I've been able to experience slowdowns with both ImGui and cImGui (I'm using the later...) with Sokol examples. I experience the same problems with your hosted examples and the same ones when I compiled myself... Could this be something related to sokol_imgui.h? I'm reading through and don't seen anything that would suggest a separate render pass for each window. Seems pretty straight forward. Any ideas? |
Hmm only difference I'm seeing where I think it could explain the different performance: The sokol_imgui.h renderer uses Lines 1929 to 1934 in 497fbd2
...in the sokol_gfx.h GL backend this calls glBufferSubData(): Line 6821 in 497fbd2
HelloImGui seems to use the GL backend code from the ImGui repository, and this uses glBufferData(): If for some reason the WebGL implementation on your machine does a complete roundtrip between the render process and browser process each time glBufferSubData() and glDraw...() is called, then this should be the most expensive operation that's happening during ImGui rendering... One common thing that causes such roundtrips is glGetError(), so for instance if you are compiling in debug mode (e.g. NDEBUG is not defined), then you will get really bad WebGL performance since the sokol_gfx.h code is littered with _SG_GL_CHECK_ERROR calls, and each causes pipeline flush and roundtrip to the Chrome rendering process. But this doesn't explain why you are also seeing slow performance in the demos on https://floooh.github.io/sokol-html5/ and in the emulators (for a second I thought that this must be the reason though). |
PS: this is a very old version of an ImGui render loop which uses a singel glBufferData (via sg_update_buffer()) instead of glBufferSubData(): ...this is not in the sokol_imgui.h header, but it should be trivial to copy that code over and test how it behaves. The downside of this code is that it doesn't support UIs with more than 64k vertices per frame. |
...the screenshots look like you're not running in the browser though, does the slowdown also happen in native code? Maybe the GL driver just has a particularly bad glBufferSubData() implementation? |
...hmm as for the glGetError() theory: I don't see a noticeable slowdown when opening many windows on macOS when compiled in debug mode when running in Chrome, so at least on that config glGetError() wouldn't explain the behaviour you're seeing. Would've been too easy anyway... |
...is your code available somewhere so that I could try how it behaves on a different machine and generally could have a look at the code? |
It's a selected area screenshot on ChromeOS.
I'll post a gist for you... On a personal note, Thanks for spending so much time on this... It's really perplexing. The imGui community hasn't heard of such a slowdown either. They point me to examples with hundreds of windows running above 60fps. I don't know if this is only a problem with my test target (it is a Chromebook after all...) or a general problem with Sokol + ImGui. I'd love to figure this out for your sake and make sure that Sokol can render hundreds of windows without slowdown. I REALLY LIKE the simplicity of the Sokol framework. It's obviously written by a minimalist looking to help others git the bang for their buck with minimal code... - ❤️ I LOVE IT!!! |
Yeah I'd really like to figure this out too. E.g. if it turns out that sokol_imgui.h is triggering some sort of slow-path on some WebGL configs then I'd rather work around that problem even if leads to "non-obvious" code. It'll also be helpful for the dynamic-data-update API changes I have in mind (e.g. replacing the somewhat crude sg_update_*() and sg_append_buffer() functions) -- assuming that glBufferSubData() is the culprit. |
I like the |
I don't have a chromebook and I can't reproduce the slowdown on my PCs, but I can on my phone. The number of imgui windows does affect the framerate, dropping to below 1fps with many windows. That's Chrome on an Android phone. Sokol demos do have the problem. The HelloImgui ones don't. Again that's only on my Android phone. With Chrome on Windows and Linux Sokol demos work with 60 fps with tens of windows open. |
That's the SAME thing I'm experiencing... You can duplicate it!!! YAY!!! I also tested on the desktop at the office. It did deteriorate, but only by 4fps. Interestingly the desktop (Arch Linux) is running at 30fps while the Chromebook starts out at 60fps. Chromebook actually has a better GPU. I don't do gaming on the dev machine anyway so that fine. I think we should try your append buffer idea and see what that does... |
@iboB that's good info thanks! I guess one possible solution for now (assuming the problem is indeed glBufferSubData), is to have a runtime and/or compile-time config-toggle in sokol_imgui.h which selects between the "new" method using sg_append_buffer() and the "old" method with sg_update_buffer(). |
I would try a test with some hundred imgui windows on a desktop. I'm not sure whether it's a phone/chrome OS problem or a general one which is being compensated by the much more powerful PC configurations |
I wouldn't be surprised if the problem goes down to the GLES2/3 driver layer, or is at least GL API specific. sg_append_buffer() works quite differently in the D3D11 and Metal backends, while with glBufferData() and glBufferSubData() one has to trust the GL implementation to not do stupid things (and that's a bit much to expect especially from mobile GL drivers). |
I have created a little test here: http://floooh.github.io/oryol-sticky-tests/imgui-perf-old/imgui-perf-sapp.html It works perfectly fine on my Windows machine in Chrome, but I see pretty bad frame rate fluctuations on macOS Big Sur both in Chrome and Safari (up to 50ms frame times) And on my low-end Android phone performance is absolutely terrible, going into second-long frame times, same behaviour as you guys are seeing. I'll try to come with solutions and/or workaround in the next few days and might upload new versions of this test. |
Yup. That blows up pretty good on my end. Makes the whole computer almost unusable... (And yes. Non-HDPI ImGui really does looks that bad on my Chromebook...) UPDATE: I found the reason for the non-hdpi stuff looking terrible. I had the display oversampling. 1200x800 is native but I have it running 1422x889 to get a little more screen real estate. When I go back to native resolution it looks fine. Interesting anecdote, but nothing to hijack the thread about... |
Just curious if you've had time to stub out the buffer fix you mentioned above. No that I've got my project tooling pretty much fixed I'm in a better place to start testing this. |
Not yet, I first want to work through a couple of pending PRs. One thing I wanted to try was checking if buffer orphaning at the start of a frame helps by adding some code here: Lines 6825 to 6828 in 612755e
So that it looks like this: _sg_gl_cache_store_buffer_binding(gl_tgt);
_sg_gl_cache_bind_buffer(gl_tgt, gl_buf);
if (new_frame) {
glBufferData(gl_tgt, buf->cmn.size, 0, _sg_gl_usage(buf->cmn.usage));
}
glBufferSubData(gl_tgt, buf->cmn.append_pos, data_size, data_ptr);
_sg_gl_cache_restore_buffer_binding(gl_tgt); ...that's just a shot in the dark though because last time I checked (a couple of years ago), buffer orphaning didn't have any effect on WebGL. |
Okay. really appreciate all your hard work on this. Excellent lib!!!! |
Hmm, the buffer-orphaning trick seems to do nothing: https://floooh.github.io/oryol-sticky-tests/imgui-perf-orphaning/imgui-perf-sapp.html I guess I need to do more fundamental changes in the sokol_imgui.h rendering code... |
I can confirm the same thing... I'm afraid I don't know enough GL to be helpful digging in at that level. Sorry this isn't an easy fix. |
P.S. I've been using the Docking branch of ImGui lately and I can confirm that docking does not improve anything... |
yeah, in the meantime I've tried a couple more things (without success): https://floooh.github.io/oryol-sticky-tests/imgui-perf-noscissor/imgui-perf-sapp.html https://floooh.github.io/oryol-sticky-tests/imgui-perf-preorphan/imgui-perf-sapp.html ...following a discussion on the WebGL dev mailing list, and the Chrome WebGL team also has found a difference between running on ANGLE versus running on native GL: https://bugs.chromium.org/p/chromium/issues/detail?id=1145248 ...but nevertheless I think I will rewrite the ImGui-Rendering code so that it will only do a single buffer update per frame. That'll be the safest solution. |
I think that's what Hello ImGui is doing. BTW - haven't forgotten about contributing the instructions for a manual Emscripten build on the samples. I've got everything working from a Makefile. But as I reported in that thread last week there are a lot of flags I had to pass to get a comparable result to what you were doing. I haven't taken the time to test and pair them down yet which is what I need to give you concise documentation... |
NOTE I just rewrote the sokol_imgui.h render code to only do one buffer update for the vertex- and index-data per frame each (so total two calls to sg_update_buffer() per frame, regardless of the UI complexity). Hopefully this fixes the performance problems (if not, please reopen this ticket) |
Using the Sokol Timer from the cimgui example ,it looks like it significantly slows down the FPS as more widgets are added - especially ImGui Windows. However, when I use a fixed FPS (1.0f/60.0f) it never changes. Is the Timer really slowing things down or am I missing something?
I guess I'm not understanding what's the difference between the two.
Can you explain what stm is doing?
The text was updated successfully, but these errors were encountered: