Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dear Imgui Windows slow down render on Emscripten builds... #399

Closed
frink opened this issue Oct 5, 2020 · 29 comments
Closed

Dear Imgui Windows slow down render on Emscripten builds... #399

frink opened this issue Oct 5, 2020 · 29 comments

Comments

@frink
Copy link

frink commented Oct 5, 2020

Using the Sokol Timer from the cimgui example ,it looks like it significantly slows down the FPS as more widgets are added - especially ImGui Windows. However, when I use a fixed FPS (1.0f/60.0f) it never changes. Is the Timer really slowing things down or am I missing something?

I guess I'm not understanding what's the difference between the two.
Can you explain what stm is doing?

@frink frink changed the title Sokol Timer vs Dear Imgui Sokol Timer / Dear Imgui FPS / Window Drawing... Oct 5, 2020
@frink
Copy link
Author

frink commented Oct 5, 2020

Digging in a little bit more it looks like there is still a slowdown regardless of which timer I use. Just the Sokol Timer reports more correctly. But I've got an app generating approx 15 windows. (blank right now...) The 4rd and 5th windows each seem to degrade performance by 10fps. If 3 windows are open it runs at just under 60fps. If 15 windows are open it runs at a little under 10fps.

I'm running this in Enscripten without webGPU because you don't have it compiling on Linux yet. Is that what is causing things to slow down so much...?

I've been talking with the crew on the ImGui Discord and they think I'm nuts for the times I'm reporting. Any Ideas?

These are blank windows here. It's really weird...

@frink frink changed the title Sokol Timer / Dear Imgui FPS / Window Drawing... Sokol Timer / FPS / Dear Imgui Windows Slowdown... Oct 5, 2020
@floooh
Copy link
Owner

floooh commented Oct 6, 2020

No idea to be honest, does it "feel" like 10 fps when moving windows around, or is rendering itself smooth and only the displayed frame rate is off?

Does the slowdown also happen when opening all the UI windows here:

https://floooh.github.io/tiny8bit/bombjack-ui.html

I haven't managed to slow down rendering with ImGui yet no matter how many windows I've thrown at it, but there are some cases where special care must be taken (for instance when rendering large lists, those must use a clipper to not render list items outside the visible list section).

Do you have a native version of that app working, and does the slowdown happen there as well?

The recommended way to measure frame time with sokol_time.h is calling stm_laptime() once per frame, and optionally run the result through stm_round_to_common_refresh_rate() to eliminate any microstutter.

PS: this is the sokol_time.h code on emscripten which takes the current time:

sokol/sokol_time.h

Lines 201 to 205 in d5e6190

#if defined(__EMSCRIPTEN__)
EM_JS(double, stm_js_perfnow, (void), {
return performance.now();
});
#endif

This should run once per frame (e.g. in stm_laptime()), if this would cause performance problems it would be really strange.

@frink
Copy link
Author

frink commented Oct 13, 2020

I've been able to debug enough that I'm sure it's not the timer code.
Dear ImGui definitely feels sluggish as more windows open...

Does the slowdown also happen when opening all the UI windows here...

Yes!! I opening every debug window I could find and it got really sluggish.
Seemed like it was down to around 1-3fps by the time I was done.
Takes about a full seconds for a click to register. Really bad!!!

Do you have a native version of that app working, and does the slowdown happen there as well?

Haven't tested that yet...

The recommended way to measure frame time with sokol_time.h is calling stm_laptime() once per frame,

I noticed that. My code is actually built out from the HDPI example... I've tested HDPI or not and that doesn't seem to matter too much. It seems like something with ImGui vs Emscripten in the way that Sokol compiles things. You can easily get into a pass the buck sort of scenario trying to blame the actual code doing this...

I'm running this on Chromebook right now. Don't know if GPU acceleration might be turned off and I'm running everything from CPU. I'll have to dig into my chrome flags. I'll also try to test my code on a faster computer and see if that changes things...

If I can generate an APK I could install that on the same computer to test Android vs ChromeOS+Emscripten.

The whole thing is baffling...

@floooh
Copy link
Owner

floooh commented Oct 13, 2020

Hmm, the only thing I can think of is overdraw. ImGui windows are rendered with alpha-blending, so they cannot benefit from the z-buffer reducing overdraw. If you have a slow GPU combined with a high-resolution display, combined with big windows that overlap, this stuff can add up quickly.

If you open the same amount of windows, but arrange them so that they are small and don't overlap, do you also see this extreme slowdown?

Don't know if GPU acceleration might be turned off and I'm running everything from CPU.

...this would actually fit my "overdraw theory" ;)

PS: another way to test this overdraw theory: a single window covering the entire screen should be just as slow as any number of smaller non-overlapping windows which taken together cover the same screen area.

@frink
Copy link
Author

frink commented Oct 16, 2020

If you open the same amount of windows, but arrange them so that they are small and don't overlap, do you also see this extreme slowdown?

Nope still it deteriorates...
Screenshot 2020-10-15 at 8 48 01 PM

See frame rate on menu bar. It's calculated two ways. The first frame rate is the average frame rate from ImGui and the second is the actual delta time for that particular frame...

Here is one with overlapping windows...
Screenshot 2020-10-15 at 10 35 56 PM

Size does not seem to matter. Nor does overlap.

I also test one large and it didn't seem to have any effect on the frame rate...
Screenshot 2020-10-15 at 10 57 00 PM

I even tried turning the background fade on and off which also had no effect on the frame rate. The only thing that seem seems to matter is number of windows. I've been able to duplicate the problem both with your app BombJack and with mine...

Furthermore, I've tested the ImGui Manual by opening many windows... (which uses Hello Imgui to build Emscripten)

See frame rate on the bottom right...
Screenshot 2020-10-15 at 11 01 27 PM

This leads me to believe that this really is a Sokol specific problem. Not sure how or why though...

Still trying to dig deeper into this. I've been able to experience slowdowns with both ImGui and cImGui (I'm using the later...) with Sokol examples. I experience the same problems with your hosted examples and the same ones when I compiled myself...

Could this be something related to sokol_imgui.h? I'm reading through and don't seen anything that would suggest a separate render pass for each window. Seems pretty straight forward.

Any ideas?

@floooh
Copy link
Owner

floooh commented Oct 16, 2020

Hmm only difference I'm seeing where I think it could explain the different performance: The sokol_imgui.h renderer uses sg_append_buffer() in the ImGui render loop:

sokol/util/sokol_imgui.h

Lines 1929 to 1934 in 497fbd2

if (vtx_ptr) {
vb_offset = sg_append_buffer(bind.vertex_buffers[0], vtx_ptr, vtx_size);
}
if (idx_ptr) {
ib_offset = sg_append_buffer(bind.index_buffer, idx_ptr, idx_size);
}

...in the sokol_gfx.h GL backend this calls glBufferSubData():

sokol/sokol_gfx.h

Line 6821 in 497fbd2

glBufferSubData(gl_tgt, buf->cmn.append_pos, data_size, data_ptr);

HelloImGui seems to use the GL backend code from the ImGui repository, and this uses glBufferData():

https://github.com/ocornut/imgui/blob/b1a18d82e32f13a2ae62df70d08ee46bc8ee6a76/backends/imgui_impl_opengl3.cpp#L348-L350

If for some reason the WebGL implementation on your machine does a complete roundtrip between the render process and browser process each time glBufferSubData() and glDraw...() is called, then this should be the most expensive operation that's happening during ImGui rendering...

One common thing that causes such roundtrips is glGetError(), so for instance if you are compiling in debug mode (e.g. NDEBUG is not defined), then you will get really bad WebGL performance since the sokol_gfx.h code is littered with _SG_GL_CHECK_ERROR calls, and each causes pipeline flush and roundtrip to the Chrome rendering process. But this doesn't explain why you are also seeing slow performance in the demos on https://floooh.github.io/sokol-html5/ and in the emulators (for a second I thought that this must be the reason though).

@floooh
Copy link
Owner

floooh commented Oct 16, 2020

PS: this is a very old version of an ImGui render loop which uses a singel glBufferData (via sg_update_buffer()) instead of glBufferSubData():

https://github.com/floooh/sokol-samples/blob/5704cffef8f5d6529e760c394d558013faccb07d/html5/imgui-emsc.cc#L291-L355

...this is not in the sokol_imgui.h header, but it should be trivial to copy that code over and test how it behaves. The downside of this code is that it doesn't support UIs with more than 64k vertices per frame.

@floooh
Copy link
Owner

floooh commented Oct 16, 2020

...the screenshots look like you're not running in the browser though, does the slowdown also happen in native code? Maybe the GL driver just has a particularly bad glBufferSubData() implementation?

@floooh
Copy link
Owner

floooh commented Oct 16, 2020

...hmm as for the glGetError() theory: I don't see a noticeable slowdown when opening many windows on macOS when compiled in debug mode when running in Chrome, so at least on that config glGetError() wouldn't explain the behaviour you're seeing. Would've been too easy anyway...

@floooh
Copy link
Owner

floooh commented Oct 16, 2020

...is your code available somewhere so that I could try how it behaves on a different machine and generally could have a look at the code?

@frink
Copy link
Author

frink commented Oct 16, 2020

...the screenshots look like you're not running in the browser though

It's a selected area screenshot on ChromeOS.

...is your code available somewhere so that I could try how it behaves

I'll post a gist for you...


On a personal note,

Thanks for spending so much time on this...

It's really perplexing. The imGui community hasn't heard of such a slowdown either. They point me to examples with hundreds of windows running above 60fps. I don't know if this is only a problem with my test target (it is a Chromebook after all...) or a general problem with Sokol + ImGui. I'd love to figure this out for your sake and make sure that Sokol can render hundreds of windows without slowdown.

I REALLY LIKE the simplicity of the Sokol framework. It's obviously written by a minimalist looking to help others git the bang for their buck with minimal code... - ❤️ I LOVE IT!!!

@floooh
Copy link
Owner

floooh commented Oct 16, 2020

Yeah I'd really like to figure this out too. E.g. if it turns out that sokol_imgui.h is triggering some sort of slow-path on some WebGL configs then I'd rather work around that problem even if leads to "non-obvious" code.

It'll also be helpful for the dynamic-data-update API changes I have in mind (e.g. replacing the somewhat crude sg_update_*() and sg_append_buffer() functions) -- assuming that glBufferSubData() is the culprit.

@frink
Copy link
Author

frink commented Oct 16, 2020

I like the sg_append_buffer() idea anyway...

@iboB
Copy link
Contributor

iboB commented Oct 16, 2020

I don't have a chromebook and I can't reproduce the slowdown on my PCs, but I can on my phone. The number of imgui windows does affect the framerate, dropping to below 1fps with many windows. That's Chrome on an Android phone.

Sokol demos do have the problem. The HelloImgui ones don't.

Again that's only on my Android phone. With Chrome on Windows and Linux Sokol demos work with 60 fps with tens of windows open.

@frink
Copy link
Author

frink commented Oct 16, 2020

That's the SAME thing I'm experiencing... You can duplicate it!!! YAY!!!

I also tested on the desktop at the office. It did deteriorate, but only by 4fps. Interestingly the desktop (Arch Linux) is running at 30fps while the Chromebook starts out at 60fps. Chromebook actually has a better GPU. I don't do gaming on the dev machine anyway so that fine.

I think we should try your append buffer idea and see what that does...

@floooh
Copy link
Owner

floooh commented Oct 17, 2020

@iboB that's good info thanks! I guess one possible solution for now (assuming the problem is indeed glBufferSubData), is to have a runtime and/or compile-time config-toggle in sokol_imgui.h which selects between the "new" method using sg_append_buffer() and the "old" method with sg_update_buffer().

@iboB
Copy link
Contributor

iboB commented Oct 17, 2020

I would try a test with some hundred imgui windows on a desktop. I'm not sure whether it's a phone/chrome OS problem or a general one which is being compensated by the much more powerful PC configurations

@floooh
Copy link
Owner

floooh commented Oct 17, 2020

I wouldn't be surprised if the problem goes down to the GLES2/3 driver layer, or is at least GL API specific. sg_append_buffer() works quite differently in the D3D11 and Metal backends, while with glBufferData() and glBufferSubData() one has to trust the GL implementation to not do stupid things (and that's a bit much to expect especially from mobile GL drivers).

@floooh
Copy link
Owner

floooh commented Oct 18, 2020

I have created a little test here:

http://floooh.github.io/oryol-sticky-tests/imgui-perf-old/imgui-perf-sapp.html

It works perfectly fine on my Windows machine in Chrome, but I see pretty bad frame rate fluctuations on macOS Big Sur both in Chrome and Safari (up to 50ms frame times)

And on my low-end Android phone performance is absolutely terrible, going into second-long frame times, same behaviour as you guys are seeing.

I'll try to come with solutions and/or workaround in the next few days and might upload new versions of this test.

@frink
Copy link
Author

frink commented Oct 18, 2020

Yup. That blows up pretty good on my end. Makes the whole computer almost unusable...

No HDPI

(And yes. Non-HDPI ImGui really does looks that bad on my Chromebook...)

UPDATE: I found the reason for the non-hdpi stuff looking terrible. I had the display oversampling. 1200x800 is native but I have it running 1422x889 to get a little more screen real estate. When I go back to native resolution it looks fine. Interesting anecdote, but nothing to hijack the thread about...

No HDPI

@frink frink changed the title Sokol Timer / FPS / Dear Imgui Windows Slowdown... Dear Imgui Windows slow down render on Emscripten builds... Oct 29, 2020
@frink
Copy link
Author

frink commented Oct 29, 2020

Just curious if you've had time to stub out the buffer fix you mentioned above. No that I've got my project tooling pretty much fixed I'm in a better place to start testing this.

@floooh
Copy link
Owner

floooh commented Oct 30, 2020

Not yet, I first want to work through a couple of pending PRs.

One thing I wanted to try was checking if buffer orphaning at the start of a frame helps by adding some code here:

sokol/sokol_gfx.h

Lines 6825 to 6828 in 612755e

_sg_gl_cache_store_buffer_binding(gl_tgt);
_sg_gl_cache_bind_buffer(gl_tgt, gl_buf);
glBufferSubData(gl_tgt, buf->cmn.append_pos, data_size, data_ptr);
_sg_gl_cache_restore_buffer_binding(gl_tgt);

So that it looks like this:

    _sg_gl_cache_store_buffer_binding(gl_tgt);
    _sg_gl_cache_bind_buffer(gl_tgt, gl_buf);
    if (new_frame) {
        glBufferData(gl_tgt, buf->cmn.size, 0, _sg_gl_usage(buf->cmn.usage));
    }
    glBufferSubData(gl_tgt, buf->cmn.append_pos, data_size, data_ptr);
    _sg_gl_cache_restore_buffer_binding(gl_tgt);

...that's just a shot in the dark though because last time I checked (a couple of years ago), buffer orphaning didn't have any effect on WebGL.

@frink
Copy link
Author

frink commented Nov 1, 2020

Okay. really appreciate all your hard work on this. Excellent lib!!!!

@floooh
Copy link
Owner

floooh commented Nov 4, 2020

Hmm, the buffer-orphaning trick seems to do nothing:

https://floooh.github.io/oryol-sticky-tests/imgui-perf-orphaning/imgui-perf-sapp.html

I guess I need to do more fundamental changes in the sokol_imgui.h rendering code...

@frink
Copy link
Author

frink commented Nov 17, 2020

I can confirm the same thing...

I'm afraid I don't know enough GL to be helpful digging in at that level. Sorry this isn't an easy fix.

@frink
Copy link
Author

frink commented Nov 17, 2020

P.S. I've been using the Docking branch of ImGui lately and I can confirm that docking does not improve anything...

@floooh
Copy link
Owner

floooh commented Nov 17, 2020

yeah, in the meantime I've tried a couple more things (without success):

https://floooh.github.io/oryol-sticky-tests/imgui-perf-noscissor/imgui-perf-sapp.html

https://floooh.github.io/oryol-sticky-tests/imgui-perf-preorphan/imgui-perf-sapp.html

...following a discussion on the WebGL dev mailing list, and the Chrome WebGL team also has found a difference between running on ANGLE versus running on native GL:

https://bugs.chromium.org/p/chromium/issues/detail?id=1145248

...but nevertheless I think I will rewrite the ImGui-Rendering code so that it will only do a single buffer update per frame. That'll be the safest solution.

@frink
Copy link
Author

frink commented Nov 17, 2020

I think that's what Hello ImGui is doing.

BTW - haven't forgotten about contributing the instructions for a manual Emscripten build on the samples. I've got everything working from a Makefile. But as I reported in that thread last week there are a lot of flags I had to pass to get a comparable result to what you were doing. I haven't taken the time to test and pair them down yet which is what I need to give you concise documentation...

@floooh
Copy link
Owner

floooh commented Feb 14, 2021

NOTE I just rewrote the sokol_imgui.h render code to only do one buffer update for the vertex- and index-data per frame each (so total two calls to sg_update_buffer() per frame, regardless of the UI complexity).

Hopefully this fixes the performance problems (if not, please reopen this ticket)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants