Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

noVNC eats CPU and hangs Firefox #431

Closed
abligh opened this issue Jan 21, 2015 · 12 comments
Closed

noVNC eats CPU and hangs Firefox #431

abligh opened this issue Jan 21, 2015 · 12 comments
Labels

Comments

@abligh
Copy link

abligh commented Jan 21, 2015

noVNC launched connecting to a qemu instance running Firefox causes extreme CPU usage and essentially hangs firefox. Essentially a lot of RFB output appears to hang firefox.

Here's the easiest way to replicate:

  • Launch a cirros VM (I don't think the OS matters but that's the easiest to download) using qemu, specifying a VNC port to bind it to (therefore qemu is the rfb server)
  • Use websockify to forward to this port (no SSL needed)
  • Open the VNC console in Firefox
  • Log in to the Cirros VM and type ls -lR / or indeed anything else that produces a lot of screen output
  • Watch firefox die

The 'death' involves one page of output, a multisecond delay, then a second page of output, then a complete hang with the CPU pegged. Firefox is then unresponsive until the window containing VNC is closed (by hitting the close button on the window), at which point after a few seconds it recovers.

I am using the following:

  • noVNC 0.5.1 straight from git on Linux. 0.5.0 did the same thing.
  • Firefox 35.0 (current) on Mac OS-X. Firefox 31.0 does the same thing
  • websockify 0.5.1+dfsg1-3ubuntu0.14.04.1 (Ubuntu Trusty current). My own websockets proxy which shares no code does the same thing.
  • qemu 2.0.0+dfsg-2ubuntu1.10 (Ubuntu Trusty current). Guest OS tried: Cirros, Ubuntu.
  • no SSL (have tried SSL using my own websockets proxy and it is also broken)

There is no problem using Chrome.

I conclude the problem is thus somewhere between Firefox and novnc, possibly when qemu is the server, when there's a lot of screen output and scrolling.

@samhed
Copy link
Member

samhed commented Jan 21, 2015

I am not able to reproduce using TigerVnc as the vnc server. Tested Firefox 33 on Mac OS-X and Firefox 34 on Fedora 20.

@abligh
Copy link
Author

abligh commented Jan 27, 2015

Definitely can be replicated with qemu as the vnc server :-)

@DirectXMan12
Copy link
Member

I'm able to reproduce the cases of both @samhed and @abligh. There appears to be something with the way qemu is sending the update frames. I'll investigate further.

@DirectXMan12 DirectXMan12 self-assigned this Jan 28, 2015
@DirectXMan12
Copy link
Member

(i.e. qemu causes Firefox, but not Chrome, to start eating CPU, but a traditional VNC server (e.g. tightvnc) does not)

@DirectXMan12
Copy link
Member

Ok, so, I did some quick debugging and comparison, and here what I've found so far:

Both TightVNC and qemu use the TIGHT encoding. However, they each make different uses of the various compression modes.

I inserted some quick metrics code into the TIGHT encoding handler, and here's what a got: TightVNC uses a mix of copy, fill, jpeg, and palette filter operations, and tends to send larger chunks of data per operation. On the other hand, qemu tends to use copy and fill operations (with a , with small chunks of data.

I did two sets of runs to capture some numbers. The first set of runs consisted of 5 second bursts, and captured data about copy, fill, and jpeg operations. The second set consisted of 10s bursts, and captured data about copy, fill, jpeg, and filter operations (note that this operation was a bit "slower" since the computations and logging for the filter average also had to be done).

The commands used were (sleep 5s; pkill ls) &; ls -lR / and (sleep 10s; pkill ls) &; ls -lR /.
Debug logging was enabled, and extra logging was inserted as well, so these throughputs are somewhat lower than normal operation.

Server Dur Total Ops Copy Ops Avg Copy Size Fill Ops JPEG Ops Avg JPEG Size Filter Ops Avg Filter Size
tightvnc 5s 1273 455 1040 627 191 5222
tightvnc 5s 1170 413 1154 581 166 4777
qemu 5s 34150 24484 98 9666 0 0
qemu 5s 33495 23986 101 9509 0 0
tightvnc 10s 2283 774 960 1068 291 5227 150 1120
tightvnc 10s 2424 813 1036 1135 298 5277 178 1220
qemu 10s 60667 43017 86 16435 0 0 1215 8
qemu 10s 63986 45297 86 17498 0 0 1191 8

As you can see from the table above, qemu uses many more (30x!) operations with much smaller amounts of data. I suspect that this is somehow overwhelming our code in Firefox. We probably didn't notice it earlier because it really only becomes an issue with high-frequency refreshes with large amounts of change (such as rapidly scrolling text).

An important thing to note here is that the display in chrome seems to slow down, but Chrome doesn't freeze like Firefox.

I'll investigate further and see if I can pin down if there's a specific factor that's causing the slowdown.

@abligh
Copy link
Author

abligh commented Jan 28, 2015

Interesting. Out of interest I connected with guacamole+vnc and it is fine with both chrome and firefox. This may of course be because they are handling things server side.

@kanaka
Copy link
Member

kanaka commented Jan 28, 2015

@abligh yeah, guacamole is completely different. The question would be if the Java VNC client on the server starts to grow memory use or slow down when connecting to QEMU.

@DirectXMan12 Any chance you could do similar tests with the memory profiler in firefox or Google. There might be a memory leak/cycle that QEMU vnc traffic is exacerbating and profiling would probably tell us where. We might need to self-manage a memory pool for something. Actually, I suspect switching everything to use typed arrays throughout would probably address this issue too (partly because, with typed arrays everywhere we would probably be managing more of our own memory too).

@DirectXMan12
Copy link
Member

@kanaka: that's what I've been investigating this afternoon -- I suspected it was either leaks or GC pauses (or both) that were doing us in. I'll let you know if I find conclusive data, but preliminary results indicate that we have a lot of array allocations (unsurprising).

@DirectXMan12
Copy link
Member

@kanaka: here's a couple of memory timelines that you can open up in Chrome/Chromium (open up the Dev Tools, go to "Timeline", right click -> "Load Timeline Data", and then make sure that up top the whole range is selected (not in grey)): https://gist.github.com/DirectXMan12/6cbec585cfe23679ae06. The sharp drop at the end is me triggering a forced GC. Since it goes all the way down, it looks like (at least in Chrome) we don't have a leak. However, it looks to me like we're triggering GC fairly frequently.

@DirectXMan12
Copy link
Member

I've added Firefox profile data to the above multi-file gist as well. From a peek at the Firefox timeline, it doesn't seem like it's GCing frequently like Chromium.

@DirectXMan12
Copy link
Member

@kanaka: So, after a bit more investigating, it turns out that one major issues is that our zlib decompressor is slow. I attempted to replace it with pako (https://github.com/nodeca/pako), but encountered some difficulties where a couple of messages result in outputs that are much larger than what the old library generated (additionally, you have to tell it to use a suitably big buffer, since it uses fixed-sized Uint8Arrays). Ignoring, for the moment, a small bit of graphical distortion initially, the resulting output is quite fast, and does not seem to crash Firefox.

The main issue that I've found with other zlib implementations in javascript is that they assume the use case of "decompress this whole object" and not "decompress this next part of a stream", and thus get unreasonably fussy when you try to use them with VNC.

I may be a bit delayed in following up on this, so if someone else wants to do the research, go ahead.

@DirectXMan12
Copy link
Member

Fixes implemented in #488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants