-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
noVNC sudden disconnects #567
Comments
Just a quick update - I had been using the latest master branch download for all my testing, so I decided to try using the last official release version ( |
Thought I would send another update. Since I happen to work in the datacenter my server is hosted at, I thought I might try connecting from there, thinking perhaps there was a latency issue between my apt. and the server that could be attributed to this. The connection was visibly faster as to be expected, but the random, inconsistent disconnects persisted, effectively ruling out network issues. I had already sort of done that anyway as I had been using guacamole beforehand and had had no such problems, but I figured it was worth a shot anyway. |
This sounds similar to what I was seeing today with our iframe implementation of the noVNC viewer.
|
This seems very odd, it's nothing I've encountered. |
Very odd indeed. I can't imagine this is super-widespread, but there do seem to be a few issues that are similar. I tried installing Firebug to see if that might give me any more verbose output but it didn't give me anything more than what I noted in my original post. I also thought I might be on to something yesterday when I noticed my firewall blocking some traffic from my VNC connection. I was for sure that was it, but it made no impact unfortunately. I'm still sifting through logs and what-not, but I'm afraid I'm not getting anywhere. |
May have something. I turned on debugging logging for my libvirtd backend, and it appears that it is receiving a
I get the same exact messages in the debug log if I just close my window while the display is running. So from what I can tell from this, something is instructing the backend to end the session prematurely. I wish I could blame this on a timeout of some variety, but due to the extreme variability of this (sometimes disconnects in mere seconds, other times several minutes), I find it very unlikely to be a timeout. For what it's worth, I've also boosted any related timeouts to extreme figures to no effect. |
I've encountered a similar issue with Firefox 43, noVNC wasn't the only impacted but also long running ajax request. |
Given that I experience this issue on all browsers/OS's, I didn't think it would work. I did try it anyway, but the disconnects are still occurring. I'm quite frankly about out of ideas about what might be causing this. |
@Ziris85 any chance you could capture a tcpdump between the browser and websockify/libvirt? That might give more indication of where the problem is. Note: it would need to be ws:// or manually unencrypted to be of much use. |
Sure @kanaka . I have a dump, but I don't see a way of attaching a file to this. So here is a giant ugly comment:
Worth mentioning that this dump is not from the beginning of the connection, but shortly after it started. The bottom is immediately after the disconnect occurred. |
Can you add one with the first few bytes of the payload included, so we can see the series of messages that noVNC is sending? Alternatively, a websockify recording would also work (in fact, that would be preferable, as it would enable us to reproduce the issue on our end). I would recommend sticking it in a gist and then linking to the gist here, or uploading it elsewhere (pastebin, etc). |
Oh hey, fancy. Here's a gist of the initial connection: |
I see this was labelled as a bug a bit ago. Coding is hardly my strong suite, so let me know how I can help debug/test further. |
@Ziris85 without payload I can't really determine what's going on. A file capture (tcpdump -w FILE) would be best. You can clone a gist and add a binary file that way. Or just add a file to you fork of noVNC on a branch or something like that. |
I made a tcpdump (see attached zip). You can ignore the "health checks" from our load-balancer, relevant ip is 2001:7b8:3:1000:201:80ff:fe7c:2f35 Error message logged by "websockify" is:
|
@hydro-b does your problem occur across all browsers and intermittently like for @Ziris85? If you have a load balancer in-between then that's what I suspect is mucking with the network traffic (especially given the invalid packet that websockify receives). The pcap file you sent isn't much use to me because it looks like it's TLS (SSL) encrypted data. Can you reproduce this without encryption and post a new capture? |
@kanaka Since this issue was opened, I've seen a similar message in websockify several times. It seems to mainly occur when connections get terminated unexpectedly in noVNC. |
@kanaka What I had been providing were tcpdumps (using the command That is a complete dump, from just prior to the connection starting, to just after it dropped. That is a dump from the web servers perspective also (where novnc is running) to the backend (where libvirt is running). |
@Ziris85 Okay, so here is what is happening in your dump.
So for some reason the server stopped responding to the client frame buffer requests. And then the server closes the connection when the client sends a ping. Previously you said you were connecting directly from the noVNC client directly to the libvirt/QEMU VNC server without using websockify. Was that the case for this dump? If so, then it looks like something is happening on the server side to me. For some reason it stops responding to frame buffer requests and then the next time the client tries to talk to it with a websocket ping it closes the connection. |
@kanaka I wonder if this dump might show anything different. That is one with constant activity, and having it disconnect while I was doing stuff in there (as opposed to the other one where I was just letting it sit until it timed out). I'm honestly just about out of ideas for this - I had been using websockify before this and was having the same problem; I've exhausted every timeout/keepalive I can find in relation to qemu/libvirtd;I've ruled out firewalls by temporarily disabling them on both sides; I just now tried testing from a remote machine by using novnc/websockify launch.sh quickstart, and it errors out with the following message:
I'm open to suggestions on things to try, because I'm out of them myself. |
I wanted to see if I could prove to myself that libvirt was to blame for these disconnects. Going with the idea that libvirt is unable to sustain a VNC connection, I figured it would not matter what client I used - libvirt would eventually just stop responding. I first started with a desktop client - ssvnc. I used it to connect to the same vnc client as I had been using. However, where the novnc connection would drop, the ssvnc client retained its connection through the entirety of my testing. I next tried going with another browser-based solution. Since I already know that guacamole works, I tried using spice-html5. This (while it appears to have a bug of its own) also did not produce any sudden disconnects. As far as I've been able to tell, libvirt/qemu seems to be fine. I'd really like to see novnc working in this situation since it feels like the right tool for the job, but I don't know what else to do. @DirectXMan12 , you mentioned that a websockify recording would be helpful - I can see if I can make one if we think it might still be helpful/useful. Thanks for all your help so far with this folks. |
@Ziris85 it's certainly possible that noVNC is doing something to trigger the server to disconnect, but it's definitely the server doing the visible disconnect on the wire and not the client. In the latest pcap file you sent I see the following leading up to the disconnect:
It all looks quite normal to me right up to the point where the connection is closed by the server-side SYN-ACK. So my suspicion is that there might be a bug in the WebSocket implementation/listener in QEMU or libvirt (probably QEMU since libvirt should just be doing a passthrough of the options to QEMU/kvm). @Ziris85 I don't think the websockify capture will reveal any more than the tcpdump since it's really just a subset of that. If it was the client doing the close, then the websockify capture would allow us to reproduce the condition in noVNC but since noVNC appears to be operating normally and it's the server initiating the close, a websockify capture probably won't reveal anything new. Perhaps the next step would be to run a qemu/KVM instance directly and see if it reproduces without libvirt in the mix. You can do a |
@kanaka , Thanks for that info. I had had a similar thought as well, only instead of invoking qemu by hand, to instead try newer and older versions of libvirtd, respectively, by installing the latest version of Fedora, and CentOS 6. I unfortunately am not in a position where I have another web-facing server at my disposal to try that however. Going to see what I can do about that though, unless someone else comes along who can say they've already tried one or both of those and knows how that story ends. |
Any luck on the bug so far, I think I have the same issue, I'm trying to connect to a VNC server raised in a container deployed in opennebula, and if i click on the VNC button the connection opens and everything works fine, after a while the server falls, I used vncterm to create the server and noVNC used by opennebula, which differs from the statndard one in how it looks. |
Actually, I believe my issue to be resolved. For a variety of reasons, I switched my webserver from nginx to Apache, and once I translated the environment over, the issue disappeared. Where I was thinking that the issue lie with the qemu version on the backend I was connecting to, now I believe the issue actually to be with the way nginx handles its websocket connections. It's possible that nginx doesn't keep the connection fresh in the same way as Apache does, to the point that the backend just assumes that the connection has idled out and times out. Just a thought, since I've no real answer honestly. Guess my suggestion to anyone encountering the same issue I was having is: if you're using nginx, try using Apache for that connection instead. |
Hello all,
|
Thank you @Ziris85 and @karinepires for sharing your solutions. |
I wonder if a more robust solution would be to incorporate a heartbeat into the websocket connection. There are 2 reasons I can think of:
@samhed what do you think and its it possible without affecting the VNC traffic? If you agree I'll make an issue and try to do a PR |
@naggie what part of the connection is sensitive in your use case?
If it's between noVNC and the websocket proxy (1), I believe that the websocket protocol has something built in. Perhaps the browsers are using it already? With regards to option 2, implementing into Websockify for example, it wouldn't be possible since Websockify is a dumb proxy websocket->TCP which isn't aware of what kind of data is being sent to the server. The third option would mean sending data over VNC. It would require writing an extension to the RFB protocol, some sort of message that would be supported in noVNC and the VNC server. |
Hi @samhed It's (1) between noVNC and the proxy. Indeed -- websockets have a PING/PONG mechanism built in but unfortunately none of the browsers actually expose that API. However your comment has made me look further. It appears Firefox and Chrome now implement TCP keepalive which means this is not an issue for those browsers. However, it seems MS Edge does not implement such keep-alives. A heartbeat between noVNC and the proxy could be an option. Thanks! |
Afternoon,
Similar to the discussion found here, I'm seeing noVNC suddenly disconnecting to the backend on a regular basis. I had thought I had been hitting some sort of server timeout or something, but that idea was dashed when I witnessed it dropping the connection as I was actively typing in the window. It also (perhaps unfortunately) doesn't happen with any regularity - sometimes it will last for several minutes before dropping, other times it will drop mere seconds after showing the display. That said, I've never had a session last longer than a half hour or so, and a connection drop always happens.
I had for a while been using websockify between the client and the server. However, after discovering that libvirt had support for websocket VNC connections, I cut out the middle-man and connect directly to that now. However, the issue has remained, thus helping narrow down that noVNC has the connection issue, rather than websockify
Since it was requested in the aforementioned issue, I'm providing the browser console results here. This is the entire session in Firefox:
It is perhaps worth mentioning that Firefox seems to suffer the issue the worst - other browsers like Opera, Chrome, even IE fare better, though they ALL will eventually have the same problem.
Any thoughts on what this might be? Let me know how I can help troubleshoot this further.
Thanks in advance.
The text was updated successfully, but these errors were encountered: