Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Webfrontend freezes after some time #1523
What were you doing?
I was surfing in the net and had a tab with octoprint open.
What did you expect to happen?
The tab keeps updating information (like temperature, print status, etc) on a regular basis.
What happened instead?
On a longer print (1h or more I guess), the site simple stalls. No information is updated anymore. (Neither temp nor printing time, which should clearly increase.)
In this situation I also cannot use any of the GUI. This can be critical for example if I need to stop a wild printer from getting crazy and I have no control.
Only a reload of the browser page can reactivate the GUI and updates the information again. However this takes some time (depending on hardware and network connection) which is bad in case of trouble.
Branch & Commit or Version of OctoPrint
Version: 1.2.16 (master branch): 354042b
Browser and Version of Browser, Operating System running Browser
Firefox 49.0 under Ubuntu 14.04 and under 15.10
Link to octoprint.log
The issue aruse at 15:50 for sure with the machine 10.1.100.6
Link to contents of terminal tab or serial.log
There was nothing logged as far as I can tell. Potentially it was deactivated but I am unsure here.
Screenshot(s) showing the problem:
This is a screenshot of just now but the printer should be further in the process.
I have read the FAQ and a I did not find anything related to my problem.
Also please clarify: while you did have the tab open, did you also have it focused? OctoPrint will detect if the tab is focused or not and at disable e.g. the gcode viewer updates from happening when it isn't (since those are quite expensive client side). That wouldn't explain the lack of general updates, but it would still be interesting to know if that was the case or not.
Looking at your log:
This looks like something went seriously wrong in sending data from the server via the websocket to your browser. This message gets produced when an exception is triggered in the underlying sending code and usually the message of said exception should be logged right after the
How reliably can you reproduce this? And how's your client and server connected to your LAN, cable or wifi?
OK, just for clarification: You suggest to use the Webdeveloper tools in firefox, right? I will let it run once my pi is online again and I am printing. This might happen this evening or tomorrow.
The tab was in background. I had it open to be able to abort in case of a problem with the printer. I was surfing the net while it was running.
I would guess that a closed websocket will not lead to a complete breakdown in python, will it? I would rather expect to get an exception with the possibility to handle (and log) it. Or am I mislead?
The server is in my LAN (cable). The named IP is a connection from outside using openvpn there. But the same problem arises only in the LAN (I have to admit using my laptop with WiFi). I could go with a cable to the laptop. Further I wand to tell that most of the time I have a SSH connection to the server open (through the VPN) and it is configured to disconnect on ping errors. Thus the network seems reliable to me (also no lag or something).
No, but since the websocket is the transport used to push updates to the frontend it will lead to the updates stopping. You also should get an offline dialog when the client realizes that the socket was closed, and since you didn't get that apparently something happened causing the communication to stop getting through but with the socket still appearing open to both sides.
So something is fishy here and any logs or network traces from the client side might help to clear it up.
As a side note, I have never seen that happen myself, and I have OctoPrint tabs whose istances are busy printing open pretty much constantly. Hence I need as much data from your side as possible to have even a remote chance to maybe understand what's happening here :)
@christianlupus Forgot to mention, I also pushed a commit to the
loggers: octoprint.server.util.sockjs: level: DEBUG
If you could reproduce the same issue on
First here you can have a wireshark sniff from the client filtered for only the relevant IP.
In the ending you see three clicks to load a OUTDOOR file. This was not possible/not reflected by the GUI. Nevertheless these POST requests for the webserver were successful (?). For example the first of these led to the answering header, which indicates at least that the octoprint server is still responding, isn't it?
The information is routed through haproxy as suggested by the wiki page. In the logs of haproxy I do not find anything interesting.
The recent octoprint.log can be found here.
OK, I think this all information I can directly give to you. If you need more, please tell me.
PS: My version is 6c21ee3.
Hm. In the recent octoprint.log, there's not the same error sending a message to the client that we saw earlier (and which I upped the logging for to see details on in the first place in the form of a stack trace). But the updates still stalled? If so that would be bad because it means that the errors when sending a message to the stalled client in question that we saw in the earlier log are unrelated to the issue after all. Which means we'd be back at square one.
As a side note, it looks like you modified
I'm not surprised that you could still issue POST requests - what's going over the board here appears to be only the push connection (which is what pushes the progress data and all that stuff to the frontend). Clicking buttons and all that goes over the regular API however, the websocket is only used for pushing "real time" (more or less) data from the backend to the frontend.
I guess the next step will be to up the logging on the frontend, to try to see what is happening there. But first I'll try if your PCAP file (thanks for that) doesn't contain some hint.
I sadly was not able to glimpse anything from the pcap file so far, but I made a branch with some extra logging in the client for you,
Should look somewhat like this:
Please test if you can reproduce the issue with that branch (the commit linked above is the only thing that makes that branch differ from 1.2.17rc1 btw) and if so provide a similar screenshot. It's important that the line "Didn't get a message from the backend for the past 5 seconds, that's unusual!" is on the screenshot then (it's not part of my own screenshot above) - well, and if that's not to be found, that's some interesting info as well! What I want to see is what happens right before (any weird message/heartbeat/close behaviour or something) and also what happens after (are there still heartbeats etc).
I updated to the named branch and added one more configuration update: I inserted the webcam. Now I had the web frontend open for more than 2 hours and no problem arose. Now I go back and unconfigured the webcam. This way I can see if the webcam made any difference. Unfortunately I cannot give any more information for today as I have to leave soon and it is unlikely that the problem arises again in the remaining time.
@christianlupus I've since seen this too (once or twice) but haven't been able to figure out much more. @Salandora appears to be able to reproduce it somewhat reliable, but it so far vanishes once debug probes are inserted.
Once thing we noticed in your pcap file is that the socket runs in
PS: Your log files on paste.debian.net vanished - my guess is they were auto deleted. In the future better use a more reliable pasting service such as gist.github.com
Regarding the setup: First, I do not use OctoPi. I use a plain raspbian image and installed octoprint inside. The server is proxied via the proposed haproxy. The octoprint server listens on port 5000 and haproxy is configured to listen on port 80 and redirecting everything except for some single path prefixes to the port 5000 on localhost.
I am sorry about the vanished logs. I was not aware of this. I used the tool
Ok, that's the tornado version that should be installed, just as expected. So it's not that.
And your setup (btw, that description up there is sufficient, thanks, no need for a full fledged network plan with ip ranges :)) doesn't sound that weird either.
What somewhat troubles me is the
In any case, considering that @Salandora is seeing the same (or at least a virtually indistinguishable) behaviour but does not have that
Of course, it's still impossible for me to reproduce this issue on a whim. I only saw it a couple weeks ago during a semi-long print, and once I tried to look further into it it of course stopped showing up again. That makes pinning this down extremely hard. Might be a timing issue/race condition somewhere
Small update. It might actually be the
I've pinned tornado to 4.0.2 (soon to be pushed to
It would be interesting though if once 1.3.0rc2 is out if you could test with that if you can still observe the issue.
Not sure if 100% related to this or it's the acual same issue. I just posted this on TouchUI github:
"Not sure who to blame here, but I'll try it here first since it mainly happens on TouchUI LOCAL pi+touch display....
Thing is, when prints finish overnight and printer stays idle for a long long time, raspi is lagggish as hell, and takes ages to respond to touch commands...
Happens on both pi2 and pi3. Restarting the service seems to solve the issue, so i wonder if there's some sort of memory hog increased by the idle status, like "terminal" log buildup on MEMORY instead than on "disk" cache, that makes Pi go slow as hell??
Obviously, Pi reacts fine when accessed remotelly via a computer in the LAN."
And I'm on the latest stable 1.3.0 master :(
EDIT: Just to add, once again, that i notice this only in LOCAL Pi+Touch display+TouchUI. And it does not completely freeze, it just laggggs big time.
OK, I finally got my printer running again. Yay!
I will close this issue and will reopen it in case the problem reappears.