New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webfrontend freezes after some time #1523
Comments
Nothing at all in the Javascript console? That sounds wrong, it will at the very least log framework initialization messages there. Could you please take another look when it reproduces? The Javascript console and also the network tab will probably be crucial to getting to the ground of this. Also please clarify: while you did have the tab open, did you also have it focused? OctoPrint will detect if the tab is focused or not and at disable e.g. the gcode viewer updates from happening when it isn't (since those are quite expensive client side). That wouldn't explain the lack of general updates, but it would still be interesting to know if that was the case or not. Looking at your log:
This looks like something went seriously wrong in sending data from the server via the websocket to your browser. This message gets produced when an exception is triggered in the underlying sending code and usually the message of said exception should be logged right after the How reliably can you reproduce this? And how's your client and server connected to your LAN, cable or wifi? |
OK, just for clarification: You suggest to use the Webdeveloper tools in firefox, right? I will let it run once my pi is online again and I am printing. This might happen this evening or tomorrow. The tab was in background. I had it open to be able to abort in case of a problem with the printer. I was surfing the net while it was running. I would guess that a closed websocket will not lead to a complete breakdown in python, will it? I would rather expect to get an exception with the possibility to handle (and log) it. Or am I mislead? The server is in my LAN (cable). The named IP is a connection from outside using openvpn there. But the same problem arises only in the LAN (I have to admit using my laptop with WiFi). I could go with a cable to the laptop. Further I wand to tell that most of the time I have a SSH connection to the server open (through the VPN) and it is configured to disconnect on ping errors. Thus the network seems reliable to me (also no lag or something). |
Yes
No, but since the websocket is the transport used to push updates to the frontend it will lead to the updates stopping. You also should get an offline dialog when the client realizes that the socket was closed, and since you didn't get that apparently something happened causing the communication to stop getting through but with the socket still appearing open to both sides. Usually it goes like this: socket goes down, both sides realize that, python server logs a message to the log ("connection closed"), javascript client triggers the "Server is offline" overlay (with a short delay to keep it from flickering on unreliable connections) and starts attempting to reconnect. What appears to have happened here is that the server ran into some hard trouble sending the usual updates to the client at first but didn't detect anything wrong with the connection, the client still stopped receiving updates though but didn't detect anything wrong with the connection either. Since the server logged neither "client closed" nor further warnings or even errors after the two quoted ones, I'm leaning towards the issue lying with the client in your browser here. I remember someone else running into similar issues a while back, but they appear to have resolved with 91b5fc6 - that's present in 1.2.16 however, so it's probably not the same reason. So something is fishy here and any logs or network traces from the client side might help to clear it up. |
As a side note, I have never seen that happen myself, and I have OctoPrint tabs whose istances are busy printing open pretty much constantly. Hence I need as much data from your side as possible to have even a remote chance to maybe understand what's happening here :) |
If you like I can also let wireshark sniff all the traffic if the data is interesting for you... |
Might give us some additional insights :) Right now anything might give the final clue as to what is happening. I just sincerely hope it's not another case of "browser decided to do weird stuff with the web socket connection after an update" :/ |
Might help bring some clarity for #1523
@christianlupus Forgot to mention, I also pushed a commit to the loggers:
octoprint.server.util.sockjs:
level: DEBUG If you could reproduce the same issue on |
Hello, |
OK, I tried my best but I was unable to do an export of the javascript console. First here you can have a wireshark sniff from the client filtered for only the relevant IP. Further a plain text copy of the logger in firefox with network and javascript enabled. In the ending you see three clicks to load a OUTDOOR file. This was not possible/not reflected by the GUI. Nevertheless these POST requests for the webserver were successful (?). For example the first of these led to the answering header, which indicates at least that the octoprint server is still responding, isn't it? The information is routed through haproxy as suggested by the wiki page. In the logs of haproxy I do not find anything interesting. Here you see, that the javascript logging is enabled and nothing is shown: The recent octoprint.log can be found here. OK, I think this all information I can directly give to you. If you need more, please tell me. PS: My version is 6c21ee3. |
Hm. In the recent octoprint.log, there's not the same error sending a message to the client that we saw earlier (and which I upped the logging for to see details on in the first place in the form of a stack trace). But the updates still stalled? If so that would be bad because it means that the errors when sending a message to the stalled client in question that we saw in the earlier log are unrelated to the issue after all. Which means we'd be back at square one. As a side note, it looks like you modified I'm not surprised that you could still issue POST requests - what's going over the board here appears to be only the push connection (which is what pushes the progress data and all that stuff to the frontend). Clicking buttons and all that goes over the regular API however, the websocket is only used for pushing "real time" (more or less) data from the backend to the frontend. I guess the next step will be to up the logging on the frontend, to try to see what is happening there. But first I'll try if your PCAP file (thanks for that) doesn't contain some hint. |
* Log heartbeat messages * Log disconnects including error code * Log when messages are received * Log when messages are NOT received after 5s
I sadly was not able to glimpse anything from the pcap file so far, but I made a branch with some extra logging in the client for you, That branch logs to the javascript console when
Should look somewhat like this: Please test if you can reproduce the issue with that branch (the commit linked above is the only thing that makes that branch differ from 1.2.17rc1 btw) and if so provide a similar screenshot. It's important that the line "Didn't get a message from the backend for the past 5 seconds, that's unusual!" is on the screenshot then (it's not part of my own screenshot above) - well, and if that's not to be found, that's some interesting info as well! What I want to see is what happens right before (any weird message/heartbeat/close behaviour or something) and also what happens after (are there still heartbeats etc). |
Hello again, I updated to the named branch and added one more configuration update: I inserted the webcam. Now I had the web frontend open for more than 2 hours and no problem arose. Now I go back and unconfigured the webcam. This way I can see if the webcam made any difference. Unfortunately I cannot give any more information for today as I have to leave soon and it is unlikely that the problem arises again in the remaining time. |
@christianlupus I've since seen this too (once or twice) but haven't been able to figure out much more. @Salandora appears to be able to reproduce it somewhat reliable, but it so far vanishes once debug probes are inserted. Once thing we noticed in your pcap file is that the socket runs in PS: Your log files on paste.debian.net vanished - my guess is they were auto deleted. In the future better use a more reliable pasting service such as gist.github.com |
Hello again, Regarding the setup: First, I do not use OctoPi. I use a plain raspbian image and installed octoprint inside. The server is proxied via the proposed haproxy. The octoprint server listens on port 5000 and haproxy is configured to listen on port 80 and redirecting everything except for some single path prefixes to the port 5000 on localhost. I am sorry about the vanished logs. I was not aware of this. I used the tool |
Ok, that's the tornado version that should be installed, just as expected. So it's not that. And your setup (btw, that description up there is sufficient, thanks, no need for a full fledged network plan with ip ranges :)) doesn't sound that weird either. What somewhat troubles me is the In any case, considering that @Salandora is seeing the same (or at least a virtually indistinguishable) behaviour but does not have that Of course, it's still impossible for me to reproduce this issue on a whim. I only saw it a couple weeks ago during a semi-long print, and once I tried to look further into it it of course stopped showing up again. That makes pinning this down extremely hard. Might be a timing issue/race condition somewhere |
I mainly walked along the wiki of octoprint to configure haproxy. I only added one additional prefix for lighttpd for other printer-related issues. This should not be the difference here. haproxy is version 1.5.8-3+deb8u2 in dpkg. |
I agree. |
Small update. It might actually be the I've pinned tornado to 4.0.2 (soon to be pushed to It would be interesting though if once 1.3.0rc2 is out if you could test with that if you can still observe the issue. |
Not sure if 100% related to this or it's the acual same issue. I just posted this on TouchUI github: "Not sure who to blame here, but I'll try it here first since it mainly happens on TouchUI LOCAL pi+touch display.... Thing is, when prints finish overnight and printer stays idle for a long long time, raspi is lagggish as hell, and takes ages to respond to touch commands... Happens on both pi2 and pi3. Restarting the service seems to solve the issue, so i wonder if there's some sort of memory hog increased by the idle status, like "terminal" log buildup on MEMORY instead than on "disk" cache, that makes Pi go slow as hell?? Obviously, Pi reacts fine when accessed remotelly via a computer in the LAN." And I'm on the latest stable 1.3.0 master :( EDIT: Just to add, once again, that i notice this only in LOCAL Pi+Touch display+TouchUI. And it does not completely freeze, it just laggggs big time. |
Not related 👍 But after reading this issue, I am wondering, @christianlupus is this resolved for you in 1.3.0?? |
@BillyBlaze I am not sure I have to rebuild most of my printer and need to wait for spare parts (yet ordered but not arrived). I broke the head due to a wrong assembly. Until then it does not make much sense to use Octoprint... I will report soon I hope. |
@christianlupus any chance to check? :) |
Not yet, sorry. Cable had wrong pinning. Reordered... |
OK, I finally got my printer running again. Yay! I will close this issue and will reopen it in case the problem reappears. |
What were you doing?
I was surfing in the net and had a tab with octoprint open.
What did you expect to happen?
The tab keeps updating information (like temperature, print status, etc) on a regular basis.
What happened instead?
On a longer print (1h or more I guess), the site simple stalls. No information is updated anymore. (Neither temp nor printing time, which should clearly increase.)
In this situation I also cannot use any of the GUI. This can be critical for example if I need to stop a wild printer from getting crazy and I have no control.
Only a reload of the browser page can reactivate the GUI and updates the information again. However this takes some time (depending on hardware and network connection) which is bad in case of trouble.
Branch & Commit or Version of OctoPrint
Version: 1.2.16 (master branch): 354042b
Browser and Version of Browser, Operating System running Browser
Firefox 49.0 under Ubuntu 14.04 and under 15.10
Link to octoprint.log
http://paste.debian.net/844935/
The issue aruse at 15:50 for sure with the machine 10.1.100.6
Link to contents of terminal tab or serial.log
http://paste.debian.net/844947/
Link to contents of Javascript console in the browser
There was nothing logged as far as I can tell. Potentially it was deactivated but I am unsure here.
Screenshot(s) showing the problem:
This is a screenshot of just now but the printer should be further in the process.
This is a screenshot a few minutes later after reload of the page:
You see the jump of about 15% and 45 min printing time.
I have read the FAQ and a I did not find anything related to my problem.
The text was updated successfully, but these errors were encountered: