Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless Resends after End of Print (perhaps related to a misunderstood command?) #470

Closed
cakeller98 opened this issue May 16, 2014 · 30 comments
Labels
needs information More information is needed to further process this issue or PR

Comments

@cakeller98
Copy link

commit: e7088ef
on Windows 8.1 Tablet

Something is happening at the end of my prints that is causing the endless resend loop, until the connect button is pressed 2x.

The first press returns this error:

just messing arond at the terminal I found that any malformed command results in Octoprint thinking it's not connected
The Disconnect turns to a Connect Button - though it has to be pressed 2x to actually reconnect as the first press will fail with the following error:

Changing monitoring state from 'Error: Format error
' to 'Closed'
Unexpected error while reading serial port: SerialException: 'Attempting to use a port that is not open' @ comm.py:_readline:986
Connection closed, closing down monitor
Changing monitoring state from 'Offline' to 'Opening serial port'
Connecting to: COM3
Unexpected error while connecting to serial port: COM3 SerialException: 'could not open port 'COM3': WindowsError(5, 'Access is denied.')' @ comm.py:_openSerial:949
Changing monitoring state from 'Opening serial port' to 'Error: Failed to open seria...'

After the second press of the connect button:

Changing monitoring state from 'Offline' to 'Opening serial port'
Connecting to: COM3
Connected to: Serial<id=0x23455d0, open=True>(port='COM3', baudrate=115200, bytesize=8, parity='N', stopbits=1, timeout=2.0, xonxoff=False, rtscts=False, dsrdtr=False), starting monitor
Changing monitoring state from 'Opening serial port' to 'Connecting'
Recv: start

...etc and back to normal operation

I don't know what's happening at the end of a print that's causing this, or how long AFTER the print has finished this is occurring, however you can initiate this exact error behavior by submitting any malformed command into the terminal (e.g. simply send a single character like 'a' or a space.)

My repetier firmware commit (before customization): 2b1c85411069154b71d7aa580b4185837516d126
My octoprint is at commit: e7088ef

@cakeller98
Copy link
Author

It appears, and this may be incidental, but appears the status changing to:
'Error: Format error
'
seems like there's something wrong... but maybe I'm wrong.

To manually recreate this behavior follow the steps demarcated with >>>

    >>> CLICKED THE CONNECT BUTTON 
    ...
Changing monitoring state from 'Offline' to 'Opening serial port'
Connecting to: COM3
Connected to: Serial<id=0x21a36f0, open=True>(port='COM3', baudrate=115200, bytesize=8, parity='N', stopbits=1, timeout=2.0, xonxoff=False, rtscts=False, dsrdtr=False), starting monitor
Changing monitoring state from 'Opening serial port' to 'Connecting'
Recv: start
Recv: Free RAM:1764
Recv: X:0.00 Y:0.00 Z:0.00 E:0.00
Send: N1 M105*38
Recv: SD init fail
Recv: ok 1
Changing monitoring state from 'Connecting' to 'Operational'
Send: N2 M21*18
Recv: T:0.00 /0 B:0.00 /0 B@:0 @:0
Recv: ok 2
Send: N3 M105*36
Recv: SD init fail
Recv: ok 3
    ...
    >>> many m105 commands removed for brevity <<<
    ...
Recv: ok 9
Recv: T:26.40 /0 B:27.44 /0 B@:0 @:0
    ...
    >>> ENTERED 'a' AND CLICKED SEND  
    ...
Send: N10 a*14
Recv: Error:Format error
Changing monitoring state from 'Operational' to 'Error: Format error
'
    ...
    ABOUT NOW "Disconnect" becomes a "Connect" button
    ...
Recv: 
Recv: Resend:10
Recv: ok
Recv: 
    ...
    >>> repeated over and over <<<
    ...
Recv: Resend:10
Recv: ok
Recv: 
Recv: Resend:10
Recv: ok
    ...
    >>> CLICKED THE CONNECT BUTTON
    ...
Changing monitoring state from 'Error: Format error
' to 'Closed'
Unexpected error while reading serial port: SerialException: 'Attempting to use a port that is not open' @ comm.py:_readline:986
Connection closed, closing down monitor
Changing monitoring state from 'Offline' to 'Opening serial port'
Connecting to: COM3
Unexpected error while connecting to serial port: COM3 SerialException: 'could not open port 'COM3': WindowsError(5, 'Access is denied.')' @ comm.py:_openSerial:949
Changing monitoring state from 'Opening serial port' to 'Error: Failed to open seria...'
    ...
    >>> CLICKED THE CONNECT BUTTON AND START CYCLE OVER
    ...

@cakeller98 cakeller98 changed the title Seems Repetier or Octoprint has a bad response to a misunderstood command Endless Resends after End of Print (perhaps related to a misunderstood command?) May 18, 2014
@cakeller98
Copy link
Author

OK Caught it before the repeating endless resend had filled the Octoprint terminal buffer:

Send: N111896 M105*17
Send: N111897 M105*16
Send: N111898 M105*31
Send: N111899 M105*30
Send: N111900 M105*31
Send: N111901 M105*30
Send: N111902 M105*29
Send: N111903 M105*28
Send: N111904 M105*27
Send: N111905 M105*26
Send: N111906 M105*25
Send: N111907 M105*24
Send: N111908 M105*23
Send: N111909 M105*22
Send: N111910 M105*30
Send: N111911 M105*31
Send: N111912 M105*28
Send: N111913 M105*29
Send: N111914 M105*26
Send: N111915 M105*27
Send: N111916 M105*24
Send: N111917 M105*25
Send: N111918 M105*22
Send: N111919 M105*23
Send: N111920 M105*29
Send: N111921 M105*28
Send: N111922 M105*31
Send: N111923 M105*30
Send: N111924 M105*25
Send: N111925 M105*24
Send: N111926 M105*27
Send: N111927 M105*26
Send: N111928 M105*21
Send: N111929 M105*20
Send: N111930 M105*28
Send: N111931 M105*29
Send: N111932 M105*30
Send: N111933 M105*31
Send: N111934 M105*24
Send: N111935 M105*25
Send: N111936 M105*26
Send: N111937 M105*27
Send: N111938 M105*20
Send: N111939 M105*21
Send: N111940 M105*27
Send: N111941 M105*26
Send: N111942 M105*25
Send: N111943 M105*24
Send: N111944 M105*31
Send: N111945 M105*30
Send: N111946 M105*29
Send: N111947 M105*28
Send: N111948 M105*19
Send: N111949 M105*18
Send: N111950 M105*26
Send: N111951 M105*27
Send: N111952 M105*24
Send: N111953 M105*25
Send: N111954 M105*30
Send: N111955 M105*31
Send: N111956 M105*28
Send: N111957 M105*29
Send: N111958 M105*18
Send: N111959 M105*19
Send: N111960 M105*25
Send: N111961 M105*24
Send: N111962 M105*27
Send: N111963 M105*26
Send: N111964 M105*29
Send: N111965 M105*28
Send: N111966 M105*31
Send: N111967 M105*30
Send: N111968 M105*17
Send: N111969 M105*16
Send: N111970 M105*24
Send: N111971 M105*25
Send: N111972 M105*26
Send: N111973 M105*27
Send: N111974 M105*28
Send: N111975 M105*29
Send: N111976 M105*30
Send: N111977 M105*31
Send: N111978 M105*16
Send: N111979 M105*17
Send: N111980 M105*23
Send: N111981 M105*22
Send: N111982 M105*21
Send: N111983 M105*20
Send: N111984 M105*19
Send: N111985 M105*18
Send: N111986 M105*17
Send: N111987 M105*16
Send: N111988 M105*31
Send: N111989 M105*30
Send: N111990 M105*22
Send: N111991 M105*23
Send: N111992 M105*20
Send: N111993 M105*21
Send: N111994 M105*18
Send: N111995 M105*19
Send: N111996 M105*16
Send: N111997 M105*17
Send: N111998 M105*30
Send: N111999 M105*31
Send: N112000 M105*21
Send: N112001 M105*20
Send: N112002 M105*23
Send: N112003 M105*22
Send: N112004 M105*17
Send: N112005 M105*16
Send: N112006 M105*19
Send: N112007 M105*18
Send: N112008 M105*29
Send: N112009 M105*28
Send: N112010 M105*20
Send: N112011 M105*21
Send: N112012 M105*22
Send: N112013 M105*23
Send: N112014 M105*16
Send: N112015 M105*17
Send: N112016 M105*18
Send: N112017 M105*19
Send: N112018 M105*28
Send: N112019 M105*29
Send: N112020 M105*23
Send: N112021 M105*22
Send: N112022 M105*21
Send: N112023 M105*20
Send: N112024 M105*19
Send: N112025 M105*18
Send: N112026 M105*17
Send: N112027 M105*16
Send: N112028 M105*31
Send: N112029 M105*30
Send: N112030 M105*22
Send: N112031 M105*23
Send: N112032 M105*20
Send: N112033 M105*21
Send: N112034 M105*18
Send: N112035 M105*19
Recv: ok 111890
Recv: Set output 5 to 0
Recv: ok 111891
Recv: T:66.34 /0 B:63.05 /0 B@:0 @:0
Recv: ok 111892
Recv: T:66.34 /0 B:63.05 /0 B@:0 @:0
Recv: ok 111893
Recv: T:66.34 /0 B:63.05 /0 B@:0 @:0
Recv: ok 111894
Recv: T:66.34 /0 B:63.05 /0 B@:0 @:0
Recv: ok 111895
Recv: T:66.34 /0 B:63.05 /0 B@:0 @:0
Recv: ok 111896
Recv: T:66.34 /0 B:63.05 /0 B@:0 @:0
Recv: T:66.34 /0 B:63.05 /0 B@:0 @:0
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Recv: 
Recv: Resend:111897
Recv: ok
Changing monitoring state from 'Operational' to 'Closed'
Send: N112036 M105*16
Unexpected error while writing serial port: SerialException: 'Attempting to use a port that is not open' @ comm.py:_doSendWithoutChecksum:1109
Connection closed, closing down monitor

@foosel
Copy link
Member

foosel commented May 19, 2014

could you provide an example gcode file? I'd like to take a look at your ending gcode. Also a full serial.log of a print's end might shed some light onto the issue, so we see what's causing the loop to start.

@cakeller98
Copy link
Author

I will do as you ask... after the next print, and I will log serial for that.

In the mean time...
octoprint.log (omitted the beginning since it's literally more of the same first three lines here... request to resend, no sufficient blah blah :D

2014-05-18 22:06:58,671 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-18 22:06:58,862 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-18 22:06:59,072 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-19 01:03:58,996 - octoprint.server - CRITICAL - Now that is embarrassing... Something really really went wrong here. Please report this including the stacktrace below in OctoPrint's bugtracker. Thanks!
2014-05-19 01:03:59,000 - octoprint.server - ERROR - Stacktrace follows:
Traceback (most recent call last):
  File "C:\dev\python27\lib\site-packages\octoprint-1.2.0_dev-py2.7.egg\octoprint\server\__init__.py", line 214, in run
    IOLoop.instance().start()
  File "C:\dev\python27\lib\site-packages\tornado-3.0.2-py2.7.egg\tornado\ioloop.py", line 627, in start
    event_pairs = self._impl.poll(poll_timeout)
  File "C:\dev\python27\lib\site-packages\tornado-3.0.2-py2.7.egg\tornado\platform\select.py", line 63, in poll
    self.read_fds, self.write_fds, self.error_fds, timeout)
error: (10038, 'An operation was attempted on something that is not a socket')

and the error from the end of serial.log (Of course, I'll upload a file and give you a link shortly too :) )

2014-05-19 00:35:53,551 - SERIAL - DEBUG - Recv: T:22.78 /0 B:24.03 /0 B@:0 @:0
2014-05-19 00:35:54,526 - SERIAL - DEBUG - Recv: wait
2014-05-19 00:35:54,559 - SERIAL - DEBUG - Send: N8137 M105*26
2014-05-19 00:35:54,591 - SERIAL - DEBUG - Recv: ok 8137
2014-05-19 00:35:54,634 - SERIAL - DEBUG - Recv: T:22.64 /0 B:23.89 /0 B@:0 @:0
2014-05-19 00:35:56,658 - SERIAL - DEBUG - Send: N8138 M105*21
2014-05-19 00:35:56,765 - SERIAL - DEBUG - Serial timeout while writing to serial port, trying again.
2014-05-19 00:35:56,799 - SERIAL - DEBUG - Unexpected error while writing serial port: SerialException: 'WriteFile failed (WindowsError(5, 'Access is denied.'))' @ comm.py:_doSendWithoutChecksum:1113
2014-05-19 01:03:58,954 - SERIAL - DEBUG - Changing monitoring state from 'Operational' to 'Closed'

and directly from the terminal window:

  File "C:\dev\python27\lib\logging\__init__.py", line 875, in emit self.flush()
  File "C:\dev\python27\lib\logging\__init__.py", line 835, in flush self.stream.flush()
IOError: [Errno 22] Invalid argument
Logged from file comm.py, line 216
Traceback (most recent call last):
  File "C:\dev\python27\lib\logging\__init__.py", line 875, in emit self.flush()
  File "C:\dev\python27\lib\logging\__init__.py", line 835, in flush self.stream.flush()

...
478 lines removed (looks just like the ones above and below here)
...

Traceback (most recent call last):
  File "C:\dev\python27\lib\logging\__init__.py", line 875, in emit self.flush()
  File "C:\dev\python27\lib\logging\__init__.py", line 835, in flush self.stream.flush()
IOError: [Errno 22] Invalid argument
Logged from file comm.py, line 216
2014-05-18 18:44:01,456 - octoprint.server.util - INFO - New connection from client: 24.205.93.241
2014-05-18 20:48:17,947 - octoprint.server.util - INFO - Client connection closed
2014-05-18 20:48:19,598 - octoprint.server.util - INFO - New connection from client: 24.205.93.241
2014-05-18 21:20:38,207 - octoprint.server.util - INFO - Client connection closed
2014-05-18 22:03:54,697 - octoprint.server.util - INFO - New connection from client: 24.205.93.241
2014-05-18 22:06:49,819 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-18 22:06:50,013 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-18 22:06:50,223 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend

...
dozens of repeats
...

2014-05-18 22:06:58,671 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-18 22:06:58,862 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-18 22:06:59,072 - octoprint.util.comm - WARNING - Printer requested line 111897 but no sufficient history is available, can't resend
2014-05-19 01:03:58,996 - octoprint.server - CRITICAL - Now that is embarrassing... Something really really went wrong here. Please report this including the stacktrace below in OctoPrint's bugtracker. Thanks!
2014-05-19 01:03:59,000 - octoprint.server - ERROR - Stacktrace follows:
Traceback (most recent call last):
  File "C:\dev\python27\lib\site-packages\octoprint-1.2.0_dev-py2.7.egg\octoprint\server\__init__.py", line 214, in run IOLoop.instance().start()
  File "C:\dev\python27\lib\site-packages\tornado-3.0.2-py2.7.egg\tornado\ioloop.py", line 627, in start event_pairs = self._impl.poll(poll_timeout)
  File "C:\dev\python27\lib\site-packages\tornado-3.0.2-py2.7.egg\tornado\platform\select.py", line 63, in poll self.read_fds, self.write_fds, self.error_fds, timeout)
error: (10038, 'An operation was attempted on something that is not a socket')

@cakeller98
Copy link
Author

all the log files, a redacted g-code file (I can't upload all of it, as it's a proprietary thing, but has only the meat taken out, start and end are complete) - hope this helps.

https://github.com/cakeller98/Cakeller98Postings/tree/master/OctoPrintPosts/RepeatingResend

@cakeller98
Copy link
Author

Is it possible to enable the serial logging from g-code?

serial logging makes the print stutter, but I don't ever seem to be able to log the actual triggering event. closest I got was above where I missed the actual command by like 5 seconds! (and it fell out the back end of the log)

For diagnostics this would be pretty sweet if you could enable serial logging with a message from within the gcode file! :D e.g.

;OCTO_DEBUG(Serial_Logging, "Checking for X after printing completed, during cooldown"

serial loggin begins with a note like...

Serial Logging Enabled by G-Code command: "Checking for X after printing completed, during cooldown"

just a thought - maybe it's already possible :D

@cakeller98
Copy link
Author

Since I've been paying attention to this - it happens every single print near as I can tell. After some time, eventually at some point

2014-05-22 10:13:43,957 - octoprint.server.util - INFO - New connection from client: xx.xx.xx.xx
2014-05-22 10:16:39,138 - octoprint.server.util - INFO - Client connection closed
2014-05-22 10:18:09,257 - octoprint.util.comm - WARNING - Printer requested line 84154 but no sufficient history is available, can't resend
2014-05-22 10:18:09,444 - octoprint.util.comm - WARNING - Printer requested line 84154 but no sufficient history is available, can't resend
...
then again...
...
2014-05-22 13:11:39,446 - octoprint.server.util - INFO - Client connection closed
2014-05-22 13:11:45,180 - octoprint.server.util - INFO - New connection from client: xx.xx.xx.xx
2014-05-22 14:01:36,378 - octoprint.util.comm - WARNING - Printer requested line 44103 but no sufficient history is available, can't resend
2014-05-22 14:01:36,572 - octoprint.util.comm - WARNING - Printer requested line 44103 but no sufficient history is available, can't resend
2014-05-22 14:01:36,779 - octoprint.util.comm - WARNING - Printer requested line 44103 but no sufficient history is available, can't resend

this file - I can send you...
Download the gcode:an offending gcode file
Download the logfile:the log file containing the post of the above print

there was no serial logging in this log file - but... I can't really print with serial logging for expected reasons - stuttering..

@scp38
Copy link

scp38 commented May 25, 2014

In the log with the long list of resends, there is a long sequence of M105 commands in the beginning, with NO "ok"s in between. That means that maybe one command is taking very long to be processed by the machine. That is number 111890.
After that has been successfully finished, the next commands in the buffer of the printer are processed, i.e. number 111891 thru 111896.

And number 111897 seems to have caused a buffer overrun. It is most likely incomplete and that's the reason why the firmware requests a resend.

But this one has already slipped out of Octoprint's memory. However, at least part of the fix to get RepetierFirmware out of that state is to issue SOME command with that number. RepetierFirmware will not move on unless that happens.

Pressing Disconnect/Connect should restart the RepetierFirmware and lead out of the condition, too. Well, at least it has done it reliably on my system in the past, but it has stopped resetting the printer reliably somewhen recently with latest Octoprint (devel branch) updates on Pi. The resets now only work sometimes.

@cakeller98
Copy link
Author

I have now noted that this occurs after every print.

might I suggest that the buffer ought to PAUSE in case the oldest complete
command has not yet received an OK? or, perhaps better yet, extend the
actual buffer by 10% or some safety margin such that the history isn't
allowed to ever become invalid until it's never needed again?

EDIT: Perhaps Repetier firmware (all firmware?) needs an alternate acceptable response to a request for resend. e.g. if the host responds with "GFU" it would tell the firmware it's going to have to deal with never knowing what that command was, and... let's move on from here, because arguing about a missing command over and over again is well... isn't it that - "insanity is doing the same thing over and over and expecting different results" ? so.... somehow there needs to be able to graceful exit to this insanity!

@foosel
Copy link
Member

foosel commented May 27, 2014

This is pretty much what I'm working on for the communication layer
rewrite. I'm currently somewhat completely under water at the day job,
which is why I haven't reacted here.

@cakeller98
Copy link
Author

Cool... It's an inconvenience more than anything :). And at lesst now I
get what is going on more or less. ;).

On Monday, May 26, 2014, Gina Häußge notifications@github.com wrote:

This is pretty much what I'm working on for the communication layer
rewrite. I'm currently somewhat completely under water at the day job,
which is why I haven't reacted here.


Reply to this email directly or view it on GitHubhttps://github.com//issues/470#issuecomment-44235569
.

--- Chris

@scp38
Copy link

scp38 commented May 27, 2014

@cakeller98
For the time being regarding your proposal to make Repetier Firmware more smart: This is some draft code to replace the existing GCode::requestResend() function. After having sent out 6 Resend requests for the same line, it gives up and just takes the last received line as the one to move on from.

// *****************************************
void GCode::requestResend()
{
    static uint8_t countResend = 0;
    static uint32_t resendLineNumber = 0;

    countResend++;
    if ((countResend > 6) && (resendLineNumber == lastLineNumber))
    {
        commandsReceivingWritePosition=0;
        Com::println();
        Com::printFLN(PSTR("Warning: Line numbers not in sync! Resync'd!"));
        Com::printFLN(Com::tOk);
        countResend = 0;
        waitingForResend = -1;
        lastLineNumber = actLineNumber;
        return;
    }
    if (resendLineNumber != lastLineNumber)
    {
        countResend = 1;
        resendLineNumber = lastLineNumber;
    }

    HAL::serialFlush();
    commandsReceivingWritePosition=0;
    if(sendAsBinary)
        waitingForResend = 30;
    else
        waitingForResend = 14;
    Com::println();
    Com::printFLN(Com::tResend,lastLineNumber+1);
    Com::printFLN(Com::tOk);
}
// *****************************************

Works for me (see below), but please check if it suits your issue, too.

2014-05-27 20:15:22,001 - SERIAL - DEBUG - Send: N1 M105_38
2014-05-27 20:15:22,027 - SERIAL - DEBUG - Recv: Error:expected line 49 got 1
2014-05-27 20:15:22,035 - SERIAL - DEBUG - Recv:
2014-05-27 20:15:22,046 - SERIAL - DEBUG - Recv: Resend:49
2014-05-27 20:15:22,055 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:22,063 - SERIAL - DEBUG - Changing monitoring state from 'Connecting' to 'Operational'
2014-05-27 20:15:22,072 - SERIAL - DEBUG - Send: N2 M21_18
2014-05-27 20:15:22,081 - SERIAL - DEBUG - Send: N3 M20_18
2014-05-27 20:15:22,093 - SERIAL - DEBUG - Recv: skip 2
2014-05-27 20:15:22,103 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:22,114 - SERIAL - DEBUG - Recv: skip 3
2014-05-27 20:15:22,123 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:22,290 - SERIAL - DEBUG - Recv:
2014-05-27 20:15:22,302 - SERIAL - DEBUG - Recv: Resend:49
2014-05-27 20:15:22,317 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:22,491 - SERIAL - DEBUG - Recv:
2014-05-27 20:15:22,502 - SERIAL - DEBUG - Recv: Resend:49
2014-05-27 20:15:22,518 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:22,694 - SERIAL - DEBUG - Recv:
2014-05-27 20:15:22,708 - SERIAL - DEBUG - Recv: Resend:49
2014-05-27 20:15:22,723 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:22,894 - SERIAL - DEBUG - Recv:
2014-05-27 20:15:22,906 - SERIAL - DEBUG - Recv: Resend:49
2014-05-27 20:15:22,921 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:23,094 - SERIAL - DEBUG - Recv:
2014-05-27 20:15:23,106 - SERIAL - DEBUG - Recv: Resend:49
2014-05-27 20:15:23,121 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:23,297 - SERIAL - DEBUG - Recv:
2014-05-27 20:15:23,322 - SERIAL - DEBUG - Recv: Warning: Line numbers not in sync! Resync'd!
2014-05-27 20:15:23,332 - SERIAL - DEBUG - Recv: ok
2014-05-27 20:15:24,307 - SERIAL - DEBUG - Recv: wait
2014-05-27 20:15:24,315 - SERIAL - DEBUG - Send: N4 M105_35
2014-05-27 20:15:24,329 - SERIAL - DEBUG - Recv: ok 4
2014-05-27 20:15:24,363 - SERIAL - DEBUG - Recv: T:23.47 /0 B:23.19 /0 B@:0 @:0 T0:23.47 /0 @0:0 T1:23.33 /0 @1:0
2014-05-27 20:15:25,329 - SERIAL - DEBUG - Recv: wait
2014-05-27 20:15:25,346 - SERIAL - DEBUG - Send: N5 M105*34
2014-05-27 20:15:25,359 - SERIAL - DEBUG - Recv: ok 5
2014-05-27 20:15:25,393 - SERIAL - DEBUG - Recv: T:23.47 /0 B:23.19 /0 B@:0 @:0 T0:23.47 /0 @0:0 T1:23.33 /0 @1:0
2014-05-27 20:15:26,359 - SERIAL - DEBUG - Recv: wait

@cakeller98
Copy link
Author

While what you've done will end the loop. It doesn't protect instances where we need 7 or 8 retries, and don't want it to just give up after only 6 tries.

The way I see the issue is this:

Repetier receives data that it believes to be corrupt. Maybe it's corrupt, maybe it isn't. Whatever the case, in the condition we're discussing, the host has no way to tell repetier "hey, I can't give you what you want, so please stop asking". However Repetier should not assume this. Repetier, in fact, should simply have a mechanism to simply take the next command. It's similar to what you've proposed except formatted so the host gets to decide when enough is enough instead of Repetier "guessing" why the resend is happening.

Maybe I'm missing something - so... take what I've said here with a grain of salt.

@scp38
Copy link

scp38 commented May 28, 2014

Some weeks ago I have helped fixing a weird Refresh bug between Octoprint and Repetier. The issue was that when a Resend situation occurred, the protocol was going two commands ahead and then one back (type "Error:expected line 70 got 69"), making the printer stutter. The core issue there was, too, that initially a buffer overrun in the printer had happened. What we did was that we changed the code in a way that commands that had smaller numbers were not rejected any longer, but just skipped. But you are right, the protocol is rather strict and simple: It expects that all commands are sent in a row without a gap. This is for an easy reason: A perfect 3D-print requires that all commands are performed, in the correct order.

So it's up to the sender to make sure that all the commands are sent in a way that they can be reliably processed. IMHO that means that the sender
a) must be aware of the limited receiving buffer on the printer side and avoid flooding it and
b) handle the resends reliably in case there was a transmission problem

Before a command can be discarded from command history the sender must make sure it has got the confirmation from the printer that the command was successfully processed. That way we can't get into the situation any more that the command requested for resend has already slipped out of the command history.

Regarding your question if 6 retries in my code proposal were enough: Make it a dozen or even several dozens, anyway. In my experience I have seen that normally the Refresh is resolved with the first repetition. The 6 retries was to cover the rare occasions where the communication is very bad (but then the printer wouldn't work satisfactorily anyway.)
The only point of my code proposal is to break the Resend loop of the receiver to make it listen again to commands. In that situation information already has been lost. I'm not saying it will fit for all issues. Any feedback is welcome, of course.

Gina has told us that she's about to rewrite that part of the octoprint code. Let's wait what she will come up with. I'm happy to do beta-testing.

@cakeller98
Copy link
Author

You exactly made my point actually. 6, or 12 or whatever... basically if there is an error in the sending - there must be a way for the printer firmware to allow the host to notify the printer-firmware that it will be impossible to send a command the firmware will like.

Everything else I agree with - it is just that if the firmware has no escape route, then it either ends up in an endless loop as it does, or has to stupidly guess that it's ok to continue. I suggest that the 6 retries is fine, however it should not just accept the next command, it should send a challenge request. IOW - ok I reached my 6 retry limit, I'm not going to try and process this command, but I will notify the sender what line I was trying for, and that it did not go. If the sender (Octoprint) decides to resend the oldest command it has, or whatever it has, that's fine. It's up to octoprint to figure it out.

Hope that makes sense? - Basically fine, do what you suggested, but give notification that Octoprint can do something smart with. :D

Cool - I'm totally fine waiting to see what Gina comes up with. :)

@scp38
Copy link

scp38 commented May 29, 2014

if there is an error in the sending - there must be a way for the printer firmware to allow the host to notify the printer-firmware that it will be impossible to send a command the firmware will like.

Not quite sure if I understand your point. If the Firmware has sent a "Resend" request (this can be due to a checksum error, buffer overrun, incomplete command within a second or another format error) it will by default accept ANY command from the sender, provided that new command has the same expected line number. It could be a totally different command, the firmware will not care. Maybe that's the reason why the M110 command is not widely implemented. The sender is simply required to move on with the expected (same) line number and the next command it has decided to continue with.

ok I reached my 6 retry limit, I'm not going to try and process this command, but I will notify the sender what line I was trying for, and that it did not go.

Again not sure: The command that was not understood was the number contained in the Resend request. IMHO it's clear what caused the hiccup.

Basically fine, do what you suggested, but give notification that Octoprint can do something smart with.

Octoprint should be able to handle a Resend request on the very first occasion, which is not always the case with the current implementation. The situations are (according to what I know) when the requested command is no longer in the Octoprint history or if the communcation is out of sync from the beginning (both already discussed above).
The code i proposed should even not be required if the protocol does what I have tried to outline above.

Just my two cents worth. ;-)

@cakeller98
Copy link
Author

Thanks @scp38

Thanks for clearing that up... I apologize, I didn't realize repetier would take any correctly formatted line of the same number it expected in the request. No need to do anything in the firmware then, it's already doing enough. The host has to simply be smart enough to know, if it's getting a resend request and doesn't have that line... then warn the user, and issue ANY command at the correct line number and move on.

I thought repetier was somehow looking for the same line number with the same checksum. But that makes absolutely no sense now that you've explained it. So, the only thing that repetier host can count on is that the line number incremented from the previous line. If something goes wrong on a specific line, the firmware doesn't know what went wrong - only that what it got wasn't right. So the host only needs to send an intelligible command for that same line number. Therefore, repetier firmware doesn't need anything to be fixed.

@cakeller98
Copy link
Author

i would love to see a repeated error stop repeating. iow - if you see the same error x times, shutdown or pause. is that possible to do with events?

e.g. if status is operational, and we get a resend request, increment count. when count hits 5, disconnect and reconnect to printer.

? is that something that can be done with events?

foosel added a commit that referenced this issue Feb 20, 2015
Might be a good solution for people affected by #166, #470, #490 until commRefactoring branch is ready for prime time. No solution for #553 since that needs proper queueing and blocking command detection.
@foosel
Copy link
Member

foosel commented Mar 3, 2015

Please test if this still occurs with current devel

@foosel foosel added needs information More information is needed to further process this issue or PR and removed status:accepted labels Mar 3, 2015
@scp38
Copy link

scp38 commented Mar 3, 2015

Sorry, but I was impatient and patched my Repetier firmware in May last year. I haven't seen the issue since then, so I can't say if the issue is resolved with your change mentioned above.

What should this change do? Avoid flooding the firmware buffer with M105 temperature requests while waiting for the completion of a command that takes very long?

@foosel
Copy link
Member

foosel commented Apr 24, 2015

Among other things, yes.

@foosel foosel closed this as completed Apr 24, 2015
@foosel foosel reopened this Apr 24, 2015
@foosel
Copy link
Member

foosel commented Apr 24, 2015

Wrong button.

@foosel
Copy link
Member

foosel commented Jun 3, 2015

Closing since assuming this is now fixed by the extensive changes in the comm layer (no feedback, no choice)

@foosel foosel closed this as completed Jun 3, 2015
@mrb255
Copy link

mrb255 commented Aug 23, 2015

Sadly, I'm still getting this in 1.2.4. I first noticed it on an older version (whatever one that was bundled with my octopi image), then upgraded and it still reproduces. Oddly, I've never had a problem printing gcode produced by slic3r on octoprint (dozens of prints, no communication errors), only since I switched to cura for prime towers. Possibly also of note, the printer noticeably slows down and even stutters on curves when printing via octoprint, something that was much less pronounced before I switched from slic3r. It looks like the raspberry pi's cpu is pegged (version 1, b+).

I've tried to print the attached file 3 times and it always fails in about the same spot. It prints fine from the printer's sd card (Rostock max v2).

https://dl.dropboxusercontent.com/u/80114996/20140205_Marvin_KeyChain_Dual_Color_2_1_2.gcode
https://dl.dropboxusercontent.com/u/80114996/serial.log

@foosel
Copy link
Member

foosel commented Aug 24, 2015

Your serial log is truncated, the error might have started before the first line you provided or not, it's not possible to say from the excerpt. The starting point is of utmost importance however.

From what information is available, something fishy is definitely going on that looks like some off-by-one resend handling that started somewhere way up:

Send: N103256 G1 X15.161 Y6.676 E63.57619*91        // line N
Recv: ok 103256                                     // ok N
Send: N103257 G1 X15.288 Y7.368 E63.58917*91        // line N+1
Recv: Error:expected line 103257 got 103256         // got N but want N+1
Recv:                                               // empty line?!
Recv: Resend:103257                                 // resend N+1
Recv: ok                                            // that belongs to the resend
Send: N103257 G1 X15.288 Y7.368 E63.58917*91        // line N+1 again
Recv: ok 103257                                     // ok N+1
Send: N103258 G1 X14.767 Y7.413 E63.60218*85        // line N+2
Recv: Error:expected line 103258 got 103257         // got N+1, but want N+2
Recv:                                               // empty line?!
Recv: Resend:103258                                 // resend N+2
Recv: ok                                            // that belongs to the resend
Send: N103258 G1 X14.767 Y7.413 E63.60218*85        // line N+2 again
Recv: ok 103258                                     // ok N+2

What firmware that is? The ordering of the temperature output plus the line numbers on the oks makes me think it's Repetier, which might be the clue here since I recently learnt from @repetier that Repetier will send resend requests for the same fault repeatedly to make sure they are not lost to communication errors - OctoPrint currently can't handle that (it reads a resend request as something definitive it always needs to respond to and never may ignore, since that is how it needs to be done for every other firmware but Repetier) and will need to be adjusted to do so. It might be that a superfluous resend by OctoPrint for one of these "repeats" caused everything to go out of sync here - the firmware only expected one reply to a resend it sent multiple times, but got multiple instead and then host and firmware started to continously misunderstand each other. But we can only be sure that is the reason by getting a full serial.log.

Another thing, regarding your problems with "stuttering", you didn't say if you explicitly enabled the serial logging for hunting down this issue or if you always have it enabled. If the latter, disable it (once we have tracked this issue down). Writing stuff to the file system is somewhat slow on the Pi, and doing that for every single line sent to and received from the firmware makes everything stutter, hence the big "warning" next to the serial log checkbox in the settings.

Considering that the whole communication layer has changed tremendously since that ticket here was created and hence any information in this ticket does not apply to the current code base any more, I'd also ask you to please open a new bug ticket for this. Please make sure to include all information asked of you.

@nophead
Copy link
Contributor

nophead commented Aug 24, 2015

Prints from Cura over USB stutter because it outputs very short segments if
the model contains them. Other slicers merge small segments to avoid comms
overload.

On 24 August 2015 at 08:35, Gina Häußge notifications@github.com wrote:

Your serial log is truncated, the error might have started before the
first line you provided or not, it's not possible to say from the excerpt.
The starting point is of utmost importance however.

From what information is available, something fishy is definitely going
on that looks like some off-by-one resend handling that started somewhere
way up:

Send: N103256 G1 X15.161 Y6.676 E63.57619_91 // line N
Recv: ok 103256 // ok N
Send: N103257 G1 X15.288 Y7.368 E63.58917_91 // line N+1
Recv: Error:expected line 103257 got 103256 // got N but want N+1
Recv: // empty line?!
Recv: Resend:103257 // resend N+1
Recv: ok // that belongs to the resend
Send: N103257 G1 X15.288 Y7.368 E63.58917_91 // line N+1 again
Recv: ok 103257 // ok N+1
Send: N103258 G1 X14.767 Y7.413 E63.60218_85 // line N+2
Recv: Error:expected line 103258 got 103257 // got N+1, but want N+2
Recv: // empty line?!
Recv: Resend:103258 // resend N+2
Recv: ok // that belongs to the resend
Send: N103258 G1 X14.767 Y7.413 E63.60218*85 // line N+2 again
Recv: ok 103258 // ok N+2

What firmware that is? The ordering of the temperature output plus the
line numbers on the oks makes me think it's Repetier, which might be the
clue here since I recently learnt from @repetier
https://github.com/repetier that Repetier will send resend requests for
the same fault repeatedly to make sure they are not lost to communication
errors - OctoPrint currently can't handle that (it reads a resend request
as something definitive it always needs to respond to and never may ignore,
since that is how it needs to be done for every other firmware but
Repetier) and will need to be adjusted to do so. It might be that a
superfluous resend by OctoPrint for one of these "repeats" caused
everything to go out of sync here - the firmware only expected one reply to
a resend it sent multiple times, but got multiple instead and then host and
firmware started to continously misunderstand each other. But we can only
be sure that is the rea son by g etting a full serial.log.

Another thing, regarding your problems with "stuttering", you didn't say
if you explicitly enabled the serial logging for hunting down this issue or
if you always have it enabled. If the latter, disable it (once we have
tracked this issue down). Writing stuff to the file system is somewhat slow
on the Pi, and doing that for every single line sent to and received from
the firmware makes everything stutter, hence the big "warning" next to the
serial log checkbox in the settings.

Considering that the whole communication layer has changed tremendously
since that ticket here was created and hence any information in this ticket
does not apply to the current code base any more, I'd also ask you to
please open a new bug ticket
https://github.com/foosel/OctoPrint/wiki/How-to-file-a-bug-report for
this. Please make sure to include all information asked of you.


Reply to this email directly or view it on GitHub
#470 (comment).

@foosel
Copy link
Member

foosel commented Aug 24, 2015

Not all prints sliced by Cura necessarily stutter over USB however, and worse than the small segments rendered by Cura is leaving the serial log enabled on a Pi1 while printing something semi complex. Hence the above hint ;)

@mrb255
Copy link

mrb255 commented Aug 24, 2015

Hmm, I might have to reduce the stl's poly count then. But to answer your questions foosel, yes it is repetier (seemecnc's customized version 0.91, further customized by me for cyclops support). And yes, I do get the stuttering without serial logging a bit, but it is far far far worse with logging. Without logging, it just slows down a bit and sounds different. I'll open a new ticket when I have a full serial.log to post. Thanks!

@foosel
Copy link
Member

foosel commented Aug 24, 2015

You might also want to try current devel, I added a new setting today that makes OctoPrint ignore repeated resend requests for the same line. I also could reproduce the exact behavior you got by faking repetiers resend behavior, so chances are high that problem is solved now too. I'll also add it to the upcoming 1.2.5 release.

@mrb255
Copy link

mrb255 commented Aug 26, 2015

So repetier is the ie6 of the 3d printing world? :P
I reprinted that same gcode I posted and the devel branch, and it completed successfully. I did get the format error at one point (line 7953 of the attached log), but it looks like that resolved itself. Here's the copy I made of the serial log before it reset itself. I'll open a new bug if this crops up again.

https://dl.dropboxusercontent.com/u/80114996/serial_devel.log

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs information More information is needed to further process this issue or PR
Projects
None yet
Development

No branches or pull requests

5 participants