Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Repetier Resend Request #1015

Closed
wants to merge 2 commits into from
Closed

Conversation

Salandora
Copy link
Contributor

I noticed that the Resend Request of Repetier weren'T handled correctly after a short look I noticed that only the new resend way was getting recognized but not parsed correctly.
I added the missing Resend parsing.

@Salandora
Copy link
Contributor Author

Alright after testing this 2 prints went through without problems.
Only thing I'm not sure about is if the Format error Repetier send will kill this fix which is why I would like to discuss if it is useful to add the Format error to an exception list any suggestions?

@foosel
Copy link
Member

foosel commented Aug 7, 2015

Can you provide an example log output of what Repetier sends in case of a Resend request? I'm a bit puzzled that it didn't get caught by the existing mechanism. The processing in place so far should match all of

rs 23
rs N23
rs: 23
rs: N23
Resend 23
rEsEnd N23
Resend: 23
...

unless I'm mistaken in reading the code, so I'm a bit surprised that apparently that wasn't enough.

Also, what Format error?

@Salandora
Copy link
Contributor Author

Repetier send "Resend:12345" without a space between : and 12345 and it also is not caught by the if "rs" in line

@Salandora
Copy link
Contributor Author

Well as I understand the Repetier code right it tries 3 times to get the command resended if this fails it says the command has a Format Error.

@foosel
Copy link
Member

foosel commented Aug 7, 2015

The code should turn the : there into a though so that thesplit works:

line.replace("N:", " ").replace("N", " ").replace(":", " ").split()[-1]

So a Resend:23 should in fact be turned into a Resend 23 and the last part of that would then be taken as the line number, which should work. The if "rs" in line part in the exception handler should only be necessary for firmwares which send things like rs <line number> <some> <more> <stuff>.

Something in that line apparently doesn't work like it should when encountering a resend request from your repetier version and hence necessitates your change. I'd prefer to first understand what is really happening here, I fear it might lead to other problems otherwise. If you can get around to reproducing a resend error somehow (I know that can be tricky, stuff never fails when you need it to ;)) and add the log here that would probably help to get to the root cause.

Let us discuss the Format Error stuff in another issue/on IRC on Monday and tackle this resend business here first, one problem at a time ;) From the sounds of it however I'd expect the firmware to reset its internal "I'm not receiving that line properly" counter once the resend succeeds, so the error should only occur when stuff really runs amiss, in which case a print abort with all consequences is valid behaviour.

@Salandora
Copy link
Contributor Author

Hmm you are right this line should capture it... could it be because of the -1 instead of 1?

@foosel
Copy link
Member

foosel commented Aug 7, 2015

-n as index in python tells it "nth element from the end of the list". So just the last element really for -1. Which from how I understand your resend line looks should actually work, unless there's some other stuff following the Resend:12345, in which case it needs some more intelligence similar to the rs <line number> <and> <some> <other> <stuff> scenario.

@Salandora
Copy link
Contributor Author

Okay I checked the source the full Resend line

"Resend:12345\r\n"

Will try that line with the parsing in the python interpreter later this day.

@foosel
Copy link
Member

foosel commented Aug 7, 2015

Already tried:

>>> line = "Resend:23\r\n"
>>> int(line.replace("N:", " ").replace("N", " ").replace(":", " ").split()[-1])
23

Something's not right here. We need a log I fear.

@Salandora
Copy link
Contributor Author

Think so too, tried too because I didn't saw your response... Will try to capture a Resend request gimme a moment

@repetier
Copy link

repetier commented Aug 7, 2015

Not sure if this is important here, but Repetier always requests resends like this:

Resend:23
ok

ok is without line number here even if line numbers are enabled for ok.

@foosel
Copy link
Member

foosel commented Aug 7, 2015

@repetier thanks, that looks exactly like what I would expect it to look like and what should cause no problems to parse. OctoPrint will react to the Resend line here, extract the requested line, resends that if possible and then skip over the next ok it sees (to not cause that to send another line).

@Salandora
Copy link
Contributor Author

... great I have forced a Resend request (Wrong checksums are great hehe), Well the Resend parsing works great so my "fix" is a piece of shit... But that doesn't explain why the Resend failed for me here...

@foosel
Copy link
Member

foosel commented Aug 7, 2015

so my fix is a piece of

Aww come on. It isn't, it just is fixing the wrong problem. It probably makes sense to log any exceptions that occur while trying to parse the resend request, at least if the rs branch isn't hit, my guess is that is what happened for you. Some hiccup on the line causing an exception to be raised here which was then silently thrown away. So that should in fact be looked into.

@Salandora
Copy link
Contributor Author

I'm already looking into because this bug crashed 3 of my prints xD It's something personal now hehehe

@Salandora
Copy link
Contributor Author

Worst case as always can reproduce the bug with serial logging enabled... I hate sporadic bugs... But I believe it could also be the firmware trying to investigate further...

@Salandora
Copy link
Contributor Author

Alright I got a truncated serial.log now, sorry for that but I had to disable kommunikation logging to force this bug: https://gist.github.com/Salandora/4b1daa80f091e2e4c826

Summary:

  1. OctoPrint sends a line
  2. Repetier answers with "Error:Format Error" and Resend Request
  3. OctoPrint closes connection because of Error:
  4. Repetier asks for the line till I reconnect to it.

For me it now looks like a Firmware bug, it might be I'm sending to fast for Repetier to handle it.
Dunno...
I'm running on a Due (Programming Port, native is not working sudden death. dunno why)

@foosel, @repetier what are your opinions on this? Could it be a Firmware bug?
Is it safe to add Format Error to the exception list so OctoPrint is not closing the connection?

@repetier
Copy link

First repetier uses error prefix to tell something is wrong, not to stop communication. A error is a message a user normally want's to see in log so he knows why things go wrong. That is also the way my server and host handle errors. If firmware needs to stop it can call kill to refuse new commands or reset causing a new "start" line to appear which also should reset ongoing prints.
The Format Error normally occurs if a command has no G, M or T in it's line.

A resend request can get lost by transmission error making it "Reend:23344" or something similar. For that reason it is repeated after a time (0.2s I think it is) so it can make sure that host always will react correctly. With bad timing it can of course get tricky if you resend a line and get a resend request in parallel.

@foosel
Copy link
Member

foosel commented Aug 21, 2015

repetier uses error prefix to tell something is wrong, not to stop communication [...] If firmware needs to stop it can call kill to refuse new commands or reset causing a new "start" line to appear which also should reset ongoing prints

That is a fairly individual interpretation of the "protocol" though. Once upon a time an Error was the way a firmware would tell the host "Oh no, something horrible happened, I'm stopping everything and shutting down now" (see also the definition of what !! is supposed to signal). For everything else there's echo: or //. Then someone started mixing unrecoverable errors and recoverable ones in firmware replies, without any clear pattern. Which leads to the somewhat problematic scenario for a host software - if it sees an Error, is it recoverable or not? OctoPrint rather errs on the safe side, interprets anything it doesn't know as unrecoverable and drops the connection and the print job. The reasoning being simply that if someone decides to change the error message reported for a runaway heater or cold extrusion, OctoPrint still won't try to continue.

So, your Format Error simply needs to be added to the whitelist of recoverable communication errors. Any other undocumented recoverable messages like that in your firmware we should know of? ;)

A resend request can get lost by transmission error making it "Reend:23344" or something similar. For that reason it is repeated after a time (0.2s I think it is) so it can make sure that host always will react correctly

So you are basically spamming the host and leave it up to it to weed through the garbage :/ How should a host distinguish between the repeated Resends caused by one misreceived line or a misreceiving of the same line multiple times (which can happen just as easily)? E.g.

>>> N23 G1 X20.3 Y49.2 E39.33*87
<<< Error:Format error
<<< Resend:23
<<< ok
>>> N23 G1 X20.3 Y49.2 E39.33*87
<<< Resend:23

My guess is that you always prefix your "new" resends with additional Error lines, but that basically spreads a resend request across three lines now (the error, the actual resend and the final ok) which makes resend handling needlessly complicated for the host, especially if said host tries to stay compatible to all the other firmwares out there that just send one single Resend/ok pair and are done with it. What the host basically needs to do is "Oh, an Error, let's see if a Resend follows, ah, it does, let's ignore the error and just resend" and "Oh, a Resend request, now I either need to ignore that since it's a resend of the resend or I need to process it throws a coin". The first part could be done transparently, the second part however means yet another configuration option the user has to set (no, M115 is a mess).

I see the point about the Resend request becoming garbled on the way being a problem, but I don't think the spamming approach is the correct solution to this problem. Checksumming the responses might. Or just having users run into problems here and finally fix there cabling to make their comms more reliable.

Finally, I understand the reasoning behind these decisions, but reinterpreting the "protocol" in such a way is a bit tricky if you don't also document it along side your firmware. If you care about interoperability that is ;) You are basically creating your custom sub-protocol here that is incompatible with the "established standard" (however idiotic that might be at places).

@Salandora
Copy link
Contributor Author

Closed because of new PR #1032

@Salandora Salandora closed this Aug 21, 2015
@Salandora Salandora deleted the fixResend branch August 21, 2015 14:09
@repetier
Copy link

Well an error is something that went wrong. It does not say you need to disconnect per definition. And regarding standards you know good enough that there is no real common standard. When I started with the firmware there was no Marlin and no echo: at all and so repetier copied that behavior. Later Marlin started to prefix it with echo: but is it a standard if Marlin changes behavior?

Disconnecting is a bit hard I find, especially since firmware continues to work. I agree that stopping a running print for most errors is normally a good thing since we can expect that the print has failed.

Regarding your whitelist I'd put these recoverable com errors into the whitelist if not already done:
Error:Wrong checksum
Error:Format Error

Regarding resend logic, I use the following for repetier firmware (I connect with M115 to detect firmware type or let user define it in case of server):
On first resend set ignoreResend to line and ignore counter to 7. On every subsequent resend with the same line I decrease counter until it is down to 0 then I accept it again. This is quite heavy, but we normally send several commands in parallel adding to communication complexity.

@foosel
Copy link
Member

foosel commented Aug 23, 2015

I was not referring to Marlin in my statement. I can't find where I read that though at the moment, but it was a quite succinct definition along the lines of "Error only if broken beyond repair". Might have been the source of Sprinter or Teacup, or something else. Considering the above stated problems though it made sense and I still think there needs to be some kind of definition with regards to what is recoverable and what isn't, and that a (user extendable) white list is the only "safe" approach until something like that gets established. I agree though that the disconnect should be reconsidered.

@Salandora pushed a commit (see above) that adds the format error to the whitelist. Anything with the word "checksum" in it was already whitelisted.

I connect with M115 to detect firmware type

For what string in the output of that one can I be absolutely certain I'll always be able to detect your firmware? I've had bad experiences with the reliability of the response of that one, which is why I'm asking. If you gave me something dependable to work here, that would help a lot.

@repetier
Copy link

You are right that M115 is not that safe. Host uses the word "repetier" in the lowercased string. But of course I already had companies renaming the M115 string into printer name and wondering why it does not get detected. So for the server I now let user enter the firmware type. This also has some advantages since I have a xml firmware definition file for each firmware. So I have less strings to compare and can adjust to different behavior and can also use command definitions in cases where it is different for different firmwares. Makes the whole thing much more portable once you have a processor that converts such a xml file into parser and sender.

I agree that a distinction would be great to have. Didn't java have log level fatal for not recoverable errors? But as always we can not fix old stuff, so we need to take care of that in newer software. Until now I was letting prints continue even with errors knowing that not all errors are fatal ones, and now you make me think.

I really would not disconnect since many errors are not that harmful like moves outside print area catched by firmware. This lets firmware continue but of course aborting a print might be a good idea if this happens. Errors about failed heaters also ruin the print and should aborted. Here if I remember right marlin ignores anything but M999 while repetier-firmware continuous without extrusion. And I also saw kill being called on Marlin and do reset as well in very severe states, so disconnect is a bit of overcompensation, but that is your software so your decision. And again you are right that blacklisting will never work. We can not predict future additions even if we would try to find all existing error messages. So I guess if I start stopping on errors I will also use a whitelist as protection.

@foosel
Copy link
Member

foosel commented Aug 23, 2015

Host uses the word "repetier" in the lowercased string

Ok, that's something then.

I already had companies renaming the M115 string into printer name and wondering why it does not get detected

I feared as much. If only use it as a guess basically, overridable by the user.

I have a xml firmware definition file for each firmware

Hah. Just the other day I was talking with @Salandora on IRC about adding something like that to OctoPrint, for redefining white lists and such stuff in a portable way. Didn't think about xml though, I don't like the overhead of that so much to be honest. I was more thinking into the direction of YAML.

I agree that a distinction would be great to have. Didn't java have log level fatal for not recoverable errors?

Yeah. I would lean more towards error codes though. Something like the http approach maybe. And going with two letter prefixes (ok, !!, rs) instead. Man... I really need to find the time again to get back on that reprapcode definition stuff. The last month's have been nuts.

So I guess if I start stopping on errors I will also use a whitelist as protection

If so, I can tell you that OctoPrint currently has these words on it: "line number", "expected line", "checksum", "format error" (those are usually followed by a resend request) and "volume.init", "openroot", "workdir", "error writing to file". That lost can definitely reduced further when only prints are cancelled as a consequence instead of performing a disconnect to include only the first (communication problem indicating) block

@repetier
Copy link

I used XML as it is a bit better readable with attributes (which YAML doe snot have) that can later be extended. Never used YAML myself so it might be more of being more used to it. Concerning the parser you are fully right. Compared with JSON which I use for communication, xml is a beast I also like to avoid and for simple xml files I have nice classes to simply read them out.

If you outsource do not forget to put some things like "does ok after resend" into it, so you can reduce config options obsolete once you know the firmware.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants