-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Programming sync errors #50
Comments
Thanks for the detailed report! Can you confirm that you are using v1.0.3 of esp-link? I've looked at similar traces and tweaked things back and forth. The avrdude sync code leaves a lot to be desired, for example, it doesn't deal with late sync responses coming in, as you've noted. I don't really agree with The first received sequence (0x30;0x20) is sent to the arduino. (It is sent very short after the reset, and can be considered lost in almost all situations because the MCU has not booted yet at this moment) because optiboot does come up in less than a millisecond, if I remember correctly. On the time-scale of your trace that's going to look like "at the same time". There is a delay of 1ms after toggling reset (https://github.com/jeelabs/esp-link/blob/v1.0/serial/serbridge.c#L235), which I thought was enough, but I can try it again myself as well. One thing I'm curious about is the detailed timing between the reset line, the actual reset pin on the arduino (there may be A/C coupling which can delay the rising edge), and the first character arriving. WRT your overall set-up: there is a timing constant in optiboot which determines how much time the programmer has between an optiboot response and the next programming command. By default this seems to be 500ms and it applies to every single command. If at any point in time the programmer doesn't send the next command within that time-frame the programming sequence ends and the loaded sketch is run. With your long-distance set-up you may well find that it's difficult to meet this constraint reliably. I'm not saying it's impossible, I don't know what your network connectivity is for one, just mentioning it as something to look out for. Actually, since you're already compiling your own version of optiboot you might as well increase this, however it also increases the time optiboot waits at boot before running your sketch, so if you need that to be short then this won't work for you. In terms of timing, right now esp-link does not interfere in any way with the data transmitted, and I'd like to keep it that way if at all possible. I could increase the initial delay after reset, but if it exceeds ~10ms I need to change the code quite a bit 'cause that starts to interfere with wifi operation. |
65ms is more than a busy-wait will accomodate. This brings me to the question why there is not a prompt reply to the second sync attempt (labeled 3 in your trace). |
The version of esp-link I used is the master tree code at commit f057a65. Optiboot remarksLooking at the code of optiboot I notice:
So this timing issue might be optiboot specific. I'll try with the default arduino bootloader in the weekend to see if it behaves differently. Reset timingCurrently I only have the logic analyser screenshot (see below), but if you want, I can record it in the analog domain too with my scope. But since you toggle the reset from 1 to 0 and back, the rising edge of the reset should be fine afaik. Just like the 1-0 pulse 'passes' the capacitor, the 0-1 pulse wil also 'pass' the capacitor and quickly restore reset back to a logical 1. As can be seen on the screenshot above, the reset pulse is low for 0.2091 ms. The time between the end of the pulse and beginning of the serial TX (from ESP to arduino) is 1.124 ms (time between markers A1 and A2 is displayed on the right side of the screenshot). Remark The reset channel is also measured on the reset pin of the arduino (just like TX and RX) and NOT on GPIO12 of the ESP module. My network latencyThanks for the remark about my overall setup. Just to get an idea, I just measured my round trip times. I have this small serial-echo sketch running on the arduino that is connected to the ESP:
I also have a small perl program running in my remote development environment that repeatedly sends 1 byte of data to the esp-link'ed arduino and waits for the response. Based on those full-loop round trip times I calculated some statistics (based on 10000 rtt measurements):
So occasionally, there are some outliers up to 650ms, but in general rtt is 14ms with a small standard deviation. An occasional watchdog timeout might happen sometimes, but in general I would not expect it to be a real issue with this setup. conclusion & questionsMaybe my sync timing issue is mostly caused by the way optiboot blinks the led. I will test the default arduino bootloader during the weekend. Some questions:
|
FYI, I will work on this and want to say a big thanks for the detailed description. I also use optiboot but I was reminded of the AVR (and PIC) fuses which can delay the start-up time by varying amounts... |
I'm now dropping the first sync bytes sent by the programmer on the floor. Let's see whether this improves things... |
Ok. I'll test it as soon as I got some free time. |
I'm closing this given that the built-in programming support works pretty well at this point. |
Setup
My programming environment (where I run avrdude) is a linux vmware host in a remote datacenter, connected to my home network via a vpn over a cable connection. The ESP module is associated with an access point on my home network. So the latency/jitter between the ESP and avrdude is a bit higher than a physical host on the same network as the ESP.
STK500 protocol
For short overview see: http://www.nongnu.org/pulsefire/apidocs/src-html/org/nongnu/pulsefire/device/flash/avr/Stk500Command.html, For the full specs see http://www.atmel.com/Images/doc2525.pdf)
For this explanation, only the next commands are parameters are relevant:
STK500 synchronisation mechanism
The programmer sends STK_GET_SYNC;CRC_EOP (0x30;0x20) repeatedly until it receives STK_INSYNC;STK_OK (0x14;0x10) from the arduino. At that moment synchronisation is complete. After initial synchronisation, avrdude starts querying the device (hardware/software version,...) by sending for example STK_GET_PARAMETER;Parm_STK_HW_VER;CRC_EOP (0x41;0x80;0x20). The arduino replies with STK_INSYNC;hardwareversion number;STK_OK (0x14;0x3;0x10). This interrogation continues a for while and when all information is gathered and evaluated by avrdude, the programming sequence can start.
The following snippet illustrates initial synchronisation and beginning of the interrogation sequence. This is part of the console output if you run avrdude with multiple -v options (-v -v -v -v -v -v -v -v).
Synchronisation errors
The following snippet illustrates an out of sync sequence (as seen by avrdude) I get very regularly:
In this snippet, avrdude receives the reply (0x14;0x10) to the 0x30;0x20 synchronisation sequence twice. Since avrdude does not expect this, we get a 2 byte mismatch between issuing a command and receiving the answer to the command. This causes the protocol error messages and finally abort of the programming. In order to get a better understanding how this happens, we look at the serial communication between the ESP and the arduino.
The next image is captured with a logic analyser placed on the serial interface between ESP and arduino and shows 3 logic signals:
Initial situation: Arduino is running and idling, not sending or receiving anything on the serial port. ESP module is booted and configured for serial bridge. Next, avrdude is started and has opened a tcp connection to the ESP module and starts sending it's first command.
Let's see whats happening step by step:
(I verified the exact data bytes sent/received at each step, but lost the screenshots)
A few possible solutions
My feeling is solution 1 seems to be the most protocol agnostic. If needed, the 50ms timeout can be set as an esp-link parameter.
Solution 2 feels like the most robust algorithm, but must be adapted for each programming protocol.
Remark 1: Maybe my observations are an optiboot specific timing issue. If needed, I could try with some other bootloaders.
Remark 2: Still haven't found the time to get acquainted with the esp-link code itself :-(
The text was updated successfully, but these errors were encountered: