-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to download after certain previous downloads #194
Comments
I was able to replicate this issue and tracked a connection leak to the serialport library. I've submitted a PR (garrows/browser-serialport/pull/38) and I will resolve this issue once the fix is verified and merged. |
We have a customer in our educator's course today that is experiencing this; at least the pattern fits. Very frustrating. I hope this actually fixes it, but I'm skeptical. Please test using multiple programs that contain repetitive DEBUG output- that's where I see it happen most often. I don't remember having to restart Parallax IDE in order to solve this particular problem; I just have to use the BSE software to reprogram the BS2-IC with a simple PBASIC program containing a STOP command, then Parallax IDE can recognize it and reprogram it. |
This should be fixed with dependency updates. However, we were never able to duplicate this problem with the supplied source code, so it should be thoroughly tested by @PropGit. Please re-open if the problem persists. |
This problem is not entirely fixed. The original problem still exists exactly as before using the Branch code and demonstrated by the first video in the first post. The second and likely related problem noted by the second video and the Debug_Debugin code (below) still occurs, but not with as much frequency. Sometimes it fails, sometimes it does not. Perhaps the dependency updates improved things? |
Requires more research and information, as we are not able to duplicate with the given code. |
Let's add it to the next Sprint and I'll follow-up with more research at that time. Question: Do you have a logic analyzer? I can't remember. |
Yea, I have a Saleae logic analyzer, it should work fine for this case. I wasn't able to get the code above to replicate the problem, but I can start by running it again and watching the serial lines. That'll let me see if I was just lucky and was avoiding a timing issue on my end. If you can come up with any other examples that might cause the issue more reliably, that would help a ton. |
Okay, I'll try to discover more. As it might related to this (but maybe not), take a look at Issue #237. |
Please disgregard #237. My mistake, it's a legal and optional part of the protocol. But this issue here is still a problem. I'll try to give more info if I can. |
Here's some more evidence to study. There are some unexpected behaviors from Parallax IDE that show up here. To start with, for reference here's a trace of the BSE software:
Here's a trace of the Parallax IDE app trying to do the same thing as above: First: Note that there are two reset pulses in every case. There should only be one for each time it wishes to Identify a particular stamp type. Second: Parallax IDE isn't always transmitting any hello codes after the simulated break condition ends. This is something I pointed out in Issue #177 that I've just reopened. Here's a close-up of the first Identify attempt (above) I made after the successful download.
For reference, here's what the same moment looks like from the BSE's signals: Of course, it Identifies the Stamp, so it doesn't attempt any more resets, breaks, and hello code transmissions. NOTE: I'm running this on a Win 7 Pro 64-bit machine. This doesn't happen all the time. It will work fine at times and other times it behaves this way. To get these captures I simply keep downloading followed by identifying over and over again with difference cadence until it breaks. Sometimes it does this very frequently, sometimes not. It's not usual for a user to do this exactly thing, but I started this because it should never behave this way and all too often, when I was doing a simple download after a previous download, it'd get stuck in this kind of mode and I could usually never get it to snap back into proper behavior. Very frustrating. I started doing the routine above because it's the only thing I've found to cause the problem often enough that I could capture it reliably. |
Here is the Saleae Logic 8 capture files from the previous post. |
I've identified what appears to be the cause of this. I wasn't able to replicate this issue on OSX/Chromebook, but on WinXP and Win10 I was able to replicate it with the provided program and other programs with excessive I came across an existing ticket for Chrome that described the same issue, where it was tracked down to a I was able to write a simple workaround, where I attempt to recover the connection after a That Chrome ticket has a few patches that will break up I'm putting together a branch with the workaround so you can verify it from your end. |
@PropGit I think we should target the version of chrome which the linked patch lands in. It doesn't seem to be a good idea to forgo all error handling for a single case that only happens on Windows. I found a graph on new version uptake at http://clicky.com/marketshare/global/web-browsers/google-chrome/ that shows that chrome has near-perfect uptake when a new version is released. |
@2fast2fourier - Thanks. Once I can test the workaround branch, we can decide further on this. If it all works, we or I can create another issue to further address this with a later stable release of Chrome and hopefully handle each of the distinct errors appropriately (choosing to lump some of them together, perhaps). The CE_OVERRUN error is not one I would have expected at this point; that's a lower-level problem with system scheduling in the OS as far as I know. |
@phated - Are we looking at a few months out by doing that? The noted errors are all reception errors that are soft issues from our perspective; things that happen due to context changes (like opening a port that was otherwise closed but with a device on it that was sending data already, or performing a reset on the hardware that was in the middle of sending data). Unless there's other errors of a more serious nature (possible) then I don't think it's a dangerous thing. |
A small update, the patch I mentioned before is in the chrome dev channel and scheduled for v46. v45 should release in September (44 was 7/21), meaning 46 will probably release late October or early November. After digging into this I found out that I'm actually running into framing errors, not overruns. The timing on this is unusual, I'm still working out the exact series of serial events. I'll update when I have more info. |
@2fast2fourier - I sent an email... I built and ran but get a "serial.break is not a function" error. |
I can see the break condition working with the experimental-break branch's build and Chrome 46.0.2490.13 beta-m installed. The reset signal is way outside of the break condition though, which causes the BASIC Stamp to wake up, see no break condition (yet) and move on to its normal operation, ignoring any further data from the host trying to program it. The BASIC Stamp wakes up from reset in roughly 18 ms, so the break condition needs to be happening before that. In this capture, the break condition doesn't start until 32 ms after reset ends. For a target signal reference, here's the BSE's signals (zoomed-in a little more) from the same computer successfully identifying a BS2-IC. In past implementations, I found that I could get the proper response (DTR inside of Break condition) by, strangely, telling it to do a DTR toggle first, then a Break condition immediately. (I think this is what I did). There's other potential tricks as well, like extending the DTR reset to start before the break, then start a break, then end the DTR reset, then end the break condition. I know there may be different limitations with Chrome.SerialAPI, but I'm just suggesting trying different things that may not be obvious. |
That DTR is only showing the leading edge, and I was experimenting with bringing DTR high before break, and dropping it immediately after, like the trick you mention. I was under the assumption that the reset only finished on DTR falling edge, but what happens while DTR is high? There's a relatively long period between DTR on/off due to the callback events (~16ms at best), I might be able to work around it though. |
@2fast2fourier - No, I just zoomed out the image too far to see all of DTR; it's a high pulse of only 1.08 uS in my capture, rising edge and falling edge. You're right, the reset condition ends upon DTR falling edge (and about 18 mS later the BASIC Stamp is fully awake). While DTR is high, the BASIC Stamp is asleep. By DTR on/off do you mean DTR high/low (as seen by the BASIC Stamp)? The length of time of DTR high is relatively inconsequential; it just needs to be long enough for the signal to generate a reset on the BASIC Stamp. The timing of the DTR fall and the break condition is the most critical part of this sequence. |
@PropGit I've made some changes on my It seems to improve the success rate considerably, but I'm still working on the timing. I'd also like to see how the timing works out on your system, if you could send me a full scan that would be great. |
@2fast2fourier - I removed node_modules and then did a git pull origin experimental-break and it tells me my branch is already up-to-date. I'm on commit 5de0fe... from Sept 1st. |
@PropGit That's the right commit for the main repo, the changes are in |
@2fast2fourier - Okay, rebuilding now. |
@2fast2fourier - Following up on our meeting discussion, here are some details from the Windows Delphi-based serial code I wrote for BSE. The entire Serial.pas unit can be viewed, but here is a summary (with unimportant details left out for brevity): The
Then, if the handle
|
@2fast2fourier - Some other notable methods in Serial.pas are |
I've isolated this down to an issue with chrome's serial connection hanging after receiving a framing error. I've opened an bug on the chromium project, and included a demo app that will hopefully make it really easy for them to replicate. https://code.google.com/p/chromium/issues/detail?id=534517 At this point, without a fix for that specific issue, I have minimized the cases that will trigger this issue. The timing is a little variable, but as long as any program has a delay of at least 100-200ms before outputting any serial data, we shouldn't run into any issues. If for any reason a framing error occurs, that connection will be unresponsive until we close it again. However, framing errors only really happen when starting the break condition, or if a reset happens outside of our break. To explain that 100ms figure: The critical part is that we need to be able to enter the break condition before the any serial data is sent from the board. The chrome serial open call causes DTR to trigger, but we don't have any control over that and that opening DTR won't trigger a framing error. After the open callback (~10-20ms delay), we request a setBreak. The break condition seems to take an additional 10ms to establish, but at that point we don't have to worry about any data coming in from the device. Then we can complete a DTR cycle and enter programming mode. I'll be cleaning up my experimental branch and preparing it to PR, that should resolve #144 as well. When a fix lands for the chrome issue, we shouldn't need to change anything on our side. |
@2fast2fourier - Thank you for this report and for submitting the bug report on Chromium too. The 200 ms delay will be a workaround we'll have to add to example code for now.
QUESTION: Are you saying that the reflected break condition doesn't cause a framing error and a resulting lock-out of controlling the port? NOTE: Resets outside of break (via dev board reset button) are not super common, but they are far from rare as it's a test-and-diagnose step users take while developing. There's also the case you haven't see yet: BASIC Stamp code uses
Have you tried to immediately set DTR low (Clear DTR) right after opening and configuring the port, like we do in the BSE example code to see if it gets rid of the port-opening-induced DTR toggle? |
In the Chromium bug report, you say:
Is |
@PropGit For the first question, that reflected break is the primary cause of our framing errors if it occurs while the board is transmitting. Otherwise breaks seem to be pretty safe if it occurs while the device isn't sending.
The issue with opening the port is that we don't have access to the serial port until after that callback is called. That means the initial DTR flip happens well before we know the connectionId that the rest of chrome.serial uses. Honestly, this initial DTR is mostly harmless, we never see any framing errors from it since the board is reset right before the port starts listening. The main issue with it is that the board will fully wake up about 18ms after that reset and potentially start transmitting data before we can setBreak. Yes, we can't override anything in the chromium project, it's all deep in the API internals and apps aren't allowed to modify chrome itself. |
@2fast2fourier - Thank you. FYI: Ironically, this morning a customer reported having a problem that I think is exactly this issue. |
@2fast2fourier - Regarding this post above, I whole-heartedly agree, you've greatly minimized the conditions that trigger the problem. I've tested with all the code (and more) that I found the original problem with and Parallax IDE never failed to download again on Win 7! This is incredibly exciting because I didn't expect it to work so well with this problem still existing! Great Job! I've found that I can force it to fail in much the same way, but only by downloading code including a SEROUT instruction that transmits at a different baud rate than the Debug Terminal's default 9600. Idea For Possible Interim Workaround You know how simply opening the port causes the first DTR toggle event, as seen here? The BS2 resets in about 18 ms and starts transmitting data (if the previously downloaded code does so). So I see a possible way to nearly eliminate framing error problems when trying to Identify or Download: What do you think? |
Parallax IDE may not be flushing it's receive buffer after a reset condition is delivered to the BASIC Stamp, or not waiting long enough before flushing. See BASIC Stamp Programming Protocol v1.1 page 5.
After the following program is downloaded to the BS2 module, Parallax IDE is often unable to identify (or download) to the BS2 again whereas on the same computer the BSE software is able to do both with no problem. See this demo video for an example. Sometimes Parallax IDE is able to identify again, but unable to download and it displays a toast saying "Serial port not open."
UPDATE: Here's a more in-depth video using a different PBASIC program to cause the problem.
The text was updated successfully, but these errors were encountered: