-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CrossMgrImpinj connection to CrossMgr can freeze when restarting RIFD test too quicky #95
Comments
Thanks for this.
After re-reading the documentation, I came across this:
Timeouts and the connect method
The connect() operation is also subject to the timeout setting, and in
general it is recommended to call settimeout() before calling connect() or
pass a timeout parameter to create_connection(). However, the system
network stack may also return a connection timeout error of its own
regardless of any Python socket timeout setting.
Sure enough, I was setting the timeout *after* the connect call, not
before. This caused the connect call to block, which is what I believe is
causing the freeze.
The timeout is now set *before* the connect call. That way, if CrossMgr
fails to respond to the connection, CrossMgrImpinj will try again in 2 sec.
Yeesh. Socket programming is hard.
…On Fri, Sep 23, 2022 at 11:37 AM Kim Wall ***@***.***> wrote:
*Description*
Very occasionally, if you stop and re-start the RFID test too quickly,
CrossMgrImpinj gets into a wedged state where it stops communicating with
CrossMgr. (Communication with the tag reader continues normally.) The
frozen connection does not time out, even after CrossMgr is closed. If you
restart CrossMgrImpinj, everything works normally.
*Steps to reproduce the behavior*
Start CrossMgrImpinj and CrossMgr.
Perform an RFID test.
Repeatedly stop and restart the RFID test until CrossMjrImpinj's right
hand window shows a frozen handshake with CrossMgr.
Note that nothing you can do from the CrossMgr side restores communication
with CrossMgrImpinj.
*Expected behavior*
CrossMgrImpinj should be able to determine the connection is frozen, close
and re-attempt to establish the connection to CrossMgr.
Failing that, CrossMgr should not allow you to re-start the RFID test
immediately after stopping one.
*Screenshots*
See video capture here: https://youtu.be/ATKShKZLIzc I manage to get it
to freeze at 32 seconds in.
*Desktop (please complete the following information):*
Windows 11
CrossMgrImpinj 3.0.16
CrossMgr 3.1.05
*Additional context*
This occurred at last weekend's BHPC event, we suspect due to an
accidental double-click on the "Stop RFID Test" button. Due to Murphy's
Law, as everyone who understands the system was in the race, an unskilled
volunteer started the race in CrossMgr, and did not realise there was a
problem for about 20 minutes.
(We were running the previous version of CrossMgr/CrossMgrImpinj, but I've
upgraded and confirmed the bug persists in the current release.)
—
Reply to this email directly, view it on GitHub
<#95>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABGXKMDUAOXBXCFFXQ6SV3V7XFFLANCNFSM6AAAAAAQUC6YJQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Edward Sitarski
|
I've done a quck test (without a tag reader connected) using CrossMgrImpinj_3_0_16-20220927155532.exe and it seems to think it's still connected to CrossMgr even after CrossMgr is closed, which doesn't seem right. I'll test more thoroughly when I've got time to dig the tag reader out and set it up. |
Okay, tried CrossMgrImpinj 3.0.16-20220927155532 https://www.youtube.com/watch?v=qjPKpJApzLo In the absence of this freeze condition, it correctly times out the connection if CrossMgr goes away, but I note that it only seems to notice the connection is dead when a tag read comes in: https://www.youtube.com/watch?v=W1kF2lZf7Ug (If no tag reads arrive, as when I posted the comment above, it appears that the connection never times out.) Could this be a threading issue, perhaps? |
Case 1:
I enhanced the code to handle all possible exceptions with the CrossMgr
socket, not just timeouts.
Perhaps the turn/on/off behavior caused an unhandled exception, crashing
the IO thread in CrossMgrImpinj.
I am finding this very difficult to reproduce as the CrossMgr socket has to
be shut down during the handshake (my computer is very fast?).
The update should be available in about 10 minutes.
Case 2:
Yes, CrossMgrImpinj only realizes CrossMgr is no longer alive when a tag
send fails. There isn't a heartbeat check.
I will give it some thought.
…On Thu, Sep 29, 2022 at 1:28 PM Kim Wall ***@***.***> wrote:
Okay, tried CrossMgrImpinj 3.0.16-20220927155532
with the tag reader connected. Again, repeated restarting of the RIFD test
causes CrossMgrImpinj to freeze while the connection is initialised, which
then does not time out and reconnect:
https://www.youtube.com/watch?v=qjPKpJApzLo
In the absence of this freeze condition, it correctly times out the
connection if CrossMgr goes away, but I note that it only seems to notice
the connection is dead when a tag read comes in:
https://www.youtube.com/watch?v=W1kF2lZf7Ug
Could this be a threading issue, perhaps?
—
Reply to this email directly, view it on GitHub
<#95 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABGXKM37K4UCJCHK4TLRN3WAXGTHANCNFSM6AAAAAAQUC6YJQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Edward Sitarski
|
This seems a lot better, when the socket closes during the handshake:
And:
Whereas a graceful shutdown gets:
rather than the 10054 error above, for reasons that probably make sense to Windows (I assume depending on whether the socket closure handshake completes properly), so it would appear that it was an unhandled exception. I've been clicking the start/stop button like a maniac for about 15 minutes now, successfully interrupting the handshake several times, and haven't had it freeze at all, so that looks like it's fixed. The lack of a heartbeat is mostly a UI issue at this point, as it will reconnect when tag reads arrive. I suppose that might be more important if you're running CrossMgrImpinj on a different machine to CrossMgr (for multiple start/finish lines or whatever). I work on the principle that you can usually trust localhost to work :) |
Clearly, Windows throws at least two exceptions (10053 and 10054) when a
connection is dropped.
I was only handling one, which caused the thread to exit when it got the
other one.
Handling all exceptions is the way to go - it's anyone's guess what other
exceptions Windows is capable of ;)
Thanks for all your help to get this resolved and making the software more
robust.
…On Fri, Sep 30, 2022 at 10:34 AM Kim Wall ***@***.***> wrote:
This seems a lot better, when the socket closes during the handshake:
CrossMgr Connection succeeded!
Sending identifier "BHPC-Timing-11584"...
CrossMgr error: [WinError 10054] An existing connection was forcibly closed by the remote host.
Trying to connect to CrossMgr at 127.0.0.1:53135 as "BHPC-Timing-11584"...
And:
CrossMgr Connection succeeded!
Sending identifier "BHPC-Timing-11584"...
Waiting for GT (get time) command from CrossMgr...
CrossMgr error: [WinError 10054] An existing connection was forcibly closed by the remote host.
Trying to connect to CrossMgr at 127.0.0.1:53135 as "BHPC-Timing-11584"...
Whereas a graceful shutdown gets:
CrossMgr error: [WinError 10053] An established connection was aborted by the software in your host machine.
rather than the 10054 error above, for reasons that probably make sense to
Windows (I assume depending on whether the socket closure handshake
completes properly), so it would appear that it was an unhandled exception.
I've been clicking the start/stop button like a maniac for about 15
minutes now, successfully interrupting the handshake several times, and
haven't had it freeze at all, so that looks like it's fixed.
The lack of a heartbeat is mostly a UI issue at this point, as it will
reconnect when tag reads arrive.
—
Reply to this email directly, view it on GitHub
<#95 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABGXKIUZOPDKKKDJIPMBSDWA327DANCNFSM6AAAAAAQUC6YJQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Edward Sitarski
|
Description
Very occasionally, if you stop and re-start the RFID test too quickly, CrossMgrImpinj gets into a wedged state where it stops communicating with CrossMgr. (Communication with the tag reader continues normally.) The frozen connection does not time out, even after CrossMgr is closed. If you restart CrossMgrImpinj, everything works normally.
Steps to reproduce the behavior
Start CrossMgrImpinj and CrossMgr.
Perform an RFID test.
Repeatedly stop and restart the RFID test until CrossMjrImpinj's right hand window shows a frozen handshake with CrossMgr.
Note that nothing you can do from the CrossMgr side restores communication with CrossMgrImpinj.
Expected behavior
CrossMgrImpinj should be able to determine the connection is frozen, close and re-attempt to establish the connection to CrossMgr.
Failing that, CrossMgr should not allow you to re-start the RFID test immediately after stopping one.
Screenshots
See video capture here: https://youtu.be/ATKShKZLIzc I manage to get it to freeze at 32 seconds in.
Desktop (please complete the following information):
Windows 11
CrossMgrImpinj 3.0.16
CrossMgr 3.1.05
Additional context
This occurred at last weekend's BHPC event, we suspect due to an accidental double-click on the "Stop RFID Test" button. Due to Murphy's Law, as everyone who understands the system was in the race, an unskilled volunteer started the race in CrossMgr, and did not realise there was a problem for about 20 minutes.
(We were running the previous version of CrossMgr/CrossMgrImpinj, but I've upgraded and confirmed the bug persists in the current release.)
The text was updated successfully, but these errors were encountered: