Skip to content

assume/free race condition #10

Closed
mbrevoort opened this Issue Jul 13, 2012 · 2 comments

2 participants

@mbrevoort

Sometimes when the seaport client connection drops and the client reconnects, the assume event is emitted before the free event.

Even worse is that this trips up seaport itself. After this occurs (below for example), if you query seaport for social-manhattan-subway, seaport doesn't think it exists. However, the client is still connected as I can see the connect via netstat and the client things it's connected.

This race condition is a tricky to address. One solution might be to make the upnode ping function more intelligent so that it could check if the roles still exists and if not re-add them such that if it gets out of sync it would become consistent on the next ping. However it's probably best to address this in upnode/dnode.

I'd like to address this temporarily by lengthening the reconnect time but this bug is preventing me from passing the reconnect option: substack/upnode#8

13 Jul 19:44:30 - info: assume: {"port":8080,"started":1342208652053,"harborPid":9432,"hostname":"use1c-pri-subway-0x2x2-07","host":"10.201.4.14","role":"social-manhattan-subway","version":"0.2.2"} host=use1c-pri-portauthority-01, role=spindrift-port-authority, version=0.1.4

13 Jul 19:44:30 - info: free: {"port":8080,"started":1342208652053,"harborPid":9432,"hostname":"use1c-pri-subway-0x2x2-07","host":"10.201.4.14","role":"social-manhattan-subway","version":"0.2.2"} host=use1c-pri-portauthority-01, role=spindrift-port-authority, version=0.1.4
@mbrevoort

The reconnect alone isn't enough to mitigate the issue. Today there was a 17s time difference between the assume on reconnect and the free fired after seaport recognized the collection as closed.

@substack
Owner

Closing this since it seems to be fixed with the referenced pull request.

@substack substack closed this Jul 19, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.