Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphaned ping thread can disconnect other connections #194

Closed
nbouscal opened this issue Oct 20, 2019 · 4 comments
Closed

Orphaned ping thread can disconnect other connections #194

nbouscal opened this issue Oct 20, 2019 · 4 comments

Comments

@nbouscal
Copy link

We're running a secure websocket proxy server (using Wuss) and noticed connections dropping frequently with a few different errors:

  • Most commonly: Terminated True “received fatal error: BadRecordMac” (Error_Protocol (“remote side fatal error”,True,BadRecordMac))
  • ParseException "Unknown opcode: 7"
  • HandshakeFailed (Error_Packet_Parsing "Failed reading: invalid header type: 72\nFrom:\theader\n\n")

After a lot of time digging through lower-level library internals, assuming it was something TLS-specific, I noticed that some of the time the disconnects would happen exactly 30 seconds apart, which is the interval we were passing to forkPingThread. Rewriting it to use withAsync rather than forkIO fully solved the problem, confirming that forkPingThread was the source of the issue.

I'm still a little bit confused as to exactly how forkPingThread is causing this issue, but I suspect that it might have something to do with the fact that many of the connections are to the same server and port, and that a lot of the underlying network code uses (host, port) as an ID for the connection. So, I think what's happening is that when one connection dies and the client reconnects within the ping window, the ping thread from the previous (dead) connection is somehow interfering with the new connection when it tries to ping again. I could be completely wrong about that though, as I still don't understand the internals of those libraries very well.

Unfortunately I don't see an obvious way of replicating forkPingThread's exact API (Connection -> Int -> IO ()) in a safe way. The best I've come up with for a new API (and what we're now using) is something like this:

withPingThread :: Int -> ClientApp a -> ClientApp a
withPingThread n app conn = withAsync (pingThread conn n) (const $ app conn)

where pingThread is basically forkPingThread without the forkIO or ignore.

This would add an explicit dependency on async, but it's already a transitive dependency anyway.

If you don't want to change the API, it's probably worth at least adding a note to the docs that using forkPingThread in a wss proxy server can cause problems.

This is a little bit related to #159 and #174, but as I understand it those concern pinging a client, whereas this showed up for us when pinging a server.

@jaspervdj
Copy link
Owner

Hi @nbouscal, thanks for digging into this, this is great stuff!

I think what we should do is add withPingThread to the library but not remove forkPingThread yet. Instead we should add a deprecation warning to it, instructing users to call withPingThread instead.

Then, we can remove forkPingThread whenever we do a breaking release. Does that sound good to you?

@nbouscal
Copy link
Author

Yep, that sounds great. Thanks!

@jaspervdj
Copy link
Owner

@nbouscal I created #195 which has some changes to what you had:

  • I'm using a plain IO rather than ClientApp because this is also useful on the server side.
  • I allow passing in an additional IO operation that is executed after every ping. That will allow me to tickle a timeout in the snap backend here and git rid of the duplicated code.

Please let me know if this is in line with what you would expect.

@nbouscal
Copy link
Author

Yep, both of those changes make sense to me. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants