not cleanly closing ssl sockets on phased_restart? #1002

mattyb · 2016-06-16T20:00:05Z

We're running apache bench against a running puma server bound to ssl sockets. When we perform a puma phased restart, we get a batch of SSL errors in apache bench each time a worker goes down. With verbose ab flags, a healthy connection includes messages like

SSL/TLS State [connect] SSLv3 flush data
read from 0x7f1e770b74e0 [0x7f1e770aab33] (5 bytes => -1 (0xFFFFFFFFFFFFFFFF))
read from 0x7f1e770b74e0 [0x7f1e770aab33] (5 bytes => 5 (0x5))
0000 - 16 03 03 00 aa                                    .....

Where a failed connection looks like

SSL/TLS State [connect] SSLv3 flush data
read from 0x7f1e770b77a0 [0x7f1e770aab33] (5 bytes => -1 (0xFFFFFFFFFFFFFFFF))
read from 0x7f1e770b77a0 [0x7f1e770aab33] (5 bytes => 0 (0x0))
SSL handshake failed (5).

I believe that this is the reason that our AWS ELB returns 504 errors each time we perform a phased restart. cc @timabdulla, who reported similar symptoms in #957

The text was updated successfully, but these errors were encountered:

evanphx · 2016-07-24T04:42:10Z

If it's the same issue as #957, you can configured the persistent_timeout now. Please try that and if it doesn't work, reopen this issue.

mattyb · 2016-07-24T13:38:46Z

Thanks @evanphx, but adjusting timeouts doesn't affect this. This error occurs without going through the ELB.

evanphx · 2016-07-24T20:11:59Z

Hm, ok. I wonder if perhaps on a worker exit persistent sockets aren't being closed. That's not an issue for normal HTTP sockets because the process exit will close the socket, but would be an issue for SSL sockets because the close actually sends data back across telling the other side that it's closed. I'll investigate.

evanphx · 2016-07-25T03:43:26Z

Nope, close is called on all sockets. Oh, are you using actioncable or websockets or something else that perhaps uses hijack? If so, I'm betting they're not closing the sockets down on shutdown...

mattyb · 2016-07-25T14:24:04Z

I've reproduced the error with just a puma config file (no rack app) here: https://github.com/mattyb/puma-socket-test

evanphx · 2016-07-25T22:43:31Z

Thanks so much for the reproduction! I'm working on it now.

evanphx · 2016-07-26T01:45:34Z

@mattyb That should fix your problem. Do you have a way to try it out easily before I release the fix?

mattyb · 2016-07-26T05:29:38Z

Thanks for working on this! I no longer see the SSL errors popping up in the console and there are new messages in the verbose logs about connections being closed. Unfortunately, even with the reproduction code updated to the 46416cb commit, I still see failed requests from ab.

Concurrency Level:      25
Time taken for tests:   21.217 seconds
Complete requests:      3000
Failed requests:        21
   (Connect: 0, Receive: 0, Length: 21, Exceptions: 0)

It's possible that this is a problem with ab's implementation, though I believe I'm using the latest version. I confirmed that there are no errors if I bind tcp without ssl. I assume the fact that all the errors are failures in "Length" is relevant.

I could try the new commit on our real app in our staging environment and see if the ELB handles whatever error is happening more cleanly, but it seems like something may still be awry in puma.

The problem was a few points: * We were not clearing the reactor on a normal stop, which is what is used in a phased restart. * On close, SSL sockets were not sending the shutdown message. * SSL sockets that were completely uninitialized ended up sitting in reactor and could not actually be shutdown because there were not initialized.

evanphx closed this as completed Jul 24, 2016

evanphx reopened this Jul 24, 2016

evanphx added the investigating label Jul 24, 2016

evanphx added Needs Feedback and removed investigating labels Jul 25, 2016

evanphx closed this as completed in 46416cb Jul 26, 2016

This was referenced Jun 13, 2017

Puma >= 3.6.1 + SSL + Persistent Connections. Puma Hangs :( #1214

Closed

fix #1214 Puma >= 3.6.1 + SSL + Persistent Connections. Puma Hangs :( #1330

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not cleanly closing ssl sockets on phased_restart? #1002

not cleanly closing ssl sockets on phased_restart? #1002

mattyb commented Jun 16, 2016 •

edited

evanphx commented Jul 24, 2016

mattyb commented Jul 24, 2016

evanphx commented Jul 24, 2016

evanphx commented Jul 25, 2016

mattyb commented Jul 25, 2016

evanphx commented Jul 25, 2016

evanphx commented Jul 26, 2016

mattyb commented Jul 26, 2016

not cleanly closing ssl sockets on phased_restart? #1002

not cleanly closing ssl sockets on phased_restart? #1002

Comments

mattyb commented Jun 16, 2016 • edited

evanphx commented Jul 24, 2016

mattyb commented Jul 24, 2016

evanphx commented Jul 24, 2016

evanphx commented Jul 25, 2016

mattyb commented Jul 25, 2016

evanphx commented Jul 25, 2016

evanphx commented Jul 26, 2016

mattyb commented Jul 26, 2016

mattyb commented Jun 16, 2016 •

edited