SSL hanging due undrained error queue? #1719

Closed
ry opened this Issue Sep 15, 2011 · 15 comments

Projects

None yet

7 participants

@ry
ry commented Sep 15, 2011

@mranney says

Hey guys, we've recently started getting some strange HTTPS issues on a few of our servers. It seems like there is some sort of TLS negotiation problem, but it only happens over slower networks. The client appears to negotiate the connection successfully, but then node never reads any HTTP data. The connection hangs until either side gives up from various timeouts. On fast networks, this never happens, and on slow networks, it happens every now and then. Have you guys ever seen anything like this?
Danny has been digging into this a bit more, hopefully he can provide some more insight. Seems like some sort of OpenSSL integration issue, perhaps.

@dannycoates says

Hey all,
So we're seeing the error surface in tls.js : CleartextStream.prototype._pusher
The error is the super informative:
"error:00000001:lib(0):func(0):reason(1)"Its intermittent but always happens before the SecurePair 'secure'event is emitted.
Googling the error led me to this thread: http://www.mail-archive.com/openssl-users@openssl.org/msg52587.html
Looking to track down the root error I patched node_crypto.cc from thev0.4.11 tag with: https://gist.github.com/1218721
Now a lot of:
------ SSL: error:00000001:lib(0):func(0):reason(1)
gets printed to stderr, and surprisingly it doesn't hang!!!
This is mysterious and disturbing.
When I get the error string correctly, this is what I get:
------ SSL: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

From the man page:

  ERR_get_error() returns the earliest error code from the

thread's error queue and removes the
entry. This function can be called repeatedly until there are
no more error codes to return.

is it possible that the thread's error queue is getting filled and
causing OpenSSL to stop processing when it hits an error? In several
tests they seem to loop on this function:

./test/bntest.c: while ((l=ERR_get_error()))
./test/rsa_test.c: while ((l = ERR_get_error()) != 0)

Should Node be greedily looping on this?

It seems stunnel is doing this too:

dh=PEM_read_bio_DHparams(bio, NULL, NULL, NULL);
BIO_free(bio);
if(!dh) {
    while(ERR_get_error())
        ; /* OpenSSL error queue cleanup */
    s_log(LOG_DEBUG, "Could not load DH parameters from %s", cert);
    return NULL; /* FAILED */
}
@mranney
mranney commented Sep 15, 2011

I wish there were a test for this. But it does seem like draining the openssl error queue is the wise thing.

It'd be great to have @pquerna weigh in here.

Hopefully soon he'll just tell us to use Selene.

@bnoordhuis
Member
<pquerna> fuck openssl.
<pquerna> i guess it is best to do that error loop shit
@mranney
mranney commented Sep 16, 2011

Called it.

@ry
ry commented Sep 16, 2011

This fix is in v0.4.12 and will be in v0.5.7 (to be released tomorrow). I'm going to leave this issue open. Hopefully someone can find a test.

@koichik
koichik commented Sep 16, 2011

It may be related to socketio/socket.io#522.

@dannycoates

I was able to reproduce with these scripts https://gist.github.com/1220229

@rauchg
rauchg commented Sep 26, 2011

We still see this in production running 0.4.12

@ry
ry commented Sep 26, 2011

@guille supposedly it's fixed 0.4.12 - what exactly are you seeing?

@dannycoates any luck producing a test for this?

@rauchg
rauchg commented Sep 26, 2011

------ SSL: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

Then requests hang

@bnoordhuis
Member

Okay, I think I can reproduce it. It's openssl complaining that it received something that (to openssl) doesn't look like a HTTP request, probably because it doesn't start with GET|POST|PUT|DELETE|CONNECT.

@bnoordhuis
Member

@guille: seems I can't reproduce it with the openssl version I think you guys are using. Can you test df0fab9?

@rauchg
rauchg commented Sep 27, 2011

That test passes on that machine, so I guess it doesn't reproduce it :/

@ghost
ghost commented Nov 10, 2011

I think it's a normal error logged when a sockets listenning for https is recieving a plain http request ... so i (guess?) it should not harm the system right?

@polotek
polotek commented Nov 15, 2011

@Lou-adrien is right. I ran into this randomly development today. I kept switching back and forth between an http server and https server in node. Sending insecure requests to the https server would produce this error. It doesn't seem to have any lasting ill effects though. But it also doesn't propagate errors up to uncaughtException. Instead the connection just dies.

@bnoordhuis
Member

Closing, I have good reason to believe that this is no longer issue. If you have reason to believe otherwise, speak up and I'll reopen it.

@bnoordhuis bnoordhuis closed this Nov 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment