Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

bud leaks file descriptiors (heavily) #78

Closed
phillipp opened this issue Nov 4, 2015 · 8 comments
Closed

bud leaks file descriptiors (heavily) #78

phillipp opened this issue Nov 4, 2015 · 8 comments

Comments

@phillipp
Copy link
Contributor

phillipp commented Nov 4, 2015

So a few hours ago we got the first symptoms of a new problem where clients can't connect to bud anymore. The debug output from cul looks like this:

* About to connect() to system-zeus.lima-city.de port 443 (#0)
*   Trying 212.83.45.137... connected
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* Unknown SSL protocol error in connection to system-zeus.lima-city.de:443 
* Closing connection #0

Bud debug output says:

(dbg) [3436] client 0x61b9640 on backend connecting to 127.0.0.1:10010
(dbg) [3436] client 0x61b9640 on frontend close_cb
(dbg) [3436] received handle on ipc
(dbg) [3436] client 0x61b9640 on backend connecting to 127.0.0.1:10010
(dbg) [3436] client 0x61b9640 on frontend close_cb
(dbg) [3436] received handle on ipc
(dbg) [3436] client 0x61b9640 on backend connecting to 127.0.0.1:10010
(dbg) [3436] client 0x61b9640 on frontend close_cb
(dbg) [3436] received handle on ipc
(dbg) [3436] client 0x61b9640 on backend connecting to 127.0.0.1:10010
(dbg) [3436] client 0x61b9640 on frontend close_cb
(dbg) [3436] received handle on ipc

Other workers seem to work normally. Over time more and more of the workers start to do the same.

So I was thinking that some problem occurred in the requests before and left the worker in a bad state, and found that bud opens too many files (output from the same worker before):

(dbg) [3436] received handle on ipc
(dbg) [3436] client 0x5fc5390 on backend connecting to 127.0.0.1:10010
(dbg) [3436] client 0x5fc5390 on frontend new
(dbg) [3436] client 0x5fc5390 on backend connect 0
(dbg) [3436] client 0x5fc5390 on frontend SSL_read() => -1
(dbg) [3436] client 0x5fc5390 on frontend after read_cb() => 222
(dbg) [3436] client 0x5fc5390 on backend ssl_cert_cb {0}
(ntc) [3436] client 0x5fc5390 on frontend failed to request SNI: "uv_tcp_connect(http_req) returned -24 (too many open files)"
(dbg) [3436] client 0x5fc5390 on frontend SSL_read() => -1
(ntc) [3436] client 0x5fc5390 on frontend closed because: SSL_read(client) - 1 (cert cb error)
(dbg) [3436] client 0x5fc5390 on frontend force closing (and waiting for other)
(dbg) [3436] client 0x5fc5390 on backend force closing (and waiting for other)
(ntc) [3436] client 0x5fc5390 on frontend closed because: SSL_read(client) - 1 ((null))
(dbg) [3436] client 0x5fc5390 on frontend close_cb

So as this problem started only after #76 was fixed, maybe the fix introduced a new bug that leads to HTTP requests no longer closed and thus leaking file descriptors. Could that be possible, that the fix introduced that problem? Right now the problem seems to emerge very fast as we see a worker hit the ulimit (1024) in only a couple of minutes.

It is possible though that there is another problem as I see strange debug output from the workers:

(dbg) [20210] received handle on ipc
(dbg) [20210] client 0x4182a80 on backend connecting to 127.0.0.1:10010
(dbg) [20210] client 0x4182a80 on frontend new
(dbg) [20210] client 0x4182a80 on backend connect 0
(dbg) [20210] client 0x4182a80 on frontend SSL_read() => -1
(dbg) [20210] received handle on ipc
(dbg) [20210] client 0x41a1720 on backend connecting to 127.0.0.1:10010
(dbg) [20210] client 0x41a1720 on frontend new
(dbg) [20210] received handle on ipc
(dbg) [20210] client 0x41b2550 on backend connecting to 127.0.0.1:10010
(dbg) [20210] client 0x41b2550 on frontend new
(dbg) [20210] client 0x41a1720 on backend connect 0
(dbg) [20210] client 0x41a1720 on frontend SSL_read() => -1
(dbg) [20210] client 0x41b2550 on backend connect 0
(dbg) [20210] client 0x41b2550 on frontend SSL_read() => -1
(dbg) [20210] received handle on ipc
(dbg) [20210] client 0x41dfca0 on backend connecting to 127.0.0.1:10010
(dbg) [20210] client 0x41dfca0 on frontend new
(dbg) [20210] client 0x41dfca0 on backend connect 0
(dbg) [20210] client 0x41dfca0 on frontend SSL_read() => -1
(dbg) [20210] received handle on ipc
(dbg) [20210] client 0x41fef60 on backend connecting to 127.0.0.1:10010
(dbg) [20210] client 0x41fef60 on frontend new
(dbg) [20210] client 0x41fef60 on backend connect 0
(dbg) [20210] client 0x41fef60 on frontend SSL_read() => -1

So something is going on there that may be the cause of the leak. Any ideas?

@phillipp
Copy link
Contributor Author

phillipp commented Nov 4, 2015

I have added some more debug output from my previous "investigation" and found that all connections that lead to the log lines in the last example (the very show connections that do nothing and abort) are from the networks of a single entity. We rejected all traffic from their nets and the file descriptors are no longer increasing.

So there seems to be an easily exploitable DoS attack vector, regardless if the client was aware and actively using it or not. Any ideas what might cause this?

@indutny
Copy link
Owner

indutny commented Nov 5, 2015

@phillipp have you tried lowering frontend.keepalive?

@indutny
Copy link
Owner

indutny commented Nov 5, 2015

@phillipp I think I know what's happening. Will fix it in a bit, not sure how I missed it.

@indutny
Copy link
Owner

indutny commented Nov 5, 2015

@phillipp may I ask you to give a try to this patch?

diff --git a/src/client.c b/src/client.c
index 9ff97e5..c097ade 100644
--- a/src/client.c
+++ b/src/client.c
@@ -968,8 +968,7 @@ void bud_client_shutdown_cb(uv_shutdown_t* req, int status) {
   /* If either closing, or shutdown both sides - kill both sockets! */
   } else if (side->close == kBudProgressRunning ||
              client->frontend.shutdown == client->backend.shutdown ||
-             (side == &client->frontend &&
-                  !client->config->frontend.allow_half_open)) {
+             !client->config->frontend.allow_half_open) {
     bud_client_close(client, bud_client_ok(side));
   }
 }

@indutny
Copy link
Owner

indutny commented Nov 5, 2015

Hm... actually, this seems to be too aggressive.

@indutny
Copy link
Owner

indutny commented Nov 5, 2015

@phillipp how about this?

diff --git a/src/client.c b/src/client.c
index 9ff97e5..34fe283 100644
--- a/src/client.c
+++ b/src/client.c
@@ -971,6 +971,16 @@ void bud_client_shutdown_cb(uv_shutdown_t* req, int status) {
              (side == &client->frontend &&
                   !client->config->frontend.allow_half_open)) {
     bud_client_close(client, bud_client_ok(side));
+  } else if (side == &client->backend &&
+             client->backend.reading == kBudProgressNone) {
+    bud_client_error_t cerr;
+
+    DBG_LN(&client->backend, "read_start after shutdown");
+    cerr = bud_client_read_start(client, &client->backend);
+    if (bud_is_ok(cerr.err))
+      client->backend.reading = kBudProgressRunning;
+    else
+      bud_client_close(client, cerr);
   }
 }

diff --git a/test/basic-test.js b/test/basic-test.js
index 217da81..25506a9 100644
--- a/test/basic-test.js
+++ b/test/basic-test.js
@@ -1,5 +1,6 @@
 var assert = require('assert');
 var https = require('https');
+var net = require('net');
 var fixtures = require('./fixtures');
 var request = fixtures.request;
 var caRequest = fixtures.caRequest;
@@ -125,4 +126,16 @@ describe('Bud TLS Terminator/Basic', function() {
       });
     });
   });
+
+  describe('EOF on frontend', function() {
+    var sh = fixtures.getServers();
+
+    it('should support basic termination', function(cb) {
+      var socket = net.connect(sh.frontend.port);
+      socket.on('close', function() {
+        cb();
+      });
+      socket.end();
+    });
+  });
 });

@indutny indutny closed this as completed in 85d6ca4 Nov 9, 2015
@phillipp
Copy link
Contributor Author

Great, we've deployed the new version and will see if the problem pops up again.

@indutny
Copy link
Owner

indutny commented Nov 12, 2015

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants