Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptor leak caused by clients prematurely closing connections #327

Open
milesbxf opened this issue Oct 17, 2018 · 21 comments

Comments

Projects
None yet
7 participants
@milesbxf
Copy link

commented Oct 17, 2018

Hi! 馃憢We've been using JMX exporter to instrument Cassandra (using the javaagent on version 0.3.1).

We recently had an incident caused by Cassandra running out of file descriptors. We found these had been gradually leaking over time (metric here is node_filefd_allocated from node_exporters on those instances - the FD limit we set for Cassandra is 100k):
screen shot 2018-10-17 at 11 21 57

We'd been seeing some issues with Prometheus timing out whilst scraping these nodes, and found that the majority of open FDs were orphaned TCP sockets in CLOSE_WAIT. Thread dumps showed that all 5 JMX exporter threads on these nodes seemed to be stuck writing to the socket:

"pool-1-thread-1" - Thread t@84
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
        - locked <2e084219> (a java.lang.Object)
        at sun.net.httpserver.Request$WriteStream.write(Request.java:391)
        - locked <368dd754> (a sun.net.httpserver.Request$WriteStream)
        at sun.net.httpserver.ChunkedOutputStream.writeChunk(ChunkedOutputStream.java:125)
        at sun.net.httpserver.ChunkedOutputStream.write(ChunkedOutputStream.java:87)
        at sun.net.httpserver.PlaceholderOutputStream.write(ExchangeImpl.java:444)
        at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
        at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
        at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
        at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
        - locked <fd8ef5> (a java.io.ByteArrayOutputStream)
        at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer$HTTPMetricHandler.handle(HTTPServer.java:74)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
        at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   Locked ownable synchronizers:
        - locked <467e300a> (a java.util.concurrent.ThreadPoolExecutor$Worker)

Putting these two bits of information together gives us this theory:

  1. Prometheus scrapes the node - sends a HTTP request to JMX exporter
  2. JMX exporter collects metrics, but takes a long time to do so (this is occasionally expected in our case, our nodes export thousands of JMX metrics)
  3. Prometheus reaches the scrape timeout, and cancels the request with a TCP FIN
  4. JMX exporter finishes collecting metrics, and attempts to write the output to the socket (https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L78). The other side of the TCP connection has been closed.
  5. This call blocks forever, and we never reach https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L80 which closes the socket

It looks like simpleclient_httpserver doesn't have good semantics around handling closed connections.

We don't have a minimal reproduction of this, but tcpdumps back this up. We're considering forking the jmx_exporter to use simpleclient_jetty instead, but we wondered if anyone else had come across this?

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Oct 17, 2018

Hmm, what happens if you use a try/finally for the close?

@milesbxf

This comment has been minimized.

Copy link
Author

commented Oct 17, 2018

Do you mean in client_java/simpleclient_httpserver?

It could look something like

        try {
            if (shouldUseCompression(t)) {
                t.getResponseHeaders().set("Content-Encoding", "gzip");
                t.sendResponseHeaders(HttpURLConnection.HTTP_OK, 0);
                final GZIPOutputStream os = new GZIPOutputStream(t.getResponseBody());
                response.writeTo(os);
                os.finish();
            } else {
                t.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.size());
                response.writeTo(t.getResponseBody());
            }
        } finally {
            t.close();
        }

but I don't think that would fix this - there are no exceptions thrown, just the response.writeTo call seems to block forever.

I'm not familiar enough with Java socket programming to suggest a better fix, but surely Jetty should be able to better handle the client closing the connection?

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Oct 17, 2018

JMX exporter finishes collecting metrics, and attempts to write the output to the now-closed socket

That's not how TCP works, only one direction of the socket is closed. What exactly is netstat showing for these sockets? It also shouldn't be possible for there to be thousands of them, as there's only 5 worker threads.

@milesbxf

This comment has been minimized.

Copy link
Author

commented Oct 17, 2018

Yes of course - I've clarified in the description, thanks.

I don't have netstat output for the affected nodes, though I'll grab it when the issue reoccurs. We did analyse the leak with lsof, and found thousands of entries like:

java    17678 cassandra  526u     IPv4          148742999        0t0        TCP hostname-a:http-alt->hostname-b:36374 (CLOSE_WAIT)

hostname-a is the hostname of the Cassandra node we ran this on, and hostname-b is one of the Prometheus hosts. All of the connections in CLOSE_WAIT showed a connection to a Prometheus host and we're exposing the JMX exporter on http-alt port (8080), so these are definitely connections handled by the JMX exporter.

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Oct 17, 2018

I was interested more in the SendQ from netstat.

@milesbxf

This comment has been minimized.

Copy link
Author

commented Oct 23, 2018

@brian-brazil just grabbed some output from a node that started being affected again. Here's an example line:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp      200      0 <cassandra IP>:8080       <prometheus IP>:58288     CLOSE_WAIT  73970/java

All of the CLOSE_WAIT conns look almost identical, all with 200 bytes of data in Recv-Q and 0 in Send-Q - so as I understand it there's 200 bytes of data received on the socket but not read by the JMX exporter

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

Hmm, that smells like a connection that's in the backlog - that is it's in the kernel but the application has yet to accept it.

@milesbxf

This comment has been minimized.

Copy link
Author

commented Oct 25, 2018

Yup, it's looking like the workers get stuck writing to a socket whilst the HTTPServer background thread (?) continues to accept connections from Prometheus, but never actually reads or closes them, presumably as all worker threads are blocked. Does that sound about right?

I think we'll try Jetty (I think it should be able to handle this situation better), will post here with what happens.

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

Does that sound about right?

That sounds plausible. Do you know where they're blocking?

@milesbxf

This comment has been minimized.

Copy link
Author

commented Oct 25, 2018

Yep, all worker threads look like this:

"pool-1-thread-1" - Thread t@84
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
        - locked <2e084219> (a java.lang.Object)
        at sun.net.httpserver.Request$WriteStream.write(Request.java:391)
        - locked <368dd754> (a sun.net.httpserver.Request$WriteStream)
        at sun.net.httpserver.ChunkedOutputStream.writeChunk(ChunkedOutputStream.java:125)
        at sun.net.httpserver.ChunkedOutputStream.write(ChunkedOutputStream.java:87)
        at sun.net.httpserver.PlaceholderOutputStream.write(ExchangeImpl.java:444)
        at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
        at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
        at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
        at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
        - locked <fd8ef5> (a java.io.ByteArrayOutputStream)
        at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer$HTTPMetricHandler.handle(HTTPServer.java:74)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
        at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   Locked ownable synchronizers:
        - locked <467e300a> (a java.util.concurrent.ThreadPoolExecutor$Worker)

so they're blocking on response.writeTo(t.getResponseBody()); - (https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L78)

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

That's odd, looks like it's stuck in the kernel. Can you get an strace?

Could you also try with the master version of our httpserver library? There's one fix there which has yet to make it out and has the potential to be relevant.

@kamiKAZIK

This comment has been minimized.

Copy link

commented Oct 30, 2018

We ran into a similar issue using v0.4.0 where HTTPServer accepts connections but never responds.

@xoraes

This comment has been minimized.

Copy link

commented Dec 19, 2018

Any update on this?

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Jan 14, 2019

That you get a chance to try this with a newer version of client_java?

@jaseemabid

This comment has been minimized.

Copy link

commented Jan 14, 2019

@brian-brazil We moved the main jmx exporter out of the Cassandra process to an external HTTP server because we couldn't afford to have fd leaks and restart Cassandra once in a while. Now we run 2 copies of the JMX exporter with the in process version scraping only the minimal jvm metrics and the external exporter scraping much more detailed Cassandra metrics. Since it's an external process which gets restarted automatically by systemd if it crashes, we haven't really looked into it since. Its not an optimal solution, but saved us from the problematic db restarts.

@amruta1989mohite

This comment has been minimized.

Copy link

commented Jan 21, 2019

@brian-brazil any solution on this issue please.

@amruta1989mohite

This comment has been minimized.

Copy link

commented Jan 21, 2019

@brian-brazil We moved the main jmx exporter out of the Cassandra process to an external HTTP server because we couldn't afford to have fd leaks and restart Cassandra once in a while. Now we run 2 copies of the JMX exporter with the in process version scraping only the minimal jvm metrics and the external exporter scraping much more detailed Cassandra metrics. Since it's an external process which gets restarted automatically by systemd if it crashes, we haven't really looked into it since. Its not an optimal solution, but saved us from the problematic db restarts.

Hello @jaseemabid , I am facing same issue right now and I also wanted to avoid the restart of my DB at specific interval. Can you tell me how you have fixed it in detail? It would be really great if I could get any help for the same.

@jaseemabid

This comment has been minimized.

Copy link

commented Jan 25, 2019

@amruta1989mohite Run 2 copies of this exporter.

  1. Make the in process exporter scrape only the core JVM metrics and on our infra it completes a scrape in a couple of seconds. This normally doesn't leak FDs and we haven't had any issues so far.

  2. Run an out of process HTTP exporter to scrape everything else you need. This takes much much longer depending on your cluster but since it's a different process, you should be able to safely and automatically restart it once it starts leaking leaks FDs.

@amruta1989mohite

This comment has been minimized.

Copy link

commented Feb 5, 2019

@jaseemabid, Thanks for the response and sorry for bothering you again.
But I am quite new to exporters, so is it possible I could get the exporter code from you or any document how you did it in detail.
It would be so helpful, if possible. Thanks in advance.

@amruta1989mohite

This comment has been minimized.

Copy link

commented Feb 15, 2019

@jaseemabid Should I wait for your help?

@tusharmakkar08

This comment has been minimized.

Copy link

commented Feb 15, 2019

@brian-brazil I am facing similar issue when I am using python's prometheus client.

Could you also try with the master version of our httpserver library? There's one fix there which has yet to make it out and has the potential to be relevant.

Can you please mention the link to the library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.