Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluency fails with SSLException: Broken pipe #152

Closed
sabarivasan opened this issue Feb 24, 2020 · 13 comments · Fixed by #155
Closed

Fluency fails with SSLException: Broken pipe #152

sabarivasan opened this issue Feb 24, 2020 · 13 comments · Fixed by #155

Comments

@sabarivasan
Copy link

sabarivasan commented Feb 24, 2020

Hi,

I am running into issues sending logs using TCP forward to fluentd to a fluentd cluster setup in AWS:

AWS Network load balancer (NLB) with TLS protocol with a self-signed certificate for *.us-east-1.lb.cvent.com
The fluentd cluster itself sits behind the NLB and runs in a Docker container in ECS. The NLB terminates SSL and is supposed to forward requests to the fluentd cluster.

The exception I am getting is below:

Here's how I have built the fluency client:

        FluencyBuilderForFluentd builder = new FluencyBuilderForFluentd();
        builder.setBufferChunkInitialSize((int) (FileUtils.ONE_MB / 2));
        builder.setBufferChunkRetentionSize((int) (1 * FileUtils.ONE_MB));
        builder.setMaxBufferSize(2 * FileUtils.ONE_MB);
        builder.setBufferChunkRetentionTimeMillis((int) TimeUnit.SECONDS.toMillis(1));
        builder.setJvmHeapBufferMode(true);
        builder.setSenderMaxRetryCount(1);
        builder.setSslEnabled(true);
//        builder.setAckResponseMode(true);
        builder.setConnectionTimeoutMilli((int) TimeUnit.MINUTES.toMillis(2));
        builder.setReadTimeoutMilli((int) TimeUnit.MINUTES.toMillis(2));

        return builder.build("fluentd-sandbox.us-east-1.lb.cvent.com", 24224);
    }
{
	"thread": "pool-4-thread-1",
	"message": "Failed to send 1071 bytes data",
	"level": "ERROR",
	"timestamp": "2020-02-24T16:48:00.654Z",
	"logger": "org.komamitsu.fluency.fluentd.ingester.sender.NetworkSender"
} {
	"exception": "java.net.SocketException: Broken pipe (Write failed)\n\tat java.base/java.net.SocketOutputStream.socketWrite0(Native Method)\n\tat java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)\n\tat java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)\n\tat java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:320)\n\tat java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:983)\n\t... 16 common frames omitted\nWrapped by: javax.net.ssl.SSLProtocolException: Broken pipe (Write failed)\n\tat java.base/sun.security.ssl.Alert.createSSLException(Alert.java:126)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:264)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:259)\n\tat java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:988)\n\tat java.base/java.io.OutputStream.write(OutputStream.java:122)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.SSLSender.sendBuffers(SSLSender.java:87)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.SSLSender.sendBuffers(SSLSender.java:31)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.NetworkSender.sendInternal(NetworkSender.java:102)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.FluentdSender.sendInternalWithRestoreBufferPositions(FluentdSender.java:74)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.FluentdSender.send(FluentdSender.java:56)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.RetryableSender.sendInternal(RetryableSender.java:77)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.FluentdSender.sendInternalWithRestoreBufferPositions(FluentdSender.java:74)\n\tat org.komamitsu.fluency.fluentd.ingester.sender.FluentdSender.send(FluentdSender.java:56)\n\tat org.komamitsu.fluency.fluentd.ingester.FluentdIngester.ingest(FluentdIngester.java:87)\n\tat org.komamitsu.fluency.buffer.Buffer.flushInternal(Buffer.java:357)\n\tat org.komamitsu.fluency.buffer.Buffer.flush(Buffer.java:112)\n\tat org.komamitsu.fluency.flusher.Flusher.runLoop(Flusher.java:66)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n",
	"thread": "pool-4-thread-1",
	"message": "Sender failed to send data. sender=RetryableSender{baseSender=SSLSender{config=Config{host='fluentd-sandbox.us-east-1.lb.cvent.com', port=24224, connectionTimeoutMilli=120000, readTimeoutMilli=120000, waitBeforeCloseMilli=1000} Config{senderErrorHandler=null}} NetworkSender{config=Config{host='fluentd-sandbox.us-east-1.lb.cvent.com', port=24224, connectionTimeoutMilli=120000, readTimeoutMilli=120000, waitBeforeCloseMilli=1000} Config{senderErrorHandler=null}, failureDetector=null} org.komamitsu.fluency.fluentd.ingester.sender.SSLSender@695a4ee9, retryStrategy=ExponentialBackOffRetryStrategy{config=Config{baseIntervalMillis=400, maxIntervalMillis=30000} Config{maxRetryCount=1}} RetryStrategy{config=Config{baseIntervalMillis=400, maxIntervalMillis=30000} Config{maxRetryCount=1}}, isClosed=false} org.komamitsu.fluency.fluentd.ingester.sender.RetryableSender@52e856da, retry=0",
	"level": "WARN",
	"timestamp": "2020-02-24T16:48:00.658Z",
	"logger": "org.komamitsu.fluency.fluentd.ingester.sender.RetryableSender"
}

I am happy to provide more info as necessary. I am not sure what I am doing wrong with the setup.

@komamitsu
Copy link
Owner

@sabarivasan Could you give me the following information?

  1. You can connect to the NLB using openssl s_client command?
  2. Did you create and specify a JKS file properly?
  3. Which Java version did you use?
  4. When did the exception happen? Did it always happen when connecting?

@sabarivasan
Copy link
Author

  1. Yes, I am able to connect to the NLB using the openssl s_client command
  2. Yes, we use the same JKS file at other places successfully
  3. Java 11
           java -version
          openjdk version "11.0.2" 2019-01-15
          OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
          OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
  1. The SSL handshake completes successfully, I believe. It happens when the actual log events are being sent. I turned on -Djavax.net.debug=all to produce detailed logs. I am attaching the relevant portion.

@sabarivasan
Copy link
Author

@komamitsu
Copy link
Owner

Thanks! Additional 2 questions:

  1. Does this issue happen with other Java versions like 8 as well?

  2. Does this issue always happen when it starts sending actual events?

@sabarivasan
Copy link
Author

  1. Yes, I tried Java 8 and observed the same issue.
  2. Yes, it happens when we start sending events, not when the fluency client is created.

@komamitsu
Copy link
Owner

Thanks for the quick answer. Let me ask some more...

  1. I think NLB provides access logs capturing TLS request information https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-access-logs.html. Can I ask you to get some NLB access logs that capture requests from the Fluency client?

  2. I want to know how the Fluentd instance communicated on the TCP connection from Fluency via NLB. Could you capture some TCP packets with, for instance, tcpdump -i eth0 -nX -s512 tcp port 24224?

BTW, Fluency's SSLSender that inherits NetworkSender lazily connects to server at the first send, not at instantiation phase

@sabarivasan
Copy link
Author

sabarivasan commented Mar 3, 2020

  1. Fluentd runs on a Docker container in ECS Fargate. This means I don't have an EC2 node from which to run the tcpdump command. I am trying to see if there's another way to capture TCP logs.

FYI: I am able to get fluent-bit to send logs to fluentd through this same NLB successfully.

I am also able to send logs from a AWS Lambda written in Javascript using this
library: fluent-logger-node

@komamitsu
Copy link
Owner

Thanks again. I looked at the log files, but didn't find anything weird.

I'll create an similar environment to reproduce this issue, but it would be great if there is an endpoint to where I can connect from Fluency on my laptop.

@komamitsu
Copy link
Owner

I set up a similar Fluentd environment with NLB + ACM + EC2, but I couldn't reproduced this issue (SSLException: Broken pipe.) However, I ran into an issue that Fluentd behind NLB received invalid packets when both setJvmHeapBufferMode and setSslEnabled are set to true. I suspect this is a bug of Fluency that happens with the combination of SSL and JvmHeapBufferMode.

So, can I ask you to try with setJvmHeapBufferMode(false)? Actually you don't need to call the API since false is the default.

And I noticed your configuration seems a bit too restrict from the perspective of memory resource although it depends your use case. How about trying default values like this to isolate the root cause?

        FluencyBuilderForFluentd builder = new FluencyBuilderForFluentd();
        builder.setSslEnabled(true);
        Fluency fluency = builder.build(host, 24224);

BTW, I'm interested in the throughput, the number of connection reset and so on between your Fluency and Fluentd. Could you share a screenshot of CloudWatch metrics of the NLB on AWS console?

@sabarivasan
Copy link
Author

@komamitsu , I just wanted to let you know when I choose defaults for all parameters, it worked!
i then set my original parameters and just turned off JVMHeapBufferMode and things worked.
So it all boils down to ByteBuffer.allocate() vs ByteBuffer.allocatedirect().

Thanks for your support!

I plan to do some load testing on fluency soon. I will send you metrics that you asked for.

@sabarivasan
Copy link
Author

Also, is there any documentation on how I can configure the 3 buffer-related params:

bufferChunkInitialSize, bufferChunkRetentionSize and maxBufferSize?

I see that fluency manages a pool of buffers.

If my requirement is to flush the buffer when it reaches 1MB or 30 seconds, whichever comes first, how would I configure this when I create the fluency client?

@komamitsu
Copy link
Owner

komamitsu commented Mar 8, 2020

@sabarivasan Thanks for let me know it. I just fixed the issue at #155 and released it as version 2.4.1.

As for NLB's metrics I wanted to see which side reset the connections to isolate the root cause and I don't think we don't need the metrics now.

If my requirement is to flush the buffer when it reaches 1MB or 30 seconds, whichever comes first

Like this?

        FluencyBuilderForFluentd builder = new FluencyBuilderForFluentd();
        builder.setBufferChunkRetentionSize(1 * 1024 * 1024);
        builder.setBufferChunkRetentionTimeMillis(30 * 1000);
        Fluency fluency = builder.build();

Yeah, I think I should add some more explanations about the buffer configuration. I'll take care of it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants