td-agent v3.7.1 ssl hostname does not match the server certificate #3028

toyaser · 2020-06-05T18:53:25Z

Describe the bug
We are using td-agent v3.7.1
This is using fluentd version 1.10.2 and fluent-plugin-elasticsearch v4.0.7

We have a 3 node local elasticsearch cluster setup where starting up the td-agent will continue to work for around 18-20 hours after which we start to see fluentd fail with the following error:

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=16 next_retry_seconds=2020-05-29 17:18:06 +0000 chunk="5a6904a4700dc751015bf6f7fb2e0bc1" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.1\" does not match the server certificate (OpenSSL::SSL::SSLError)"

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=17 next_retry_seconds=2020-05-27 02:35:12 +0000 chunk="5a690483041ba29bda96202b35491072" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.2\" does not match the server certificate (OpenSSL::SSL::SSLError)"

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=18 next_retry_seconds=2020-05-27 11:17:37 +0000 chunk="5a69048c8e1d158c8826c73a15f903b0" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.3\" does not match the server certificate (OpenSSL::SSL::SSLError)"

To Reproduce
Leave td-agent running for long enough.

Expected behavior
fluentd should continue to ship logs with no interruptions.

Your Environment

Fluentd or td-agent version: td-agent 3.1.1
Operating system: Windows Server 2016
Elasticsearch version: 6.2.0

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

<!-- Write your configuration here -->

Your Error Log

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=19 next_retry_seconds=2020-05-29 17:18:06 +0000 chunk="5a6904a4700dc751015bf6f7fb2e0bc1" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.1\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=16 next_retry_seconds=2020-05-27 02:35:12 +0000 chunk="5a690483041ba29bda96202b35491072" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.2\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=17 next_retry_seconds=2020-05-27 11:17:37 +0000 chunk="5a69048c8e1d158c8826c73a15f903b0" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.3\" does not match the server certificate (OpenSSL::SSL::SSLError)"

Additional context

What is interesting, is that logs will be shipped consistently and then will suddenly stop working. Also to note we have 3 separate servers each shipping logs to the same elasticsearch cluster, and all 3 servers will eventually (around the same time) fail with the exact same reason.

A restart of the fluentd service gets rid of the issue, but any logs in the buffer are lost and manual recovery has to be done.

Another point to make is using a much older version of td-agent v3.1.1 which uses fluentd v1.0.2 and fluent-plugin-elasticsearch v2.4.0 works with no issues.

Using the old version of td-agent, we have been running for over a week with no issues.

The text was updated successfully, but these errors were encountered:

repeatedly · 2020-06-09T12:24:45Z

This seems fluent-plugin-elasticsearch issue, closed.
I'm not sure but elasticsearch-ruby's reconnection mechanizm may cause the problem.

toyaser · 2020-06-11T18:46:43Z

Thank you, you are more than probably correct. If anyone is following this, I created a ticket in the fluent-plugin-elasticsearch repo here

urpylka · 2024-04-15T09:47:46Z

Same problem with Opensearch output plugin on fluent/fluentd-kubernetes-daemonset:v1.16.3-debian-opensearch-2.1 (fluent-plugin-opensearch version 1.1.4)

toyaser changed the title ~~td-agent v3.7.1 ssl issue~~ td-agent v3.7.1 ssl hostname does not match the server certificate Jun 7, 2020

repeatedly closed this as completed Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

td-agent v3.7.1 ssl hostname does not match the server certificate #3028

td-agent v3.7.1 ssl hostname does not match the server certificate #3028

toyaser commented Jun 5, 2020 •

edited

Loading

repeatedly commented Jun 9, 2020

toyaser commented Jun 11, 2020

urpylka commented Apr 15, 2024

td-agent v3.7.1 ssl hostname does not match the server certificate #3028

td-agent v3.7.1 ssl hostname does not match the server certificate #3028

Comments

toyaser commented Jun 5, 2020 • edited Loading

repeatedly commented Jun 9, 2020

toyaser commented Jun 11, 2020

urpylka commented Apr 15, 2024

toyaser commented Jun 5, 2020 •

edited

Loading