Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

td-agent v3.7.1 ssl hostname does not match the server certificate #3028

Closed
toyaser opened this issue Jun 5, 2020 · 3 comments
Closed

td-agent v3.7.1 ssl hostname does not match the server certificate #3028

toyaser opened this issue Jun 5, 2020 · 3 comments

Comments

@toyaser
Copy link

toyaser commented Jun 5, 2020

Describe the bug
We are using td-agent v3.7.1
This is using fluentd version 1.10.2 and fluent-plugin-elasticsearch v4.0.7

We have a 3 node local elasticsearch cluster setup where starting up the td-agent will continue to work for around 18-20 hours after which we start to see fluentd fail with the following error:

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=16 next_retry_seconds=2020-05-29 17:18:06 +0000 chunk="5a6904a4700dc751015bf6f7fb2e0bc1" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.1\" does not match the server certificate (OpenSSL::SSL::SSLError)"

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=17 next_retry_seconds=2020-05-27 02:35:12 +0000 chunk="5a690483041ba29bda96202b35491072" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.2\" does not match the server certificate (OpenSSL::SSL::SSLError)"

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=18 next_retry_seconds=2020-05-27 11:17:37 +0000 chunk="5a69048c8e1d158c8826c73a15f903b0" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.3\" does not match the server certificate (OpenSSL::SSL::SSLError)"

To Reproduce
Leave td-agent running for long enough.

Expected behavior
fluentd should continue to ship logs with no interruptions.

Your Environment

  • Fluentd or td-agent version: td-agent 3.1.1
  • Operating system: Windows Server 2016
  • Elasticsearch version: 6.2.0

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

<!-- Write your configuration here -->

Your Error Log

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=19 next_retry_seconds=2020-05-29 17:18:06 +0000 chunk="5a6904a4700dc751015bf6f7fb2e0bc1" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.1\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=16 next_retry_seconds=2020-05-27 02:35:12 +0000 chunk="5a690483041ba29bda96202b35491072" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.2\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=17 next_retry_seconds=2020-05-27 11:17:37 +0000 chunk="5a69048c8e1d158c8826c73a15f903b0" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.3\" does not match the server certificate (OpenSSL::SSL::SSLError)"

Additional context

What is interesting, is that logs will be shipped consistently and then will suddenly stop working. Also to note we have 3 separate servers each shipping logs to the same elasticsearch cluster, and all 3 servers will eventually (around the same time) fail with the exact same reason.

A restart of the fluentd service gets rid of the issue, but any logs in the buffer are lost and manual recovery has to be done.

Another point to make is using a much older version of td-agent v3.1.1 which uses fluentd v1.0.2 and fluent-plugin-elasticsearch v2.4.0 works with no issues.

Using the old version of td-agent, we have been running for over a week with no issues.

@toyaser toyaser changed the title td-agent v3.7.1 ssl issue td-agent v3.7.1 ssl hostname does not match the server certificate Jun 7, 2020
@repeatedly
Copy link
Member

This seems fluent-plugin-elasticsearch issue, closed.
I'm not sure but elasticsearch-ruby's reconnection mechanizm may cause the problem.

@toyaser
Copy link
Author

toyaser commented Jun 11, 2020

Thank you, you are more than probably correct. If anyone is following this, I created a ticket in the fluent-plugin-elasticsearch repo here

@urpylka
Copy link

urpylka commented Apr 15, 2024

Same problem with Opensearch output plugin on fluent/fluentd-kubernetes-daemonset:v1.16.3-debian-opensearch-2.1 (fluent-plugin-opensearch version 1.1.4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants