Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RabbitMQ output plugin blocks logstash to start if server is down #37

Open
ebuildy opened this issue Oct 9, 2020 · 5 comments
Open

Comments

@ebuildy
Copy link

ebuildy commented Oct 9, 2020

  • Version: 7.9.2
  • Operating System: Linux, Centos, official Elastic Helm

We run logstash, with RabbitMQ output, and our RabbitMQ server is down. Logstash never start properly (I mean, HTTP API is not started), looks like it stucks at plugin registration, so I suspect the connection inside register method, to prevent Logstash to start, is that possible?

If so, what about lazy connect to RabbitMQ server, when the 1st event comes?

Our use case:

We use logstash as a HTTP service, to send data to RabbitMQ. RabbitMQ server can be down for some hours. Our infra is a kubernetes cluster, where logstash is scaled via HPA, we use logstash in-memory queue as a buffer in front of RabbitMQ, so if RabbitMQ server is down > new logstash pod are created so queue is "load-balanced".

But since RabbitMQ output plugin is blocking logstash to start if server is down, nothing happen.

(Reference > elastic/logstash#12330)

@mbitzos
Copy link

mbitzos commented Jan 6, 2021

I can confirm that this is happening with our system as well.

Our system is set up as the following:

We have many different N number of pipelines that have this:

pipeline-N.conf:

output
  pipeline {
          id => "RABBIT-FORWARDER"
          send_to => "rabbitmq"
          ensure_delivery => false
      }

and then we have a single rabbitmq pipeline that is setup like this:
rabbit.conf:

input {
    pipeline {
        id => "RABBIT-PIPELINE"
        address => "rabbitmq"
    }
output {
    rabbitmq {
       id                     => "ALL-RABBIT"
       exchange               => "***"
       exchange_type          => "direct"
       key                    => "*****"
       host                   => [ **** ]
       durable                => true
       persistent             => false
       connect_retry_interval => 60
       user                      => "***"
       password                  => "***"
    }
}

Assume the "***" is actual values.

The error occurs in this use-case:

  1. Logstash is down
  2. RabbitMQ is not running/not reachable
  3. Start Logstash

At this point, logstash is correctly dealing with the packets being sent to it (sending to elastic in our case and not sending to rabbitmq for obvious reasons). But the really strange thing is that the logstash HTTP API is completely unreachable.
Trying to get the health status of the logstash instance via /_node/pipelines HTTP API will not work as it is unreachable.

Once rabbit gets turned back on the logstash HTTP API works fine again and we can get the health.
Also, this will work completely fine if we do the above steps but RabbitMQ IS running before logstash starts.

So it definitely seems to be related to the initial rabbitmq connection breaking thats somehow relating to the logstash http server being blocked or crashing?

My (only slightly) educated guess is that this might have something to do with March Hare and Faraday?
Only reason I mentioned that is that if we run logstash with debug logging we get A LOT of these system printed out to our console:
WARNING: Unexpected middleware set after the adapter. This won't be supported from Faraday 1.0.
Not sure if this is a by-product or sympton of the issue.

Anyways, this is causing a lot of problems for us and our only solution is to us an older version of the logstash output plugin with our own fixes to catch the connection breaking. But again this is less than ideal since a lot of changes have happened since and we would rather just use the built in plugin to avoid more unnecessary in-house fixes.

@massimiliano-dalcero
Copy link

today, I confirm that it is still an open problem :(

it's a really annoying problem that causing a lot of problems :(

@massimiliano-dalcero
Copy link

massimiliano-dalcero commented Apr 23, 2022

I put my hands in the code and I solved the bug (YES, it is a BUG and it's not so hard to fix it, very few changes are needed). At the moment I need some more test, but after this I try to submit code :)

@ebuildy
Copy link
Author

ebuildy commented Apr 24, 2022 via email

@massimiliano-dalcero
Copy link

massimiliano-dalcero commented Apr 24, 2022

is absurd that the official developers do not fix this :(

but mainly, how did they not notice it? ... probably, perhaps, they never did any test in a real evinronment :(

Ho really good new! Many thanks 👍 Le sam. 23 avr. 2022 à 23:40, Massimiliano Dal Cero < @.***> a écrit :

I put my hands in the code and I solved (it's not so hard, very few changes are needed). At the moment I need some more test, but after this I try to submit code :) — Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJZ2JU3IC3HPGAI47K3VLVGRU23ANCNFSM4SKCE6MQ . You are receiving this because you authored the thread.Message ID: </issues/37/1107653464@ github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants