Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Redis #18

Closed
rtoma opened this Issue Dec 17, 2012 · 16 comments

Comments

Projects
None yet
7 participants
@rtoma
Copy link

rtoma commented Dec 17, 2012

Hi Jordan,

Kudos on your Logstash and Lumberjack work!

Wouldn't it be cool to have lumberjack send fresh timber directly to Redis?
I am currently designing a Logstash -> Elasticsearch architecture in which Redis plays a major role.

My high-level design looks like this:

  • logging - local on ~200 servers (not on shared storage, to avoid nfs mayhem)
  • solution X

--- from here on its centralized/scaled ---

  • redis - a buffer between input and parsing LS instances
  • logstash (parsing) - centralized, N instances with M workers
  • redis - buffer between parsing & forwarding LS instances
  • logstash (forwarding) - doing nothing more than redis.blpop and push to ES
  • elasticsearch
  • kibana

I could run another tier of LS instances as "solution X", but running ~200 LS instances seems like overkill and wasteful. I am sure you agree.

So I was thinking: wouldn't it be cool to use Lumberjack with Redis support?

What are your ideas?

Regards,
Renzo

@jordansissel

This comment has been minimized.

Copy link
Contributor

jordansissel commented Dec 18, 2012

There are other projects similar to lumberjack that support redis:

I don't anticipate supporting redis. The lumberjack protocol has requirements that redis doesn't support (compression, encryption, message acknowledgement). In fact, lumberjack probably wouldn't exist if the redis developers were amenable to patches that provided compression, encryption, etc, but they refuse.

@jordansissel

This comment has been minimized.

Copy link
Contributor

jordansissel commented Dec 18, 2012

(closing, but still willing to discuss options!)

@vincentbernat

This comment has been minimized.

Copy link

vincentbernat commented Jan 14, 2014

Would you consider patches that add Redis support (without compression, encryption nor message acknowledgment)? Lumberjack is not interesting only because of its protocol but by being fast and small. Other solutions with Redis support are written in a slower language.

@jordansissel

This comment has been minimized.

Copy link
Contributor

jordansissel commented Jan 14, 2014

I'm still open to discussion, but I'm confident that the lumberjack protocol gives us more benefit than using redis, so I'll throw a bunch of questions at you -

What is a "slower language"? Given it's easy to write slow or buggy code in any language, why are we blaming the language and not a bug in the shipper code or design?

There are so many shipping options available out there. Beaver, logstash, rsyslog, and many many others.
What shipper(s) have you tried and what was too slow about them? Were they slow on purpose? If not, can we fix those bugs?

What benefits are added by adding redis support? What about the lumberjack protocol doesn't do what you need? Can we fix the protocol?

@vincentbernat

This comment has been minimized.

Copy link

vincentbernat commented Jan 14, 2014

As for Redis, I see two advantages to have a broker:

  1. I can restart logstash without losing messages.
  2. I can add more logstash without reconfiguring every client.

I am unsure if logstash-forwarder could fix the first point. Is it the spool-size?

I have tried Beaver but it is quite slow at reading very big files (which is problematic for my use case, it is reading files slower that they are being written). It is also known to suffer from a memory leak. I was told to try the Perl equivalent. This could be a solution. I didn't consider rsyslog. As far as I know, I cannot easily attach tags and things like that to preprocess the logs. But maybe I am wrong.

@jordansissel

This comment has been minimized.

Copy link
Contributor

jordansissel commented Jan 14, 2014

  1. the lumberjack protocol uses reliable messaging to ensure messages are not lost.
  2. you can achieve this today by using dns to list downstream logstash servers (logstash-forwarder resolves dns and picks an address from that list). Alternately, you can use haproxy.

Regarding 'fixing the first point' - the lumberjack protocol has application-level message acknowledgements to prevent loss of messages. If you see message loss, it's either a bug or something else going wrong.

Regarding beaver, have you raised a bug on the project? These certainly sound like bugs if it's slow at reading files and additionally leaks memory.

Regarding rsyslog, you would be using it in this case for the same purpose as beaver or logstash-forwarder. Simply ship logs from files to logstash just like anything else.

@vincentbernat

This comment has been minimized.

Copy link

vincentbernat commented Jan 14, 2014

For 1, what happens if you get a "connection refused" or something like that? Is the message dropped? If it is buffered, how can I control how much buffer to use?

For 2, you are right, haproxy would work just fine.

For beaver, the memory leak is a known problem: python-beaver/python-beaver#186. For rsyslog, the inability to add types and tags is limiting compared to beaver and logstash-forwarder.

@jordansissel

This comment has been minimized.

Copy link
Contributor

jordansissel commented Jan 14, 2014

On connection refused, logstash-forwarder will simply retry another server (possibly the same one). No messages are lost. Messages are only forgotten once they are acknowledged by a downstream server. There's little or no buffering in a typical sense because the files you are reading with lsf are the buffer/queue.

@vincentbernat

This comment has been minimized.

Copy link

vincentbernat commented Jan 14, 2014

OK, I thought that there were internal buffering but instead, the whole chain will just wait if we cannot send messages to logstash. This is all fine by me then.

untergeek added a commit that referenced this issue Mar 13, 2014

Merge pull request #18 from untergeek/1.4.x
Fix package building, version bump
@kayrus

This comment has been minimized.

Copy link

kayrus commented Apr 11, 2014

I also prefer redis support.

@kayrus

This comment has been minimized.

Copy link

kayrus commented Apr 11, 2014

When I tried to use logstash-forwarder as simple forwarder to logstash parser, it used just one logstash parser (but I defined several). So performance became very low.
schematically it was:
logstash-forwarder -> logstash cluster with elasticsearch (actually just one logstash server from cluster)

When I tried to forward logs to local logstash (I don't need ssl for that, so it would be great if I could disable SSL) without any parser (just send logs to redis, the rest logstash parser cluster transfer logs from redis) - performance increased.
schematically it was:
logstash-forwarder -> logstash -> redis -> (logstash cluster with elasticsearch)

So I guess if logstash-forwarder could just send logs to redis, performance became much higher. And it will be no necessary to keep logstash instance for redis logs writing.
schematically:
logstash-forwarder -> redis -> (logstash cluster with elasticsearch)

@jordansissel

This comment has been minimized.

Copy link
Contributor

jordansissel commented Apr 11, 2014

This project is intended to be light weight, simple, secure, reliable, and fast. Adding additional message transport options (redis, rabbitmq, etc) adds weight and complexity.

Most of the requests I see to have lsf output to other transports are based on ideas that seem to mistake bugs for intentional behaviors. lsf is supposed to be fast over the lumberjack protocol. If it isn't, either the downstream (logstash) is slow, or there's a bug. We can fix bugs without adding complexity (new transports).

@wjimenez5271

This comment has been minimized.

Copy link

wjimenez5271 commented Oct 9, 2014

I apologize if this isn't the right forum for this, but this issue seemed to closely related to my interest. I'm wondering if the pure logstash fowarder + lumberjack model addresses the buffering / event spike concern and if not what the best practice for doing so is. I get the sense from @jordansissel's comments that a queue is unnecessary (but I could be misinterpreting his intent). Should I assume the lumberjack input has a built in mechanism for buffering events it can't process immediately? Thanks in advance.

@driskell

This comment has been minimized.

Copy link
Contributor

driskell commented Oct 9, 2014

Hi @wjimenez5271 - forwarder waits for logstash to finish the spool of events before sending the next. Thus it will slow down as needed. Only issue at the moment is if logstash runs really slow due to huge bulks of events, forwarders can timeout and reconnect faster than logstash realises and cause it to crash. If you increase network timeout enough, and/or reduce spool size on forwarders enough - you can live completely without a redis queue. This only applies to instances where spikes cause the logstash processing time to be longer than the network timeout. For light loads everything is fine

My PR #180 made it so you don't need to reduce the spool size or increase network timeouts. I have since forked my changes into my own project, log-courier, so I've stopped maintaining that PR and closed it.

@nullsign

This comment has been minimized.

Copy link

nullsign commented Oct 20, 2014

@jordansissel

Redis has a very useful role in large scale logstash environments; it can act as a buffer between the agents and the ES nodes; where the indexer can run against the data in Redis without issue.

Your LSF would be very handy if it could send unencrypted data to Redis, as then I could use it on a number of windows servers to send their log data to Redis for the above architecture. Without that feature, I can not use your tool in that architecture.

Please reconsider adding this feature to LSF.

@jordansissel

This comment has been minimized.

Copy link
Contributor

jordansissel commented Oct 20, 2014

@rcaston Logstash already supports this (redis output). Recommend you use that.

urishalit pushed a commit to Aloomaio/logstash-forwarder that referenced this issue Oct 27, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.