New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat] add grok functionality to preparse log lines #679

Closed
stuart-warren opened this Issue Jan 9, 2016 · 13 comments

Comments

Projects
None yet
9 participants
@stuart-warren
Copy link

stuart-warren commented Jan 9, 2016

Sounds like a pretty obvious win for efficiency, especially seeing as those creating the logs are more likely to know what format they are in.

There are a couple of golang bindings for grok already in existence:
https://github.com/jbuchbinder/gogrok
https://github.com/blakesmith/go-grok
https://github.com/scalingdata/go-grok

Happy to take a crack at it myself

@tsg

This comment has been minimized.

Copy link
Collaborator

tsg commented Jan 11, 2016

Hi, we don't plan to add Grok functionality to filbeat because we want to keep it light in terms of CPU requirements. Instead, there are plans to add Grok functionality to Elasticsearch itself.

This means that you will be able to send logs from Filebeat to Elasticsearch directly and still get parsing features. Until this is implemented in Elasticsearch, we recommend using Filebeat -> Logstash -> Elasticsearch for parsing the logs. Even when this will be possible without Logstash, having it in the pipeline will give you more flexibility (output to other systems, queuing, ruby plugins, etc.).

Closing the ticket, but happy to discuss further.

@tsg tsg closed this Jan 11, 2016

@stuart-warren

This comment has been minimized.

Copy link

stuart-warren commented Jan 12, 2016

Hi,

So that sounds a bit backwards to me.
Lets say I have 1000+ servers owned by various development teams creating logs running all sorts of applications and in heaven knows what sort of log format.
There's a central Logstash/Elasticsearch cluster owned by Operations and used by everyone.

Now rather than some poor Ops guys having to understand and keep up with all the crazy input possibilities, we delegate the job to the devs to do groking on their own servers using filebeat configuration.

Not only do we delegate this massive job to those with the knowledge, we are less likely to break the log pipeline by avoiding tweaking Logstash or these new Elasticsearch settings. Plus we spread the load of the groking over all the app servers rather than concentrate it on a few poor infrastructure boxes.

Ideally Filebeat, etc should be for getting logs from their source to some central point.
Logstash should be for doing any extra processing/routing of logs
Elasticsearch should be for storage

Bundling log processing into Elasticsearch seems like a blatant violation of the "Do one thing and do it well" Unix philosophy.

@Megabibite

This comment has been minimized.

Copy link

Megabibite commented Jun 1, 2016

As a 'poor Ops guys having to understand and keep up with all the crazy input possibilities' I completely agree with you.

@yissachar

This comment has been minimized.

Copy link

yissachar commented Jun 1, 2016

We find ourselves in the same situation described by @stuart-warren. Centralized log parsing (Logstash or Elasticsearch) is not maintainable in a microservice context, where each service has its own parsing requirements. Instead we want each service to parse their own logs before they get shipped off.

I don't understand the CPU argument. Adding grok functionality shouldn't inherently bloat Filebeat. If it's not used, Filebeat CPU usage should remain lightweight, and if it is used and increases CPU usage, that's a tradeoff that we should be allowed to make.

@Megabibite

This comment has been minimized.

Copy link

Megabibite commented Jun 1, 2016

About cpu, i limit the log shipper usage on the server anyway in order to guarantee it cannot be the cause of any perf issue.
And i want to be alerted if my logs are too verbose and the log shipper cannot keep up.
I want it to drop the logs after a certain delay, alert me of this and resume on live/current logs.

@reqless

This comment has been minimized.

Copy link

reqless commented Jun 21, 2016

Is parsing using Filebeat still not available? We don't want the added overhead of Logstash. We want to stick with an 'EK' cluster (Elasticsearch + Kibana)

@ruflin

This comment has been minimized.

Copy link
Collaborator

ruflin commented Jun 22, 2016

Parsing in Filebeat is not available and is currently not on the Roadmap as the goal of beats is to stay as lightweight as possible. One option you have with the 5.0 releases is to send directly structured JSON logs.

In 5.0 elasticsearch has the ingest node feature which allows you do some basic parsing. Have a look here: https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html

@yissachar

This comment has been minimized.

Copy link

yissachar commented Jun 24, 2016

Hi @ruflin, can you expand on the "lightweight" argument? As I mentioned in my last comment, adding this feature should not impose any extra overhead for those that don't make use of it.

@ruflin

This comment has been minimized.

Copy link
Collaborator

ruflin commented Jun 28, 2016

@yissachar There are lots of facets of lightweight and there is definitively lots of gray areas. For me lightweight on the one hand is cpu / memory usage but also features and complexity of a project itself. Even though some features when not enabled not necessarly add cpu / memory overhead, they still add complexity to the project which makes it potentially harder to maintain. In your case you are well aware of what the consequences can be of enabling such a feature, but most people expect when enabling existing feature, that something lightweight stays lightweight.

A simple example that we already have to tackle is, that filters are sometimes very slow and cause the beat to use 80-100% of CPU. The reason is the in most cases very inefficient regexp expressions like .*abc.* are used, so its not directly a beats issue, but still it makes look filebeat not lightweight anymore.

For the grok part, I think there are two facets: One is grok patterns to simplify writing regexp expression for example for filters. The other part is adding log line processing with grok which is a different beast and as far as I understand that is what you are interested in.

In summary: Lightweight is not a "fixed" criteria and the beats project will also evolve over time. All I can say at the moment that this is not on top of our priority list as we think there are products in our stack which are much better in doing this task.

@pbkdf3

This comment has been minimized.

Copy link

pbkdf3 commented Jul 31, 2016

logstash can run wherever you want, it has a file input and all the outputs you want, including logstash. Just use it instead of filebeat if you want to grok in a distributed manner.

@AdamBraun

This comment has been minimized.

Copy link

AdamBraun commented Apr 2, 2017

I'll add to what @pbkdf3 has said, as a reference for those who'll stumble on this issue in the future and feel like @yissachar 's argument about microservices was left unanswered:
Since logstash can run anywhere, a possible microservice architecture would be to have a centralized logstash instance for each service's group of instances. Each service's instance will have a lightweight filebeat to share its resources with, that'll report to a centralized logstash instance, dedicated to this service alone. That logstash will then send the tailor parsed logs to either the main logstash or to elasticsearch itself.
This allows for both scalability and that luxurious personal parsing feel that we all wish for 😉

@yissachar

This comment has been minimized.

Copy link

yissachar commented Apr 5, 2017

We ended up just going with Fluentd in place of Filebeat. All other solutions had tradeoffs that we weren't comfortable with.

@vortex314

This comment has been minimized.

Copy link

vortex314 commented Sep 10, 2018

As filebeat is open source, it's open for customization. So it's exactly what we did : https://discuss.elastic.co/t/filebeat-with-grok-javascript-and-avro-codec/147890

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment