Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: filter log records based on JSON fields #101

Open
JensErat opened this issue May 7, 2020 · 3 comments
Open

Feature request: filter log records based on JSON fields #101

JensErat opened this issue May 7, 2020 · 3 comments

Comments

@JensErat
Copy link
Contributor

JensErat commented May 7, 2020

If I get it right, grok_exporter currently only makes use of JSON fields apart from the log messsage (from webhook tailer) for assigning labels. I'd like to propose a change to also filter based of JSON fields.

An example configuration, but I'm also fine with other approaches if anybody comes up with a better proposal:

metrics:
# example log record: { "metadata": {"app": "foo"}, "stage": "prod", "log": "bad error event: 42!" }
- type: counter
  name: myapp_errors_total
  help: Total number of errors for my app
  match: 'bad error event'
  filters:
  - path: '.metadata.app' # jsonpath
    value: 'foo' # maybe even regex support?
  labels:
    stage: '{{ index .extra "stage" }}'

Before evaluating the grok matcher, grok_exporter would loop over all filters and have only pass those records that match all lines.

Why do I want this?

  • make sure we only match the correct logs, not unrelated logs from another component (I might not even have control over the output)
  • performance improvements, I guess matching some fields is much better performance compared to potentially complex grok patterns

Given there is agreement on this design proposal and we actually go the grok_exporter route (not decided yet, and might take some time), we would also contribute code.

@fstab
Copy link
Owner

fstab commented May 9, 2020

I think it sounds like a good idea. grok_exporter was originally made for plain text log lines, but more and more people are using it for Json, so it would be good to support Json better.

This is partly related to #97, the conditions proposed there are similar to the filters in your proposal. We could try and work something out together.

@JensErat
Copy link
Contributor Author

JensErat commented May 9, 2020

I guess this is the new plain text log stream: meta data from the orchestration layer (Kubernetes, anything else if that still exists) and then the plain text message. Great way to pre-filter when grepping for logs -- works out pretty will with log databases like Loki (filter out most of the logs using meta data, then apply linear search for whatever you're looking for) but also stream evaluation where complex regular expressions are applied (like grok exporter).

Yes, the other proposal also sounds promising. I'd pledge for making this as performant as possible (this is a great way to pre-filter on high load systems before applying the rather expensive grok patterns), but generic enough that it fits most use cases. Our use case would include hierarchical JSON attributes (thus jsonpath), but if somebody comes up with a more generic solution, let's go for that!

We have log messages mostly like this one:

{ "time": "iso-timestamp", "kubernetes": {"pod": "coredns-fb8b8dccf-bgrsr", "namespace": "kube-system", "container": "coredns"}, "hostname": "node-123", "log": "example message"}

I'll go over the other proposal next week and see where both of them match already anyway.

@Skeen
Copy link
Contributor

Skeen commented May 9, 2020

Hi @JensErat,

I am the author of the other proposal. I think our proposals are somewhat equivalent, and that we should strive to find a common solution.

The filter you present:

  filters:
  - path: '.metadata.app' # jsonpath
    value: 'foo' # maybe even regex support?

Could be expressed with:

  conditions:
    - '{{ eq (index .extra "metadata" "app") "foo" }}'

In the proposal I wrote. My proposal is objectively harder to read, and likely also less performant, but does have the advantage of stronger expressabilitiy using go-templates.
-- I think it would be good to find a middle ground.

I think filters is a better name than conditions, and that we should go with filters.

Additionally I think we should either add a precheck / postcheck flag to specify whether the filter should be applied before or after executing the grok pattern, such that the option to reject samples early is there, but there is also the option to do more detailed filtering + filtering using the file tailer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants