Feature request: filter log records based on JSON fields #101

JensErat · 2020-05-07T13:19:23Z

If I get it right, grok_exporter currently only makes use of JSON fields apart from the log messsage (from webhook tailer) for assigning labels. I'd like to propose a change to also filter based of JSON fields.

An example configuration, but I'm also fine with other approaches if anybody comes up with a better proposal:

metrics:
# example log record: { "metadata": {"app": "foo"}, "stage": "prod", "log": "bad error event: 42!" }
- type: counter
  name: myapp_errors_total
  help: Total number of errors for my app
  match: 'bad error event'
  filters:
  - path: '.metadata.app' # jsonpath
    value: 'foo' # maybe even regex support?
  labels:
    stage: '{{ index .extra "stage" }}'

Before evaluating the grok matcher, grok_exporter would loop over all filters and have only pass those records that match all lines.

Why do I want this?

make sure we only match the correct logs, not unrelated logs from another component (I might not even have control over the output)
performance improvements, I guess matching some fields is much better performance compared to potentially complex grok patterns

Given there is agreement on this design proposal and we actually go the grok_exporter route (not decided yet, and might take some time), we would also contribute code.

fstab · 2020-05-09T19:17:49Z

I think it sounds like a good idea. grok_exporter was originally made for plain text log lines, but more and more people are using it for Json, so it would be good to support Json better.

This is partly related to #97, the conditions proposed there are similar to the filters in your proposal. We could try and work something out together.

JensErat · 2020-05-09T19:30:17Z

I guess this is the new plain text log stream: meta data from the orchestration layer (Kubernetes, anything else if that still exists) and then the plain text message. Great way to pre-filter when grepping for logs -- works out pretty will with log databases like Loki (filter out most of the logs using meta data, then apply linear search for whatever you're looking for) but also stream evaluation where complex regular expressions are applied (like grok exporter).

Yes, the other proposal also sounds promising. I'd pledge for making this as performant as possible (this is a great way to pre-filter on high load systems before applying the rather expensive grok patterns), but generic enough that it fits most use cases. Our use case would include hierarchical JSON attributes (thus jsonpath), but if somebody comes up with a more generic solution, let's go for that!

We have log messages mostly like this one:

{ "time": "iso-timestamp", "kubernetes": {"pod": "coredns-fb8b8dccf-bgrsr", "namespace": "kube-system", "container": "coredns"}, "hostname": "node-123", "log": "example message"}

I'll go over the other proposal next week and see where both of them match already anyway.

Skeen · 2020-05-09T23:27:23Z

Hi @JensErat,

I am the author of the other proposal. I think our proposals are somewhat equivalent, and that we should strive to find a common solution.

The filter you present:

  filters:
  - path: '.metadata.app' # jsonpath
    value: 'foo' # maybe even regex support?

Could be expressed with:

  conditions:
    - '{{ eq (index .extra "metadata" "app") "foo" }}'

In the proposal I wrote. My proposal is objectively harder to read, and likely also less performant, but does have the advantage of stronger expressabilitiy using go-templates.
-- I think it would be good to find a middle ground.

I think filters is a better name than conditions, and that we should go with filters.

Additionally I think we should either add a precheck / postcheck flag to specify whether the filter should be applied before or after executing the grok pattern, such that the option to reject samples early is there, but there is also the option to do more detailed filtering + filtering using the file tailer.

fstab mentioned this issue May 9, 2020

Feature Request: Consider flexible conditions for exclusions #97

Open

thomas315 mentioned this issue Jun 25, 2021

Feature Request: Configurable logging options #159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: filter log records based on JSON fields #101

Feature request: filter log records based on JSON fields #101

JensErat commented May 7, 2020

fstab commented May 9, 2020

JensErat commented May 9, 2020

Skeen commented May 9, 2020 •

edited

Feature request: filter log records based on JSON fields #101

Feature request: filter log records based on JSON fields #101

Comments

JensErat commented May 7, 2020

fstab commented May 9, 2020

JensErat commented May 9, 2020

Skeen commented May 9, 2020 • edited

Skeen commented May 9, 2020 •

edited