Best way to refer to objects in grok patterns #39

willemdh · 2018-07-08T10:16:11Z

Hello,

I was wondering why the grok debugger in Kibana seems not able to understand objects when referenced with [object][subobject] notation?

Does this mean I should use 'object.subobject' in grok patterns if I want to facilitate grok debugging? http://grokconstructor.appspot.com/do/match seems to have the same behaviour.

Grtz

Willem

willemdh · 2018-07-09T16:44:53Z

It seems Logstash is treating fields different if they are defined as [][] vs .
For example this optional list of request violations grok pattern:

((?<event.action>Request) violations: %{GREEDYDATA:f5.dcc.violations.blocked}. )?

As using [][] notation in the regex capture makes Logstash fail I have to use . notation (see event.action)

But when I later create a conditional using for example:

        if [event][action] == "Request" {
            mutate {
                replace => { "[event][action]" => "Request passed obj" }
                add_tag => "grok_f5_dcc_test2
            }
        }
        if "%{[event][action]}" == "Request" {
            mutate {
                replace => { "[event][action]" => "Request passed var obj" }
                add_tag => "grok_f5_dcc_test3
            }
        }
        if "%{event.action}" == "Request" {
            mutate {
                replace => { "[event][action]" => "Request passed var dot" }
                add_tag => "grok_f5_dcc_test4
            }
        }

All the above conditionals fail somehow... So how should I do a named regex capture to an object if I can't use [][]? For the record, the field event.action is indexed correctly to Elasticsearch.

Also tried with:

((?<[event][action]>Request) violations: %{GREEDYDATA:f5.dcc.violations.request}. )?
((?<\[event]\[action]>Request) violations: %{GREEDYDATA:f5.dcc.violations.request}. )?
((?<\\[event]\\[action]>Request) violations: %{GREEDYDATA:f5.dcc.violations.request}. )?

But those make my Logstash fail to execute action.. I found this issue: logstash-plugins/logstash-filter-grok#66 which actually describes my problem.. I'm not sure how I can continue migrating my F5 grok patterns to objects with this issue.. Is there anyone who knows a workaround?

praseodym · 2018-07-09T18:45:58Z

Elasticsearch and Logstash treat names with dots in them differently. For data indexed in Elasticsearch 5.0 and newer, the JSON objects {"a": {"b": "c"}} and {"a.b": "c"} are equal. In Logstash [a][b] and a.b two different fields, which is what you're hitting here.

It is preferable to index events in the form of {"a": {"b": "c"}} (so [a][b] in Logstash), mainly because it's way easier to work with in other languages. For example, in Go you can use nested structs to represent different ECS field groups.

To work around the problem with the grok filter you can simply use a.b or a_b in the regex and then use mutate to rename the field to [a][b].

prehor · 2018-07-09T21:42:35Z

I run the logstash with the configuration file:

input {
  generator {
    lines => [ "Request violations: blocked" ]
    count => 1
  }
}

filter {
  grok {
    match => { "message" => "((?<event.action>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?" }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

and with the --log.level=debug to see expanded regular expression:

[2018-07-09T18:26:05,887][DEBUG][logstash.filters.grok    ] Grok compiled OK {:pattern=>"((?<event.action>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?", :expanded_pattern=>"((?<event.action>Request) violations: (?<GREEDYDATA:[f5][dcc][violations][blocked]>.*))?"}
[2018-07-09T18:26:07,171][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
{
            "host" => "ad8a32273295",
        "@version" => "1",
         "message" => "Request violations: blocked",
              "f5" => {
        "dcc" => {
            "violations" => {
                "blocked" => "blocked"
            }
        }
    },
    "event.action" => "Request",
        "sequence" => 0,
      "@timestamp" => 2018-07-09T18:26:06.128Z
}

The expanded GREEDYDATA grok pattern contains it's name as the field name prefix. I modified regex part of your pattern to contain some pattern name prefix:

filter {
  grok {
    match => { "message" => "((?<REQUEST:[event][action]>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?" }
  }
}

Expanded regular expression looks good and [event][action] field is set properly:

[2018-07-09T18:32:56,692][DEBUG][logstash.filters.grok    ] Grok compiled OK {:pattern=>"((?<REQUEST:[event][action]>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?", :expanded_pattern=>"((?<REQUEST:[event][action]>Request) violations: (?<GREEDYDATA:[f5][dcc][violations][blocked]>.*))?"}
[2018-07-09T18:32:57,903][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
{
       "message" => "Request violations: blocked",
      "@version" => "1",
      "sequence" => 0,
    "@timestamp" => 2018-07-09T18:18:56.896Z,
          "host" => "ad8a32273295",
         "event" => {
        "action" => "Request"
    },
            "f5" => {
        "dcc" => {
            "violations" => {
                "blocked" => "blocked"
            }
        }
    }
}

I hope it helps you.

jsvd · 2018-07-10T09:42:12Z

This confusion comes differences in the platforms grok is running on: logstash uses square brackets for field references and ingest node uses dots, so kibana's grok debugger will use dots as well.

The solution that creates less friction is for one of these two (or both) to support both notations.
On the Logstash side this has been proposed already and could likely be done without any breaking changes.

webmat · 2018-08-17T20:48:24Z

Does everyone here agree that this issue in ECS can be closed, in favour of the one in the Logstash repo? :-)

willemdh closed this as completed Aug 18, 2018

krizb8 mentioned this issue Apr 5, 2019

[RFC] Use dot notation for field references. elastic/logstash#8772

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to refer to objects in grok patterns #39

Best way to refer to objects in grok patterns #39

willemdh commented Jul 8, 2018 •

edited

Loading

willemdh commented Jul 9, 2018 •

edited

Loading

praseodym commented Jul 9, 2018

prehor commented Jul 9, 2018

jsvd commented Jul 10, 2018

webmat commented Aug 17, 2018 •

edited

Loading

Best way to refer to objects in grok patterns #39

Best way to refer to objects in grok patterns #39

Comments

willemdh commented Jul 8, 2018 • edited Loading

willemdh commented Jul 9, 2018 • edited Loading

praseodym commented Jul 9, 2018

prehor commented Jul 9, 2018

jsvd commented Jul 10, 2018

webmat commented Aug 17, 2018 • edited Loading

willemdh commented Jul 8, 2018 •

edited

Loading

willemdh commented Jul 9, 2018 •

edited

Loading

webmat commented Aug 17, 2018 •

edited

Loading