Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to refer to objects in grok patterns #39

Closed
willemdh opened this issue Jul 8, 2018 · 5 comments
Closed

Best way to refer to objects in grok patterns #39

willemdh opened this issue Jul 8, 2018 · 5 comments

Comments

@willemdh
Copy link
Contributor

willemdh commented Jul 8, 2018

Hello,

I was wondering why the grok debugger in Kibana seems not able to understand objects when referenced with [object][subobject] notation?

image

Does this mean I should use 'object.subobject' in grok patterns if I want to facilitate grok debugging? http://grokconstructor.appspot.com/do/match seems to have the same behaviour.

Grtz

Willem

@willemdh
Copy link
Contributor Author

willemdh commented Jul 9, 2018

It seems Logstash is treating fields different if they are defined as [][] vs .
For example this optional list of request violations grok pattern:

((?<event.action>Request) violations: %{GREEDYDATA:f5.dcc.violations.blocked}. )?

As using [][] notation in the regex capture makes Logstash fail I have to use . notation (see event.action)

But when I later create a conditional using for example:

        if [event][action] == "Request" {
            mutate {
                replace => { "[event][action]" => "Request passed obj" }
                add_tag => "grok_f5_dcc_test2
            }
        }
        if "%{[event][action]}" == "Request" {
            mutate {
                replace => { "[event][action]" => "Request passed var obj" }
                add_tag => "grok_f5_dcc_test3
            }
        }
        if "%{event.action}" == "Request" {
            mutate {
                replace => { "[event][action]" => "Request passed var dot" }
                add_tag => "grok_f5_dcc_test4
            }
        }

All the above conditionals fail somehow... So how should I do a named regex capture to an object if I can't use [][]? For the record, the field event.action is indexed correctly to Elasticsearch.

Also tried with:

((?<[event][action]>Request) violations: %{GREEDYDATA:f5.dcc.violations.request}. )?
((?<\[event]\[action]>Request) violations: %{GREEDYDATA:f5.dcc.violations.request}. )?
((?<\\[event]\\[action]>Request) violations: %{GREEDYDATA:f5.dcc.violations.request}. )?

But those make my Logstash fail to execute action.. I found this issue: logstash-plugins/logstash-filter-grok#66 which actually describes my problem.. I'm not sure how I can continue migrating my F5 grok patterns to objects with this issue.. Is there anyone who knows a workaround?

@praseodym
Copy link
Contributor

Elasticsearch and Logstash treat names with dots in them differently. For data indexed in Elasticsearch 5.0 and newer, the JSON objects {"a": {"b": "c"}} and {"a.b": "c"} are equal. In Logstash [a][b] and a.b two different fields, which is what you're hitting here.

It is preferable to index events in the form of {"a": {"b": "c"}} (so [a][b] in Logstash), mainly because it's way easier to work with in other languages. For example, in Go you can use nested structs to represent different ECS field groups.

To work around the problem with the grok filter you can simply use a.b or a_b in the regex and then use mutate to rename the field to [a][b].

@prehor
Copy link

prehor commented Jul 9, 2018

I run the logstash with the configuration file:

input {
  generator {
    lines => [ "Request violations: blocked" ]
    count => 1
  }
}

filter {
  grok {
    match => { "message" => "((?<event.action>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?" }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

and with the --log.level=debug to see expanded regular expression:

[2018-07-09T18:26:05,887][DEBUG][logstash.filters.grok    ] Grok compiled OK {:pattern=>"((?<event.action>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?", :expanded_pattern=>"((?<event.action>Request) violations: (?<GREEDYDATA:[f5][dcc][violations][blocked]>.*))?"}
[2018-07-09T18:26:07,171][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
{
            "host" => "ad8a32273295",
        "@version" => "1",
         "message" => "Request violations: blocked",
              "f5" => {
        "dcc" => {
            "violations" => {
                "blocked" => "blocked"
            }
        }
    },
    "event.action" => "Request",
        "sequence" => 0,
      "@timestamp" => 2018-07-09T18:26:06.128Z
}

The expanded GREEDYDATA grok pattern contains it's name as the field name prefix. I modified regex part of your pattern to contain some pattern name prefix:

filter {
  grok {
    match => { "message" => "((?<REQUEST:[event][action]>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?" }
  }
}

Expanded regular expression looks good and [event][action] field is set properly:

[2018-07-09T18:32:56,692][DEBUG][logstash.filters.grok    ] Grok compiled OK {:pattern=>"((?<REQUEST:[event][action]>Request) violations: %{GREEDYDATA:[f5][dcc][violations][blocked]})?", :expanded_pattern=>"((?<REQUEST:[event][action]>Request) violations: (?<GREEDYDATA:[f5][dcc][violations][blocked]>.*))?"}
[2018-07-09T18:32:57,903][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
{
       "message" => "Request violations: blocked",
      "@version" => "1",
      "sequence" => 0,
    "@timestamp" => 2018-07-09T18:18:56.896Z,
          "host" => "ad8a32273295",
         "event" => {
        "action" => "Request"
    },
            "f5" => {
        "dcc" => {
            "violations" => {
                "blocked" => "blocked"
            }
        }
    }
}

I hope it helps you.

@jsvd
Copy link
Member

jsvd commented Jul 10, 2018

This confusion comes differences in the platforms grok is running on: logstash uses square brackets for field references and ingest node uses dots, so kibana's grok debugger will use dots as well.

The solution that creates less friction is for one of these two (or both) to support both notations.
On the Logstash side this has been proposed already and could likely be done without any breaking changes.

@webmat
Copy link
Contributor

webmat commented Aug 17, 2018

Does everyone here agree that this issue in ECS can be closed, in favour of the one in the Logstash repo? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants