-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grok: Maybe allow defining the type *in* the pattern definition #1859
Comments
Can it be done? Yes. If adding this functionality sacrifices a substantial amount of performance in order to obtain the desired simplicity, I am not in favor of it. We can better use that time to write thorough docs and examples and how-to guides that explain how to get the desired outcome, and why it works that way. In fact, I will take this as a blog post suggestion for the Elasticsearch site right now, while I have some time. If adding this functionality can be done without too much cost in performance, then it might be worth pursuing. The next cost/benefit thought is how much complexity this might add to testing and the code base for grok. |
+1 on this. Although we would need to measure performance when we coerce everything to a type (because user may never use it). This may actually be ok given the performance improvements we did in the grok library last month There are cases like Another common example is |
Indeed! I forget if I filed a ticket, but this (IP addresses) is related to an idea recently to allow Logstash to act as the dynamic-mapping feature for logstash indices to Elasticsearch (instead of using Elasticsearch's dynamic mapping) - or to improve dynamic mapping in ES to allow Logstash to use it most correctly for IP addresses (better template? something...) |
We could try to turn on automatic IP field detection in the default template. I don't know how much slowness that might introduce, but if minimal it would eliminate the need for IP-type coercion. |
@suyograo We set the apache response code to a |
@avleen Since the http response code is mapped key value pairs, e.g. "404" = "not found," it doesn't make sense to do mathematical operations on key names like this. Storing this as a Even so, I store this "number" as a string in my own setup. |
For Logstash 1.5.0, we've moved all plugins to individual repositories, so I have moved this issue to logstash-plugins/logstash-filter-grok#26. Let's continue the discussion there! :) |
Problem: Many users do things like
%{NUMBER:bytes}
in grok and then are confused why Elasticsearch fails to do statistics or other numeric aggregations on it. The cause is that Grok only does strings by default and Elasticsearch is sent a string and maps 'bytes' to a string - and this is confusing.I'm tired of users tripping over this problem. I would be willing to add a feature to grok that allowed you to define the 'type' of a pattern inside the pattern definition.
Background: In a grok patterns file, you can define a pattern with
NAME PATTERN
syntax (name of pattern, space, the regexp pattern).Proposal: Allow the type to accompany the NAME.
By way of example, if we were to fix this NUMBER problem permanently, we would define the new pattern like this:
The new syntax is
NAME:TYPE REGEXP
and is backwards-compatible with the old syntax (The:TYPE
is made optional and defaults to string if not provided).This would allow us to more reasonably define the patterns with their respective types such that this will be captured as a numeric type in Elasticsearch:
%{NUMBER:bytes}
It's not clear if this will solve everything, though, since in some cases like 'bytes' the value is never fractional, so users doing
%{NUMBER:bytes}
and seeing a float may be confused because they wanted to see a long type in Elasticsearch.Thoughts?
The text was updated successfully, but these errors were encountered: