Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grok Processor should complain about patterns it doesn't understand #22831

Closed
bleskes opened this issue Jan 27, 2017 · 6 comments

Comments

@bleskes
Copy link
Member

commented Jan 27, 2017

The following command (note the UNKNOWN pattern name):

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "myfield",
          "patterns": [
            """{%UKNOWN:field}"""
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "myfield": "50-59"
      }
    }
  ]
}

results in the following error:

 "reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [50-59]",
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [50-59]",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Provided Grok expressions do not match field value: [50-59]"
          }

Note that the actual failure is nowhere to be found. It also seems it gets all the way to evaluating the field values. IMO it should fail early and hard.

PS. This is on 5.1

@talevy

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2017

as of now, I believe if it doesn't match the correct format, it assumes it is just plain-old regex. what if you wanted to match {%UKNOWN:field}, literally?

if it were %{UKNOWN:field}, then it would probably fail and say UKNOWN is not found. but that as is doesn't match the %{PATTERN_NAME:capture_field} naming convention

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2017

@talevy sorry, I'm not following what you say. Also for my surprise %{\d+:field} didn't seem to work, with no failure messages either.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jan 31, 2017

I think @talevy is saying that the correct format for grok patterns is %{\w+:\w+}, so {%UNKNOWN:field} doesn't match that pattern and is just treated as an ordinary regex. Similarly %{\d+:field} doesn't match the pattern either.

@talevy

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2017

@clintongormley, thanks for the clarification. That is exactly what I meant

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2017

thx for clarifying @clintongormley . Sadly that was a typo on my end when I made the issue. %{UKNOWN:field} gives exactly the same behavior.

re %{\d+:field} - I expected %{.*} to always be matched and it's content analyzed. In that sense I expected grok patterns to be a templating strings for regular expressions. Reading the logstash docs, I see it is not and people can used named matches (or custom patterns) for things not in our list. That said, I think my issue about named patterns still holds.

@talevy

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2017

@bleskes makes sense, I'll make it do a better job during parsing

talevy added a commit to talevy/elasticsearch that referenced this issue Jun 5, 2017
@talevy talevy closed this in #25063 Jun 6, 2017
talevy added a commit that referenced this issue Jun 6, 2017
…25063)

Unknown patterns used to silently be ignored. This was a problem because users did not know they were providing an invalid pattern name, and maybe thought the rest of their regexes were invalid.

Fixes #22831.
talevy added a commit that referenced this issue Jun 6, 2017
…25063)

Unknown patterns used to silently be ignored. This was a problem because users did not know they were providing an invalid pattern name, and maybe thought the rest of their regexes were invalid.

Fixes #22831.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.