Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failure to process "reserved" chars in regular expressions #4510

Open
michael-markevich opened this issue May 7, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@michael-markevich
Copy link

Describe the bug
Similar to #3514, the regex parser fails on the example from documentation: https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md#reference-table.

To Reproduce
Steps to reproduce the behavior:

  1. Create a pipeline with the following configuration
log-pipeline:
  source:
    http:
      ssl: false

  processor:
    - parse_json:
        source: message
        parse_when: '/message=~"^\w*$"' # Fails
        # parse_when: '/message=~"^\w*\ $"' # Also fails
        # parse_when: '/message =~ "^(\\{.*\\}|\\[.*\\])$"' # Also fails

  sink:
    - opensearch:
        hosts: [ 'https://opensearch:9200' ]
        insecure: true

  1. Send in a log message (any).
  2. See the error log:
2024-05-07T12:07:25,039 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@25c210a1]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
	at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
	at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
	... 12 more
Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (5)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.InputMismatchException: null
    at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:07:25,042 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@2c0f1eaf]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
  1. If you escape the dollar sign, you still get an error:
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,189 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@686b21ea]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
	at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
	at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
	... 12 more
Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (6)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.InputMismatchException: null
    at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,191 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@1c5deeb4]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
  1. Parsing also fails when checking if message is a JSON string or array with the following regex:

parse_when: '/message =~ "^(\{.\}|\[.\])$"'

Expected behavior
Regex should be parsed correctly.

Environment (please complete the following information):

  • Data Prepper 2.7.

Additional context
Add any other context about the problem here.

@michael-markevich michael-markevich added bug Something isn't working untriaged labels May 7, 2024
@kkondaka
Copy link
Collaborator

kkondaka commented May 7, 2024

Looks like a bug in expression grammar.

@kkondaka kkondaka removed the untriaged label May 7, 2024
@michael-markevich
Copy link
Author

Additional notes to the case:

  1. If I add a regular expression without { } $ (or some other special characters), it works perfectly fine. Our example was tested on a different parser and works there. As mentioned above, even the example from your documentation ("^\w*$") fails the test because of the dollar sign.
  2. This use case is quite important for us, because it helps to distinguish log messages with JSON structure from any other (syslog) messages, avoid parser errors and improve overall performance. Also, such behaviour is a standard feature in Graylog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

2 participants