Skip to content

fluent-bit's regex references (rubular.com) differ from fluent-bit's own regex behavior #11180

@hlein

Description

@hlein

Bug Report

Describe the bug

Regexes used in fluent-bit often point at https://rubular.com/ for a set of example messages to document their behavior and allow troubleshooting. A few use(d) regex101.com, but the majority currently are rubular, and that's even recommended in the new-issue template on GitHub.

But... they do not behave the same.

This is not a bug in fluent-bit, and not really a bug in https://rubular.com/ either. But it unfortunately makes that site an inadequate reference for documenting and testing fluent-bit regexes.

To Reproduce

  • Make a regular expression that uses duplicate named groups, such as:

^foo (?:bar=(?<bar>\d+) yada|baz bar=(?<bar>[a-z]+))

  • Make test-cases:
$ cat parsers_test.duplicate_subpattern.test
foo bar=2 yada
foo baz bar=ab
  • Run that with fluent-bit:
$ cat parsers_test.yaml
parsers:
  - name: duplicate_subpattern
    format: regex
    regex: '^foo (?:bar=(?<bar>\d+) yada|baz bar=(?<bar>[a-z]+))'

$ cat parsers_test.duplicate_subpattern.test | fluent-bit -q -R parsers_test.yaml -i stdin -p parser=duplicate_subpattern -o stdout -p format=json_lines
{"date":1763527096.159587,"bar":"2"}
{"date":1763527096.159635,"bar":"ab"}
  • There the two "different" bar groups get filled in appropriately.

  • Now try that in rubular.com; both input lines will match, but the group assignments will be incorrect: https://rubular.com/r/0IE3g0BZZR18SZ

Match groups:
        
Match 1
--
bar  2
2.   

bar   
2.  ab

Apparently this has to do with rubular.com using scan while fluent-bit uses match.

Here is the non-reduced case where I first discovered rubular's odd behavior: https://rubular.com/r/4PPPebSZjvyimL

Options

I've seen one or two other mentions of rubular's use of scan being a problem, but no solutions other than "don't use it".

There are other regex testing websites. I haven't yet found any others that advertise Onigmo regex engine.

https://regex101.com/ offers lots of flavors/libraries. None of those flavors is explicitly Onigmo nor Ruby. Of the flavors it offers, .NET, Golang, and ECMAScript pass this specific test. But there might be other feature-incompatibilities that come up later.

In more extensive tests (with a >100 line regex, although it doesn't do anything too exotic: https://regex101.com/r/YH1t6w/1), .NET-compatible behaved exactly the same as fluent-bit's Onigmo implementation. (And Golang and ECMAScript fail for other reasons.)

So, I would be inclined to switch from rubular to regex101.com-.NET-mode as the reference test-case - maybe no need to replace existing working tests, but for new tests going forward. And maybe the GH Issue template ought to change.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions