Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom regexp engine parses isolated options differently from Oniguruma #2354

Open
Thom1729 opened this issue Jun 11, 2018 · 6 comments
Open

Comments

@Thom1729
Copy link

Thom1729 commented Jun 11, 2018

Expected behavior

When Sublime's custom regexp engine handles a regexp, it should behave identically to Oniguruma.

Actual behavior

Oniguruma has a quirk when parsing isolated options (e.g. (?i)) that Sublime does not replicate. When Oniguruma encounters isolated options, the remainder of the enclosing group (or of the expression, if there is no enclosing group) is implicitly grouped. For instance, the following expressions are equivalent:

x(?i)y|z
x(?i:y|z)

The documentation is less than clear, and this behavior is unintuitive, but it is consistent. I suppose that option groups are parsed with the same precedence as the | operator.

Sublime's custom regexp engine, however, will interpret that expression differently, so that the following are equivalent:

x(?i)y|z
(?:x(?i)y)|z

As a result, the same construct may be interpreted differently depending on whether the expression triggers the Oniguruma engine or uses the native Sublime engine. This is confusing. In addition, this is an obstacle to third-party implementations and other tools.

Sample syntax

%YAML 1.2
---
name: Test Option Parsing
scope: source.test-option-parsing
contexts:
  main:
    - match: a(?i)b|c
      scope: region.redish

    - match: (?:d(?i)e)|f
      scope: region.redish

    # Force Oniguruma
    - match: u(?i)v|w(?<!0)
      scope: region.bluish

    - match: x(?i:y|z)(?<!0)
      scope: region.bluish

Sample input

ab
ac
c

de
df
f

uv
uw
w

xy
xz
z

Notes

The core HTML syntax inadvertently relies upon this bug. I will submit a PR to correct that.

A suggested best practice to avoid this issue is to avoid isolated options, except at the very beginning of an expression (and never in variables). Instead, use noncapturing groups with flags. For example, instead of a(?i)b, use a(?i:b).

@Thom1729 Thom1729 changed the title Custom regexp engine parses options differently from Oniguruma Custom regexp engine parses isolated options differently from Oniguruma Jun 11, 2018
@FichteFoll
Copy link
Collaborator

Is it certain that Oniguruma didn't mean x(?i)y|z to become x[Yy]|[Zz]? The wording really isn't clear on that.

@Thom1729
Copy link
Author

By observation, it's grouped like x(?i)(?:y|z) = x(?i:y|z). I've tested this in Sublime (using (?<!0) to force Oniguruma) and in the highlighter I'm working on.

@FichteFoll
Copy link
Collaborator

FichteFoll commented Jun 11, 2018

I rather meant it in the way whether we know it's not a bug. Because it really does seem weird to parse it like that.

@Thom1729
Copy link
Author

I've opened an issue to verify.

It would be better for Sublime to replicate the bug than to differ from Oniguruma. However, if it is a bug, and it is fixed in Oniguruma, than that might be a good reason for Sublime to update its Oniguruma version.

@deathaxe
Copy link
Collaborator

I always felt like (?i) to express some kind of globally applied flag to everything following it. This is actually what https://stackoverflow.com/questions/15145659/what-do-i-and-i-in-regex-mean#15145701 says, too.

So it is not a bug of Oniguruma.

@FichteFoll
Copy link
Collaborator

Since I just went through the referenced issue, the intended solution for Oniguruma is to interpret x(?i)y|z as x(?i)(?:y|z).

See also this test case: kkos/oniguruma@0b7a1b9#diff-f1faa5ae6ee6c139773f8424cadf6112R398

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants