Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect matches when using Unicode exponents like ⁿ #474

Open
costika1234 opened this issue Oct 18, 2022 · 0 comments
Open

Incorrect matches when using Unicode exponents like ⁿ #474

costika1234 opened this issue Oct 18, 2022 · 0 comments

Comments

@costika1234
Copy link

I've just found a bug when using the following pattern on the PCRE engine with global flags enabled:

[nⁿ][^x]

This is basically matching either 'n' or 'ⁿ' that is not followed by letter 'x'.

Thus, the following should be expected when running this Python snippet:

import re

print(re.findall('[nⁿ][^x]', 'ⁿa'))  # Displays ['ⁿa']
print(re.findall('[nⁿ][^x]', 'ⁿx'))  # Displays []

print(re.findall('[nⁿ][^x]', 'na'))  # Displays ['na']
print(re.findall('[nⁿ][^x]', 'nx'))  # Displays []

That being said, regexr.com/70d5f shows something different: it gives two matches for the string 'ⁿa', one match for 'ⁿx', both of which being incorrect. It does, however, give the expected output for both 'na' and 'nx'.

I suspect that those Unicode chars are not handled correctly for some reason. Maybe someone can have a more in-depth look at this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant