Incorrect matches when using Unicode exponents like ⁿ #474

costika1234 · 2022-10-18T13:27:01Z

I've just found a bug when using the following pattern on the PCRE engine with global flags enabled:

[nⁿ][^x]

This is basically matching either 'n' or 'ⁿ' that is not followed by letter 'x'.

Thus, the following should be expected when running this Python snippet:

import re

print(re.findall('[nⁿ][^x]', 'ⁿa'))  # Displays ['ⁿa']
print(re.findall('[nⁿ][^x]', 'ⁿx'))  # Displays []

print(re.findall('[nⁿ][^x]', 'na'))  # Displays ['na']
print(re.findall('[nⁿ][^x]', 'nx'))  # Displays []

That being said, regexr.com/70d5f shows something different: it gives two matches for the string 'ⁿa', one match for 'ⁿx', both of which being incorrect. It does, however, give the expected output for both 'na' and 'nx'.

I suspect that those Unicode chars are not handled correctly for some reason. Maybe someone can have a more in-depth look at this issue.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect matches when using Unicode exponents like ⁿ #474

Incorrect matches when using Unicode exponents like ⁿ #474

costika1234 commented Oct 18, 2022

Incorrect matches when using Unicode exponents like ⁿ #474

Incorrect matches when using Unicode exponents like ⁿ #474

Comments

costika1234 commented Oct 18, 2022