Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline python re modifiers not working #1698

Open
SteveBarnes-BH opened this issue Jan 11, 2022 · 6 comments
Open

Inline python re modifiers not working #1698

SteveBarnes-BH opened this issue Jan 11, 2022 · 6 comments

Comments

@SteveBarnes-BH
Copy link

Bug Description

The regex
#define\s+(?i:CONFIGXML_HEADER)
reports:

(? Incomplete group structure
) Incomplete group structure

However, it is a valid regular expression in python 3.9 and possibly others meaning that I need #define as case sensitive but CONFIGXML_HEADER case insensitive.

image

Reproduction steps

Paste the above regex into the regex field on the site.

Expected Outcome

Partially case sensitive regex.

Browser

Chrome

OS

Windows 10

@firasdib
Copy link
Owner

This is a longer standing issue, the website caters for Python 2.7, which is very outdated at this point. I will have to rework it completely to support Python 3+ ASAP.

@SteveBarnes-BH
Copy link
Author

In this context it is probably worth mentioning that all official support for Python 2.x ended 01/01/2020.

@firasdib
Copy link
Owner

@SteveBarnes-BH Is there a writeup somewhere outlining the regex differences between 2.7 and 3.x?

@SteveBarnes-BH
Copy link
Author

https://docs.python.org/3/library/re.html has:

(?aiLmsux-imsx:...)
(Zero or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x', optionally followed by '-' followed by one or more letters from the 'i', 'm', 's', 'x'.) The letters set or remove the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

The letters 'a', 'L' and 'u' are mutually exclusive when used as inline flags, so they can’t be combined or follow '-'. Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns (?a:...) switches to ASCII-only matching, and (?u:...) switches to Unicode matching (default). In byte pattern (?L:...) switches to locale depending matching, and (?a:...) switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group.

New in version 3.6.

Changed in version 3.7: The letters 'a', 'L' and 'u' also can be used in a group.

The How To is a useful resource as well.

There is also the significant difference that you can have string, (i.e. Unicode), or byte regular expressions and also targets and that the 2 don't mix, i.e. re.findall("Fred", b"Fred") will cause a error, (TypeError: cannot use a string pattern on a bytes-like object), but I would suggest this is probably best just being a comment on your site rather than trying to deal with it.

@thesuperzapper
Copy link

@firasdib have you managed to make any progress on this issue?

Either way, the name of the "Python" flavor should probably be "Python 2.7", to make sure users understand that Python 3 syntax is not supported.

@weallcock
Copy link

weallcock commented Nov 20, 2022

I would agree with the statement that you should make it very clear that this is python 2.7.

I removed the rest of this because the issue was that I was not using raw strings and the \b was being interpreted as the backspace rather than the word boundary escape sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@firasdib @weallcock @thesuperzapper @SteveBarnes-BH and others