Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"\W" pattern with re.ASCII flag is not equivalent to "[^a-zA-Z0-9_]" #89621

Closed
owentrigueros mannequin opened this issue Oct 13, 2021 · 2 comments
Closed

"\W" pattern with re.ASCII flag is not equivalent to "[^a-zA-Z0-9_]" #89621

owentrigueros mannequin opened this issue Oct 13, 2021 · 2 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes topic-regex type-bug An unexpected behavior, bug, or error

Comments

@owentrigueros
Copy link
Mannequin

owentrigueros mannequin commented Oct 13, 2021

BPO 45458
Nosy @ezio-melotti, @serhiy-storchaka, @owentrigueros

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-10-13.09:28:14.946>
created_at = <Date 2021-10-13.09:10:37.218>
labels = ['expert-regex', 'type-bug', '3.8', '3.9', '3.10', '3.7', 'invalid']
title = '"\\W" pattern with re.ASCII flag is not equivalent to "[^a-zA-Z0-9_]"'
updated_at = <Date 2021-10-13.09:28:14.945>
user = 'https://github.com/owentrigueros'

bugs.python.org fields:

activity = <Date 2021-10-13.09:28:14.945>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2021-10-13.09:28:14.946>
closer = 'serhiy.storchaka'
components = ['Regular Expressions']
creation = <Date 2021-10-13.09:10:37.218>
creator = 'owentrigueros'
dependencies = []
files = []
hgrepos = []
issue_num = 45458
keywords = []
message_count = 2.0
messages = ['403810', '403811']
nosy_count = 4.0
nosy_names = ['ezio.melotti', 'mrabarnett', 'serhiy.storchaka', 'owentrigueros']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue45458'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

@owentrigueros
Copy link
Mannequin Author

owentrigueros mannequin commented Oct 13, 2021

"\W" regex pattern, when used with re.ASCII, is expected to have the same behavior as "[^a-zA-Z0-9_]" (see [1]).

For example, the following sub() call

>>> re.sub('\W', '', '½ a', re.ASCII)
'½a'

should return the same as this one:

>>> re.sub('[^a-zA-Z0-9_]', '', '½ a', re.ASCII)
'a'

But it does not.

[1] https://docs.python.org/3/library/re.html#regular-expression-syntax

@owentrigueros owentrigueros mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes topic-regex type-bug An unexpected behavior, bug, or error labels Oct 13, 2021
@serhiy-storchaka
Copy link
Member

It works as expected:

>>> re.sub(r'\W', '', '½ a', 0, re.ASCII)
'a'

You just passed re.ASCII as the count argument, not as the flags argument.

>>> help(re.sub)
Help on function sub in module re:
sub(pattern, repl, string, count=0, flags=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the Match object and must return
    a replacement string to be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes topic-regex type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant