Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.sub behaves inconsistent between versions with * repetition qualifier #83868

Closed
slomo mannequin opened this issue Feb 19, 2020 · 2 comments
Closed

re.sub behaves inconsistent between versions with * repetition qualifier #83868

slomo mannequin opened this issue Feb 19, 2020 · 2 comments
Labels
3.8 only security fixes topic-regex

Comments

@slomo
Copy link
Mannequin

slomo mannequin commented Feb 19, 2020

BPO 39687
Nosy @ezio-melotti, @serhiy-storchaka, @slomo

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-02-19.13:24:52.061>
created_at = <Date 2020-02-19.11:40:24.374>
labels = ['expert-regex', 'invalid', '3.8']
title = 're.sub behaves inconsistent between versions with * repetition qualifier'
updated_at = <Date 2020-02-19.13:24:52.057>
user = 'https://github.com/slomo'

bugs.python.org fields:

activity = <Date 2020-02-19.13:24:52.057>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2020-02-19.13:24:52.061>
closer = 'serhiy.storchaka'
components = ['Regular Expressions']
creation = <Date 2020-02-19.11:40:24.374>
creator = 'slomo'
dependencies = []
files = []
hgrepos = []
issue_num = 39687
keywords = []
message_count = 2.0
messages = ['362264', '362271']
nosy_count = 4.0
nosy_names = ['ezio.melotti', 'mrabarnett', 'serhiy.storchaka', 'slomo']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue39687'
versions = ['Python 3.8']

@slomo
Copy link
Mannequin Author

slomo mannequin commented Feb 19, 2020

On different platforms and versions the following expression has different results:

python -c 'import re; print(re.compile("(.*)", 0).sub("a\\1", "bc"))'

As far is I observed:

Linux/Python 3.6.9 => abc
MacOS/Python 3.7.1 => abca
Repl.it/Python 3.8.1 => abca
MacOS/Python 2.7.17 => abc
Linux/Python 2.7.17 => abc

According the the documentation I would guess that "abc" is the correct return value.

The issues also occurs without compiling or capture group:

re.sub(".*", "a", "cb") a vs aa

@slomo slomo mannequin added 3.8 only security fixes topic-regex labels Feb 19, 2020
@serhiy-storchaka
Copy link
Member

It is correct and documented behavior. ".*" matches two substrings: the whole string "bc" and an empty string at the end of the string.

See https://docs.python.org/3/library/re.html#re.sub and https://docs.python.org/3/whatsnew/3.7.html#changes-in-the-python-api.

The behavior before 3.7 was incorrect.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes topic-regex
Projects
None yet
Development

No branches or pull requests

1 participant