Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sub[n] not working as expected. #37084

Closed
nicfit mannequin opened this issue Aug 24, 2002 · 2 comments
Closed

sub[n] not working as expected. #37084

nicfit mannequin opened this issue Aug 24, 2002 · 2 comments

Comments

@nicfit
Copy link
Mannequin

nicfit mannequin commented Aug 24, 2002

BPO 599757

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2002-08-27.16:22:15.000>
created_at = <Date 2002-08-24.22:14:57.000>
labels = ['expert-regex', 'invalid']
title = 'sub[n] not working as expected.'
updated_at = <Date 2002-08-27.16:22:15.000>
user = 'https://bugs.python.org/nicfit'

bugs.python.org fields:

activity = <Date 2002-08-27.16:22:15.000>
actor = 'nowonder'
assignee = 'effbot'
closed = True
closed_date = None
closer = None
components = ['Regular Expressions']
creation = <Date 2002-08-24.22:14:57.000>
creator = 'nicfit'
dependencies = []
files = []
hgrepos = []
issue_num = 599757
keywords = []
message_count = 2.0
messages = ['12158', '12159']
nosy_count = 3.0
nosy_names = ['effbot', 'nowonder', 'nicfit']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue599757'
versions = ['Python 2.2']

@nicfit
Copy link
Mannequin Author

nicfit mannequin commented Aug 24, 2002

I'm running into what looks to be a bug in the python
2.2 re module.
These examples should demonstrate the problem.

Using Python 1.5.2:
import re;
data =
"\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5"
data1 =
re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data);
print data1
'\377\340\323\323\344\225\377\000\000\021\377\365'

This output is exactly what I expect, but now see what
happens in
2.2.1:
import re;
data =
"\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5"
data1 =
re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data);
print data1
'\\xFF\xe0\xd3\xd3\xe4\x95\xff\x00\x00\x11\\xFF\xf5'

I like the hex output over the octal in 1.5, but the
substitution is
clearly wrong. Notice each spot containing "\\" in the
last result.

@nicfit nicfit mannequin closed this as completed Aug 24, 2002
@nicfit nicfit mannequin added the invalid label Aug 24, 2002
@nicfit nicfit mannequin assigned effbot Aug 24, 2002
@nicfit nicfit mannequin added the topic-regex label Aug 24, 2002
@nicfit nicfit mannequin closed this as completed Aug 24, 2002
@nicfit nicfit mannequin added the invalid label Aug 24, 2002
@nicfit nicfit mannequin assigned effbot Aug 24, 2002
@nicfit nicfit mannequin added the topic-regex label Aug 24, 2002
@nowonder
Copy link
Mannequin

nowonder mannequin commented Aug 27, 2002

Logged In: YES
user_id=14463

The substitution is correct. Notice that the r"..." raw
string given to sub in this example has length 6, not length
3! As you can see from the case, \\xFF is a string of length
4 and has no close relationship to the singleton string \xff.

If you use .sub("\xFF\\1", data) instead you will achieve
the desired result.

Note that the raw string passed to re.compile() also does
not contain the character \xff itself, but as described in
the documentation, re is able to parse the \xHH-style
character escapes.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0 participants