sub[n] not working as expected. #37084

nicfit · 2002-08-24T22:14:57Z

BPO	599757

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2002-08-27.16:22:15.000>
created_at = <Date 2002-08-24.22:14:57.000>
labels = ['expert-regex', 'invalid']
title = 'sub[n] not working as expected.'
updated_at = <Date 2002-08-27.16:22:15.000>
user = 'https://bugs.python.org/nicfit'

bugs.python.org fields:

activity = <Date 2002-08-27.16:22:15.000>
actor = 'nowonder'
assignee = 'effbot'
closed = True
closed_date = None
closer = None
components = ['Regular Expressions']
creation = <Date 2002-08-24.22:14:57.000>
creator = 'nicfit'
dependencies = []
files = []
hgrepos = []
issue_num = 599757
keywords = []
message_count = 2.0
messages = ['12158', '12159']
nosy_count = 3.0
nosy_names = ['effbot', 'nowonder', 'nicfit']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue599757'
versions = ['Python 2.2']

nicfit · 2002-08-24T22:14:57Z

I'm running into what looks to be a bug in the python
2.2 re module.
These examples should demonstrate the problem.

Using Python 1.5.2:
import re;
data =
"\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5"
data1 =
re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data);
print data1
'\377\340\323\323\344\225\377\000\000\021\377\365'

This output is exactly what I expect, but now see what
happens in
2.2.1:
import re;
data =
"\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5"
data1 =
re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data);
print data1
'\\xFF\xe0\xd3\xd3\xe4\x95\xff\x00\x00\x11\\xFF\xf5'

I like the hex output over the octal in 1.5, but the
substitution is
clearly wrong. Notice each spot containing "\\" in the
last result.

nowonder · 2002-08-27T16:22:15Z

Logged In: YES
user_id=14463

The substitution is correct. Notice that the r"..." raw
string given to sub in this example has length 6, not length
3! As you can see from the case, \\xFF is a string of length
4 and has no close relationship to the singleton string \xff.

If you use .sub("\xFF\\1", data) instead you will achieve
the desired result.

Note that the raw string passed to re.compile() also does
not contain the character \xff itself, but as described in
the documentation, re is able to parse the \xHH-style
character escapes.

nicfit mannequin closed this as completed Aug 24, 2002

nicfit mannequin added the invalid label Aug 24, 2002

nicfit mannequin assigned effbot Aug 24, 2002

nicfit mannequin added the topic-regex label Aug 24, 2002

nicfit mannequin closed this as completed Aug 24, 2002

nicfit mannequin added the invalid label Aug 24, 2002

nicfit mannequin assigned effbot Aug 24, 2002

nicfit mannequin added the topic-regex label Aug 24, 2002

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sub[n] not working as expected. #37084

sub[n] not working as expected. #37084

nicfit mannequin commented Aug 24, 2002

nicfit mannequin commented Aug 24, 2002

nowonder mannequin commented Aug 27, 2002

sub[n] not working as expected. #37084

sub[n] not working as expected. #37084

Comments

nicfit mannequin commented Aug 24, 2002

nicfit mannequin commented Aug 24, 2002

nowonder mannequin commented Aug 27, 2002