Skip to content

Commit

Permalink
Make _str in tests correctly process unicode escapes.
Browse files Browse the repository at this point in the history
  • Loading branch information
opottone committed Feb 21, 2015
1 parent b1c7416 commit 174820b
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 2 deletions.
7 changes: 6 additions & 1 deletion src/lxml/tests/common_imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,14 @@ def make_doctest(filename):
doctests, {}, os.path.basename(filename), filename, 0))
else:
# Python 2
unichr_escape = re.compile(r'\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}')

from __builtin__ import unicode
def _str(s, encoding="UTF-8"):
return unicode(s, encoding=encoding)
s = unicode(s, encoding=encoding)
return unichr_escape.sub(lambda x:
x.group(0).decode('unicode-escape'),
s)
def _bytes(s, encoding="UTF-8"):
return s
from io import BytesIO
Expand Down
1 change: 0 additions & 1 deletion src/lxml/tests/test_etree.py
Original file line number Diff line number Diff line change
Expand Up @@ -3818,7 +3818,6 @@ def _writeElement(self, element, encoding='us-ascii', compression=0):
data = zlib.decompress(data)
return canonicalize(data)


class _XIncludeTestCase(HelperTestCase):
def test_xinclude_text(self):
filename = fileInTestDir('test_broken.xml')
Expand Down
6 changes: 6 additions & 0 deletions src/lxml/tests/test_unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@


class UnicodeTestCase(HelperTestCase):
def test__str(self):
# test the testing framework, namely _str from common_imports
self.assertEqual(_str('\x10'), _str('\u0010'))
self.assertEqual(_str('\x10'), _str('\U00000010'))
self.assertEqual(_str('\u1234'), _str('\U00001234'))

This comment has been minimized.

Copy link
@scoder

scoder Feb 21, 2015

what does _str('\\u1234') give with this change?

This comment has been minimized.

Copy link
@opottone

opottone Feb 21, 2015

Author Owner

Since '\u1234' == '\\u1234', _str('\u1234') == _str('\\u1234') == u'\u1234' (with python 2).

This comment has been minimized.

Copy link
@scoder

scoder Feb 21, 2015

Right - I guess it's acceptable to allow that behaviour in the test suite (assuming it's not currently used anywhere).

This comment has been minimized.

Copy link
@opottone

opottone Feb 21, 2015

Author Owner

Yes _str('\\u1234) == u'\u1234' (after the change) is a nasty pitfall, but then again, so is _str('\u1234) == u'u1234' (before the change), and it just is not possible to get both cases right.

Grepping through the code, I did not find a single _str('\\uxxxx'), but there were couple of lines (not many) with _str('\uxxxx').

This comment has been minimized.

Copy link
@scoder

scoder Feb 21, 2015

After looking through these cases, I think it doesn't hurt to make this change.


def test_unicode_xml(self):
tree = etree.XML('<p>%s</p>' % uni)
self.assertEqual(uni, tree.text)
Expand Down

0 comments on commit 174820b

Please sign in to comment.