Skip to content

Commit

Permalink
Fixed encoding for small Unicode strings
Browse files Browse the repository at this point in the history
  • Loading branch information
Dmitry Vasiliev committed Jan 8, 2010
1 parent f34b044 commit d068e64
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 5 deletions.
2 changes: 1 addition & 1 deletion CHANGES
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Version 0.4 (YYYY-MM-DD)


- Fixed encoding for small Unicode strings with characters in range 128-255

Version 0.3 (2010-01-03)

Expand Down
2 changes: 2 additions & 0 deletions TODO
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
- Optimize encoding for lists of bytes (integer in the range 0-255)

- Add new datatypes (dictionaries, bit integer etc.)

- Add support for term compression
Expand Down
9 changes: 5 additions & 4 deletions src/erlport/erlterms.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,11 +181,12 @@ def encode_term(term):
return "j"
length = len(term)
if length <= 65535:
for i in term:
if ord(i) > 255:
break
try:
term = term.encode("latin1")
except UnicodeEncodeError:
pass
else:
return pack(">BH", 107, length) + str(term)
return pack(">BH", 107, length) + term
return encode_term([ord(i) for i in term])
elif isinstance(term, Atom):
return pack(">BH", 100, len(term)) + term
Expand Down
4 changes: 4 additions & 0 deletions src/erlport/tests/erlterms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ Unicode strings:
'\x83j'
>>> encode(u"test")
'\x83k\x00\x04test'
>>> encode(u"\x00\xff")
'\x83k\x00\x02\x00\xff'
>>> encode(u"\u0100")
'\x83l\x00\x00\x00\x01b\x00\x00\x01\x00j'
>>> encode(unicode("тест", "utf-8"))
'\x83l\x00\x00\x00\x04b\x00\x00\x04Bb\x00\x00\x045b\x00\x00\x04Ab\x00\x00\x04Bj'
>>> encode(u"X" * 65536) # doctest: +ELLIPSIS
Expand Down

0 comments on commit d068e64

Please sign in to comment.