-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reject unicode in zlib #49007
Comments
Python 2.x allows to encode any byte string (str) and ASCII unicode $ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> import zlib
>>> zlib.compress('abc')
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(u'abc')
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(u'abc\xe9')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ... I'm not sure that this behaviour was really wanted become the $ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> import zlib
>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'")
'abc' Python 3.0 accepts any string: bytes or characters. But decompress $ ./python
Python 3.1a0 (py3k:67926M, Dec 26 2008, 23:59:07)
>>> import zlib
>>> zlib.compress(b'abc')
b"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress('abc')
b"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress('abc\xe9')
b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93'
>>> zlib.compress('abc\xe9'.encode('utf-8'))
b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93'
>>> zlib.decompress(b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93')
b'abc\xc3\xa9' The most strange operation is the decompression of an unicode string: $ ./python
>>> zlib.decompress('x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93')
...
zlib.error: Error -3 while decompressing data: incorrect header check I propose to change zlib API to reject unicode string and use explicit
Note: binascii.crc32() already rejects unicode string. The behaviour may kept in Python 3.0.x and only changed in Python 3.1. |
See also issue bpo-4738 (better threads support in zlib). |
On 2008-12-27 13:58, STINNER Victor wrote:
> Python 2.x allows to encode any byte string (str) and ASCII unicode
> string (unicode):
>
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.compress('abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc\xe9')
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ...
>
> I'm not sure that this behaviour was really wanted become the
> decompress operation is not symetric (the result type is always byte
> string):
>
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'")
> 'abc'
> I don't see a problem with this. The fact that Python 2.x also zlib itself doesn't care about whether the data to be encoded In Python 3.x, it's probably better to use bytes throughout the |
I don't think Python 2.x should be changed - but 3.0 or 3.1 should be:
|
I hate this behaviour. It doesn't help migration, it's the opposite! Sometimes
I propose to reject unicode in Python 3.x and display a warning for Python |
The current behaviour may help the majority by ignorance and cause weird Modules that only operate on Bytes should reject Unicode-objects in Also see bpo-4821 and bpo-4818 where unicode already got rejected by the |
On 2009-01-04 23:51, STINNER Victor wrote:
Well, that's your opinion. The feature was added to get people At the time the Python community was a lot smaller and there wasn't See the introduction in PEP-100 for the motivation behind the design http://www.python.org/dev/peps/pep-0100/
Fair enough. |
The patch for Python 3.x is already attached to this issue. We might only |
Definitely, zlib.compress should raise a TypeError (like bz2 does). >>> import bz2, zlib
>>> bz2.compress('abc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument 1 must be bytes or buffer, not str
>>> zlib.compress('abc')
b"x\x9cKLJ\x06\x00\x02M\x01'" Someone can review the patch and merge it? |
The patch lacks a test that TypeError is raised on unicode input, |
Patch from haypo updated for r76830 . Additional tests for TypeError, and to check that bytearray objects are |
The patch produces a number of errors in test_tarfile, test_distutils, |
Fixed. And some "bytearray" tests improved in test_zlib. |
The patch was committed to py3k and 3.1. Thank you! |
r76836 and r76838 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: