Skip to content

Using unicode exchanges or routing keys results in confusing error #34

Closed
russss opened this Issue Feb 22, 2011 · 10 comments

5 participants

@russss
russss commented Feb 22, 2011
File "lib/python2.6/site-packages/pika/frames.py", line 68, in _marshal
    len(payload)) + payload + chr(spec.FRAME_END)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in  range(128)

The routing key was a unicode string even though it contained no unicode characters, but the result of the preceding struct.pack gets coerced to a unicode by the first concatenation operator, and then back to ascii by the second one.

I understand that exchanges can be UTF-8 and routing keys can't, so they should probably be encoded/asserted further up the stack.

@vladev
vladev commented Feb 23, 2011

The same confusing behavior is observed when vhost=u''.

@gmr
Pika member
gmr commented Apr 15, 2011

In my preliminary testing, I am not finding any problems with using unicode anywhere. I'm writing functional tests for this now and will close the issue. If you see any issues with 0.9.6 please update this ticket.

@gmr gmr pushed a commit that referenced this issue Apr 15, 2011
Gavin M. Roy Fix unicode support addressing issue #34.
 - data.py: Add unicode to the data type validation for shortstr.
 - frame.py: remove the str() in _marshal and update a docblock comment
fa19080
@gmr gmr closed this Apr 15, 2011
@hntrmrrs
hntrmrrs commented May 6, 2011

It appears this issue is not fixed, as demonstrated by the following gist:

https://gist.github.com/959237

I think that payload in the _marshal function should be encoded as a bytestring (possibly UTF-8?) before concatenation. You can see this demonstrated in the exception handler in my above gist.

@russss russss reopened this May 6, 2011
@gmr gmr was assigned May 6, 2011
@gmr
Pika member
gmr commented May 6, 2011

I have confirmed with your gist, will see what the difference is between your gist and the unittests, integrate that difference and go from there. It is worth noting that https://github.com/pika/pika/blob/master/tests/functional/unicode_tests.py runs using entirely unicode values.

Perhaps it's something in Blocking Connection. I'll update this ticket when I find it.

@gmr
Pika member
gmr commented May 6, 2011

Hmm in looking at the test code, the difference is the test is assigning it specifically as unicode vs not:

exchange = "أرنب"
repr(exchange)
"'\xd8\xa3\xd8\xb1\xd9\x86\xd8\xa8'"

exchange = u"أرنب"
repr(exchange)
"u'\u0623\u0631\u0646\u0628'"

I don't believe an explicit cast is the correct behavior in every case, it's probably better to catch the exception on the unicode error and then do the casting, from a "pythonic" behavior sense.

In Python 3 we could use a byte data type and it wouldn't matter, however since we are constrained in this version to support Python 2.4, I'll try and come up with something that supports those in the most idiomatic way possible.

@hntrmrrs
hntrmrrs commented May 7, 2011

Perhaps it would be useful to provide a configuration setting which allows the client to specify whether (and how) they want unicode instances encoded as bytestrings. The default could be 'utf-8' and rather than handling the exception just test if the pieces during marshalling are instances of unicode and encode them accordingly.

I think it might be too "late" to catch the UnicodeDecodeError, potentially really hurting the marshalling performance. That's just a gut feeling though; I haven't tested it.

@gmr
Pika member
gmr commented Jun 25, 2011

The issue comes down to the use of str vs unicode in the frame encoding and there are not many good ways to fix this in the core 0.9 tree. I have this fixed in the "2" tree which i'm figuring out next steps for. I might be able to back port the changes, but it's a fundamental change in everything from the codgen to the encode/decode parsers.

@hamilyon hamilyon added a commit to hhru/pika that referenced this issue Feb 15, 2012
Gavin M. Roy Fix unicode support addressing issue #34.
 - data.py: Add unicode to the data type validation for shortstr.
 - frame.py: remove the str() in _marshal and update a docblock comment
4b49891
@hownowstephen

I'm having an issue that I think is related to this

File "/Users/stephen/devel/sweetiq/env/src/pika/pika/frame.py", line 36, in _marshal
payload = ''.join(pieces)
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 4: ordinal not in range(128)

Using the 0.9.5 release - is this still an outstanding issue, and if so, what can I do to fix it?

@gmr
Pika member
gmr commented Oct 2, 2012

Sorry, this is still an issue, working on an appropriate fix.

@gmr gmr pushed a commit that referenced this issue Oct 2, 2012
Gavin M. Roy Address the unicode encoding/decoding problems detailed in issue #34
- Remove the deprecated and soon to be a problem immediate flag
- Update how to get to the amqp_codegen module from rabbitmq-public-umbrella/rabbitmq-codegen
- Remove python 2.4 support
d15d22c
@gmr
Pika member
gmr commented Oct 2, 2012

This is now correctly addressed in master, sorry for the delay.

@gmr gmr closed this Oct 2, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.