force bytes #190

Closed
wbolster opened this Issue Mar 20, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@wbolster
Contributor

wbolster commented Mar 20, 2016

is it possible to force an api to return bytes instead of the messy mix of str (unicode in py2) and bytes, depending on whether the value happened to be valid utf-8?

apache hbase has a bytes-only data model, and happybase, the python library for it, uses thrift to communicate with hbase. however, this results in mixed str/bytes responses:

>>> c = happybase.Connection()
>>> t = c.table('mytable')
>>> t.put(b'\x01\x02', {'cf2:foo\xee\xee\xee': '\xff\xff\xff'}
>>> t.put('foo', {'cf1:bar': u'baz'})
>>> list(t.scan())
[(u'\x01\x02', {'cf2:foo\xee\xee\xee': '\xff\xff\xff'}),
 (u'foo', {u'cf1:bar': u'baz'})]

(this example code was run with python2)

@wbolster wbolster referenced this issue in wbolster/happybase Mar 20, 2016

Merged

Port to thriftpy #114

@lxyu lxyu closed this in 00c6553 Mar 24, 2016

@lxyu

This comment has been minimized.

Show comment
Hide comment
@lxyu

lxyu Mar 24, 2016

Contributor

You can now add decode_response=False to protocol factory to force return bytes.

Contributor

lxyu commented Mar 24, 2016

You can now add decode_response=False to protocol factory to force return bytes.

@wbolster

This comment has been minimized.

Show comment
Hide comment
@wbolster

wbolster Mar 24, 2016

Contributor

awesome, thanks.

Contributor

wbolster commented Mar 24, 2016

awesome, thanks.

@wbolster wbolster referenced this issue in wbolster/happybase Apr 29, 2016

Closed

How to disable thriftpy's auto decoding? #122

dan-blanchard added a commit to dan-blanchard/thriftpy that referenced this issue May 6, 2016

jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016

Treat binary and string fields differently
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. (Similarly the actual non-test code - I'm not
entirely comfortable with the redundancy this commit introduces but
there's quite a bit of redundancy in the code already so improving this
is likely a whole separate task).

[1] eleme@00c6553
[2] eleme#190

jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016

Treat binary and string fields differently
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. (Similarly the actual non-test code - I'm not
entirely comfortable with the redundancy this commit introduces but
there's quite a bit of redundancy in the code already so improving this
is likely a whole separate task).

[1] eleme@00c6553
[2] eleme#190

jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016

Treat binary and string fields differently
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. (Similarly the actual non-test code - I'm not
entirely comfortable with the redundancy this commit introduces but
there's quite a bit of redundancy in the code already so improving this
is likely a whole separate task. Also extracting the logic to a separate
function would negatively impact the performance /possibly not
important/ but it could also make the code less readable, I'm conflicted
here).

[1] eleme@00c6553
[2] eleme#190

jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016

Treat binary and string fields differently
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

[1] eleme@00c6553
[2] eleme#190

jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 17, 2016

Treat binaries and strings differently in binary/compact protocols
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

Internally the goal of this change is achieved by keeping the ttype of
binary fields as STRING but putting a non-None value in their
"container spec" which introduces a bit of inconsistency (string fields
are TType.STRING but binary fields are (TType.STRING, BINARY)) but
I believe we can live with it.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

[1] eleme@00c6553
[2] eleme#190

jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 17, 2016

Improve binary and compact binary handling
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

Internally the goal of this change is achieved by keeping the ttype of
binary fields as STRING but putting a non-None value in their
"container spec" which introduces a bit of inconsistency (string fields
are TType.STRING but binary fields are (TType.STRING, BINARY)) but
I believe we can live with it.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

Casting of values in the parser is fixed as a bonus, the following code
never worked because TType.STRING and TType.BINARY had the same value:

     if t == TType.STRING:
         return _cast_string
     if t == TType.BINARY:
         return _cast_binary

[1] eleme@00c6553
[2] eleme#190

@pyup-bot pyup-bot referenced this issue in scieloorg/opac_proc Aug 18, 2016

Closed

Initial Update #10

@pyup-bot pyup-bot referenced this issue in scieloorg/scielo-manager Aug 26, 2016

Closed

Pin thriftpy to latest version 0.3.9 #1314

jstasiak added a commit to jstasiak/thriftpy that referenced this issue Sep 25, 2016

Improve binary and compact binary handling
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

Internally the goal of this change is achieved by keeping the ttype of
binary fields as STRING but putting a non-None value in their
"container spec" which introduces a bit of inconsistency (string fields
are TType.STRING but binary fields are (TType.STRING, BINARY)) but
I believe we can live with it.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

Casting of values in the parser is fixed as a bonus, the following code
never worked because TType.STRING and TType.BINARY had the same value:

     if t == TType.STRING:
         return _cast_string
     if t == TType.BINARY:
         return _cast_binary

[1] eleme@00c6553
[2] eleme#190
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment