Keys in bencode 3 are bytes, where in bencode 2 they were strings? #16

joachimmetz · 2020-06-08T04:49:48Z

Keys in bencode 3 are bytes, wheres in bencode 2 they were strings? Is this the intended behavior?

Bencode 2:

import bencode
with open('test_data/bencode_transmission', 'rb') as f: 
  x = bencode.bread(f)

print(x)
OrderedDict([('activity-date', 1383935064), ('added-date', 1383924680), ...

Bencode 3:

import bencode
with open('test_data/bencode_transmission', 'rb') as f: 
  x = bencode.bread(f)

print(x)
OrderedDict([(b'activity-date', 1383935064), (b'added-date', 1383924680), ...

The text was updated successfully, but these errors were encountered:

sximba · 2020-06-11T00:31:20Z

@joachimmetz Yes, it seems like this is the intended behaviour. According to the bencode spec this is the correct behaviour. However, like you, I have come to rely on this library doing the conversion from bytes to strings. I was hoping the API would be changed to something like:

import bencode
with open('test_data/bencode_transmission', 'rb') as f: 
  x = bencode.bread(f, try_decode=True, encoding='utf-8')

This will allow people who want the previous behaviour to opt-in to it.

joachimmetz · 2020-06-11T03:25:54Z

@sximba thx for the context, though for me file/data format specification and Python API are 2 separate things. I understand that the file/data format defines byte streams to store textual strings, and that it appears that over time UTF-8 seems to have become the standard encoding for this 1, 2.

However it is a bit surprising to me to, see the Python API change and seeing the Python 2 to 3 bytes to str changes, I'm left to wonder if this impactful API change is intentional or not. However this commit hints that this is intentional.

A request to the project maintainers to reflect this in https://github.com/fuzeman/bencode.py#usage

fuzeman · 2020-06-11T05:36:35Z

I'll add a flag to re-enable string decoding tomorrow (probably just an encoding option that defaults to None).

Should bytes be returned from decode_string if there is a decoding error (as it did in v2.1.0)? can we start throwing UnicodeDecodeError or do we need another flag to fallback to bytes if decoding fails?

Moving forward pinning to a major version would be a good idea to avoid unexpected changes to the API. No breaking changes will be released in minor or patch releases.

joachimmetz · 2020-06-11T06:39:22Z

I'll add a flag to re-enable string decoding tomorrow (probably just an encoding option that defaults to None).

thx much appreciated

Moving forward pinning to a major version would be a good idea to avoid unexpected changes to the API.

Noworries, this only affected tests for us. That's why the tests are there in the first place to catch changes. For production purposes we "pin" to specific versions.

The library now includes a backwards-compatible `bencode` package, with the updated interface now available on the `bencodepy` package. #16

fuzeman · 2020-06-12T04:44:16Z

I've released v4.0.0 which now includes a bencode package that should be backwards-compatible with the API in v2.1.0.

The "Improved" API has been moved to bencodepy, this has also been updated to include a Bencode class that can be configured to enable string decoding with encoding='utf-8'.

Noworries, this only affected tests for us. That's why the tests are there in the first place to catch changes. For production purposes we "pin" to specific versions.

Great to hear. I was a bit concerned as requirements.txt in the referenced repository didn't have it pinned.

joachimmetz · 2020-06-12T04:51:10Z

The "Improved" API has been moved to bencodepy, this has also been updated to include a Bencode class that can be configured to enable string decoding with encoding='utf-8'.

thx, much appreciated. I'll have a look shortly.

IMHO you can consider this issue closed

joachimmetz changed the title ~~Keys in bencode 3 are bytes, wheres in bencode 2 they were strings?~~ Keys in bencode 3 are bytes, where in bencode 2 they were strings? Jun 8, 2020

joachimmetz mentioned this issue Jun 8, 2020

bencode.py 3.x release breaks tests log2timeline/plaso#2985

Closed

fuzeman added a commit that referenced this issue Jun 12, 2020

Implemented backwards-compatible bencode package

205e6ac

The library now includes a backwards-compatible `bencode` package, with the updated interface now available on the `bencodepy` package. #16

fuzeman closed this as completed Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keys in bencode 3 are bytes, where in bencode 2 they were strings? #16

Keys in bencode 3 are bytes, where in bencode 2 they were strings? #16

joachimmetz commented Jun 8, 2020

sximba commented Jun 11, 2020

joachimmetz commented Jun 11, 2020

fuzeman commented Jun 11, 2020

joachimmetz commented Jun 11, 2020 •

edited

Loading

fuzeman commented Jun 12, 2020

joachimmetz commented Jun 12, 2020

Keys in bencode 3 are bytes, where in bencode 2 they were strings? #16

Keys in bencode 3 are bytes, where in bencode 2 they were strings? #16

Comments

joachimmetz commented Jun 8, 2020

sximba commented Jun 11, 2020

joachimmetz commented Jun 11, 2020

fuzeman commented Jun 11, 2020

joachimmetz commented Jun 11, 2020 • edited Loading

fuzeman commented Jun 12, 2020

joachimmetz commented Jun 12, 2020

joachimmetz commented Jun 11, 2020 •

edited

Loading