Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keys in bencode 3 are bytes, where in bencode 2 they were strings? #16

Closed
joachimmetz opened this issue Jun 8, 2020 · 6 comments
Closed

Comments

@joachimmetz
Copy link

Keys in bencode 3 are bytes, wheres in bencode 2 they were strings? Is this the intended behavior?

Bencode 2:

import bencode
with open('test_data/bencode_transmission', 'rb') as f: 
  x = bencode.bread(f)

print(x)
OrderedDict([('activity-date', 1383935064), ('added-date', 1383924680), ...

Bencode 3:

import bencode
with open('test_data/bencode_transmission', 'rb') as f: 
  x = bencode.bread(f)

print(x)
OrderedDict([(b'activity-date', 1383935064), (b'added-date', 1383924680), ...
@joachimmetz joachimmetz changed the title Keys in bencode 3 are bytes, wheres in bencode 2 they were strings? Keys in bencode 3 are bytes, where in bencode 2 they were strings? Jun 8, 2020
@sximba
Copy link

sximba commented Jun 11, 2020

@joachimmetz Yes, it seems like this is the intended behaviour. According to the bencode spec this is the correct behaviour. However, like you, I have come to rely on this library doing the conversion from bytes to strings. I was hoping the API would be changed to something like:

import bencode
with open('test_data/bencode_transmission', 'rb') as f: 
  x = bencode.bread(f, try_decode=True, encoding='utf-8')

This will allow people who want the previous behaviour to opt-in to it.

@joachimmetz
Copy link
Author

@sximba thx for the context, though for me file/data format specification and Python API are 2 separate things. I understand that the file/data format defines byte streams to store textual strings, and that it appears that over time UTF-8 seems to have become the standard encoding for this 1, 2.

However it is a bit surprising to me to, see the Python API change and seeing the Python 2 to 3 bytes to str changes, I'm left to wonder if this impactful API change is intentional or not. However this commit hints that this is intentional.

A request to the project maintainers to reflect this in https://github.com/fuzeman/bencode.py#usage

@fuzeman
Copy link
Owner

fuzeman commented Jun 11, 2020

I'll add a flag to re-enable string decoding tomorrow (probably just an encoding option that defaults to None).

Should bytes be returned from decode_string if there is a decoding error (as it did in v2.1.0)? can we start throwing UnicodeDecodeError or do we need another flag to fallback to bytes if decoding fails?


Moving forward pinning to a major version would be a good idea to avoid unexpected changes to the API. No breaking changes will be released in minor or patch releases.

@joachimmetz
Copy link
Author

joachimmetz commented Jun 11, 2020

I'll add a flag to re-enable string decoding tomorrow (probably just an encoding option that defaults to None).

thx much appreciated

Moving forward pinning to a major version would be a good idea to avoid unexpected changes to the API.

Noworries, this only affected tests for us. That's why the tests are there in the first place to catch changes. For production purposes we "pin" to specific versions.

fuzeman added a commit that referenced this issue Jun 12, 2020
The library now includes a backwards-compatible `bencode` package, with the updated interface
now available on the `bencodepy` package.

#16
@fuzeman
Copy link
Owner

fuzeman commented Jun 12, 2020

I've released v4.0.0 which now includes a bencode package that should be backwards-compatible with the API in v2.1.0.

The "Improved" API has been moved to bencodepy, this has also been updated to include a Bencode class that can be configured to enable string decoding with encoding='utf-8'.


Noworries, this only affected tests for us. That's why the tests are there in the first place to catch changes. For production purposes we "pin" to specific versions.

Great to hear. I was a bit concerned as requirements.txt in the referenced repository didn't have it pinned.

@joachimmetz
Copy link
Author

The "Improved" API has been moved to bencodepy, this has also been updated to include a Bencode class that can be configured to enable string decoding with encoding='utf-8'.

thx, much appreciated. I'll have a look shortly.

IMHO you can consider this issue closed

@fuzeman fuzeman closed this as completed Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants