Skip to content

Conversation

dargueta
Copy link
Contributor

@dargueta dargueta commented Mar 2, 2019

TL;DR: Allow using Base85 text encoding on Python 3.

Details

This doesn't remove base64 encoding support for Python 3; rather, it makes base85 the default but still allows overriding the default to only use base64 for backwards compatibility with Python 2. (Python 2 doesn't support base85, nor does there appear to be a backport for it.)

Why

It's a more compact text encoding than Base64, and some manual experimentation gives about a 10% space savings for randomized binary data.

Closes #250.

@davvid
Copy link
Member

davvid commented Mar 2, 2019

Sweet, this looks perfect except for one little trivial tweak that I'll fixup here before pushing out the merge. I'm going to rename the variables from prefer_base85 to use_base85, for consistency with the existing use_decimal.

davvid added a commit to davvid/jsonpickle that referenced this pull request Mar 2, 2019
Related-to: jsonpickle#251
Signed-off-by: David Aguilar <davvid@gmail.com>
davvid added a commit to davvid/jsonpickle that referenced this pull request Mar 2, 2019
* base85:
  api: rename prefer_base85 to use_base85 for consistency with use_decimal
  Update changelog
  Add Base85 encoding support for Python 3

Signed-off-by: David Aguilar <davvid@gmail.com>
@davvid davvid merged commit 90e074b into jsonpickle:master Mar 2, 2019
@cyberic99
Copy link

Hi,

Has anyone done any perf mesurement after switching to base85?

@dargueta dargueta deleted the base85 branch March 2, 2019 11:15
@dargueta
Copy link
Contributor Author

dargueta commented Mar 2, 2019

Oh, good point. All of my testing has been on relatively simple stuff.

Update

base85 appears to be slower on Python 3.6 and 3.7 by a factor of about 2.25. 😬 Maybe it should be an opt-in feature. (Base64 is implemented in C, base85 in Python.)

@davvid
Copy link
Member

davvid commented Mar 2, 2019

We haven't tagged a release yet so changing this default is still fair game.

@dargueta
Copy link
Contributor Author

dargueta commented Mar 2, 2019

I looked at the Python 3.6 source code and it appears that there's significant overhead on the first call to b85encode because it builds some tables, but subsequent calls are faster. Still, it's noticeably slower than base64.

I'll open a PR to switch base64 to be the default later today. Thanks @cyberic99 for pointing out something I obviously should've done first. Sorry about this!

@dargueta
Copy link
Contributor Author

dargueta commented Mar 3, 2019

Done: #252

@cyberic99
Copy link

Thx @dargueta !

For my use case, the thoughput is more important than the file size, so I thought about the performance impact right away ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants