Skip to content

base64: allow skipping padding = characters #6711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 2, 2023

Conversation

michalmuskala
Copy link
Contributor

This extends base64 encoding and decoding routines to allow skipping emitting final = characters, or to accept encoded data with final = characters skipped. This format is common for some use-cases. Today, parsing it requires appending the missing = characters, which incurs an unnecessary copy of the input data, or re-implementing the entire routine.

This amends the interface in a backwards-incompatible way, since the last change wasn't released yet, so should have no expectation of backwards-compatibility.

Additionally, tests are slightly enhanced to cover more cases directly with simple examples.

If this change is accepted, I will provide documentation.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 18, 2023

CT Test Results

       2 files       88 suites   37m 45s ⏱️
1 850 tests 1 802 ✔️ 48 💤 0
2 140 runs  2 090 ✔️ 50 💤 0

Results for commit 3608e88.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@bjorng bjorng added the team:VM Assigned to OTP team VM label Jan 19, 2023
@bjorng
Copy link
Contributor

bjorng commented Jan 25, 2023

This amends the interface in a backwards-incompatible way, since the last change wasn't released yet, so should have no expectation of backwards-compatibility.

Can you clarify this? Is it compatible with OTP 25 and previous releases or not?

@michalmuskala
Copy link
Contributor Author

michalmuskala commented Jan 25, 2023

It is fully compatible with OTP 25 - there only the /1 functions exists and their behaviour didn't change. However, it is not compatible with current master which has the new /2 functions, but this new interface was never released

@bjorng
Copy link
Contributor

bjorng commented Jan 25, 2023

Have you done any benchmarking to make sure that there is no a significant slowdown because of the new options?

@bjorng
Copy link
Contributor

bjorng commented Jan 26, 2023

Here is a benchmark I have been using to test base64 performance:

https://gist.github.com/bjorng/522470f313ac01d16156a597657f97bb

It is based on the benchmark in #5639.

@max-au
Copy link
Contributor

max-au commented Jan 27, 2023

Quick test on my machine shows no regression at all, at least for small strings. Here is before this PR:

./erlperf 'base64:decode(<<"SGVsbG8gV29ybGQ=">>).' -l 10M -s 10
Code                                       ||   Samples       Avg   StdDev    Median      P99  Iteration
base64:decode(<<"SGVsbG8gV29ybGQ=">>).      1        10   1686 ms    1.08%   1672 ms  1710 ms     168 ns

And this is after, using new option:

./erlperf 'base64:decode(<<"SGVsbG8gV29ybGQ">>, #{padding => false}).' -l 10M -s 10
Code                                                           ||   Samples       Avg   StdDev    Median      P99  Iteration
base64:decode(<<"SGVsbG8gV29ybGQ">>, #{padding => false}).      1        10   1654 ms    0.28%   1656 ms  1659 ms     165 ns

I also tried with random 1Mb input, with the same effect.

I wonder which case is expected to give the worst performance, when padding isn't needed, or when it needs 3 = characters?

@michalmuskala
Copy link
Contributor Author

michalmuskala commented Jan 27, 2023

I couldn't measure any significant peformance difference on my mac.

Before:

== Testing with 100 B ==
fun base64:encode/1: 1000000 iterations in 1285 ms: 778210 it/sec
fun base64:decode/1: 1000000 iterations in 1600 ms: 625000 it/sec

== Testing with 1 MB ==
fun base64:encode/1: 1000 iterations in 12767 ms: 78 it/sec
fun base64:decode/1: 1000 iterations in 15620 ms: 64 it/sec

After:

== Testing with 100 B ==
fun base64:encode/1: 1000000 iterations in 1280 ms: 781250 it/sec
fun base64:decode/1: 1000000 iterations in 1594 ms: 627352 it/sec

== Testing with 1 MB ==
fun base64:encode/1: 1000 iterations in 12671 ms: 78 it/sec
fun base64:decode/1: 1000 iterations in 15469 ms: 64 it/sec

@max-au the amount of padding shouldn't impact performance almost at all - it's basically linear in the length of the string. Since there's no copying or anything like that, it's just traversing or not the extra couple padding bytes (depending if they exist or not) - it shouldn't be noticeable.

@bjorng
Copy link
Contributor

bjorng commented Jan 27, 2023

Thanks for benchmarking, @max-au and @michalmuskala.

The OTP Technical Board will have to approve any API changes. To be able to approve the API, the board members will need to see a description of the API (not by reading the code). I have looked at your code and based on that look it seems to me that the API is reasonable and I find it likely that it will be approved. You can either write the documentation now or write a short API description in a comment in this PR.

@michalmuskala michalmuskala force-pushed the base64-padding branch 2 times, most recently from c0cb93d to 0ba36bb Compare January 27, 2023 10:59
@michalmuskala
Copy link
Contributor Author

I pushed a commit updating documentation

@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Jan 27, 2023
@bjorng
Copy link
Contributor

bjorng commented Feb 2, 2023

I have pushed two fixup commits. The first is addition of links (suggested by @RaimoNiskanen); the other is an updated description of the default value that I find less confusing. If you agree, please squash the commits. When you are done, I'll merge the PR.

@bjorng bjorng removed the testing currently being tested, tag is used by OTP internal CI label Feb 2, 2023
This extends base64 encoding and decoding routines to allow
skipping emitting final = characters, or to accept encoded
data with final = characters skipped. This format is common
for some use-cases. Today, parsing it requires appending
the missing = characters, which incurs an unnecessary copy
of the input data, or re-implementing the entire routine.

This amends the interface in a backwards-incompatible way,
since the last change wasn't released yet, so should have
no expectation of backwards-compatibility.

Additionally, tests are slightly enhanced to cover more cases
directly with simple examples.
@michalmuskala
Copy link
Contributor Author

Rebased on fresh master & squashed commits

@bjorng bjorng merged commit a6c774c into erlang:master Feb 2, 2023
@michalmuskala michalmuskala deleted the base64-padding branch February 2, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants