-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add selectable alphabets (standard vs urlsafe) to base64 #6280
Conversation
CT Test Results 2 files 86 suites 33m 37s ⏱️ Results for commit 05e61dc. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
Having worked on improving the performance of Running base64_bench on my M1 MacBook Pro on the
The results with this pull request are:
One way to improve the performance for encoding without having completely separate code for the
where
The benchmark results are:
which is closer to the original performance and probably acceptable. It should be possible to optimize decoding in a similar way. |
cb44452
to
83fafbf
Compare
@bjorng I refactored as you suggested. Running the |
53ddacf
to
502ec5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Benchmark results on my Intel Mac are almost exactly the same as for the master branch.
Regarding the specs, I think that the easiest solution is to extend base64_alphabet()
to include the characters from both variants of Base64.
It seems that you are not testing the new mime_decode
functions. I suggest that you add tests for them in the funs in the mime_decode/1
and mime_decode_to_string/1
test cases.
Since this is an extension of an API, the OTP Technical Board will have to approve it.
Great =)
Ok, then I'll do it that way.
Yeah, and there is also the property test suite that needs to be extended to cover the new functions. I'm currently working on that.
Sure. |
Something that just occurred to me when reading this PR... the Given the encoded string
(It is similar with IMO, it is ok and expected that the same string may decode differently in The possible ways I can think of to address the problem:
|
Good point @Maria-12648430, you do have a knack for things like that 😬 I'll leave things as they are for now, any of your suggestions is easy to do, whatever the decision may be. |
@Maria-12648430 Yes, I will bring that up with the OTB. |
More like a dark spot in my soul that urges me to poke holes in shiny things 😇 |
That is why no company that values its pride should ever hire you 😁 You would find three different ways to capsize their Unsinkable Flagship®️ on your first day. |
And enjoy it 😺 But make that 5 ways 😁 |
502ec5d
to
aa56b9e
Compare
The last push adds specs, tests for (On a side note, some of the links to RFC 4648 were broken in the docs, and I fixed them while I was at it.) The property test suite will be extended later by @Maria-12648430 (thanks 🤗), since she implemented the current test suite and has more experience with property testing than I do. Last but not least, I have been discussing the |
|
Hi @RaimoNiskanen 🙂
We have been discussing this also, but, well... First, it would make the code more complex. There would either have to be extra clauses and/or guards on practically all functions involved in the decoding (and quite many of them, like Second, it would still be brittle, just in a different way. It would be guesswork. I think it is better to not give raise to the illusion that the right alphabet will always be detected automatically. Third and last, I don't think it is very useful in itself, possibly dangerous even. Users should know where the data to decode originates from, and what kind of alphabet to (not) expect/accept. |
The last commit adds the property tests. I didn't bother with |
The OTB meeting agreed with the conclusion reached in the previous comments on this PR that the |
427fda8
to
f9dedd4
Compare
@bjorng all done, I removed |
Thanks! Added to our daily builds. |
I forgot about the |
RFC 4648 defines two possible alphabets that may be used for encoding and decoding, the standard alphabet in Section 4 and an alternative URL and Filename safe alphabet in Section 5. This commit adds the ability to specify one of the alphabets for encoding and decoding. Co-authored-by: Maria Scott <maria-12648430@hnc-agency.org>
f9dedd4
to
05e61dc
Compare
Dto. 😆
Done |
Thanks! |
In #5639 an addition to the
base64
module to support an alternative, URL-safe encoding alphabet (base64url
) was suggested. In the alternative encoding, the characters+
and/
(RFC 4648 Section 4), which may be problematic in URLs and file names, are replaced with-
and_
(RFC 4648 Section 5), respectively.This PR adds the new functions
encode/2
,encode_to_string/2
,decode/2
,decode_to_string/2
,mime_decode/2
andmime_decode_to_string/2
as supplements of the respective existing 1-ary functions of the same names. The second parameters may be one of the atomsstandard
(meaning the section 4 alphabet; default for the 1-ary functions) andurlsafe
(the section 5 alphabet), which denote the alphabet to be used for encoding and decoding. The decoding functions also acceptundefined
, via which they accept characters from both alphabets.I'm still not clear on how to best spec this, and documentation is another matter, so this is a WIP.
As for tests, I added two new unit tests for now. If this PR is to be accepted, I'll extend the property test suite accordingly.