Add simple base64 de/encoding #372

rv32ima · 2021-11-15T20:27:08Z

This PR adds in a relatively simplistic Base64 (de|en)coder (#369)

It is incredibly simple by design, as well as being @nogc @safe pure (except for the functions that do not take in Appenders). I've only tested it with some small bits of data, but, it should easily scale up to handle an infinite amount of data.

I have not verified how this fares with systems that are big-endian -- would love more insight on that regard.

Let me know if this needs any more work. Thanks!

codecov-commenter · 2021-11-15T20:33:11Z

Codecov Report

Merging #372 (217898e) into master (7152ba9) will increase coverage by 0.09%.
The diff coverage is 98.47%.

@@            Coverage Diff             @@
##           master     #372      +/-   ##
==========================================
+ Coverage   92.89%   92.98%   +0.09%     
==========================================
  Files          64       65       +1     
  Lines       15679    15797     +118     
==========================================
+ Hits        14565    14689     +124     
+ Misses       1114     1108       -6

Impacted Files	Coverage Δ
source/mir/base64.d	`98.47% <98.47%> (ø)`
source/mir/rc/array.d	`74.62% <0.00%> (-0.38%)`	⬇️
source/mir/serde.d	`72.50% <0.00%> (ø)`
source/mir/math/stat.d	`100.00% <0.00%> (ø)`
source/mir/ndslice/topology.d	`98.62% <0.00%> (ø)`
source/mir/math/sum.d	`96.40% <0.00%> (+0.32%)`	⬆️
source/mir/series.d	`94.69% <0.00%> (+0.42%)`	⬆️
source/mir/format.d	`95.70% <0.00%> (+1.17%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7152ba9...217898e. Read the comment docs.

source/mir/base64.d

rv32ima · 2021-11-17T17:35:55Z

Alright @9il, I believe I addressed all of your points (and more).

One point of issue was how to handle a Base64 string like QU======QUJD. It seems that implementations are split on this: JavaScript's atob would throw, Python's base64.b64decode would return A, Phobos would throw (not because it found the padding was invalid, but because the out contract would fail), and other online Base64 decoders would return garbage.

So, I've taken the safest route, in my opinion (while still being compliant to RFC4648), and decided to just throw if we meet one of two situations:

we encounter the pad character in a 4-byte chunk that is not the last chunk
we encounter the pad character in the beginning two characters of a chunk

This does mean that Base64 strings like QU====== (extra padding) or ==== (only padding) are invalid, but this is still within specification. This does avoid the concerns raised within the RFC (w.r.t. introducing potential side-channels), and actually improves our Ion test coverage (incidentally).

Let me know if you have any other concerns.

9il · 2021-11-18T04:24:19Z

source/mir/base64.d

+/++
+Encode a ubyte array as Base64, placing the result onto an Appender.
+/
+void encodeBase64(char PlusChar = '+', char SlashChar = '/', Appender)(scope const(ubyte)[] input,


No reason to make PlusChar and SlashChar generic params as well as CapitalCase. Make them runtime and camelCase.

source/mir/base64.d

rv32ima force-pushed the base64-impl branch from a075e3d to e907f7a Compare November 15, 2021 22:59

Add simple base64 de/encoding support

b052bb5

rv32ima force-pushed the base64-impl branch from e907f7a to b052bb5 Compare November 15, 2021 23:09

rv32ima requested a review from 9il November 15, 2021 23:19

9il reviewed Nov 16, 2021

View reviewed changes

source/mir/base64.d Show resolved Hide resolved

9il reviewed Nov 16, 2021

View reviewed changes