New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real iso on B64 #20

Closed
dinosaure opened this Issue Nov 29, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@dinosaure
Copy link
Member

dinosaure commented Nov 29, 2018

@hannesm get this behavior:

# B64.decode "Zm9vCg==";;
- : string = "foo\n"
# B64.decode "Zm9vCg=a";;
- : string = "foo\n"

A solution could be to put a optional argument to check strictly end of base64 encoded input.

@hannesm

This comment has been minimized.

Copy link
Member

hannesm commented Nov 29, 2018

FWIW, more discussions about this:

@hannesm

This comment has been minimized.

Copy link
Member

hannesm commented Nov 29, 2018

A solution could be to put a optional argument to check strictly end of base64 encoded input.

I'm interested to hear use cases where you want base64 without proper padding (I'm sure there are some protocols which just remove the = from the string, but IIUC this won't be handled by Base64.decode properly atm anyways).

@dinosaure

This comment has been minimized.

Copy link
Member Author

dinosaure commented Jan 15, 2019

Ok, I did a benchmark between the current version of ocaml-base64 and a tuned version of base64 available in nocrypto (mostly, from it, I avoided Hashtbl and use a string instead a cstruct). And I got this result:

Estimated testing time 2.66667m (16 benchmarks x 10s). Change using -quota SECS.
┌──────────┬──────────────┬─────────┬───────────┬──────────┬────────────┐
│ Name     │     Time/Run │ mWd/Run │  mjWd/Run │ Prom/Run │ Percentage │
├──────────┼──────────────┼─────────┼───────────┼──────────┼────────────┤
│ B64:0    │     166.75ns │  49.00w │           │          │      0.04% │
│ B64:10   │     502.07ns │  55.00w │           │          │      0.11% │
│ B64:50   │   1_957.82ns │  71.01w │           │          │      0.42% │
│ B64:100  │   3_405.78ns │  92.01w │           │          │      0.73% │
│ B64:500  │  12_527.07ns │ 258.00w │           │          │      2.69% │
│ B64:1000 │  22_862.22ns │ 468.00w │     0.24w │    0.24w │      4.90% │
│ B64:2500 │  62_308.51ns │  45.01w │ 1_047.02w │          │     13.36% │
│ B64:5000 │ 137_406.04ns │  45.04w │ 2_089.03w │          │     29.47% │
│ Old:0    │     106.60ns │   9.00w │           │          │      0.02% │
│ Old:10   │   1_198.17ns │  12.00w │           │          │      0.26% │
│ Old:50   │   5_350.12ns │  23.00w │           │          │      1.15% │
│ Old:100  │  10_174.44ns │  38.00w │           │          │      2.18% │
│ Old:500  │  56_043.60ns │ 154.00w │           │          │     12.02% │
│ Old:1000 │  92_715.26ns │ 301.00w │     0.11w │    0.11w │     19.88% │
│ Old:2500 │ 228_832.80ns │   5.00w │   733.05w │          │     49.08% │
│ Old:5000 │ 466_272.34ns │   5.00w │ 1_462.10w │    0.10w │    100.00% │
└──────────┴──────────────┴─────────┴───────────┴──────────┴────────────┘

I did a benchmark when I try to encode a random input and decode it then. The number next to B64 or Old is the length of the random input.

As we can see, the nocrypto's solution is faster than the current solution (Old). I rewrote ocaml-base64 mostly because it raise Not_found exception when I parsed some e-mails. This exception told me nothing about the input (which is of course an invalid input). This exception came from String.index.

The new version is more explicit about errors and can tell you if it's a bad input or if the padding is wrong. I will make a PR soon.

@dinosaure

This comment has been minimized.

Copy link
Member Author

dinosaure commented Jan 15, 2019

Then, the new version fix the reproducible example as expected.

@dinosaure dinosaure closed this Jan 20, 2019

dinosaure added a commit to dinosaure/opam-repository that referenced this issue Jan 21, 2019

[new release] base64 (3.0.0)
CHANGES:

* Implemenation of Base64 according to RFC 2045 (available on base64.rfc2045)
* New implementation of Base64 according to RFC 4648 from nocrypto's implementation
* Fix bad access with `String.iter` on the old implementation of Base64 (@dinosaure, mirage/ocaml-base64#23)
* Check isomorphism between `encode` & `decode` function (@hannesm, @dinosaure, mirage/ocaml-base64#20)
* Add tests from RFC 3548 and from PHP impl. (@hannesm, @dinosaure, mirage/ocaml-base64#24)
* Add fuzzer on both implementations
 - check isomorphism
 - check bijection
 - check if `decode` does not raise any exception
* __break-api__, `B64` was renamed to `Base64` (@copy, @avsm, @dinosaure, mirage/ocaml-base64#17)
* __break-api__, `Base64.decode` and `Base64.encode` returns a result type instead to raise an exception (@hannesm, @dinosaure, mirage/ocaml-base64#21)
* __break-api__, Add `sub` type to avoid allocation to the end-user (@avsm, @dinosaure, mirage/ocaml-base64#24)
* __break-api__, Add `pad` argument on `decode` function to check if input is well-padded or not (@hannesm, @dinosaure, mirage/ocaml-base64#24)
* __break-api__, Add `off` and `len` optional arguments on `encode` & `decode` functions to compute a part of input (@cfcs, @dinosaure, mirage/ocaml-base64#24)
* Better performance (see mirage/ocaml-base64#24) (@dinosaure)
* Review of code by @cfcs (see mirage/ocaml-base64#24)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment