New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIME Base64 base64 encodes unicode codepoints when it should encode a binary unicode/utf8 etc encoding #814
Comments
|
For the record, the Rosella implementation of MimeBase64 passes this test with the same behavior as perl. We can use that as a reference to improve the parrot version of this routine. |
|
IMHO the Rosella implementation is still off because the decode function needs to return a string when Base64 can also encode images and other binary data. I have created a pure Perl6 possible reference implementation Enc::MIME::Base64 here https://github.com/ronaldxs/perl6-Enc-MIME-Base64, with an interface that I think is better. Also note that rfc 4648 mentions http://josefsson.org/base-encoding/ as kind of/sort of a 'C' reference implementation. |
|
Fixed with 3a48e6b but I need to rewrite the tests |
…decode_base64 Use bytebuffer representations of the encoded string, not the encoded ord value. Also fix the tests to match this conformant behaviour. The problem is now that base64 encoded files are endian dependent, and the multibyte tests need to be skipped on big-endian.
|
TODO: big-endian testing, binary testcase. |
|
|
The following is output for Perl and parrot MIME::Base64 encodings of the unicode cents symbol, codepoint A2, from a utf8 buffer. Please note that the results are different.
05:01:15 ron-laptop:~/parrot$ perl ok_utf8_bas64.pl
wqI=
05:01:32 ron-laptop:
/parrot$ perl ok_utf8_bas64.pl | base64 -d | od -x/parrot$ parrot bad_utf8_base64.pir0000000 a2c2
0000002
05:01:46 ron-laptop:
og==
05:01:58 ron-laptop:~/parrot$ parrot bad_utf8_base64.pir | base64 -d | od -x
0000000 00a2
0000001
The Perl encoding program is base64 encoding the utf8 encoding whereas the parrot MIME Base64 library is using the "ord" operator to convert a string to codepoints and then base64 encoding the codepoints. Since parrot MIME Base64 decoding works on the same principle the encoding will decode correctly for parrot but base64 is a standard and an external base64 program like the gnu "base64" utility should decode our encoding to something that makes sense. I believe that from this point of view the short Perl program below is doing the right encoding and the short parrot program below is not.
file: ok_utf8_bas64.pl
file: bad_utf8_base64.pir
The text was updated successfully, but these errors were encountered: