MIME Base64 base64 encodes unicode codepoints when it should encode a binary unicode/utf8 etc encoding #814

Closed
ronaldxs opened this Issue Aug 17, 2012 · 5 comments

Comments

Projects
None yet
3 participants
@ronaldxs
Contributor

ronaldxs commented Aug 17, 2012

The following is output for Perl and parrot MIME::Base64 encodings of the unicode cents symbol, codepoint A2, from a utf8 buffer. Please note that the results are different.

05:01:15 ron-laptop:~/parrot$ perl ok_utf8_bas64.pl
wqI=

05:01:32 ron-laptop:/parrot$ perl ok_utf8_bas64.pl | base64 -d | od -x
0000000 a2c2
0000002
05:01:46 ron-laptop:
/parrot$ parrot bad_utf8_base64.pir
og==
05:01:58 ron-laptop:~/parrot$ parrot bad_utf8_base64.pir | base64 -d | od -x
0000000 00a2
0000001

The Perl encoding program is base64 encoding the utf8 encoding whereas the parrot MIME Base64 library is using the "ord" operator to convert a string to codepoints and then base64 encoding the codepoints. Since parrot MIME Base64 decoding works on the same principle the encoding will decode correctly for parrot but base64 is a standard and an external base64 program like the gnu "base64" utility should decode our encoding to something that makes sense. I believe that from this point of view the short Perl program below is doing the right encoding and the short parrot program below is not.

file: ok_utf8_bas64.pl

use strict;
use MIME::Base64 qw(encode_base64);
use Encode qw(encode);
my $encoded = encode_base64(encode("UTF-8", "\x{a2}"));
print $encoded, $/;

file: bad_utf8_base64.pir

.sub go :main
    load_bytecode 'MIME/Base64.pbc'

    .local pmc enc_sub
    enc_sub = get_global [ "MIME"; "Base64" ], 'encode_base64'

    .local string result_encode
    result_encode = enc_sub(utf8:"\x{a2}")

    say result_encode
.end
@Whiteknight

This comment has been minimized.

Show comment Hide comment
@Whiteknight

Whiteknight Aug 18, 2012

Contributor

For the record, the Rosella implementation of MimeBase64 passes this test with the same behavior as perl. We can use that as a reference to improve the parrot version of this routine.

Contributor

Whiteknight commented Aug 18, 2012

For the record, the Rosella implementation of MimeBase64 passes this test with the same behavior as perl. We can use that as a reference to improve the parrot version of this routine.

@ronaldxs

This comment has been minimized.

Show comment Hide comment
@ronaldxs

ronaldxs Sep 11, 2012

Contributor

IMHO the Rosella implementation is still off because the decode function needs to return a string when Base64 can also encode images and other binary data. I have created a pure Perl6 possible reference implementation Enc::MIME::Base64 here https://github.com/ronaldxs/perl6-Enc-MIME-Base64, with an interface that I think is better. Also note that rfc 4648 mentions http://josefsson.org/base-encoding/ as kind of/sort of a 'C' reference implementation.

Contributor

ronaldxs commented Sep 11, 2012

IMHO the Rosella implementation is still off because the decode function needs to return a string when Base64 can also encode images and other binary data. I have created a pure Perl6 possible reference implementation Enc::MIME::Base64 here https://github.com/ronaldxs/perl6-Enc-MIME-Base64, with an interface that I think is better. Also note that rfc 4648 mentions http://josefsson.org/base-encoding/ as kind of/sort of a 'C' reference implementation.

rurban pushed a commit that referenced this issue Sep 24, 2012

Reini Urban
[GH #813 + #814] Fix MIME/Base64.pir for encoded strings
Use bytebuffer representations of the encoded string, not the encoded ord value.
Now the implementation is correct, but some encoded tests not.

rurban pushed a commit that referenced this issue Sep 24, 2012

@ghost ghost assigned rurban Sep 24, 2012

@rurban

This comment has been minimized.

Show comment Hide comment
@rurban

rurban Sep 24, 2012

Member

Fixed with 3a48e6b but I need to rewrite the tests

Member

rurban commented Sep 24, 2012

Fixed with 3a48e6b but I need to rewrite the tests

rurban pushed a commit that referenced this issue Oct 1, 2012

Reini Urban
[GH #813, #814] New implementation of encode_base64, add 2nd arg to d…
…ecode_base64

encode_base64 uses now a sliding buffer to hold multi-byte overshoots.
decode_base64(str, ?:encoding) for easier decoding.

This does not work yet

rurban pushed a commit that referenced this issue Oct 2, 2012

Reini Urban
[GH #813 + #814] Fix MIME/Base64.pir for encoded strings
Use bytebuffer representations of the encoded string, not the encoded ord value.
Now the implementation is correct, but some encoded tests not.

rurban pushed a commit that referenced this issue Oct 2, 2012

Reini Urban
[GH #813 + #814] Use Bytebuffer for MIME::Base64, add 2nd enc arg to …
…decode_base64

Use bytebuffer representations of the encoded string, not the encoded ord value.
Also fix the tests to match this conformant behaviour.

The problem is now that base64 encoded files are endian dependent, and the multibyte
tests need to be skipped on big-endian.
@rurban

This comment has been minimized.

Show comment Hide comment
@rurban

rurban Oct 2, 2012

Member

rurban/mime-base64-utf8-gh813+gh814 fixes the multi-byte issues in MIME::Base64 and is almost ready to be merged.

TODO: big-endian testing, binary testcase.

Member

rurban commented Oct 2, 2012

rurban/mime-base64-utf8-gh813+gh814 fixes the multi-byte issues in MIME::Base64 and is almost ready to be merged.

TODO: big-endian testing, binary testcase.

rurban pushed a commit that referenced this issue Oct 2, 2012

rurban pushed a commit that referenced this issue Oct 2, 2012

Reini Urban
[GH #813, #814] Fix tests for big-endian
Also check for has_icu to check with composed unicode strings
@rurban

This comment has been minimized.

Show comment Hide comment
@rurban

rurban Oct 2, 2012

Member

rurban/mime-base64-utf8-gh813+gh814 is now ready to be merged.

Member

rurban commented Oct 2, 2012

rurban/mime-base64-utf8-gh813+gh814 is now ready to be merged.

rurban pushed a commit that referenced this issue Oct 2, 2012

@rurban rurban closed this Dec 28, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment