Base64 decode of Japanese UTF-8 Encoded text causes URI malformed Error #1

arwyn · 2017-12-19T13:25:34Z

A quick review of the code shows the error occur here:

    
           (outputEncoding === OUTPUT_STRING) ? decodeURIComponent(escape(arr2str(decode(base64Str)))) : decode(base64Str)

Why are you escaping, then decoding as a URI? The body is not guaranteed to be URI formatted.

The bug seems to of been introduced here:
101c606

The following Javascript can reproduce the issue:
decodeURIComponent(escape("日本語"))

The text was updated successfully, but these errors were encountered:

felixhammerl · 2017-12-19T16:28:33Z

Here's an explanation what's going on:
http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html
http://ecmanaut.blogspot.de/2006/07/encoding-decoding-utf8-in-javascript.html

Is it valid base64?
Can you add a test for the breaking base64?

arwyn · 2017-12-20T01:17:21Z

The Base64 decode itself works correctly. The decode function returns correct data. The issues is the escape->de-escape that causes a malformed uri error. If you run the code I gave above you will get the error.

In Base64 it is would be something like this: decodeURIComponent(escape(arr2str(decode("5pel5pys6KqeCg=="))))

arwyn · 2017-12-20T04:47:48Z

The following line is from an actual email I'm parsing. I added a console.log in a try/catch in the decode function to get the base64str. I would prefer if the test case is not used as-is, since it is from an actual email, even though it does not contain any private information.

expect(decode('4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSBCuacrOODoeODvOODq+OBr+OAgeODnuOCpOODiuOD')).to.deep.equal("━━━━━━━━━\n 本メールは、マイナ")

felixhammerl · 2017-12-20T17:06:46Z

yes, you're of course right. This has no business being there. This only makes sense if the encoded data is from charset utf-8 ... which it probably wasn't. Otherwise it'll fail. Can you tell me what the charset was?

felixhammerl · 2017-12-20T22:43:55Z

p.s. had to use your example as test data, couldn't come across non-utf8 base64-encoded data in the mean time. if you want that changed, we can do that. i figured the fix is more important though :)

arwyn · 2017-12-21T05:17:05Z

Thank you for your quick response.
There is no private info or identifying information in that snippet, so no big issue i guess.

The mime node has the following header block. It is part of a multipart/mixed message, which in turn is part of a message/rfc822 section of a multipart/report message sent by a remote Postfix server.

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: base64

The text seems to decode fine in other systems, but they might be doing error correction. Not all lines cause the issue. The base64 block contains multiple lines. One thing I can think of is that since utf8 is a multi-byte character set(1-4 bytes), maybe one of the bytes is on the next/previous line?

Next week I will be running my code through a large subset of actual mails, lots of japanese, chinese and korean text and encodings. So far your parser has worked quite well, very happy with it. I will raise any other issues I find.

felixhammerl · 2017-12-21T13:07:02Z

Yes, please raise a ticket if anything comes up.

Thanks for the feedback!

arwyn mentioned this issue Dec 19, 2017

Mail parsing fails when there is corrupted body content emailjs/emailjs-mime-parser#17

Closed

felixhammerl closed this as completed in 3668f86 Dec 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base64 decode of Japanese UTF-8 Encoded text causes URI malformed Error #1

Base64 decode of Japanese UTF-8 Encoded text causes URI malformed Error #1

arwyn commented Dec 19, 2017

felixhammerl commented Dec 19, 2017

arwyn commented Dec 20, 2017

arwyn commented Dec 20, 2017

felixhammerl commented Dec 20, 2017 •

edited

Loading

felixhammerl commented Dec 20, 2017

arwyn commented Dec 21, 2017

felixhammerl commented Dec 21, 2017

Base64 decode of Japanese UTF-8 Encoded text causes URI malformed Error #1

Base64 decode of Japanese UTF-8 Encoded text causes URI malformed Error #1

Comments

arwyn commented Dec 19, 2017

felixhammerl commented Dec 19, 2017

arwyn commented Dec 20, 2017

arwyn commented Dec 20, 2017

felixhammerl commented Dec 20, 2017 • edited Loading

felixhammerl commented Dec 20, 2017

arwyn commented Dec 21, 2017

felixhammerl commented Dec 21, 2017

felixhammerl commented Dec 20, 2017 •

edited

Loading