-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base64 decode of Japanese UTF-8 Encoded text causes URI malformed Error #1
Comments
Here's an explanation what's going on: Is it valid base64? |
The Base64 decode itself works correctly. The decode function returns correct data. The issues is the escape->de-escape that causes a malformed uri error. If you run the code I gave above you will get the error. In Base64 it is would be something like this: |
The following line is from an actual email I'm parsing. I added a console.log in a try/catch in the decode function to get the base64str. I would prefer if the test case is not used as-is, since it is from an actual email, even though it does not contain any private information.
|
yes, you're of course right. This has no business being there. This only makes sense if the encoded data is from charset utf-8 ... which it probably wasn't. Otherwise it'll fail. Can you tell me what the charset was? |
p.s. had to use your example as test data, couldn't come across non-utf8 base64-encoded data in the mean time. if you want that changed, we can do that. i figured the fix is more important though :) |
Thank you for your quick response. The mime node has the following header block. It is part of a
The text seems to decode fine in other systems, but they might be doing error correction. Not all lines cause the issue. The base64 block contains multiple lines. One thing I can think of is that since utf8 is a multi-byte character set(1-4 bytes), maybe one of the bytes is on the next/previous line? Next week I will be running my code through a large subset of actual mails, lots of japanese, chinese and korean text and encodings. So far your parser has worked quite well, very happy with it. I will raise any other issues I find. |
Yes, please raise a ticket if anything comes up. Thanks for the feedback! |
A quick review of the code shows the error occur here:
emailjs-base64/src/base64-decode.js
Line 7 in 7beb63f
Why are you escaping, then decoding as a URI? The body is not guaranteed to be URI formatted.
The bug seems to of been introduced here:
101c606
The following Javascript can reproduce the issue:
decodeURIComponent(escape("日本語"))
The text was updated successfully, but these errors were encountered: