-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decoded_body returns string with the wrong encoding #431
Comments
I found we always need to do this:
or same thing:
PS. If no charset can be determined from the email we default to "iso-8859-1", for some reason this seems to work for us, but probably doesn't work in all environments; instead you might need to use something like charlock to make a best guess of the charset used. |
Or what also seems to work:
|
+1 I also have this problem |
Using ruby 1.9.3p327 and Mail 2.5.2 I see the following:
To my understanding, the content-type of the email given in the header should be considered content type of the body and thus translate to be the encoding of the transfer-decoded string. |
Same issue #403 |
Rather than using [1] https://github.com/mikel/mail/blob/master/lib/mail/message.rb#L1786 |
Out of curiosity, can you explain why calling messae.body.decoded is wrong / or why it needs to choose the 'wrong' encoding? Reason being, that it still seems like a bug to me. |
It's an API design decision. Decoding/encoding applies to transfer encoding, not charset, here. I agree it'd be nice if it worked the same way, too. Perhaps in a future version! |
Starting with an email that is encoded with windows-1252 as sent by Apple Mail:
I create a mail object with it (I put that in a file and open it):
If I get the text part:
we can see that the content type is text/plain and the charset is windows-1252. If I call decode_body on it:
it returns the body with \x85 in it, which is windows-1252 code for the ellipsis "…". If I now try to convert it to UTF-8:
it fails with an exception:
the reason being that the initial string is marked as ASCII-8BIT instead of windows-1252 and \x85 is not a valid ASCII-8BIT char. We can see what's wrong this way:
and we can workaround it this way:
The text was updated successfully, but these errors were encountered: