Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E-Mail charset is not set back to original during body.decode #403

Closed
januszm opened this issue May 28, 2012 · 6 comments
Closed

E-Mail charset is not set back to original during body.decode #403

januszm opened this issue May 28, 2012 · 6 comments

Comments

@januszm
Copy link

januszm commented May 28, 2012

I had a problem today because in my application ISO-8859-2 characters (diacritics) where not shown in mail body.
I've found out that I have to manually guess original mail charset and use it in {{{ force_encoding }}} method, to be able to save body content in MySQL database.

I think that mail gem should check charset before doing DECODE so that after .decode we can see original diacritics.

See http://stackoverflow.com/questions/10787791/ruby-rails-email-base64-gets-split-at-diacritics-and-content-lost-in-mysql/10790062#10790062

if message.multipart?
    charset = message.text_part.content_type_parameters[:charset]
    @message_body = message.text_part.body.to_s.force_encoding(charset).encode("UTF-8")
else
    charset = message.content_type_parameters[:charset]
    @message_body = message.body.decoded.force_encoding(charset).encode("UTF-8")
end

and what I wanted to have is:

@message_body = message.multipart? ? message.text_part.body.to_s : message.body.decoded
@webdevotion
Copy link

This works for me on dozens of e-mails. Reading e-mails from Gmail using IMAP on Ruby 1.9.3.
The code makes the body human readable for me without technical headers.
Thanks!

@jeremy
Copy link
Collaborator

jeremy commented Jan 22, 2013

Rather than using message.body.decoded or message.decode_body (which just calls body.decoded), call message.decoded. That checks whether it's a text message [1] and calls message.decode_body_as_text, which decodes the transfer-encoding and sets the Ruby string encoding according to the message's charset [2].

[1] https://github.com/mikel/mail/blob/master/lib/mail/message.rb#L1786
[2] https://github.com/mikel/mail/blob/master/lib/mail/message.rb#L2047-L2050

@jeremy jeremy closed this as completed Jan 22, 2013
@webdevotion
Copy link

a simple test from a project I'm working on now:

email = Email.find_by_message_id(mail.message_id)
body = mail.decoded

results in :

Can not decode an entire message, try calling #decoded on the various fields and body or parts if it is a multipart message.

doing it as proposed by @januszm works as expected for me:

if mail.multipart?
  charset = mail.text_part.content_type_parameters[:charset]
  body = mail.text_part.body.to_s.force_encoding(charset).encode("UTF-8")
else
  charset = mail.content_type_parameters[:charset]
  body = mail.body.decoded.force_encoding(charset).encode("UTF-8")
end

@jeremy
Copy link
Collaborator

jeremy commented Jan 22, 2013

@webdevotion Call it on the text part: message.text_part.decoded

@webdevotion
Copy link

Thank you @jeremy. It's working now.
Processing loads of emails can be quite daunting at first.
Your help is much appreciated.

@jeremy
Copy link
Collaborator

jeremy commented Jan 22, 2013

@webdevotion—Great! :bowtie:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants