E-Mail charset is not set back to original during body.decode #403

Closed
januszm opened this Issue May 28, 2012 · 6 comments

Projects

None yet

3 participants

januszm commented May 28, 2012

I had a problem today because in my application ISO-8859-2 characters (diacritics) where not shown in mail body.
I've found out that I have to manually guess original mail charset and use it in {{{ force_encoding }}} method, to be able to save body content in MySQL database.

I think that mail gem should check charset before doing DECODE so that after .decode we can see original diacritics.

See http://stackoverflow.com/questions/10787791/ruby-rails-email-base64-gets-split-at-diacritics-and-content-lost-in-mysql/10790062#10790062

if message.multipart?
    charset = message.text_part.content_type_parameters[:charset]
    @message_body = message.text_part.body.to_s.force_encoding(charset).encode("UTF-8")
else
    charset = message.content_type_parameters[:charset]
    @message_body = message.body.decoded.force_encoding(charset).encode("UTF-8")
end

and what I wanted to have is:

@message_body = message.multipart? ? message.text_part.body.to_s : message.body.decoded

This works for me on dozens of e-mails. Reading e-mails from Gmail using IMAP on Ruby 1.9.3.
The code makes the body human readable for me without technical headers.
Thanks!

Collaborator
jeremy commented Jan 22, 2013

Rather than using message.body.decoded or message.decode_body (which just calls body.decoded), call message.decoded. That checks whether it's a text message [1] and calls message.decode_body_as_text, which decodes the transfer-encoding and sets the Ruby string encoding according to the message's charset [2].

[1] https://github.com/mikel/mail/blob/master/lib/mail/message.rb#L1786
[2] https://github.com/mikel/mail/blob/master/lib/mail/message.rb#L2047-L2050

@jeremy jeremy closed this Jan 22, 2013

a simple test from a project I'm working on now:

email = Email.find_by_message_id(mail.message_id)
body = mail.decoded

results in :

Can not decode an entire message, try calling #decoded on the various fields and body or parts if it is a multipart message.

doing it as proposed by @januszm works as expected for me:

if mail.multipart?
  charset = mail.text_part.content_type_parameters[:charset]
  body = mail.text_part.body.to_s.force_encoding(charset).encode("UTF-8")
else
  charset = mail.content_type_parameters[:charset]
  body = mail.body.decoded.force_encoding(charset).encode("UTF-8")
end
Collaborator
jeremy commented Jan 22, 2013

@webdevotion Call it on the text part: message.text_part.decoded

Thank you @jeremy. It's working now.
Processing loads of emails can be quite daunting at first.
Your help is much appreciated.

Collaborator
jeremy commented Jan 22, 2013

@webdevotion—Great! :bowtie:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment