Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with wrongly encoded non-ASCII filenames #651

Open
vakuum opened this issue Jan 9, 2014 · 6 comments
Open

Trouble with wrongly encoded non-ASCII filenames #651

vakuum opened this issue Jan 9, 2014 · 6 comments

Comments

@vakuum
Copy link

vakuum commented Jan 9, 2014

If the filename in the Content-Disposition header of an attachment contains non-ASCII characters, then the following code fails with an error:

$ locale
LANG=en_US.utf8
LANGUAGE=
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

$ irb
> require 'mail'
> Mail.new(File.new('/tmp/mail.txt').read).to_s
NoMethodError: undefined method `disposition_type' for #<Mail::UnstructuredField:0x0000000919bf00>
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/field.rb:167:in `method_missing'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/part.rb:37:in `inline?'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/part.rb:42:in `add_required_fields'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/message.rb:1788:in `ready_to_send!'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/message.rb:1786:in `block in ready_to_send!'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/message.rb:1784:in `each'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/message.rb:1784:in `ready_to_send!'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/message.rb:1800:in `encoded'
        from /home/clemens/.rvm/gems/ruby-2.0.0-p353@global/gems/mail-2.5.4/lib/mail/message.rb:1869:in `to_s'
        from (irb):2
        from /home/clemens/.rvm/rubies/ruby-2.0.0-p353/bin/irb:12:in `<main>'

Here is the used mail.txt:

Message-ID: <1234567890@test.test>
Date: Thu, 09 Jan 2014 00:00:00 +0100
From: Test <test@test.test>
MIME-Version: 1.0
To: test@test.test
Subject: Test
Content-Type: multipart/mixed;
 boundary="------------090202060701010202050308"

This is a multi-part message in MIME format.
--------------090202060701010202050308
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Test

--------------090202060701010202050308
Content-Type: image/png;
 name="überraschung.png"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="überraschung.png"

iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAGElEQVQY02P8z8Dwn4EIwMRA
JBhVSB2FAD2cAhLPzm83AAAAAElFTkSuQmCC
--------------090202060701010202050308--

Normally, the filename parameter of the Content-Disposition header should be encoded as described in RFC 2183 and RFC 2184. But not all mail clients are doing this.

The following monkey patch solves the problem by replacing all non-ASCII characters in the Content-Disposition header:

module Mail
  class ContentDispositionElement
    def cleaned(string)
      string = string.encode('US-ASCII', invalid: :replace, undef: :replace, replace: '_') unless string.ascii_only?
      string =~ /(.+);\s*$/ ? $1 : string
    end
  end
end

(I can't use Mail::Encodings.param_encode in the cleaned method because string contains the whole header and not only the value of the filename parameter.)

I haven't created a pull request with tests, because I am not sure if the above solution is the correct way to handle this.

@vakuum
Copy link
Author

vakuum commented Jan 9, 2014

Someone else has the same problem: http://www.redmine.org/issues/14865

@vakuum
Copy link
Author

vakuum commented Jan 10, 2014

Maybe a more robust version of the monkey patch above:

module Mail
  class ContentDispositionElement
    alias old_cleaned cleaned

    def cleaned(string)
      string = string.encode('US-ASCII', invalid: :replace, undef: :replace, replace: '_') unless string.ascii_only?
      old_cleaned(string)
    end
  end
end

@kamilio
Copy link

kamilio commented Jan 13, 2014

👍

@DanielWillig
Copy link

We have the same problem. Solved it for now by using the monkey patch...

@caseyjmorris
Copy link

I believe the problem is specifically if it contains non-ASCII characters which are not valid in whatever encoding the e-mail itself is in (for instance, if the e-mail is encoded in UTF-8 and the filename is Shift-JIS). Just throwing out all non-ASCII characters doesn't seem like a very satisfying solution though.

@jeremy
Copy link
Collaborator

jeremy commented May 17, 2017

More liberal parsing is welcome so long as it doesn't break legit common-case parsing.

No exception on current master with the following patch:

diff --git a/lib/mail/utilities.rb b/lib/mail/utilities.rb
index e962d68..abc8658 100644
--- a/lib/mail/utilities.rb
+++ b/lib/mail/utilities.rb
@@ -39,7 +39,7 @@ module Mail
 
     # Returns true if the string supplied is free from characters not allowed as a TOKEN
     def token_safe?( str )
-      not TOKEN_UNSAFE === str
+      str.ascii_only? && !(TOKEN_UNSAFE === str)
     end
 
     # If the string supplied has TOKEN unsafe characters in it, will return the string quoted
>> puts m.encoded
Date: Thu, 09 Jan 2014 00:00:00 +0100
From: Test <test@test.test>
To: test@test.test
Message-ID: <1234567890@test.test>
Subject: Test
Mime-Version: 1.0
Content-Type: multipart/mixed;
 boundary=------------090202060701010202050308
Content-Transfer-Encoding: 7bit

This is a multi-part message in MIME format.
--------------090202060701010202050308
Content-Type: text/plain;
 charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 7bit

Test

--------------090202060701010202050308
Content-Type: image/png;
 name*=utf-8'en'%C3%BCberraschung.png
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*=utf-8'en'%C3%BCberraschung.png

iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAGElEQVQY02P8
z8Dwn4EIwMRAJBhVSB2FAD2cAhLPzm83AAAAAElFTkSuQmCC

--------------090202060701010202050308--

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants