Multipart filename can not be decoded. #323

Open
comron opened this Issue Jan 21, 2012 · 28 comments
@comron

I'm using a simple form on a rails application to upload a file, it looks as if the browser is not encoding the filename in the multipart "Content-Disposition" header, and as a result rack cannot decode the filename if it has a % in it, as it expects this to be the start of an escape sequence.

<form accept-charset="UTF-8" action="/documents.json" enctype="multipart/form-data" method="post">
  <input name="utf8" type="hidden" value="✓">
  <input id="document_attachment" name="document[attachment]" type="file">
  <input type="submit">
</form>

Results in the following request

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Content-Type:multipart/form-data; boundary=----WebKitFormBoundary2NHc7OhsgU68l3Al
Origin:http://localhost:3000
Referer:http://localhost:3000/
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.18 (KHTML, like Gecko) Chrome/18.0.1010.0 Safari/535.18

------WebKitFormBoundary2NHc7OhsgU68l3Al
Content-Disposition: form-data; name="utf8"

✓
------WebKitFormBoundary2NHc7OhsgU68l3Al
Content-Disposition: form-data; name="document[attachment]"; filename="100% of a photo.jpeg"
Content-Type: image/jpeg


------WebKitFormBoundary2NHc7OhsgU68l3Al--

As you can see the filename is not encoded, but the % causes rack to throw an error. Where exactly is the problem? It seems I have no control over the browser (Chrome and Firefox on OS X) encoding this name, and rack assumes it is always encoded, in multipart/parser.rb.

Thanks and sorry if this is the wrong forum for such a question.

@BMorearty

That's funny, I was just about to file this same issue.

Here is a config.ru file that can be used for testing:

require 'rubygems'
require 'rack'
require 'rack/request'

class App
  def call(env)
    request = Rack::Request.new(env)
    params = request.params  # <== THIS BLOWS UP with percents in the filename

    html = <<-HTML
      <html>
        <head>
          <style>
            html { background-color: lightblue; }
            body { margin: 3em; padding: 3em; font-family: Verdana; background-color: white; border: 1px solid gray; }
            strong { background-color: yellow; }
            .error { color: red; }
          </style>
        </head>
        <body>
          <form accept-charset="UTF-8" action="/" method="POST" enctype="multipart/form-data">
            <p>
              Upload a file <strong>whose filename has a %</strong> but does not follow the URL-encoding pattern.
            </p>
            <p>
              The call to <tt>request.params</tt> fails with "invalid %-encoding" in IE8, IE9, FF9/Mac, Chrome 16/Mac,
              and Safari 5/Mac because the browsers are not URL-encoding the filename but <tt>Rack::Request</tt>
              expects it to be encoded.
            </p>
            <p>
              In development a stack trace is shown.
              When rack is running in production mode (<tt>rackup -E production config.ru</tt>),
              Safari 5 goes into an infinite loop of requests.  The other browsers show error pages.
            </p>
            <input name="upload" type="file" />
            <input type="submit" value="Submit and Explode"/>
          </form>
          <div class='error'>
            #{ "There was no % in that filename" if params['upload'] && !(params['upload'][:filename] =~ /%/) }
          </div>
        </body>
      </html>
    HTML

    [200, {"Content-Type" => "text/html"}, [html]]
  end
end

run App.new
@raggi
Official Rack repositories member

Can you guys validate: a5f681e

@BMorearty

@raggi That fixes most cases for me. Certainly much better than before and will work most of the time. But when I tried "a%b.txt" I still got the same error because it's followed by one valid hex letter but not by two. It might be good to make it robust enough to handle that case because Safari goes into an infinite-request loop if it gets this error.

@raggi
Official Rack repositories member

Oh, i screwed up that regex. I'll make some more tests. Thanks for checking.

@raggi
Official Rack repositories member

@BMorearty it's getting pretty nasty, but, 8d20282 should pass that example too. Any others you can think of?

@raggi raggi closed this Jan 23, 2012
@BMorearty

@raggi Here's one that still fails: '100%' (filename ends with percent--has no extension).

I'm starting to question whether unescaping percents should be done at all. As we discussed before this ticket was filed, RFC 1867 says:

...if the file name of the client's
operating system is not in US-ASCII, the file name might be
approximated or encoded using the method of RFC 1522.

RFC 1552 encoding, though, does not use percents and does not look like URL-encoding (RFC 1738).

Looking at Rack's history, the original call to Utils.unescape for the filename was in a36ac97 to handle "UAs that don't correctly escape Content-Disposition filenames." Looking at the code and the tests, this was introduced because browsers apparently are inconsistent in their treatment of an uploaded filename with quotes in it. (RFC 1867 says nothing about what to do here.)

I believe a36ac97 ended up with a bad side-effect: it fixed filenames with quotes but broke filenames with percents.

Here's what I found with various browsers (Chrome 16/Mac, FF 9/Mac, Safari 5/Mac, IE8/Win)

  • The only ASCII character any browser escapes in the filename attribute of a file field is double-quote. I didn't scientifically try every single ASCII character, but I tried a bunch that you might think would be escaped: double quote ("), single quote ('), percent (%), ampersand (&), question mark (?), backslash (\).
  • The way double-quoted is escaped varies by browser:
    • Chrome and Safari: %22
    • Firefox: \"
    • IE: N/A (double-quote is not a valid filename character in NTFS, which is the filesystem used by almost all Windows installations).

Based on this research, here's my proposal for how Rack should handle filenames in a form post:

  • Change %22 into "
  • Change \" into "
  • Do not attempt any other conversions.

This satisfies the intent of the original commit (different UAs do different things with quotes) without having other side-effects.

@raggi
Official Rack repositories member

Thanks for being so thorough!

Out of interest, webkit browsers don't escape % any way do they?

@BMorearty

No problemo. :-)

No, no browser that I tried escapes % in any way.

But yeah, I know why you're asking. It's true that the server can't tell the difference between double quote and %22.

@BMorearty

Updated test case that shows a little more information:

require 'rubygems'
require 'rack'
require 'rack/request'

class App
  def call(env)
    request = Rack::Request.new(env)
    params = request.params  # <== THIS BLOWS UP with percents in the filename

    html = <<-HTML
      <html>
        <head>
          <style>
            html { background-color: lightblue; }
            body { margin: 3em; padding: 3em; font-family: Verdana; background-color: white; border: 1px solid gray; }
            strong { background-color: yellow; }
            .error { color: red; }
          </style>
        </head>
        <body>
          <form accept-charset="UTF-8" action="/" method="POST" enctype="multipart/form-data">
            <p>
              Upload a file <strong>whose filename has a %</strong> but does not follow the URL-encoding pattern.
            </p>
            <p>
              The call to <tt>request.params</tt> fails with "invalid %-encoding" in IE8, IE9, FF9/Mac, Chrome 16/Mac,
              and Safari 5/Mac because the browsers are not URL-encoding the filename but <tt>Rack::Request</tt>
              expects it to be encoded.
            </p>
            <p>
              In development a stack trace is shown.
              When rack is running in production mode (<tt>rackup -E production config.ru</tt>),
              Safari 5 goes into an infinite loop of requests.  The other browsers show error pages.
            </p>
            <input name="upload" type="file" />
            <input type="submit" value="Submit and Explode"/>
          </form>
          <div>
            #{ "The last filename was #{params['upload'][:filename]}" if params['upload'] }
          </div>
          <div class='error'>
            #{ "There was no % in that filename (#{params['upload'][:filename]})" if params['upload'] && !(params['upload'][:filename] =~ /%/) }
          </div>
          <div>
            #{request.body.read}
          </div>
        </body>
      </html>
    HTML

    [200, {"Content-Type" => "text/html"}, [html]]
  end
end

run App.new
@raggi raggi added a commit that referenced this issue Jan 23, 2012
@raggi raggi Multipart percentage fail, round 3, the final character. Fixes string…
…s terminated with %. See #323. Revisit for 1.5.
7d3c3fd
@raggi
Official Rack repositories member

This is awesome info. I'm going to reopen this and revisit for a big cleanup in a future release.

@raggi raggi reopened this Jan 23, 2012
@BMorearty

Thanks for getting on it so quick, dude.

@raggi
Official Rack repositories member

It was mostly good timing. I'll back port stuff to 1.3.x when I do that release, maybe next weekend.

@mitijain123

I made the pull request

chneukirchen#24

please check this if it helps

@comron

@raggi @BMorearty Has the 1.4.1 release of rack solved this for you?

I think its still a little overzealous in decoding - it tries to decode any %[A-Fa-f0-0]{2}, instead of sticking to %22 and \". I know the case is silly but a file name like Foo%A0bar.jpg still causes problems because the filename is changed by rack to have some funky character in it.

@raggi
Official Rack repositories member

Gosh, the way this stuff gets encoded by these browsers is just silly. Issues still open, will consider exactly when / where is the appropriate time and place to address this.

@octavpo

This issue is still not completely fixed. I found that if you have a + in a filename, it will be changed into a space. Like for instance 5x+2eq2x8 is changed into 5x 2eq2x8. However if I add a % to the filename, like 5x+2eq2x%8, then it goes through fine.

@bf4

This appears to still be broken. Test case:

# config.ru
run ->(env) {

  [ 200, { 'Content-Type' => 'text/html', }, [Rack::Request.new(env).params.inspect] ]
}

rackup

http://localhost:9292/?utf8=%E2%9C%93&search=&page=2&commit=%u00ABscript%u00BBalert(209);%u00AB/script%u00BB

ArgumentError at /
invalid %-encoding (%u00ABscript%u00BBalert(209))

Ruby    $HOME/.rvm/rubies/ruby-1.9.3-p484/lib/ruby/1.9.1/uri/common.rb: in decode_www_form_component, line 898
Web GET localhost/
@hlidotbe

This is still broken when there is a + in the filename and no percent.

a+b.txt gets unescaped to a b.txt because Enumerable#all? returns true when the set is empty (meaning no %.. present in the string).

@ab ab referenced this issue in rest-client/rest-client Nov 25, 2015
Closed

Filenames in multipart requests should be encoded. #403

@ab
ab commented Nov 25, 2015

Same comment as @hlidotbe: "+" is incorrectly decoded to " ".

Using Utils.unescape doesn't seem correct for percent decoding even if you think it's a good idea to percent decode filenames (despite no major browser percent encoding anything except ").

https://github.com/rack/rack/blob/d938cb58/lib/rack/multipart/parser.rb#L314
https://github.com/rack/rack/blob/d938cb58/lib/rack/utils.rb#L51

Utils.unescape uses URI.decode_www_form_component, which turns + into because it's expecting CGI style forms. Using URI.unescape instead would not have this problem.

@dentarg

Could this (and #755) be solved by using Utils.unescape_path as in 568cf72?

@mhuggins

Hard to believe this is still an open issue from 4 years ago. This is causing us upwards of hundreds of errors per day in production.

invalid byte sequence in UTF-8

Is there really no consensus on what the underlying issue is or what must be done to fix it? Or is there at least a workaround in the meantime?

@raggi
Official Rack repositories member

@mhuggins the problem is that all the browsers do different things. I started a test repo to gather the data, as there's no standard. It needs picking up and completing, then someone needs to align the multipart parser against it: https://github.com/rack/multifail

@raggi
Official Rack repositories member

@dentarg that would cause ArgumentError for the original given example

@raggi
Official Rack repositories member

FTR, https://tools.ietf.org/html/rfc7578 saying "MAY be percent encoded" is a fucking joke, because it means that the decoding of filenames is entirely ambiguous. That said, it is possible with the relatively clear definition in RFC7578 now to create a percent-encoding parser that will work for this task. There is nothing in MRI standard lib that conforms to this today.

@raggi
Official Rack repositories member

There's also additional complexities introduced in the RFC, see Appendix A:

 The handling of non-ASCII field names has changed -- the method
described in RFC 2047 is no longer recommended; instead, it is
suggested that senders send UTF-8 field names directly and that file
names be sent directly in the form-charset.
@mhuggins

Thanks for the extra insight @raggi.

@chneukirchen
Official Rack repositories member

FTR, on POSIX systems filenames are anything not containing slashes or nul-bytes, so they should be treated as binary data in general.

@evinism
evinism commented Jul 12, 2016 edited

@raggi I might be completely wrong about this, but could "may be percent encoded" be referring to the case when the filename is a token, as opposed to quoted-string, as specified in the grammar in 6266? In the case that it's a quoted string, the "correct" way is to escape it with \", and that any non-control octets are allowed.

Under this interpretation, tokens should be fully percent-decoded, whereas quoted-strings should only be \" decoded, with %22 probably slapped in there for compatibility's sake.

Personally it makes sense to me (and selfishly it would solve the issues I'm having right now) but I'm not sure it's a valid interpretation.

token and quote definitions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment