Add Content-Disposition parameter parser #2077

jeremyevans · 2023-04-28T20:43:05Z

The ReDoS fix in ee25ab9 breaks valid requests, because colons are valid inside parameter values. You cannot use a regexp scan and ensure correct behavior, since values inside parameters can be escaped. Issues like this are the reason for the famous "now they have two problems" quote regarding regexps.

Add a basic parser for parameters in Content-Disposition. This parser is based purely on String#{index,slice!,[],==}, usually with string arguments for #index (though one case uses a simple regexp). There are two loops (one nested in the other), but the use of slice! ensures that forward progress is always made on each loop iteration.

In addition to fixing the bug introduced by the security fix, this removes multiple separate passes over the mime head, one pass to get the parameter name for Content-Disposition, and a separate pass to get the filename. It removes the get_filename method, though some of the code is kept in a smaller normalize_filename method.

This removes 18 separate regexp contents that were previously used just for the separate parse to find the filename for the content disposition.

This is obviously a major change and definitely needs thorough review and ideally testing with production traffic to ensure it handles cases as expected and doesn't break things.

Currently, the parser isn't as strict as it could be. We could have it:

fail for empty parameter names (instead of skipping them)
fail for quoted parameter values not ending with a semicolon (currently semicolon after end of quoted parameter value is optional)
fail for duplicate parameter names (currently, last one wins)

Currently, for filename parameters, the handling of escaped characters inside quoted parameter values is to include the backslash in the value, as that is necessary to keep tests passing for old IE which submits full paths in filename. We may want to consider dropping that support, and always removing escape characters.

Certainly open to changes regarding how the parser works, but I am convinced we should move to a parsing approach and not keep using a regexp approach for this.

This seems risky to backport to 2.2 and 3.0, but as the security fix was backported, I think we'll have to unless a fix for the regexp approach can be developed.

Fixes #2076

The ReDoS fix in ee25ab9 breaks valid requests, because colons are valid inside parameter values. You cannot use a regexp scan and ensure correct behavior, since values inside parameters can be escaped. Issues like this are the reason for the famous "now they have two problems" quote regarding regexps. Add a basic parser for parameters in Content-Disposition. This parser is based purely on String#{index,slice!,[],==}, usually with string arguments for #index (though one case uses a simple regexp). There are two loops (one nested in the other), but the use of slice! ensures that forward progress is always made on each loop iteration. In addition to fixing the bug introduced by the security fix, this removes multiple separate passes over the mime head, one pass to get the parameter name for Content-Disposition, and a separate pass to get the filename. It removes the get_filename method, though some of the code is kept in a smaller normalize_filename method. This removes 18 separate regexp contents that were previously used just for the separate parse to find the filename for the content disposition. Fixes rack#2076

tenderlove · 2023-04-28T21:05:54Z

+1000 for changing this to a non regexp based parser. Maybe we could allow people to opt out of the ReDoS fix on the maintenance branches? I'm hesitant to backport this, but you're right that we need to fix it.

ioquatix · 2023-04-28T21:06:09Z

I basically agree with your position, high risk, high reward, nice to clean up.

I personally would like to see us adopt formally verified parsers e.g. using Ragel.

We'd probably have to agree to have a separate gem as ragel (and ragel parsers) are quite a chunky dependency.

tenderlove

It looks good to me. I left a comment wrt filename encoding, but this patch seems to have the same behavior as before so no need to address the comment here.

lib/rack/multipart/parser.rb

jeremyevans · 2023-04-28T21:28:25Z

@ioquatix are you OK with merging this?

In terms of backporting, like @tenderlove, I'm hesitant as this seems risky due to the scope of the change. However, I'm not sure if there is a fix for the regexp approach that avoids ReDoS while keeping backwards compatibility. An opt-out flag for the ReDoS fix on maintenance branches is definitely possible, but it sucks to have to choose between compatibility and security. One possible option is backporting this, but making it opt-in. So by default, you get the regexp approach with the ReDoS fix in 2.2/3.0, but you can switch to this approach via a flag.

ioquatix · 2023-04-28T22:05:53Z

I haven't spent a lot of time thinking about this PR so here are some quick thoughts.

I'm basically fine to merge it.
I think we should adopt a more robust set of parsers in the future using formal verification.
I think we should consider moving this kind of behaviour into a separate gem so it can be versioned independently of other functionality.

tenderlove · 2023-04-28T22:17:30Z

So by default, you get the regexp approach with the ReDoS fix in 2.2/3.0, but you can switch to this approach via a flag.

This is acceptable to me.

I think we should adopt a more robust set of parsers in the future using formal verification.

Lets discuss this some other time. I'm not sold on using any parser generators. IME hand rolled recursive descent turn out to be faster (certainly we can JIT them better) and have better error handling support.

I think we should consider moving this kind of behaviour into a separate gem so it can be versioned independently of other functionality.

This maybe makes sense (I'm on the fence), but again I think it's outside the scope of this PR.

jeremyevans requested review from tenderlove and ioquatix April 28, 2023 20:43

tenderlove approved these changes Apr 28, 2023

View reviewed changes

lib/rack/multipart/parser.rb Show resolved Hide resolved

ioquatix approved these changes Apr 28, 2023

View reviewed changes

jeremyevans merged commit 51b0c26 into rack:main Apr 28, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Content-Disposition parameter parser #2077

Add Content-Disposition parameter parser #2077

jeremyevans commented Apr 28, 2023

tenderlove commented Apr 28, 2023

ioquatix commented Apr 28, 2023 •

edited

tenderlove left a comment

jeremyevans commented Apr 28, 2023

ioquatix commented Apr 28, 2023

tenderlove commented Apr 28, 2023

Add Content-Disposition parameter parser #2077

Add Content-Disposition parameter parser #2077

Conversation

jeremyevans commented Apr 28, 2023

tenderlove commented Apr 28, 2023

ioquatix commented Apr 28, 2023 • edited

tenderlove left a comment

Choose a reason for hiding this comment

jeremyevans commented Apr 28, 2023

ioquatix commented Apr 28, 2023

tenderlove commented Apr 28, 2023

ioquatix commented Apr 28, 2023 •

edited