Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Content-Disposition parameter parser #2077

Merged
merged 1 commit into from
Apr 28, 2023

Conversation

jeremyevans
Copy link
Contributor

The ReDoS fix in ee25ab9 breaks valid requests, because colons are valid inside parameter values. You cannot use a regexp scan and ensure correct behavior, since values inside parameters can be escaped. Issues like this are the reason for the famous "now they have two problems" quote regarding regexps.

Add a basic parser for parameters in Content-Disposition. This parser is based purely on String#{index,slice!,[],==}, usually with string arguments for #index (though one case uses a simple regexp). There are two loops (one nested in the other), but the use of slice! ensures that forward progress is always made on each loop iteration.

In addition to fixing the bug introduced by the security fix, this removes multiple separate passes over the mime head, one pass to get the parameter name for Content-Disposition, and a separate pass to get the filename. It removes the get_filename method, though some of the code is kept in a smaller normalize_filename method.

This removes 18 separate regexp contents that were previously used just for the separate parse to find the filename for the content disposition.

This is obviously a major change and definitely needs thorough review and ideally testing with production traffic to ensure it handles cases as expected and doesn't break things.

Currently, the parser isn't as strict as it could be. We could have it:

  • fail for empty parameter names (instead of skipping them)
  • fail for quoted parameter values not ending with a semicolon (currently semicolon after end of quoted parameter value is optional)
  • fail for duplicate parameter names (currently, last one wins)

Currently, for filename parameters, the handling of escaped characters inside quoted parameter values is to include the backslash in the value, as that is necessary to keep tests passing for old IE which submits full paths in filename. We may want to consider dropping that support, and always removing escape characters.

Certainly open to changes regarding how the parser works, but I am convinced we should move to a parsing approach and not keep using a regexp approach for this.

This seems risky to backport to 2.2 and 3.0, but as the security fix was backported, I think we'll have to unless a fix for the regexp approach can be developed.

Fixes #2076

The ReDoS fix in ee25ab9 breaks valid
requests, because colons are valid inside parameter values.  You cannot
use a regexp scan and ensure correct behavior, since values inside
parameters can be escaped.  Issues like this are the reason for the
famous "now they have two problems" quote regarding regexps.

Add a basic parser for parameters in Content-Disposition.  This parser
is based purely on String#{index,slice!,[],==}, usually with string
arguments for #index (though one case uses a simple regexp).  There
are two loops (one nested in the other), but the use of slice! ensures
that forward progress is always made on each loop iteration.

In addition to fixing the bug introduced by the security fix, this
removes multiple separate passes over the mime head, one pass to get
the parameter name for Content-Disposition, and a separate pass to get
the filename. It removes the get_filename method, though some of the
code is kept in a smaller normalize_filename method.

This removes 18 separate regexp contents that were previously used
just for the separate parse to find the filename for the content
disposition.

Fixes rack#2076
@tenderlove
Copy link
Member

+1000 for changing this to a non regexp based parser. Maybe we could allow people to opt out of the ReDoS fix on the maintenance branches? I'm hesitant to backport this, but you're right that we need to fix it.

@ioquatix
Copy link
Member

ioquatix commented Apr 28, 2023

I basically agree with your position, high risk, high reward, nice to clean up.

I personally would like to see us adopt formally verified parsers e.g. using Ragel.

We'd probably have to agree to have a separate gem as ragel (and ragel parsers) are quite a chunky dependency.

Copy link
Member

@tenderlove tenderlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I left a comment wrt filename encoding, but this patch seems to have the same behavior as before so no need to address the comment here.

lib/rack/multipart/parser.rb Show resolved Hide resolved
@jeremyevans
Copy link
Contributor Author

@ioquatix are you OK with merging this?

In terms of backporting, like @tenderlove, I'm hesitant as this seems risky due to the scope of the change. However, I'm not sure if there is a fix for the regexp approach that avoids ReDoS while keeping backwards compatibility. An opt-out flag for the ReDoS fix on maintenance branches is definitely possible, but it sucks to have to choose between compatibility and security. One possible option is backporting this, but making it opt-in. So by default, you get the regexp approach with the ReDoS fix in 2.2/3.0, but you can switch to this approach via a flag.

@ioquatix
Copy link
Member

I haven't spent a lot of time thinking about this PR so here are some quick thoughts.

  • I'm basically fine to merge it.
  • I think we should adopt a more robust set of parsers in the future using formal verification.
  • I think we should consider moving this kind of behaviour into a separate gem so it can be versioned independently of other functionality.

@jeremyevans jeremyevans merged commit 51b0c26 into rack:main Apr 28, 2023
14 checks passed
@tenderlove
Copy link
Member

So by default, you get the regexp approach with the ReDoS fix in 2.2/3.0, but you can switch to this approach via a flag.

This is acceptable to me.

I think we should adopt a more robust set of parsers in the future using formal verification.

Lets discuss this some other time. I'm not sold on using any parser generators. IME hand rolled recursive descent turn out to be faster (certainly we can JIT them better) and have better error handling support.

I think we should consider moving this kind of behaviour into a separate gem so it can be versioned independently of other functionality.

This maybe makes sense (I'm on the fence), but again I think it's outside the scope of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing uploaded file parameter name from Content-Disposition header fails 2.2.6.3 update
3 participants