-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Content-Disposition parameter parser #2077
Add Content-Disposition parameter parser #2077
Conversation
The ReDoS fix in ee25ab9 breaks valid requests, because colons are valid inside parameter values. You cannot use a regexp scan and ensure correct behavior, since values inside parameters can be escaped. Issues like this are the reason for the famous "now they have two problems" quote regarding regexps. Add a basic parser for parameters in Content-Disposition. This parser is based purely on String#{index,slice!,[],==}, usually with string arguments for #index (though one case uses a simple regexp). There are two loops (one nested in the other), but the use of slice! ensures that forward progress is always made on each loop iteration. In addition to fixing the bug introduced by the security fix, this removes multiple separate passes over the mime head, one pass to get the parameter name for Content-Disposition, and a separate pass to get the filename. It removes the get_filename method, though some of the code is kept in a smaller normalize_filename method. This removes 18 separate regexp contents that were previously used just for the separate parse to find the filename for the content disposition. Fixes rack#2076
+1000 for changing this to a non regexp based parser. Maybe we could allow people to opt out of the ReDoS fix on the maintenance branches? I'm hesitant to backport this, but you're right that we need to fix it. |
I basically agree with your position, high risk, high reward, nice to clean up. I personally would like to see us adopt formally verified parsers e.g. using Ragel. We'd probably have to agree to have a separate gem as ragel (and ragel parsers) are quite a chunky dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. I left a comment wrt filename encoding, but this patch seems to have the same behavior as before so no need to address the comment here.
@ioquatix are you OK with merging this? In terms of backporting, like @tenderlove, I'm hesitant as this seems risky due to the scope of the change. However, I'm not sure if there is a fix for the regexp approach that avoids ReDoS while keeping backwards compatibility. An opt-out flag for the ReDoS fix on maintenance branches is definitely possible, but it sucks to have to choose between compatibility and security. One possible option is backporting this, but making it opt-in. So by default, you get the regexp approach with the ReDoS fix in 2.2/3.0, but you can switch to this approach via a flag. |
I haven't spent a lot of time thinking about this PR so here are some quick thoughts.
|
This is acceptable to me.
Lets discuss this some other time. I'm not sold on using any parser generators. IME hand rolled recursive descent turn out to be faster (certainly we can JIT them better) and have better error handling support.
This maybe makes sense (I'm on the fence), but again I think it's outside the scope of this PR. |
The ReDoS fix in ee25ab9 breaks valid requests, because colons are valid inside parameter values. You cannot use a regexp scan and ensure correct behavior, since values inside parameters can be escaped. Issues like this are the reason for the famous "now they have two problems" quote regarding regexps.
Add a basic parser for parameters in Content-Disposition. This parser is based purely on String#{index,slice!,[],==}, usually with string arguments for #index (though one case uses a simple regexp). There are two loops (one nested in the other), but the use of slice! ensures that forward progress is always made on each loop iteration.
In addition to fixing the bug introduced by the security fix, this removes multiple separate passes over the mime head, one pass to get the parameter name for Content-Disposition, and a separate pass to get the filename. It removes the get_filename method, though some of the code is kept in a smaller normalize_filename method.
This removes 18 separate regexp contents that were previously used just for the separate parse to find the filename for the content disposition.
This is obviously a major change and definitely needs thorough review and ideally testing with production traffic to ensure it handles cases as expected and doesn't break things.
Currently, the parser isn't as strict as it could be. We could have it:
Currently, for
filename
parameters, the handling of escaped characters inside quoted parameter values is to include the backslash in the value, as that is necessary to keep tests passing for old IE which submits full paths infilename
. We may want to consider dropping that support, and always removing escape characters.Certainly open to changes regarding how the parser works, but I am convinced we should move to a parsing approach and not keep using a regexp approach for this.
This seems risky to backport to 2.2 and 3.0, but as the security fix was backported, I think we'll have to unless a fix for the regexp approach can be developed.
Fixes #2076