-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If the "Accept" header contains a media-type with a quoted parameter including a space, the resulting MIMEAccept instance is incorrect #1623
Comments
For reference, this is the parser I currently use in a different project (it does not yet handle escaped double-quotes though as I'd need a lookbehind):
|
If there are no objections to this code I will start a PR with this implementation. Note that I have not done any performance tests in comparison with the existing regex implementation! This is the main point where I am reluctant to open a PR prematurely. |
Thanks, but if possible I'd rather fix the regex. |
ok. I'll give it a shot |
Simply removing the white-space exclusion from one of the patterns was enough. This at least covers the existing unit-tests. I added a quoted value with whitespace and it passes. I can monkey-patch this at work and see if it makes our test-suite pass for the project I am currently working on. |
Okay, the tests don't run yet. There's also a media-type parameter which contains a comma in its value. I'll keep working on it. |
I'm having trouble of getting this regex to work. My biggest two problems are that on the one hand I need to know when I am "inside" double quotes while also splitting the list of media-types by comma. I have for now resorted to a "manual" parsing which is a bit different to the one I wrote above (that one was an old experiment). I'm afraid my regex-foo is not strong enough to implement this as regex. I am using my parser as workaround for now which I use in stead of If anyone runs into this same problem, my parser is located in a gist Note that this has some edge-cases still, especially I did not properly test extra junk white-space. Also some operations might be superfluous. But for now that gist works for me and covers all our internal test-cases. Also the functions |
While refactoring |
Looks like the reason for the weird parser was because previous specs allowed for things like |
According to RFC 2045 Section 5.1, mime-type parameters can be quoted. As additional reference, the
quoted-string
is defined in RFC 7230 Section 3.2.6 (which is the same as in RFC 2616 Section 2.2)This means that the following media-type is a valid media-type string:
Currently,
werkzeug
sees this asfictitious/media-type; myparam="some
, which includes the first double-quote, but drops the rest. Additionally, the parsedMIMEAccept
instance also contains an entry for('value"', 1)
. In other words, the above media-type results in the following two entries in theMIMEAccept
instance:('fictitious/media-type; myparam="some', 1)
('value"', 1)
Appending
q=0.5
results in this:('fictitious/media-type; myparam="some', 1)
('value"', 0.5)
I would expect it to contain just one entry as
('fictitious/media-type; myparam="some value"', 1)
We currently get one such media-type in an accept header and cannot use the normal werkzeug "accept" mechanism which makes this cumbersome. I just discovered this so I still need to write a workaround. I am considering appending the solution as a PR to this ticket. I already have a compliant parser lying around, which is based on a very simple state-based parser.
The text was updated successfully, but these errors were encountered: