If the "Accept" header contains a media-type with a quoted parameter including a space, the resulting MIMEAccept instance is incorrect #1623

exhuma · 2019-08-06T13:40:38Z

According to RFC 2045 Section 5.1, mime-type parameters can be quoted. As additional reference, the quoted-string is defined in RFC 7230 Section 3.2.6 (which is the same as in RFC 2616 Section 2.2)

This means that the following media-type is a valid media-type string:

fictitious/media-type; myparam="some value"

Currently, werkzeug sees this as fictitious/media-type; myparam="some, which includes the first double-quote, but drops the rest. Additionally, the parsed MIMEAccept instance also contains an entry for ('value"', 1). In other words, the above media-type results in the following two entries in the MIMEAccept instance:

('fictitious/media-type; myparam="some', 1)
('value"', 1)

Appending q=0.5 results in this:

('fictitious/media-type; myparam="some', 1)
('value"', 0.5)

I would expect it to contain just one entry as ('fictitious/media-type; myparam="some value"', 1)

We currently get one such media-type in an accept header and cannot use the normal werkzeug "accept" mechanism which makes this cumbersome. I just discovered this so I still need to write a workaround. I am considering appending the solution as a PR to this ticket. I already have a compliant parser lying around, which is based on a very simple state-based parser.

The text was updated successfully, but these errors were encountered:

exhuma · 2019-08-06T13:46:33Z

For reference, this is the parser I currently use in a different project (it does not yet handle escaped double-quotes though as I'd need a lookbehind):

def argparser(data: str) -> Generator[Tuple[str, str], None, None]:
    """
    Given a string like ``foo=bar; bar=baz; bla="blubb"``, produce a stream of
    key/value pairs.

    This parser is used to parse RFC-2045 (Section 5.1) compliant mimetype
    parameters.

    Example::

        >>> args = 'foo=bar; bar=baz; bla="blubb"'
        >>> for k, v in argparser(args):
        ...     print(k, v)
        foo bar
        bar baz
        bla blubb
    """
    state = 'keyname'
    current_key_name = []
    current_value = []  # type: List[str]
    charstream = iter(data)
    for char in charstream:
        if state == 'keyname':
            if char in ' \t\r\n;':
                continue  # ignore whitespace and other junk
            elif char == '=':
                state = 'value'
                continue
            else:
                current_key_name.append(char)
        elif state == "value":
            if char == ';':
                state = 'keyname'
                key = ''.join(current_key_name)
                value = ''.join(current_value)
                del current_value[::]
                del current_key_name[::]
                yield key, value
                continue
            elif char == '"':
                state = 'quoted_value'
                continue
            current_value.append(char)
        elif state == 'quoted_value':
            if char == '"':
                state = 'keyname'
                key = ''.join(current_key_name)
                value = ''.join(current_value)
                del current_value[::]
                del current_key_name[::]
                yield key, value
                continue
            current_value.append(char)
        else:
            raise ValueError('Unexpected parser state!')

    # return the remaining key/value pair
    if current_key_name and current_value:
        key = ''.join(current_key_name)
        value = ''.join(current_value)
        del current_value[::]
        del current_key_name[::]
        yield key, value

exhuma · 2019-08-06T13:49:05Z

If there are no objections to this code I will start a PR with this implementation.

Note that I have not done any performance tests in comparison with the existing regex implementation! This is the main point where I am reluctant to open a PR prematurely.

davidism · 2019-08-06T14:07:08Z

Thanks, but if possible I'd rather fix the regex.

exhuma · 2019-08-06T18:57:09Z

ok. I'll give it a shot

References pallets#1623

exhuma · 2019-08-06T20:04:57Z

Simply removing the white-space exclusion from one of the patterns was enough. This at least covers the existing unit-tests. I added a quoted value with whitespace and it passes.

I can monkey-patch this at work and see if it makes our test-suite pass for the project I am currently working on.

exhuma · 2019-08-07T06:43:16Z

Okay, the tests don't run yet. There's also a media-type parameter which contains a comma in its value. I'll keep working on it.

exhuma · 2019-08-08T11:47:09Z

I'm having trouble of getting this regex to work. My biggest two problems are that on the one hand I need to know when I am "inside" double quotes while also splitting the list of media-types by comma.

I have for now resorted to a "manual" parsing which is a bit different to the one I wrote above (that one was an old experiment).

I'm afraid my regex-foo is not strong enough to implement this as regex. I am using my parser as workaround for now which I use in stead of request.accept_mimetypes.

If anyone runs into this same problem, my parser is located in a gist

Note that this has some edge-cases still, especially I did not properly test extra junk white-space. Also some operations might be superfluous. But for now that gist works for me and covers all our internal test-cases. Also the functions quote_arg_value and unquote_arg_value are not symmetric which is I don't particularly like. But I will leave it like this for now.

davidism · 2023-03-06T18:49:35Z

While refactoring parse_options_header in #2614, I realized that for some reason we're using a different parser for Accept, even though it should use the same one. https://httpwg.org/specs/rfc9110.html#field.accept shows that it uses value; key=value; key=value ..., value2; .... This should use the list parser, then use the options parser on each item.

davidism · 2023-03-06T19:19:36Z

Looks like the reason for the weird parser was because previous specs allowed for things like text/html;charset=utf-8;q=1;extra=some, where charset was part of the media type, but extra was not. RFC 9110 drops support for extra parameters, and assumes q is last.

exhuma added a commit to exhuma/werkzeug that referenced this issue Aug 6, 2019

Allow white-space in mime-type parameters

9d2edc1

References pallets#1623

exhuma added a commit to exhuma/werkzeug that referenced this issue Aug 6, 2019

Merge new unit-test with existing test

7ff3f4f

References pallets#1623

davidism self-assigned this Mar 6, 2023

davidism added this to the 2.3.0 milestone Mar 6, 2023

davidism mentioned this issue Mar 6, 2023

improve parse_accept_header #2615

Merged

davidism closed this as completed in #2615 Mar 6, 2023

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If the "Accept" header contains a media-type with a quoted parameter including a space, the resulting MIMEAccept instance is incorrect #1623

If the "Accept" header contains a media-type with a quoted parameter including a space, the resulting MIMEAccept instance is incorrect #1623

exhuma commented Aug 6, 2019

exhuma commented Aug 6, 2019

exhuma commented Aug 6, 2019

davidism commented Aug 6, 2019

exhuma commented Aug 6, 2019

exhuma commented Aug 6, 2019

exhuma commented Aug 7, 2019

exhuma commented Aug 8, 2019 •

edited

Loading

davidism commented Mar 6, 2023 •

edited

Loading

davidism commented Mar 6, 2023

If the "Accept" header contains a media-type with a quoted parameter including a space, the resulting MIMEAccept instance is incorrect #1623

If the "Accept" header contains a media-type with a quoted parameter including a space, the resulting MIMEAccept instance is incorrect #1623

Comments

exhuma commented Aug 6, 2019

exhuma commented Aug 6, 2019

exhuma commented Aug 6, 2019

davidism commented Aug 6, 2019

exhuma commented Aug 6, 2019

exhuma commented Aug 6, 2019

exhuma commented Aug 7, 2019

exhuma commented Aug 8, 2019 • edited Loading

davidism commented Mar 6, 2023 • edited Loading

davidism commented Mar 6, 2023

exhuma commented Aug 8, 2019 •

edited

Loading

davidism commented Mar 6, 2023 •

edited

Loading