Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept slashes in parameter values #5

Closed
wants to merge 1 commit into from
Closed

Accept slashes in parameter values #5

wants to merge 1 commit into from

Conversation

gwicke
Copy link

@gwicke gwicke commented Nov 11, 2015

While the RFC is excluding slashes from tokens and thus parameter values, some existing content-type values do contain them, and existing content-type parsers are generally lenient when encountering them.

While the RFC is excluding slashes from tokens and thus parameter values, some
existing content-type values do contain them, and existing content-type
parsers are generally lenient when encountering them.
@dougwilson
Copy link
Contributor

It's highly unlikely I'll accept violating the RFC, as that's the whole point of this module: to parse according to the RFC. Can you at least provide very detailed explanation why you want to violate the RFC?

If you are asking to be more lenient and implement a second parser that is, then that can probably happen. Please link to the source code of other such lenient parsers so we can implement their proven parsing algorithm.

@gwicke
Copy link
Author

gwicke commented Nov 11, 2015

We have some stored content that has a content-type header with missing quoting in one of its parameters. Without the patch, content-type simply throws an exception when encountering this header. With the patch, the header is parsed correctly, which lets us use content-type to consistently format those headers before exposing them to the outside world.

Examples for lenient parsing in the wild:

Practical example:

  • Navigate to https://en.wikipedia.org/api/rest_v1/page/html/San_Francisco/632646397 using a web browser (I used Chrome).
  • In the network console, note that the raw returned content-type is text/html;profile=mediawiki.org/specs/html/1.1.0;charset=utf-8 (missing escaping around the profile parameter value).
  • In the JS console, check the parsed content type via
> document.contentType
"text/html"

@dougwilson
Copy link
Contributor

Cool. Would you be willing to update your PR? It needs at least two changes:

  1. There are no tests, so they would need to be added.
  2. We will not change the current behavior, unless you are willing to wait an indeterminate amount of time for a major version bump. Plus, if we change this, we'll first need to implement a way in this module for Express.js and all it's related libraries to validate the Content-Type header, as it does today.

This means that either this change needs to be opt in or something else. Let me know what suggestions you have to implement this change in a backwards-compatible way, including keeping the current feature of this module which allows you to do strict validation. The whole reason this does strict validation and we use this in Express.js is to deter people from putting up web servers that serve invalid Content-Type headers in the first place, like that URL you provided is doing.

If we do want to add a non-strict parsing in here, it should at least be an exact copy of a long-standing algorithm, for example, it should perhaps parse exactly how that Chrome one does (and all an entire suite of tests to go along with this new parser, please).

@dougwilson
Copy link
Contributor

Ideally, after the changes here, I can see you being able to do the "Content-Type invalid -> valid fixing" by doing contentType.format(contentType.parse(str, { loose: true })) or something like that.

@dougwilson
Copy link
Contributor

In fact, thinking about it more, I'm open to perhaps making the non-strict parsing the default, but still, would want both strict and non-strict parsing supported with a switch (and, of course, tests).

@gwicke
Copy link
Author

gwicke commented Nov 11, 2015

@dougwilson: An option sounds good to me, and I also think that implementing the non-strict algorithm as in Chrome & FF makes sense. It will likely also be faster.

I made a note to get back to this, but might not find the time right away.

@dougwilson
Copy link
Contributor

Cool. Yea, this open PR will also put it on my list as well, so it'll likely end up being whichever one of us gets to it first, then :)

@dougwilson
Copy link
Contributor

Also, this could be a temporary work-around to achieve the desired goal, without needing to maintain a custom fork, which should help with keeping timeline pressure at bay:

var contentType = require('content-type')
var rawParameterValueRegExp = /(; *[^=]+=)([^"][^;]*)/g

module.exports = function normalizeContentType(str) {
  try {
    return contentType.format(contentType.parse(str))
  } catch (err) {
    try {
      return contentType.format(contentType.parse(str.replace(rawParameterValueRegExp, '$1"$2"')))
    } catch (e) {
      throw err // original error
    }
  }
}

@dougwilson
Copy link
Contributor

Hi @gwicke I hope it's OK if I close this PR; I know we discussed above what to do, and I feel like we did come up with a good plan: implement the same loose parsing algorithm as Chrome/Firefox/whatever and expose it through a parsing option (perhaps { lax: true }). I don't really want to keep this PR open forever, especially since it's not similar to what we think the ideal solution should be, so doesn't really have value staying open. Please feel free to resubmit a PR with the new implementation at any time :) !

@dougwilson dougwilson closed this Oct 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants