-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept slashes in parameter values #5
Conversation
While the RFC is excluding slashes from tokens and thus parameter values, some existing content-type values do contain them, and existing content-type parsers are generally lenient when encountering them.
It's highly unlikely I'll accept violating the RFC, as that's the whole point of this module: to parse according to the RFC. Can you at least provide very detailed explanation why you want to violate the RFC? If you are asking to be more lenient and implement a second parser that is, then that can probably happen. Please link to the source code of other such lenient parsers so we can implement their proven parsing algorithm. |
We have some stored content that has a content-type header with missing quoting in one of its parameters. Without the patch, Examples for lenient parsing in the wild:
Practical example:
> document.contentType
"text/html" |
Cool. Would you be willing to update your PR? It needs at least two changes:
This means that either this change needs to be opt in or something else. Let me know what suggestions you have to implement this change in a backwards-compatible way, including keeping the current feature of this module which allows you to do strict validation. The whole reason this does strict validation and we use this in Express.js is to deter people from putting up web servers that serve invalid If we do want to add a non-strict parsing in here, it should at least be an exact copy of a long-standing algorithm, for example, it should perhaps parse exactly how that Chrome one does (and all an entire suite of tests to go along with this new parser, please). |
Ideally, after the changes here, I can see you being able to do the "Content-Type invalid -> valid fixing" by doing |
In fact, thinking about it more, I'm open to perhaps making the non-strict parsing the default, but still, would want both strict and non-strict parsing supported with a switch (and, of course, tests). |
@dougwilson: An option sounds good to me, and I also think that implementing the non-strict algorithm as in Chrome & FF makes sense. It will likely also be faster. I made a note to get back to this, but might not find the time right away. |
Cool. Yea, this open PR will also put it on my list as well, so it'll likely end up being whichever one of us gets to it first, then :) |
Also, this could be a temporary work-around to achieve the desired goal, without needing to maintain a custom fork, which should help with keeping timeline pressure at bay: var contentType = require('content-type')
var rawParameterValueRegExp = /(; *[^=]+=)([^"][^;]*)/g
module.exports = function normalizeContentType(str) {
try {
return contentType.format(contentType.parse(str))
} catch (err) {
try {
return contentType.format(contentType.parse(str.replace(rawParameterValueRegExp, '$1"$2"')))
} catch (e) {
throw err // original error
}
}
} |
Hi @gwicke I hope it's OK if I close this PR; I know we discussed above what to do, and I feel like we did come up with a good plan: implement the same loose parsing algorithm as Chrome/Firefox/whatever and expose it through a parsing option (perhaps |
While the RFC is excluding slashes from tokens and thus parameter values, some existing content-type values do contain them, and existing content-type parsers are generally lenient when encountering them.