-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove "bad" whitespace ! #14
Conversation
3.1.1.1. Media Type HTTP uses Internet media types [RFC2046] in the Content-Type (Section 3.1.1.5) and Accept (Section 5.3.2) header fields in order to provide open and extensible data typing and type negotiation. Media types define both a data format and various processing models: how to process that data in accordance with each context in which it is received. media-type = type "/" subtype *( OWS ";" OWS parameter ) type = token subtype = token The type/subtype MAY be followed by parameters in the form of name=value pairs. parameter = token "=" ( token / quoted-string ) Fielding & Reschke Standards Track [Page 8] RFC 7231 HTTP/1.1 Semantics and Content June 2014 The type, subtype, and parameter name tokens are case-insensitive. Parameter values might or might not be case-sensitive, depending on the semantics of the parameter name. The presence or absence of a parameter might be significant to the processing of a media-type, depending on its definition within the media type registry. A parameter value that matches the token production can be transmitted either as a token or within a quoted-string. The quoted and unquoted values are equivalent. For example, the following examples are all equivalent, but the first is preferred for consistency: text/html;charset=utf-8 text/html;charset=UTF-8 Text/HTML;Charset="utf-8" text/html; charset="utf-8" Internet media types ought to be registered with IANA according to the procedures defined in [BCP13]. Note: Unlike some similar constructs in other header fields, media type parameters do not allow whitespace (even "bad" whitespace) around the "=" character.
The spec allows that whitespace character to exist, there is nothing bad about it. Your last note is specifically about whitespace around the "=" character, but your PR is not modifying any whitespace next to a "=" character, rather you are removing the whitespace character that follows the |
both are ok. with or without whitespace. the problem is , RFC does not force the whitespace or the case-sensitivity. while parsing the headers , i am sure using a standard removes the headache in many cases. text/html;charset=utf-8 |
Right, and this module does follow the standard. The space you removed is specified in the ABNF you pasted above:
The
No parser following the standard will have any trouble parsing this, as it is in the standard. The |
If you are wondering why this module is emitting a space that is optional, it's actually for improved compatibility. Basically any website you can think of that sends a parameter in it's content-type header will include that optional single space after the ";" character. Here is, for example, the header from the site I linked to the for RFC:
|
i think this is cause of "being used to" tokenizing for spaces in "C" language , strtok . In this case , some old browsers used strtok for parsing in headers. This seems not important, but when you are dealing with slow network , low power devices , standards must be strict, else you have to handle many things in such a small cpu/memory . In this case , you have to make your own standards , "data should not come in that format" . The standards must be strict for modular development to be possible. Otherwise is loss of time , power and resources. |
No disagreement there at a high level, but the standards are what they are and this module is following them as laid out. It sounds like you may need to redirect your efforts to the actual standards bodies to alter the standards. For example, even as simple as getting the standard to add strong wording for particular serialization could help in that regard. But I am not a member of any standards body, so talk about chaining them is not going to go anywhere on this forum, at least. RFC 7231 (https://tools.ietf.org/html/rfc7231) is the most up-to-date standard regarding the |
3.1.1.1. Media Type
HTTP uses Internet media types [RFC2046] in the Content-Type
(Section 3.1.1.5) and Accept (Section 5.3.2) header fields in order
to provide open and extensible data typing and type negotiation.
Media types define both a data format and various processing models:
how to process that data in accordance with each context in which it
is received.
The type/subtype MAY be followed by parameters in the form of
name=value pairs.
Fielding & Reschke Standards Track [Page 8]
RFC 7231 HTTP/1.1 Semantics and Content June 2014
The type, subtype, and parameter name tokens are case-insensitive.
Parameter values might or might not be case-sensitive, depending on
the semantics of the parameter name. The presence or absence of a
parameter might be significant to the processing of a media-type,
depending on its definition within the media type registry.
A parameter value that matches the token production can be
transmitted either as a token or within a quoted-string. The quoted
and unquoted values are equivalent. For example, the following
examples are all equivalent, but the first is preferred for
consistency:
Internet media types ought to be registered with IANA according to
the procedures defined in [BCP13].