Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove "bad" whitespace ! #14

Closed
wants to merge 1 commit into from
Closed

Conversation

murataka
Copy link

3.1.1.1. Media Type

HTTP uses Internet media types [RFC2046] in the Content-Type
(Section 3.1.1.5) and Accept (Section 5.3.2) header fields in order
to provide open and extensible data typing and type negotiation.
Media types define both a data format and various processing models:
how to process that data in accordance with each context in which it
is received.

 media-type = type "/" subtype *( OWS ";" OWS parameter )
 type       = token
 subtype    = token

The type/subtype MAY be followed by parameters in the form of
name=value pairs.

 parameter      = token "=" ( token / quoted-string )

Fielding & Reschke Standards Track [Page 8]

RFC 7231 HTTP/1.1 Semantics and Content June 2014

The type, subtype, and parameter name tokens are case-insensitive.
Parameter values might or might not be case-sensitive, depending on
the semantics of the parameter name. The presence or absence of a
parameter might be significant to the processing of a media-type,
depending on its definition within the media type registry.

A parameter value that matches the token production can be
transmitted either as a token or within a quoted-string. The quoted
and unquoted values are equivalent. For example, the following
examples are all equivalent, but the first is preferred for
consistency:

 text/html;charset=utf-8
 text/html;charset=UTF-8
 Text/HTML;Charset="utf-8"
 text/html; charset="utf-8"

Internet media types ought to be registered with IANA according to
the procedures defined in [BCP13].

  Note: Unlike some similar constructs in other header fields, media
  type parameters do not allow whitespace (even "bad" whitespace)
  around the "=" character.

3.1.1.1.  Media Type

   HTTP uses Internet media types [RFC2046] in the Content-Type
   (Section 3.1.1.5) and Accept (Section 5.3.2) header fields in order
   to provide open and extensible data typing and type negotiation.
   Media types define both a data format and various processing models:
   how to process that data in accordance with each context in which it
   is received.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   name=value pairs.

     parameter      = token "=" ( token / quoted-string )







Fielding & Reschke           Standards Track                    [Page 8]
 
RFC 7231             HTTP/1.1 Semantics and Content            June 2014


   The type, subtype, and parameter name tokens are case-insensitive.
   Parameter values might or might not be case-sensitive, depending on
   the semantics of the parameter name.  The presence or absence of a
   parameter might be significant to the processing of a media-type,
   depending on its definition within the media type registry.

   A parameter value that matches the token production can be
   transmitted either as a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.  For example, the following
   examples are all equivalent, but the first is preferred for
   consistency:

     text/html;charset=utf-8
     text/html;charset=UTF-8
     Text/HTML;Charset="utf-8"
     text/html; charset="utf-8"

   Internet media types ought to be registered with IANA according to
   the procedures defined in [BCP13].

      Note: Unlike some similar constructs in other header fields, media
      type parameters do not allow whitespace (even "bad" whitespace)
      around the "=" character.
@dougwilson
Copy link
Contributor

The spec allows that whitespace character to exist, there is nothing bad about it. Your last note is specifically about whitespace around the "=" character, but your PR is not modifying any whitespace next to a "=" character, rather you are removing the whitespace character that follows the ; character, which you also quoted that that is valid.

@dougwilson dougwilson closed this Aug 10, 2019
@dougwilson dougwilson self-assigned this Aug 10, 2019
@murataka murataka deleted the patch-1 branch August 10, 2019 19:53
@murataka
Copy link
Author

murataka commented Aug 10, 2019

both are ok. with or without whitespace.

the problem is , RFC does not force the whitespace or the case-sensitivity.

while parsing the headers , i am sure using a standard removes the headache in many cases.

text/html;charset=utf-8
text/html;charset=UTF-8
Text/HTML;Charset="utf-8"
text/html; charset="utf-8"

@dougwilson
Copy link
Contributor

Right, and this module does follow the standard. The space you removed is specified in the ABNF you pasted above:

 media-type = type "/" subtype *( OWS ";" OWS parameter )

The OWS token is right after the literal ";" in the specification. The OWS token is defined as the following (https://tools.ietf.org/html/rfc7230#section-3.2.3):

OWS            = *( SP / HTAB )

No parser following the standard will have any trouble parsing this, as it is in the standard. The SP token is defined as the character 0x20 which is the standard whitespace character used in this moudle.

@dougwilson
Copy link
Contributor

If you are wondering why this module is emitting a space that is optional, it's actually for improved compatibility. Basically any website you can think of that sends a parameter in it's content-type header will include that optional single space after the ";" character. Here is, for example, the header from the site I linked to the for RFC:

$ curl -sI https://tools.ietf.org | grep -i content-type:
Content-Type: text/html; charset=UTF-8

@murataka
Copy link
Author

murataka commented Aug 10, 2019

i think this is cause of "being used to" tokenizing for spaces in "C" language , strtok .

In this case , some old browsers used strtok for parsing in headers.

This seems not important, but when you are dealing with slow network , low power devices , standards must be strict, else you have to handle many things in such a small cpu/memory .

In this case , you have to make your own standards , "data should not come in that format" .

The standards must be strict for modular development to be possible. Otherwise is loss of time , power and resources.

@dougwilson
Copy link
Contributor

No disagreement there at a high level, but the standards are what they are and this module is following them as laid out. It sounds like you may need to redirect your efforts to the actual standards bodies to alter the standards. For example, even as simple as getting the standard to add strong wording for particular serialization could help in that regard. But I am not a member of any standards body, so talk about chaining them is not going to go anywhere on this forum, at least.

RFC 7231 (https://tools.ietf.org/html/rfc7231) is the most up-to-date standard regarding the Content-Type header. They have a forum on GitHub in fact at https://github.com/httpwg/http-core if you are more comfortable using GitHub to make contact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants