Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide non-splitting parseHeaderField in httpcore #15478

Closed
wants to merge 3 commits into from

Conversation

supakeen
Copy link
Contributor

@supakeen supakeen commented Oct 3, 2020

Several issues are related to header splitting, one of them being #11757. This PR provides a noSplit option to httpcore's parseHeader to allow for several other issues to be worked on.

Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the way to go, we just need to follow the RFC here and parse things properly. Do other languages provide this option as well? if so you might be able to convince me, otherwise I don't think we should add this.

@supakeen
Copy link
Contributor Author

supakeen commented Oct 3, 2020

While I agree that following specification is always better the goal of parseHeader is to parse a single header line, splitting a header on , or combining multiple same-name header fields is up to a function higher in the tree that has the context of all headers. Both splitting and not splitting are used for different headers.

Currently people are running into issues with this around headers such as the User-Agent header in asynchttpserver which is being split on commas. It would be up to asynchttpserver to tell parseHeader to not split a value (or perhaps a step in between) and I'd say this is preferable to joining instead.

Since we'll be discussing spec/not-spec, I'll quote the relevant section here for ease-of-discussion:

4.2 Message Headers

HTTP header fields, which include general-header (section 4.5),
request-header (section 5.3), response-header (section 6.2), and
entity-header (section 7.1) fields, follow the same generic format as
that given in Section 3.1 of RFC 822 [9]. Each header field consists
of a name followed by a colon (":") and the field value. Field names
are case-insensitive. The field value MAY be preceded by any amount
of LWS, though a single SP is preferred. Header fields can be
extended over multiple lines by preceding each extra line with at
least one SP or HT. Applications ought to follow "common form", where
one is known or indicated, when generating HTTP constructs, since
there might exist some implementations that fail to accept anything
beyond the common forms.

   message-header = field-name ":" [ field-value ]
   field-name     = token
   field-value    = *( field-content | LWS )
   field-content  = <the OCTETs making up the field-value
                    and consisting of either *TEXT or combinations
                    of token, separators, and quoted-string>

The field-content does not include any leading or trailing LWS:
linear white space occurring before the first non-whitespace
character of the field-value or after the last non-whitespace
character of the field-value. Such leading or trailing LWS MAY be
removed without changing the semantics of the field value. Any LWS
that occurs between field-content MAY be replaced with a single SP
before interpreting the field value or forwarding the message
downstream.

The order in which header fields with differing field names are
received is not significant. However, it is "good practice" to send
general-header fields first, followed by request-header or response-
header fields, and ending with the entity-header fields.

Multiple message-header fields with the same field-name MAY be
present in a message if and only if the entire field-value for that
header field is defined as a comma-separated list [i.e., #(values)].
It MUST be possible to combine the multiple header fields into one
"field-name: field-value" pair, without changing the semantics of the
message, by appending each subsequent field-value to the first, each
separated by a comma. The order in which header fields with the same
field-name are received is therefore significant to the
interpretation of the combined field value, and thus a proxy MUST NOT
change the order of these field values when a message is forwarded.

@dom96
Copy link
Contributor

dom96 commented Oct 4, 2020

hmmm, looking at the spec it appears that this function should never split on a comma and indeed it should be up to the caller to decide what to do with the field values.

My proposal is to do the following:

  • Deprecate parseHeader
  • Implement a non-splitting variant called parseHeaderFieldValue (we can argue about the name)

The reason is that we should be discouraging the use of the splitting variant.

@supakeen
Copy link
Contributor Author

supakeen commented Oct 5, 2020

Sure we can argue about name (I'd go for parseHeaderField). Would you like me to update this PR to do the above or do you prefer to also have other changes in this PR?

@dom96
Copy link
Contributor

dom96 commented Oct 5, 2020

Would you like me to update this PR to do the above

This

@supakeen
Copy link
Contributor Author

supakeen commented Oct 5, 2020

Perfect, I'll get to it somewhere this week then 👍

@supakeen
Copy link
Contributor Author

@dom96 I've updated this to create a non-splitting version of parseHeader called parseHeaderField. How do I deprecate the old one and is this what you had in mind?

@supakeen supakeen changed the title Support noSplit for parseHeader in httpcore. Provide non-splitting parseHeaderField in httpcore Oct 17, 2020
@supakeen
Copy link
Contributor Author

supakeen commented Nov 1, 2020

I've deprecated the old parseHeader in favor of parseHeaderField. It seems the only call site in lib/ is in asynchttpserver.nim. This does change the behaviour of asynchttpserver so it still needs a test.

Shall I special case the cookie header in asynchttpserver and write a test for it @dom96 ?

Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, more tests are always nice.

Comment on lines +243 to +256
func parseHeaderField*(line: string): tuple[key: string, value: seq[string]] =
## Parses a single raw header HTTP line into key value pairs.
result.value = @[]
var i = 0
i = line.parseUntil(result.key, ':')
inc(i) # skip :
if i < len(line):
i += line.skipWhitespace(i)
result.value.add line.substr(i)
elif result.key.len > 0:
result.value = @[""]
else:
result.value = @[]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something? This won't ever return more than one value.

@stale
Copy link

stale bot commented Nov 3, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Nov 3, 2021
@stale stale bot closed this Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants