Skip to content
This repository

Multiple Link: headers in response #741

Closed
karlcow opened this Issue July 26, 2012 · 15 comments

5 participants

karl Kenneth Reitz Andrey Petrov Piotr Dobrogost Ian Cordasco
karl

Using 0.13.3 requests

So let's see, we have a URI with multiple links

curl -I http://www.la-grange.net/2012/07/26/csstests/foo
HTTP/1.1 200 OK
Date: Thu, 26 Jul 2012 22:27:21 GMT
Server: Apache
Content-Location: foo.html
Vary: negotiate,Accept-Encoding
TCN: choice
Last-Modified: Thu, 26 Jul 2012 20:55:56 GMT
Accept-Ranges: bytes
Content-Length: 675
Expires: Thu, 02 Aug 2012 22:27:21 GMT
Link: </2012/07/26/csstests/csshttplink.css>;rel=stylesheet
Link: </>;rel=next
Content-Type: text/html; charset=utf-8

Let's use requests to inspect the Link.

>>> import requests
>>> r = requests.get('http://www.la-grange.net/2012/07/26/csstests/foo')
>>> r.headers
{'content-length': '375', 'content-location': 'foo.html', 'content-encoding': 'gzip', 'accept-ranges': 'bytes', 'expires': 'Thu, 02 Aug 2012 22:19:41 GMT', 'vary': 'negotiate,Accept-Encoding', 'server': 'Apache', 'tcn': 'choice', 'last-modified': 'Thu, 26 Jul 2012 20:55:56 GMT', 'link': '</2012/07/26/csstests/csshttplink.css>;rel=stylesheet, </>;rel=next', 'date': 'Thu, 26 Jul 2012 22:19:41 GMT', 'content-type': 'text/html; charset=utf-8'}
>>> r.headers['link']
'</2012/07/26/csstests/csshttplink.css>;rel=stylesheet, </>;rel=next'

Not right :)

I haven't dig yet in the source code where the parsing of link headers was happening. But putting it here before I forget.

Kenneth Reitz
Owner

This is a side effect of how httplib works, unfortunately.

karl

So from utils.py, requests is using urllib2.py to parse the headers: parse_http_list(s) but it doesn't seem it is it.
in response.py ?

Kenneth Reitz
Owner

It happens before any of that, at the actual httplib level, deep within urllib3.

karl

ah ok requests is a bit of syntactic sugar on the broken httplib for this part. :/ hmmm

>>> import httplib
>>> class HTTP11(httplib.HTTP):
...         _http_vsn = 11
...         _http_vsn_str = 'HTTP/1.1'
... 
>>> h = HTTP11('www.la-grange.net')
>>> h.putrequest('GET', 'http://www.la-grange.net/2012/07/26/csstests/foo')
>>> h.endheaders()
>>> status, reason, headers = h.getreply()
>>> h.close()
>>> headers.keys()
['content-length', 'content-location', 'accept-ranges', 'expires', 'vary', 'server', 'tcn', 'last-modified', 'link', 'date', 'content-type']
>>> headers['link']
'</2012/07/26/csstests/csshttplink.css>;rel=stylesheet, </>;rel=next'
>>>

Indeed.

Kenneth Reitz
Owner

Way more than syntactic sugar, but yes, that's where the problem lies.

We have plans to fix this I believe, right @shazow?

Andrey Petrov
Collaborator
shazow commented July 26, 2012

Correct:

shazow/urllib3#3

Most of the planning is done, someone just needs to implement it. :)

karl

Digging into httplib I see the comment about combining headers.

OK now onto the RFC 2616 section 4.2

Multiple message-header fields with the same field-name MAY be
present in a message if and only if the entire field-value for that
header field is defined as a comma-separated list [i.e., #(values)].
It MUST be possible to combine the multiple header fields into one
"field-name: field-value" pair, without changing the semantics of the
message, by appending each subsequent field-value to the first, each
separated by a comma. The order in which header fields with the same
field-name are received is therefore significant to the
interpretation of the combined field value, and thus a proxy MUST NOT
change the order of these field values when a message is forwarded.

Heading to the work of HTTPbis WG to see if it has been changed. Semantics draft document Section 3.2

Multiple header fields with the same field name MUST NOT be sent in a
message unless the entire field value for that header field is
defined as a comma-separated list [i.e., #(values)]. Multiple header
fields with the same field name can be combined into one "field-name:
field-value" pair, without changing the semantics of the message, by
appending each subsequent field value to the combined field value in
order, separated by a comma. The order in which header fields with
the same field name are received is therefore significant to the
interpretation of the combined field value; a proxy MUST NOT change
the order of these field values when forwarding a message.

OK there is nothing to fix then!

And my server implementation is wrong as it should send a list of csv for the header Link:.
Cool!

Now what could be practical for requests is a way to access for headers which have csv list a sub-dict. In the meantime I will iterate on the list.

Closing the comment.

karl karlcow closed this July 26, 2012
Piotr Dobrogost

@karlcow

And my server implementation is wrong as it should send a list of csv for the header Link:.

I think you are wrong here. Firstly, there seems to be no change in rules regarding multiple header fields with the same field name between RFC 2616 and the draft of httpbis you cited. The wording has changed but the sense is the same. Secondly, RFC 5988 section 5 defines the value of Link header as a list of comma separated values (Link = "Link" ":" #link-value) which means, according to the rules you cited, that there CAN be multiple Link headers and that they CAN be merged into one, comma separated list of values.

karl

Multiple header fields with the same field name

header-field   = field-name ":" OWS field-value BWS

So let's say:

Link: </2012/07/26/csstests/csshttplink.css>;rel=stylesheet
Link: </>;rel=next

field-name is Link and Link, aka the same value.

MUST NOT be sent in a message

OK, can't send the previous message. but there is one exception

unless the entire field value for that header field is defined as a comma-separated list [i.e., #(values)].

When the value is of a type a, b, c, d, we can't combine it in one header field because the comma for separation the values of the field, and the comma for separation of the values of the value of the field would not be parseable anymore in the right components.

To be clearer, let's imagine a field name Foo: a, b, c, d defined as it is. The value is a list of 3 or 4 tokens.

Foo: a, b, c, d
Foo: e, f, g

You can NOT combine these two headers because it would not be possible to parse.

Foo: a, b, c, d, e, f, g

And the other hand, Link can be combined in one header.

Link: </2012/07/26/csstests/csshttplink.css>;rel=stylesheet, </>;rel=next
Piotr Dobrogost piotr-dobrogost referenced this issue in shazow/urllib3 January 08, 2013
Open

Use MultiDict for headers #3

karl

Set-Cookie is a specific issue. It is even mentioned

Note: The "Set-Cookie" header field as implemented in practice can occur multiple times, but does not use the list syntax, and thus cannot be combined into a single line ([RFC6265]). (See Appendix A.2.3 of [Kri2001] for details.) Also note that the Set-Cookie2 header field specified in [RFC2965] does not share this problem.

Piotr Dobrogost

Well, it's interesting how seemingly simple things are difficult to specify so that there's no room for different interpretations :)

(...) unless the entire field value for that header field is defined as a comma-separated list [i.e., #(values)]

I think the above statement is not entirely clear by itself. I guess the assumption is, no value (from values) includes unquoted comma. If so then taking your example it would be legal to combine


Foo: a, b, c, d
Foo: e, f, g

into

Foo: a, b, c, d, e, f, g

as long as none of the letters represents value with unquoted comma. The reasoning being, all values can be parsed and all are treated equally as long as order is intact.

And the other hand, Link can be combined in one header.

In two places RFC 5988 mandates quoting values if they contain semicolon or comma, indeed. We would have to follow whole grammar and make sure all possible commas are quoted but I guess this is the case thus multiple Link header fields can be combined into one. However, I wasn't arguing with this (on the contrary, I stated the same myself) but with your statement And my server implementation is wrong as it should send a list of csv for the header Link:. which I think is not true.

Ian Cordasco
Collaborator

Why is this discussion taking place on a closed issue?

Kenneth Reitz
Owner

Sigh.

karl

@sigmavirus24 because it is still not sure sure if it has be reopened or not. Trying to understand the issue first. In the other hand, if there is a better place for it. I will be happy to discuss it there.

Piotr Dobrogost

For those interested we asked for clarification on httbis working group's email list - see thread starting with http://lists.w3.org/Archives/Public/ietf-http-wg/2013JanMar/0016.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.