-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Closed
Labels
Description
The function parse_header_links does not parse link headers correctly if the attribute value for a link has commas or semi-colons in them.
https://github.com/kennethreitz/requests/blob/master/requests/utils.py#L562
The Memento protocol (RFC 7089) allows HTTP datetime in the link attributes and the link header parser cannot parse this reliably.
For example:
>>> r = requests.get('http://mementoweb.org/timegate/http://www.google.com', allow_redirects=False)
>>> r.headers['link']
"""<http://www.google.com>;rel="original" ,
<http://mementoweb.org/timemap/link/1/http://www.google.com>;rel="timemap"; type="application/link-format",
<http://mementoweb.org/timegate/http://www.google.com>;rel="timegate",
<http://web.archive.org/web/20140911065756/http://www.google.com/>;rel="memento"; datetime="Thu, 11 Sep 2014 06:57:56 GMT",
<http://web.archive.org/web/20131004112325/http://www.google.com/>;rel="memento first"; datetime="Sat, 04 Oct 1997 22:20:27 GMT",
<http://web.archive.org/web/20121011204401/https://www.google.com/>;rel="memento last"; datetime="Sat, 11 Oct 2014 10:05:51 GMT""""
>>> r.links
{'04 Oct 1997 22:20:27 GMT': {'url': '04 Oct 1997 22:20:27 GMT'},
'11 Oct 2014 10:05:51 GMT': {'url': '11 Oct 2014 10:05:51 GMT'},
'11 Sep 2014 06:57:56 GMT': {'url': '11 Sep 2014 06:57:56 GMT'},
'memento': {'datetime': 'Thu',
'rel': 'memento',
'url': 'http://web.archive.org/web/20140911065756/http://www.google.com/'},
'memento first': {'datetime': 'Sat',
'rel': 'memento first',
'url': 'http://web.archive.org/web/20131004112325/http://www.google.com/'},
'memento last': {'datetime': 'Sat',
'rel': 'memento last',
'url': 'http://web.archive.org/web/20121011204401/https://www.google.com/'},
'original': {'rel': 'original', 'url': 'http://www.google.com'},
'timegate': {'rel': 'timegate',
'url': 'http://mementoweb.org/timegate/http://www.google.com'},
'timemap': {'rel': 'timemap',
'type': 'application/link-format',
'url': 'http://mementoweb.org/timemap/link/1/http://www.google.com'}}There are third party link header parsers in the URLs below, but it would be very convenient if the requests' parse_header_links function could parse correctly.
https://bitbucket.org/azaroth42/linkheaderparser/src/c2321bf3349b94a12a37ed8c41d4e4785006ada7/parse_link.py?at=default
https://gist.github.com/mnot/210535
Reactions are currently unavailable