Skip to content

Link header parsing breaks if attribute value has commas #2250

@hariharshankar

Description

@hariharshankar

The function parse_header_links does not parse link headers correctly if the attribute value for a link has commas or semi-colons in them.
https://github.com/kennethreitz/requests/blob/master/requests/utils.py#L562

The Memento protocol (RFC 7089) allows HTTP datetime in the link attributes and the link header parser cannot parse this reliably.
For example:

>>> r = requests.get('http://mementoweb.org/timegate/http://www.google.com', allow_redirects=False)
>>> r.headers['link']
"""<http://www.google.com>;rel="original" ,
<http://mementoweb.org/timemap/link/1/http://www.google.com>;rel="timemap"; type="application/link-format",
<http://mementoweb.org/timegate/http://www.google.com>;rel="timegate",
<http://web.archive.org/web/20140911065756/http://www.google.com/>;rel="memento"; datetime="Thu, 11 Sep 2014 06:57:56 GMT",
<http://web.archive.org/web/20131004112325/http://www.google.com/>;rel="memento first"; datetime="Sat, 04 Oct 1997 22:20:27 GMT",
<http://web.archive.org/web/20121011204401/https://www.google.com/>;rel="memento last"; datetime="Sat, 11 Oct 2014 10:05:51 GMT""""
>>> r.links
{'04 Oct 1997 22:20:27 GMT': {'url': '04 Oct 1997 22:20:27 GMT'},
 '11 Oct 2014 10:05:51 GMT': {'url': '11 Oct 2014 10:05:51 GMT'},
 '11 Sep 2014 06:57:56 GMT': {'url': '11 Sep 2014 06:57:56 GMT'},
 'memento': {'datetime': 'Thu',
  'rel': 'memento',
  'url': 'http://web.archive.org/web/20140911065756/http://www.google.com/'},
 'memento first': {'datetime': 'Sat',
  'rel': 'memento first',
  'url': 'http://web.archive.org/web/20131004112325/http://www.google.com/'},
 'memento last': {'datetime': 'Sat',
  'rel': 'memento last',
  'url': 'http://web.archive.org/web/20121011204401/https://www.google.com/'},
 'original': {'rel': 'original', 'url': 'http://www.google.com'},
 'timegate': {'rel': 'timegate',
  'url': 'http://mementoweb.org/timegate/http://www.google.com'},
 'timemap': {'rel': 'timemap',
  'type': 'application/link-format',
  'url': 'http://mementoweb.org/timemap/link/1/http://www.google.com'}}

There are third party link header parsers in the URLs below, but it would be very convenient if the requests' parse_header_links function could parse correctly.

https://bitbucket.org/azaroth42/linkheaderparser/src/c2321bf3349b94a12a37ed8c41d4e4785006ada7/parse_link.py?at=default
https://gist.github.com/mnot/210535

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions