Writing cookies: Prefix wildcard domains with "." #295

Merged
merged 2 commits into from Feb 20, 2013

3 participants

@mmorearty

There was some discussion related to this issue in issue #293. Making this change will allow for better interop with other libraries that read cookie files (see below). What follows is a whole lot of justification for a very small change :)

If a cookie matches a wildcard domain, e.g. (anything).example.com, then the line written out to a Netscape-formatted cookies.txt file must begin with a dot, e.g. .example.com not example.com. This is redundant with the second field of the line, a TRUE/FALSE value indicating whether the domain is a wildcard, but that's the way it goes.

This is a little tricky, because there is no formal spec for the Netscape cookie file format. (There is, of course, a formal spec the format of the HTTP Set-Cookie header, but that's not the same thing as the cookie file.)

The unnecessary duplication (leading dot in the first field, plus TRUE/FALSE in the second field) seems to lead to all kinds of incompatibilities between different cookie libraries. For comparison:

  • curl/libcurl: When reading a cookie file, they allow, but ignore, the leading dot on the domain; they always respect the TRUE/FALSE flag in the second field. When writing a cookie file, if the second field is TRUE, they always put a leading "." on the domain.
  • Python: Requires that the presence or absence of the leading dot matches the TRUE/FALSE value in the second field; otherwise, rejects the whole cookie file.
  • Perl: The second field (TRUE/FALSE) is ignored. The presence or absence of a leading dot on the first field is used to determine whether

The upshot of all this is that curl will be able to correctly read cookie files that were generated by the current Mechanize, but Python and Perl scripts will not.

All of the above libraries (curl, Python, and Perl), when writing a cookie file, ensure that the presence or absence of a leading "." on the domain matches the TRUE/FALSE in the second field. Seems to me that Mechanize should do the same.

Mechanize's current behavior when reading a cookie file is fine: It allows a leading dot, but that is overridden by the TRUE/FALSE of the second field. That matches curl's behavior.

@mmorearty mmorearty Writing cookies: Prefix wildcard domains with "."
If a cookie matches a wildcard domain, e.g. "(anything).example.com",
then the line written out to a Netscape-formatted cookies.txt file must
begin with a dot, e.g. ".example.com" not "example.com".  This is
redundant with the second field of the line, a TRUE/FALSE value
indicating whether the domain is a wildcard, but that's the way it goes.
dae2733
@drbrain
Sparkle Motion member

A test should accompany this change to ensure we don't break it in the future

@mmorearty mmorearty Add test for cookies that match subdomains
A test to go with commit dae2733: Tests that when writing a cookies.txt
file:

* Cookies that only match exactly the domain specified must not have a
  leading dot, and must have FALSE as the second field.
* Cookies that match subdomains must have a leading dot, and must have
  TRUE as the second field.
85bda8f
@leejarvis leejarvis merged commit 4afa07c into sparklemotion:master Feb 20, 2013

1 check passed

Details default The Travis build passed
@leejarvis leejarvis added a commit that referenced this pull request Feb 20, 2013
@leejarvis leejarvis updated changes for #295 d509938
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment