Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add header line when writing to cookies.txt #293

Merged
merged 2 commits into from

3 participants

Mike Morearty Lee Jarvis Eric Hodel
Mike Morearty

When Python is reading a Netscape-format cookies.txt file, it seems to require an initial line that looks like this:

# Netscape HTTP Cookie File

Mechanize is not writing out a line like that, so Python scripts are unable to read cookie files that were created by Mechanize. This patch adds that line.

See this tiny project for a testable demo.

By the way: I noticed that in #257, the cookie-handling part of Mechanize is in the process of being split off into a separate gem. So I'll submit another similar pull request on the http-cookie project.

Mike Morearty mmorearty Add header line when writing to cookies.txt
When Python is reading a Netscape-format cookies.txt file, it seems to
require an initial line that looks like this:

    # Netscape HTTP Cookie File
f851b52
Mike Morearty mmorearty referenced this pull request in sparklemotion/http-cookie
Merged

Add header line when writing to cookies.txt #1

Lee Jarvis
Owner

Whilst I wouldn't usually accept a patch that appears to be covering up inefficiencies with outside libraries, I'm happy to merge this if you can remove the Netscape prefix from the line you've added, it's a little misleading.

@drbrain Whilst we're talking about this, did you want to add any extra meta information dumped along with the cookies? ie the mech version etc

Eric Hodel
Owner

Is adding such information allowed by the cookies.txt format? If so, let's do it.

Mike Morearty mmorearty Remove word "Netscape" from cookiejar header line
The header line was added for compatibility with Python, but Python
only requires "HTTP Cookie File"; the word "Netscape" is optional.
7df242a
Mike Morearty

@injekt You make a good point re: inefficiencies of other libraries; that is worth considering.

@drbrain Yes, the file format does allow this. Any line with "#" as the first character is considered a comment -- except in this quirky case where the first-line comment is sort of considered to be a file-format indicator.

I dug up the Python library source code (should have done that sooner). It does recognize the file if the header is just:

# HTTP Cookie File

So I modified my patch to remove the word "Netscape".

Their code was clearly written a very long time ago. Here is the full text of their header comment, which goes into a fair amount of detail:

class MozillaCookieJar(FileCookieJar):
    """

    WARNING: you may want to backup your browser's cookies file if you use
    this class to save cookies.  I *think* it works, but there have been
    bugs in the past!

    This class differs from CookieJar only in the format it uses to save and
    load cookies to and from a file.  This class uses the Mozilla/Netscape
    `cookies.txt' format.  lynx uses this file format, too.

    Don't expect cookies saved while the browser is running to be noticed by
    the browser (in fact, Mozilla on unix will overwrite your saved cookies if
    you change them on disk while it's running; on Windows, you probably can't
    save at all while the browser is running).

    Note that the Mozilla/Netscape format will downgrade RFC2965 cookies to
    Netscape cookies on saving.

    In particular, the cookie version and port number information is lost,
    together with information about whether or not Path, Port and Discard were
    specified by the Set-Cookie2 (or Set-Cookie) header, and whether or not the
    domain as set in the HTTP header started with a dot (yes, I'm aware some
    domains in Netscape files start with a dot and some don't -- trust me, you
    really don't want to know any more about this).

    Note that though Mozilla and Netscape use the same format, they use
    slightly different headers.  The class saves cookies using the Netscape
    header by default (Mozilla can cope with that).

    """
    magic_re = re.compile("#( Netscape)? HTTP Cookie File")

Also, right after that part is the header they write out when asked to write a cookie file -- it links to a URL which no longer exists:

    header = """\
# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This is a generated file!  Do not edit.

A further data point: libcurl (and therefore curl) does write out the header line (including "Netscape"), along with a couple more lines:

# tell curl to write out cookies.txt
$ curl -c cookies.txt http://www.google.com/
...
$ cat cookies.txt
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
....

But when reading a cookie file, curl does not require the header line.

Mike Morearty

Okay, now I'm really confused -- it appears that there is also a problem with the cookies themselves: Python seems to require that the domain name of each cookie (the first component of each line) must begin with a dot, e.g. ".google.com", instead of "google.com" which is what Mechanize writes out.

As for curl: Same pattern -- it always writes out the leading dot, but when reading, it doesn't mind if the dot is missing. Perhaps curl is following Postel's Law.

Anyway, let's not do anything hasty. I just happened across this header issue so I logged it, but I have never messed with cookiejar files before. Also, this issue about the leading dot is a separate issue from this header issue, so if anything, I'll log it separately. Just mentioning it here for completeness.

Lee Jarvis leejarvis merged commit 2c6c675 into from
Lee Jarvis
Owner

@mmorearty Thanks for researching this. I'll take some time to give it a closer look and see what I can come up with.

Lee Jarvis
Owner

I should also add to this conversation the fact that the prefixed period has a meaning, in that it's used to specify the cookies scope. .github.com would apply to foo.github.com and bar.github.com but github.com is different and would not apply to wildcard subdomains.

If you specify a domain then the app should really always prefix the period, otherwise it should just use the host address.

from http://www.ietf.org/rfc/rfc2109.txt

Domain=domain
Optional. The Domain attribute specifies the domain for which the
cookie is valid. An explicitly specified domain must always start
with a dot.

Mike Morearty mmorearty deleted the branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Feb 18, 2013
  1. Mike Morearty

    Add header line when writing to cookies.txt

    mmorearty authored Mike Morearty committed
    When Python is reading a Netscape-format cookies.txt file, it seems to
    require an initial line that looks like this:
    
        # Netscape HTTP Cookie File
  2. Mike Morearty

    Remove word "Netscape" from cookiejar header line

    mmorearty authored Mike Morearty committed
    The header line was added for compatibility with Python, but Python
    only requires "HTTP Cookie File"; the word "Netscape" is optional.
This page is out of date. Refresh to see the latest.
Showing with 1 addition and 0 deletions.
  1. +1 −0  lib/mechanize/cookie_jar.rb
1  lib/mechanize/cookie_jar.rb
View
@@ -187,6 +187,7 @@ def load_cookiestxt(io)
# Write cookies to Mozilla cookies.txt-style IO stream
def dump_cookiestxt(io)
+ io.puts "# HTTP Cookie File"
to_a.each do |cookie|
io.puts([
cookie.domain,
Something went wrong with that request. Please try again.