Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --cookies option to pass in a cookie file #47

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Add --cookies option to pass in a cookie file #47

wants to merge 7 commits into from

Conversation

brewingcode
Copy link

This is a re-pullrequest of #31, but against the most recent master (since I'm a bit more up to speed with git now)

Use a browser extension such as:

https://chrome.google.com/webstore/detail/cookietxt-export/lopabhfecdfhgogdbojmaicoicjekelh

...to dump cookies into a file, and then pass in the filename with
"--cookies FILENAME". You can also copy the cookie file content, and
then do "--cookies <(pbpaste)" to skip the intermediate file.

@RSully
Copy link

RSully commented Aug 13, 2013

I'd love to see this implemented without the need for temporary files.

@brewingcode
Copy link
Author

RSully, I agree, but I didn't see a way for cookielib to load a set of cookies via anything except a filename.

If there is some other builtin set of libraries besides cookielib + urllib2 that can build a cookie header for me, I'd be happy to switch. I'm not very familiar with the various modules Python comes with.

Use a browser extension such as:

https://chrome.google.com/webstore/detail/cookietxt-export/lopabhfecdfhgogdbojmaicoicjekelh

...to dump cookies into a file, and then pass in the filename with
"--cookies FILENAME". You can also copy the cookie file content, and
then do "--cookies <(pbpaste)" to skip the intermediate file.
@paulhammond
Copy link
Owner

I don't like the cookie file as a user interface - ideally we'd accept cookie name/value pairs on the command line and use that to generate a cookie header manually (if urllib2 or cookielib can't do it then the format isn't that hard) then pass that to req.setValue_forHTTPHeaderField_().

Also, is req.setValue_forHTTPHeaderField_() enough? Will cookies sent that way also show up in the javascript document.cookie object? If not, is that a problem?

--cookie-file is the old behavior, where you pass in a filename (or
a FIFO). --cookie is used to pass in key-value pairs, either in a
single string, or with multiple --cookie options:

--cookie 'name1=value1; name2=value2; name3=value3'

...is the same as:

--cookie name1=value1 --cookie name2=value2 --cookie name3=value3

Both --cookie and --cookie-file are allowed at the same time, in
which case --cookie values are blindly appended to the end of the
cookies that are parsed out of the cookie file.
@brewingcode
Copy link
Author

Fair enough. I've renamed the arguments to allow for both:

    --cookie-file=FILENAME
                        specify a Netscape cookie file
    --cookie=NAME=VALUE
                        specify a cookie name-value pair (multiple --cookie is
                        allowed)

I find the cookie file to be far more useful for me: I'm scraping my development Wordpress blog (requires user auth), and I do not want to mess around assembling all the cookies that are required to authenticate with Wordpress on the command line. It's far faster to simply dump the cookies out of my browser and then feed that directly into cookielib. I was following the lead of curl, which allows a Netscape cookie file with the -b option.

I'm not sure about req.setValue_forHTTPHeaderField_() setting the cookies in such a way that javascript can see them...it has not mattered for my purposes.

@brewingcode
Copy link
Author

RSully, if you use --cookie, then no temporary file be created, and none of the libraries I used to process the cookiefile will even be loaded:

  • cookielib
  • urllib2
  • tempfile
  • os

# doesn't match a particular regex, so we always guarantee the magic_re
# will succeed
tmp = tempfile.NamedTemporaryFile(delete=False)
tmp.write("# Netscape HTTP Cookie File\n")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why webkit2png needs to fix the format of the file here - isn't that the responsibility of whatever tool created the file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it totally is the responsibility of the tool that created the file: but that tool was a Chrome plugin I don't have any control over. Since then, I've found another plugin that does generate the cookie dump with this required line, so I'd be fine removing this hack.

This line's non-existence was such a silly reason for cookielib to throw out an otherwise perfectly good cookie file, I was annoyed enough to code around it. And since I was already forced to write a named file, it was not a big reach to simply guarantee this line existed in the file.

@paulhammond
Copy link
Owner

@aperlscript Thanks for the updated code - having both as an option works for me. I just left one question as a code comment...

Also, before merging this I'd like to understand how webkit2png interacts with Safari's cookie jar. Right now it appears to use Safari's cookies, which feels like a bad thing to me, but I haven't had time to investigate why.

@paulhammond
Copy link
Owner

(Also, don't worry about the part where this pull request will no longer automatically merge, I'll deal with that when it's ready)

@RSully
Copy link

RSully commented Aug 15, 2013

I hadn't realized that this already used Safari's cookiejar - I didn't notice this during my testing. Right now to get around any auth issues I have been saving pages as webarchives and running webkit2png against that.

@brewingcode
Copy link
Author

I don't think this touches Safari's cookies: if I pass in cookies for auth for webkit2png, and then go into Safari's cookies, I don't see any entries for the domain that used my auth cookies.

image

@jgallen23
Copy link

any plans to pull this in?

@brewingcode
Copy link
Author

Merged in the latest paulhammond/master, which necessitated a couple more minor changes.

@Saeven
Copy link

Saeven commented Jan 28, 2014

Similarly, I'd love to have a means to tell it not to use Safari's cookie jar. --no-cookies

Alex added 2 commits January 28, 2014 11:29
By default, NSURLMutableRequest uses Safari's cookies. This option will
explicitly set the HTTP request header "Cookie" to empty.
@brewingcode
Copy link
Author

I'm embarrassed to admit how long it took me to realize that NSURLRequest uses Safari's cookies by default, independent of the code that this pull request is adding. Consider me an idiot.

I've added an option to suppress this default behavior in the request object by simply setting Cookie to an empty string. I didn't see a way to remove the header altogether.

@raine
Copy link

raine commented Jan 13, 2016

webkit2png doesn't use Safari's cookies for me. Maybe it has stopped working at some point?

raine added a commit to raine/webkit2png that referenced this pull request Jan 13, 2016
Add --cookies option to pass in a cookie file
@raine
Copy link

raine commented Jan 13, 2016

I tried --cookie-file with output from the Chrome extension and it sent empty string as cookie string.

--cookie=FOO=BAR works though.

@RSully
Copy link

RSully commented Jan 13, 2016

@raine see issue #94. Previously webkit2png used Safari's cookies. I'm not sure what changed, perhaps Safari's sandboxing.

@raine
Copy link

raine commented Jan 14, 2016

For anyone interested, in my fork raine/master, I have this change and I added option --cookie-raw to allow setting the raw value of the Cookie header. I added it because it's easier to use with Chrome's "Copy as cURL" feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants