Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookies need to be CookieJar-backed again. #281

Closed
kennethreitz opened this issue Nov 19, 2011 · 39 comments
Closed

Cookies need to be CookieJar-backed again. #281

kennethreitz opened this issue Nov 19, 2011 · 39 comments

Comments

@kennethreitz
Copy link
Contributor

No description provided.

@n1m3
Copy link

n1m3 commented Nov 19, 2011

Why?

@kennethreitz
Copy link
Contributor Author

k/v only doesn't account for same keys with different paths.

@n1m3
Copy link

n1m3 commented Nov 19, 2011

So you're gonna use kennethreitz/oreos then?

@kennethreitz
Copy link
Contributor Author

We'll see :)

@dhagrow
Copy link

dhagrow commented Feb 3, 2012

Why was this issue closed? I'm having problems with exactly this at the moment. As it is requests can not handle cookies across multiple domains.

There isn't really a good way to subclass Request or Response, so I ended up patching requests with a modified version of CookieJar that has a partial dict interface. It's not perfect, but at least I am able to take advantage of the many specifics that cookielib handles when deciding whether or not to send a cookie. The Cookie module (which you have patched in oreos), though cleaner than cookielib, contains none of that logic.

I should note also that urllib3 appears to parse headers incorrectly. I had to dig down to the httplib and rfc822 level in order to get a properly parsed cookie header.

If you think this would be useful, I should be able to share my changes.

@kennethreitz
Copy link
Contributor Author

I would love to see the code! Issue is closed because I can't do until later :)

@kennethreitz kennethreitz reopened this Feb 3, 2012
@dhagrow
Copy link

dhagrow commented Feb 3, 2012

You can find my changes at https://github.com/dhagrow/requests

You'll notice I'm using wrappers around requests.Request/Response to work with cookielib. An alternative would be to include a cookielib that's been patched to use requests.

@slingamn
Copy link
Contributor

Is there an official mechanism for interoperating with persistent FileCookieJars?

I wrote tests for using the functions in requests.utils to do it:

https://github.com/slingamn/requests/compare/lwpcookiejar

but they fail:

[shivaram@good-fortune ~/workspace/requests/tests]$ python ./test_cookiejars.py 
FF
======================================================================
FAIL: test_cookiejar_persistence (__main__.LWPCookieJarTest)
Test that we can save cookies to a FileCookieJar.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_cookiejars.py", line 42, in test_cookiejar_persistence
    self.assertEqual(requests.utils.dict_from_cookiejar(cookiejar_2), {'key', 'value'})
AssertionError: {} != set(['key', 'value'])

======================================================================
FAIL: test_cookiejar_persistence (__main__.MozCookieJarTest)
Test that we can save cookies to a FileCookieJar.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_cookiejars.py", line 42, in test_cookiejar_persistence
    self.assertEqual(requests.utils.dict_from_cookiejar(cookiejar_2), {'key', 'value'})
AssertionError: {} != set(['key', 'value'])

----------------------------------------------------------------------
Ran 2 tests in 1.207s

FAILED (failures=2)

BTW this library is gorgeous!

@kennethreitz
Copy link
Contributor Author

Thanks!

I'll add it back in eventually :)

@slingamn
Copy link
Contributor

Are you interested in pull requests (possibly based on dhagrow's work) that implement this?

@laurentb
Copy link
Contributor

I wasn't aware of this.

requests' cookie handling has many issues (no security at all: shares cookies between domains, does not handle expiration or secure cookies), and I had to use https://github.com/sashahart/cookies in order to have decent cookie parsing. It passed almost all my tests (for some reason it stripped the '.' before domains) and its API is very clean.
I ruled out using cookielib either, as the codebase is pretty outdated, and the cost of using wrappers or altering it seemed too high.

I implemented a simple cookie jar which works with requests and Cookies:
https://github.com/laurentb/weboob/tree/browser2/weboob/tools/browser2
While it isn't used yet by the project, it already has many tests.

Currently to use it I have to erase the Session cookies on requests (because it tries to merge them with the provided cookies), and replace requests' redirect handling by my own (because hooks aren't called on each redirected request).

My implementation does a bit more than strictly required, i.e. a simpler CookiePolicy could be used.

Maybe a good step could be to use Cookies instead of "oreos" and allow cookie handling to be delegated.

@piotr-dobrogost
Copy link

It would be very useful to have good support for cookies - https://github.com/sashahart/cookies looks nice.

@slingamn
Copy link
Contributor

Just my two cents, but I would much rather have either

  1. Support for standard-library cookielib
  2. Support for a correct and robust cookie library maintained by kennethreitz himself and tied to the requests project

rather than an external cookie library that seems to have no real community and only one maintainer.

@kennethreitz
Copy link
Contributor Author

  1. That's how it used to be, but it was removed to speed up the urllib3 transition. Would love to have have it back.
  2. That's what oreos' intension is. Right now, it's just a lazy monkeypatch for cookielib though.

I'd rather do #1 if we can get away with it easily w/o urllib2.

@slingamn
Copy link
Contributor

This is a rebase of dhagrow's patches on top of develop:

https://github.com/slingamn/requests/tree/dhagrow

Does anyone know how close it is to being finished? It seems mostly right.

@slingamn
Copy link
Contributor

If the answer to that question is "it's not clear", tell me and I can puzzle over it on your behalf :-)

We could also ping dhagrow himself.

@piotr-dobrogost
Copy link

@slingamn

From https://github.com/sashahart/cookies

This doesn't compete with the cookielib (http.cookiejar) module in the Python standard library, which is specifically
for implementing cookie storage and similar behavior in an HTTP client such as a browser.

As to

rather than an external cookie library that seems to have no real community and only one maintainer.

I wouldn't judge based on community's size or number of maintainers. Wouldn't Requests community become this library's community the moment Request starts using it?

@kennethreitz
Copy link
Contributor Author

Just because we can doesn't mean we should. I don't want to maintain a cookie library :)

@slingamn
Copy link
Contributor

@piotr-dobrogost

I actually need cookie storage for my use case. Support for LWPCookieJar would be ideal. Cookies are persistent by nature; it seems arbitrary to have a cookie library that can only persist cookies across the lifetime of a single Python process.

Also, what would the relationship of cookies be to requests exactly? cookies doesn't necessarily seem stable or mature enough to warrant a formal dependency on it as an external library. But the alternative to that is forking it and maintaining it inside the requests tree, which seems drastic (even assuming license-compatibility).

@piotr-dobrogost
Copy link

I guess some of these 299 people who forked Requests would help you :)
You already created library dealing with cookies; have you created it with the plan to not maintain it? :)

@kennethreitz
Copy link
Contributor Author

The module is practically non-existant. I'd be very happy to use the well-tested standard library modules if they're appropriate.

@piotr-dobrogost
Copy link

@kennethreitz

What do you mean by The module is practically non-existant.?

@kennethreitz
Copy link
Contributor Author

Oreos isn't an effort.

@slingamn
Copy link
Contributor

OK, I'll plan to look at dhagrow's branch, hopefully relatively soon.

He's presumably still subscribed to this thread, so I guess this counts as contacting him :-)

@slingamn
Copy link
Contributor

@dhagrow poke, people are interested in your work :-)

@laurentb
Copy link
Contributor

I've tested Python's cookie parsing, oreos, and Cookies.

It's the only one that correctly parses multiple cookies in one Set-Cookie:, and recognizes and parses all attributes, including a lot of weird cases. My only issue with it is that it removes the starting '.' from the domain entry if it's there; which is easy to overcome but requires monkeypatching at least for now.

It might not be popular, but it's still the only one that does not fail, and it has a lot of tests.

A CookieJar isn't something very complicated to implement — mine is less than 100 lines of code, most of my cookiejar.py is added policies and comments. As long as you have proper cookie objects with a good API.

I quickly looked at dhagrow's branch, and it does not seem to handle cookies in redirects properly — the cookie of the original request are always sent for the next request. I'm positively surprised though, using mock objects seems to require far less code than I thought. But as I said, it's not like the CookieJar is the hard part, and cookielib focuses on RFCs no one uses (sadly, it seems to be a common trend with web browsers, like GET after 302).

@slingamn
Copy link
Contributor

@laurentb

Can you provide a test such that cookielib has the incorrect behavior and Cookies has the right one?

There's still the issue with Cookies not supporting persistence to files.

@dhagrow
Copy link

dhagrow commented Apr 22, 2012

Sorry, I've been away on business. I haven't looked too much at sashahart's cookies, but as far as I know cookielib is the only complete implementation of client-side cookie handling for Python. Cookies has a much nicer interface for accessing cookies, but has none of the logic for whether or not a client should send a particular set of cookies. As laurentb mentions, a lot of the cookielib functionality is not often used, but certain important bits are, especially the code which handles the valid domains and paths for a cookie.

cookielib is not a very pretty module, and it is written very specifically around the urllib/urllib2 modules, so ideally it should be replaced by something a little cleaner and more modular. But, practically speaking, until someone is willing to put forth the effort to write and maintain such a library, cookielib is available and it works. That's why I went with the option to write mock urllib objects to pass into cookielib. The one part that I know is missing in my patch is correct handling of cookies in redirects, as laurentb mentions. That would require some changes on the requests end of things to support adding headers to a request that should not be sent in a redirect.

As I mentioned in one of my original posts, another option is to patch cookielib to use requests objects. That would not be very difficult, and for the eventual possibility of including requests in the stdlib, might be the preferable option. Unfortunately my own work does not currently allow me the time to do much more work on this than I have already done. So unless that changes or I find some free time to do it, it will be up to someone else to look at.

@slingamn
Copy link
Contributor

@dhagrow Thanks very much! I will look into updating your branch.

@slingamn
Copy link
Contributor

Can someone clarify the expected relationship between request.cookies and request.session.cookies? I'm thinking of eliminating the first in favor of the second.

@kennethreitz
Copy link
Contributor Author

request.cookies is the cookies the server set during the request.

Session cookies are persistent, like in a browser.

Both need to exist.

@slingamn
Copy link
Contributor

Oh, that makes sense. Thanks!

@slingamn
Copy link
Contributor

So it looks like the current behavior in requests is to unconditionally strip the Cookie header. From models.py:

                # Remove the cookie headers that were sent.
                headers = self.headers
                try:
                    del headers['Cookie']
                except KeyError:
                    pass

Can anyone help me find a specification (or a trustworthy reference) that explains exactly what should be done under what circumstances? I'm having trouble Googling the issue, unfortunately.

@slingamn
Copy link
Contributor

After asking around in IRC, this behavior seems like The Right Thing: it'll cause the Cookie header to be regenerated from self.cookies (in dhagrow's branch, an intelligent CookieJar that respects domains and paths) and sent fresh with the request that follows the redirection.

I've made progress on this in my fork (green tests and such), but it needs more cleanup and more testing. I hope to check back in soon.

@slingamn
Copy link
Contributor

See the pull request at #565.

kennethreitz pushed a commit that referenced this issue May 2, 2012
Support CookieJar, references #281
@slingamn
Copy link
Contributor

slingamn commented May 3, 2012

Close this? :-)

@slingamn
Copy link
Contributor

slingamn commented May 4, 2012

I'll open another ticket for documentation.

@slingamn slingamn closed this as completed May 4, 2012
@piotr-dobrogost
Copy link

@dhagrow

I should note also that urllib3 appears to parse headers incorrectly. I had to dig down to the httplib and rfc822 level in order to get a properly parsed cookie header.

Maybe the problem you had is the same as one in urllib3/urllib3#3 (Use MultiDict for headers)?

@dhagrow
Copy link

dhagrow commented May 4, 2012

@piotr-dobrogost

That looks like the one, yes.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants