Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to correctly save and load cookies? #895

Closed
mengyyy opened this issue Apr 5, 2020 · 9 comments
Closed

How to correctly save and load cookies? #895

mengyyy opened this issue Apr 5, 2020 · 9 comments
Labels
question Further information is requested

Comments

@mengyyy
Copy link

mengyyy commented Apr 5, 2020

Question

I can only load cookies once and then it will occur CookieConflict Excpetion.
Could you please tell me how to correctly load and save cookies?

import pickle
import httpx

def save_cookies():
    cookies_dict = dict()
    for k, v in client.cookies.items():
        cookies_dict[k] = v
    with open("cookies.pk", "wb") as f:
        pickle.dump(cookies_dict, f)

def load_cookies():
    if not os.path.isfile("cookies.pk"):
        return None
    cookies_dict = None
    with open("cookies.pk", "rb") as f:
        cookies_dict = pickle.load(f)
    return cookies_dict

# this can success load cookies
client = httpx.Client(cookies=load_cookies())
# but could not save cookies again

Will cause CookieConflict error

CookieConflict                            Traceback (most recent call last)
<ipython-input-7-0a74a4172586> in <module>
      1 cookies_dict = dict()
----> 2 for k, v in client.cookies.items():
      3     cookies_dict[k] = v
      4

/usr/lib/python3.8/_collections_abc.py in __iter__(self)
    742     def __iter__(self):
    743         for key in self._mapping:
--> 744             yield (key, self._mapping[key])
    745
    746 ItemsView.register(dict_items)

/usr/local/lib/python3.8/dist-packages/httpx/_models.py in __getitem__(self, name)
   1155
   1156     def __getitem__(self, name: str) -> str:
-> 1157         value = self.get(name)
   1158         if value is None:
   1159             raise KeyError(name)

/usr/local/lib/python3.8/dist-packages/httpx/_models.py in get(self, name, default, domain, path)
   1108                         if value is not None:
   1109                             message = f"Multiple cookies exist with name={name}"
-> 1110                             raise CookieConflict(message)
   1111                         value = cookie.value
   1112

CookieConflict: Multiple cookies exist with name=ipb_coppa

And i check the cookies keys()

In [10]: list(i for i in self.client.cookies.keys())
Out[10]:
['ipb_coppa',
 'ipb_member_id',
 'ipb_pass_hash',
 'ipb_session_id',
 'ipb_member_id',
 'ipb_pass_hash',
 'ipb_session_id',
 'ipb_coppa']

@florimondmanca florimondmanca added the question Further information is requested label Apr 5, 2020
@florimondmanca
Copy link
Member

From what I can see this might be a bug in the handling of the cookies client parameter.

Can you provide a repro example that completely drops the pickling logic? That is, pass the contents of your pickled file as a plain dict literal and see if you can iterate over client.cookies.

@mengyyy
Copy link
Author

mengyyy commented Apr 5, 2020

From what I can see this might be a bug in the handling of the cookies client parameter.

Can you provide a repro example that completely drops the pickling logic? That is, pass the contents of your pickled file as a plain dict literal and see if you can iterate over client.cookies.

import httpx

# client 1
client = httpx.Client()
client.get("http://www.google.com")


cookies = dict()
for k, v in client.cookies.items():
    cookies[k] = v

#cookies  
{'1P_JAR': '2020-04-05-13',
 'NID': '201=ssssssss}


# client2
client2 = httpx.Client(cookies=cookies)
client2.get("http://www.google.com")

# cause exception
for k, v in client2.cookies.items():
    print(k, v)

In [13]: [i for i in client2.cookies.keys()]
Out[13]: ['1P_JAR', 'NID', '1P_JAR', 'NID']


In [15]: for k, v in client2.cookies.items():
    ...:     print(k, v)
    ...:
---------------------------------------------------------------------------
CookieConflict                            Traceback (most recent call last)
<ipython-input-15-7b3181fd77d4> in <module>
----> 1 for k, v in client2.cookies.items():
      2     print(k, v)
      3

/usr/lib/python3.6/_collections_abc.py in __iter__(self)
    742     def __iter__(self):
    743         for key in self._mapping:
--> 744             yield (key, self._mapping[key])
    745
    746 ItemsView.register(dict_items)

/usr/local/lib/python3.6/dist-packages/httpx/_models.py in __getitem__(self, name)
   1155
   1156     def __getitem__(self, name: str) -> str:
-> 1157         value = self.get(name)
   1158         if value is None:
   1159             raise KeyError(name)

/usr/local/lib/python3.6/dist-packages/httpx/_models.py in get(self, name, default, domain, path)
   1108                         if value is not None:
   1109                             message = f"Multiple cookies exist with name={name}"
-> 1110                             raise CookieConflict(message)
   1111                         value = cookie.value
   1112

CookieConflict: Multiple cookies exist with name=1P_JAR

@florimondmanca
Copy link
Member

Okay, thanks for the code example.

From my perspective the CookieConflict is probably legit behavior here, and depends on Google's cookies policy across "sessions" (which here translates to a Client instance).

You're trying to reuse an old set of cookies from a previous session, but Google forcibly sends you a fresh pair for this new session, so you end up with multiple versions of the 1P_JAR and NID cookies, which is a cookie conflict.

I assume most cookies are session-sensitive like this, so I'm not even sure in what context your initial use case of reusing cookies from a pickle file would be valid.

But in any case, if that's what you want to do then I'd recommend making sure you remove any session-sensitive cookies from a cookies dict before reusing it on a new client. (Which cookies are session-sensitive depends on the server you're talking to.)

Going to close this for now, as behavior seems legit on HTTPX side… Thanks!

@oczkers
Copy link
Contributor

oczkers commented Apr 8, 2020

Problem is that importing cookies from dict sets domain to empty string and website is setting cookies with domain so we end up with two cookies with exactly the same key valid for that domain. Sure it is not httpx issue but user who didn't set correct domain when importing cookies but it might generate few hard to understand problems for a lot of people.

For example if we load website that sets up cookie and then use cookies.update we end up with two cookies with the same key and httpx is going to use "old" value with correct domain.
Test case:

>>> r = httpx.Client()
>>> r.get('https://httpbin.org/cookies/set/key/value')
<Response [200 OK]>
>>> r.cookies.jar  # we've got one cookie key=value2
<CookieJar[Cookie(version=0, name='key', value='value', port=None, port_specified=False, domain='httpbin.org', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)]>
>>> r.cookies.update({'key': 'value2'})
>>> r.cookies.jar  # two cookies with the same key
<CookieJar[Cookie(version=0, name='key', value='value2', port=None, port_specified=False, domain='', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='key', value='value', port=None, port_specified=False, domain='httpbin.org', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)]>
>>> r.get('https://httpbin.org/cookies').text  # "old" cookie with set up domain is used
'{\n  "cookies": {\n    "key": "value"\n  }\n}\n'

Perfect solution would be more complex update/load and dump/save method that handles domain, path and other cookie params for us but we can also allow easly setting somekind of policy that doesn't allow existance of two cookies with same key that are valid for any domain.

Anyway we should at least warn user in docs about this behaviour.

@florimondmanca
Copy link
Member

Cookies are domain-sensitive, so I think our current behavior on .update(…) is legit.

Besides, the Cookies docs advocate for using httpx.Cookies and its .set() method, which does allow to specify the domain etc.

But I'm not actually sure that's what you need for your dump/load use case.

I think you can get away with pickling the CookieJar object directly, and then building the Cookies from it. This object contains actual Cookie instances, so it retains the domain information too.

Code example:

import pickle
import httpx

def save_cookies(client):
    with open("cookies.pk", "wb") as f:
        pickle.dump(client.cookies.jar, f)

def load_cookies():
    if not os.path.isfile("cookies.pk"):
        return None
    with open("cookies.pk", "rb") as f:
        return pickle.load(f)

client = httpx.Client(cookies=load_cookies())
save_cookies(client)

@oczkers
Copy link
Contributor

oczkers commented Apr 8, 2020

Cookies are domain-sensitive, so I think our current behavior on .update(…) is legit.

Sure, this is correct behaviour, not complaing at, I'm just saying that it might be surprising.

I think if user loads cookies without domain (may i say domain insensitive?), he's probably using this httpx session only on one website (at least he should be), so if this website wants to change cookie value there is no chance but creation of new one. We end up with duplicates that breaks simplicity of managing cookiejar as dict - CookieConflict starts to pop out but user probably never wanted this to happen, he wanted simple, easy and "human" behaviour without duplicates.

@mengyyy
Copy link
Author

mengyyy commented Apr 11, 2020

    def save_cookies(self):
        with open("cookies.pk", "wb") as f:
            pickle.dump(self.client.cookies.jar._cookies, f)

    def load_cookies(self):
        if not os.path.isfile("cookies.pk"):
            return None
        cookies = httpx.Cookies()
        with open("cookies.pk", "rb") as f:
            jar_cookies = pickle.load(f)
        for domain , pc in jar_cookies.items():
            for path, c in pc.items():
                for k, v in c.items():
                    cookies.set(k, v.value, domain=domain, path=path)
        return cookies

@septatrix
Copy link

The proposed solution sadly does not work because the CookieJar has a lock which is not picklable.

@dustydecapod
Copy link

I've solved this by pickling Client.cookies.jar._cookies to save the cookies, then loading them in by using Client.cookies.jar._cookies.update(loaded_cookies). I use Client.cookies.jar.clear_expired_cookies() afterward to make sure i don't have stale cookies afterward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants