Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider something more lenient for cookie parsing #898

Closed
erewok opened this issue Apr 15, 2020 · 5 comments
Closed

Consider something more lenient for cookie parsing #898

erewok opened this issue Apr 15, 2020 · 5 comments

Comments

@erewok
Copy link
Contributor

erewok commented Apr 15, 2020

Hello,

I have used Starlette in a handful of small services in the past few months and have been really happy with it. It's a great project.

Last week I ran into something that I wanted to raise as a potential issue here, because it may be worth considering.

We have an application that uses Okta's Sign-In widget for authentication (which is an OpenID Connect thing). Okta does its Oauth2 callback-redirect and they set some cookies, at least one of which is well outside the spec of HTTP cookies. Anyway, in Starlette the request.cookies attribute ends up pretty mangled as a result.

Here's an example of what I'm talking about using the same cookie-parsing technique in Starlette, which relies on Python's standard library module http.cookies:

In [1]: cookieval = """okta-oauth-nonce=validAsciiblabla; okta-oauth-state=validAsciiBlabla; okta-oauth-redirect-params={"responseType":"code","state":"somestate",
   ...: "nonce":"somenonce","scopes":["openid","profile","email","phone"],"urls":{"issuer":"https://subdomain.okta.com/oauth2/authServer","authorizeUrl":"https://s
   ...: ubdomain.okta.com/oauth2/authServer/v1/authorize","userinfoUrl":"https://subdomain.okta.com/oauth2/authServer/v1/userinfo"}}; importantCookie=importantValu
   ...: e; sessionCookie=importantSessionValue"""

In [2]: import http.cookies

In [3]: cookie = http.cookies.SimpleCookie()

In [4]: cookie.load(cookieval)

In [5]: list(cookie.items())
Out[5]:
[('okta-oauth-nonce', <Morsel: okta-oauth-nonce=validAsciiblabla>),
 ('okta-oauth-state', <Morsel: okta-oauth-state=validAsciiBlabla>)]

Notice in the example above, cookieval contains 5 important cookies (including the session cookie set by Starlette's SessionMiddleware), but only two are successfully parsed by this method.

I have seen a lot of wacky stuff thrown into cookies, so it may be a good idea to provide some method to parse them more leniently in the Starlette project.

Also, out of curiosity, I took a tour through what Werkzeug is doing and they have some pretty low-level code for parsing cookies, but it does successfully parse the above:

In [6]: from werkzeug._internal import _cookie_parse_impl

In [7]: list(_cookie_parse_impl(cookieval.encode("utf-8")))
Out[7]:
[(b'okta-oauth-nonce', b'validAsciiblabla'),
 (b'okta-oauth-state', b'validAsciiBlabla'),
 (b'okta-oauth-redirect-params',
  b'{"responseType":"code","state":"somestate","nonce":"somenonce","scopes":["openid","profile","email","phone"],"urls":{"issuer":"https://subdomain.okta.com/oauth2/authServer","authorizeUrl":"https://subdomain.okta.com/oauth2/authServer/v1/authorize","userinfoUrl":"https://subdomain.okta.com/oauth2/authServer/v1/userinfo"}}'),
 (b'importantCookie', b'importantValue'),
 (b'sessionCookie', b'importantSessionValue')]

In short, cookies seem to be one of those things where while there exists a spec, people have been ignoring it for a long time, and so it may be worth parsing them more leniently or offering an option to do so in the project.

Python's http.cookies module even has the following comment:

The module formerly strictly applied the parsing rules described in the RFC 2109 and RFC 2068 specifications. It has since been discovered that MSIE 3.0x doesn’t follow the character rules outlined in those specs and also many current day browsers and servers have relaxed parsing rules when comes to Cookie handling. As a result, the parsing rules used are a bit less strict.

Thanks for reading and for working on this project. I would be happy to help contribute back to it if I can.


For reference, here's the RFC 6265 spec, which obsoletes the older spec referenced by the Python standard library's http.cookies module (it actually references an even older spec...).

@tomchristie
Copy link
Member

Okay, so the way to move this forward would be to take a look at what Flask and Django do here, so that we’ve got some existing work to base this on.

Want to take a dig into either of those and see how it compares to what we currently have in Starlette?

@erewok
Copy link
Contributor Author

erewok commented Apr 15, 2020

Okay, so the way to move this forward would be to take a look at what Flask and Django do here, so that we’ve got some existing work to base this on.

I had the same thought, which is why I gave that example from Flask above (Werkzeug is the library that handles these things for Flask).

The Werkzeug parser is a pretty low-level parser handling bytes and using regexes. It looks like the kind of inscrutable code someone would write for performance.

Django, by contrast, has a custom cookie parser as well which utilizes the standard library's http.cookies module by calling its _unquote function (but unlike Starlette's they are not using SimpleCookie.load). Their cookie-parser does the straightforward thing:

  1. Split on ;, iterate
  2. Split on =,
  3. Strip whitespace
  4. Call unquote on the val.
  5. Make a dict

The result of using their cookie parser also reproduces all the cookies given in my first example:

In [3]: parse_cookie(cookieval)
Out[3]:
{'okta-oauth-nonce': 'validAsciiblabla',
 'okta-oauth-state': 'validAsciiBlabla',
 'okta-oauth-redirect-params': '{"responseType":"code","state":"somestate","nonce":"somenonce","scopes":["openid","profile","email","phone"],"urls":{"issuer":"https://subdomain.okta.com/oauth2/authServer","authorizeUrl":"https://subdomain.okta.com/oauth2/authServer/v1/authorize","userinfoUrl":"https://subdomain.okta.com/oauth2/authServer/v1/userinfo"}}',
 'importantCookie': 'importantValue',
 'sessionCookie': 'importantSessionValue'}

It would be straightforward to implement something like what Django is doing. I am not sure the performance implications. It would probably be a good idea to have an approach like the following:

  • Try the Django approach
  • Check performance regressions
  • Compare with the flask approach
  • Present results here for the community.

@tomchristie
Copy link
Member

Thanks, that's a good summary.

I'd opt for correctness-first... If the Django approach looks correct, and easier to implement, then that's what we should go for. 👍

There's no particular reason why there'd be any actually significant performance concerns with it.

@tomchristie
Copy link
Member

The best way to approach this would probably be to start by writing a failing test, and submit that as a pull request.

@erewok
Copy link
Contributor Author

erewok commented Apr 15, 2020

I would be happy to do that. We can use my example given above. I was hoping to find a weird-cookie dataset to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants