-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
cookies module allows commas in keys #70490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Commas aren't legal characters in cookie keys, yet in Python 3.5, they're allowed: >>> bool(http.cookies._is_legal_key(','))
True The issue lies in the use of _LegalChars constructing a regular expression. "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." The issue arises in this line: _is_legal_key = re.compile('[%s]+' % _LegalChars).fullmatch Which was added in 88e1151e8e0242 referencing bpo-2211. The problem is that in a regular expression, and in a character class in particular, the '-' character has a special meaning if not the first character in the class, which is "span all characters between the leading and following characters". As a result, the pattern has the unintended effect of including the comma in the pattern: >>> http.cookies._is_legal_key.__self__
re.compile("[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-.^_`|~:]+")
>>> pattern = _
>>> pattern.fullmatch(',')
<_sre.SRE_Match object; span=(0, 1), match=','>
>>> ord('+')
43
>>> ord('.')
46
>>> ''.join(map(chr, range(43,47)))
'+,-.' That's how the comma creeped in. This issue is the underlying cause of https://bitbucket.org/cherrypy/cherrypy/issues/1405/testcookies-fails-on-python-35 and possible other cookie-related bugs in Python. While I jest about regular expressions, I like the implementation. It just needs to account for the extraneous comma, perhaps by escaping the dash: _is_legal_key = re.compile('[%s]+' % _LegalChars.replace('-', '\\-')).fullmatch Also, regression tests for keys containing invalid characters should be added as well. |
Ah, how can I missed this catch? The simplest way is just move "-" to the start or the end of character list. The most error-proof way is to use re.escape(). |
Hi, I am a newcomer here, I am interested to work on this issue. |
We just need to use '\-' instead of '-'.
I have uploaded a patch. |
A test would be much appreciated. It's worthwhile capturing the rejection of at least one invalid character, if not several common ones. |
No, _LegalChars shouldn't include "\". |
@serhiy.storchaka OK, I have used re.escape instead of '\'. And I have added a test too. Also, may I know why '\' can not be in _LegalChars, so that I can remember this for future purpose? |
_LegalChars contained only characters which don't require quoting, as documented in the comment above. If _LegalChars was only used to create _is_legal_key, we would just wrote the regular expression. But it is used also in other places. In this particular case adding "\" to _LegalChars doesn't lead to visible bug (except inconsistency with the comment), but we can't be sure. _is_legal_key() is implementation detail. It would be better to test public API. |
Another option might be to do away with the regular expression (personally I like to avoid REs and code generation where practical): def _is_legal_key(key):
return key and set(_LegalChars).issuperset(key) |
The regular expression is used for performance. |
I ran regex and issuperset version on a random string. The regex one gives better performance. So, I have included the re.escape in the patch. >>> random_str = ''.join(random.choice(_LegalChars) for _ in range(10 ** 8))
>>> is_legal_key = re.compile('[%s]+' % re.escape(_LegalChars)).fullmatch
>>> Timer("is_legal_key(random_str)", setup="from __main__ import random_str, is_legal_key").timeit(1)
0.3168252399998437
>>> def is_legal_key(key):
... return key and set(_LegalChars).issuperset(key)
...
>>> Timer("is_legal_key(random_str)", setup="from __main__ import random_str, is_legal_key").timeit(1)
4.3335622880001665 Also, I have updated the patch. Can you please review it? :) |
The same bug is in the _CookiePattern regular expression. For illegal characters other than a comma, the CookieError does not actually seem to be raised. |
I take that back about _CookiePattern having the same bug; it uses a different input variable. But it is strange that _LegalKeyChars lists a comma, but _LegalChars omits it. |
_LegalKeyChars contains "\-" whereas _LegalChars just contains "-". On Sun, Feb 7, 2016 at 4:33 PM, Martin Panter <report@bugs.python.org>
|
The patch looks okay to me. The inconsistency between silently rejecting cookie “morsels” and raising an exception from load() also exists in 2.7, so maybe it is not a big deal. |
Is this patch ready to merge? |
Looks good to me. |
LGTM. Do you want to commit the patch Jason? |
New changeset 758cb13aaa2c by Anish Shah <anish.shah> in branch '3.5': New changeset 91eb7ae951a1 by Jason R. Coombs in branch 'default': |
Thanks Anish for the patch, now slated for Python 3.5.2 and Python 3.6. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: