-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In Python 3, dogpile's own key mangler can't mangle the output of its default key generators #159
Comments
This does enough Python 3 porting to make Tahrir run and do some basic stuff under Python 3 - I've tested creating badges and series, issuing badges, clicking around in Leaderboard and Explore, looking at RSS and JSON views of things. This does not break Python 2 compatibility - I'd rather not do that yet so we can test things easily both ways and identify any differences. We could remove Python 2 compat later. Most of the changes are based on 2to3 suggestions and are pretty self-explanatory. Some less obvious ones: * The str_to_bytes and dogpile stuff: well, see sqlalchemy/dogpile.cache#159 . The `sha1_mangle_key` mangler that we're using, which is provided by dogpile, needs input as a bytestring. This is pretty awkward. It obviously caused *some* problems even in Python 2 (as this app explicitly uses unicodes in some places), but in Python 3 it's worse; everywhere you see `str_to_bytes` being called is a place where I found a crash because we wound up sending a non-encoded `str` to `sha1_mangle_key` (or, in the case of `email_md5` and `email_sha1`, to hashlib directly). * map moved in Python 3; 2to3 suggests handling it with a six move, but I preferred just replacing all the `map` uses with comprehensions. * 2to3 recommended a change to strip_tags, but I noticed it is not actually used any more. It was used to sanitize HTML input to the admin route back when it was added, but the admin route was entirely rewritten later and the use of strip_tags was taken out. So I just removed strip_tags and its supporting players. * merge_dicts is used in places where we were merging two dicts in a single expression by converting them to lists, combining the lists, and turning the combined list back into a dict again. You can still do this in Python 3 but you have to add extra `list()` calls and it gets really ugly. Per https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression it's also not resource-efficient, so this seems like a better approach - it's informed by the code in that SO question but I wrote the function myself rather than taking one from that page to avoid technically having a tiny bit of CC-BY-SA code in this AGPL project. Signed-off-by: Adam Williamson <awilliam@redhat.com>
This does enough Python 3 porting to make Tahrir run and do some basic stuff under Python 3 - I've tested creating badges and series, issuing badges, clicking around in Leaderboard and Explore, looking at RSS and JSON views of things. This does not break Python 2 compatibility - I'd rather not do that yet so we can test things easily both ways and identify any differences. We could remove Python 2 compat later. Most of the changes are based on 2to3 suggestions and are pretty self-explanatory. Some less obvious ones: * The str_to_bytes and dogpile stuff: well, see sqlalchemy/dogpile.cache#159 . The `sha1_mangle_key` mangler that we're using, which is provided by dogpile, needs input as a bytestring. This is pretty awkward. It obviously caused *some* problems even in Python 2 (as this app explicitly uses unicodes in some places), but in Python 3 it's worse; everywhere you see `str_to_bytes` being called is a place where I found a crash because we wound up sending a non-encoded `str` to `sha1_mangle_key` (or, in the case of `email_md5` and `email_sha1`, to hashlib directly). * map moved in Python 3; 2to3 suggests handling it with a six move, but I preferred just replacing all the `map` uses with comprehensions. * 2to3 recommended a change to strip_tags, but I noticed it is not actually used any more. It was used to sanitize HTML input to the admin route back when it was added, but the admin route was entirely rewritten later and the use of strip_tags was taken out. So I just removed strip_tags and its supporting players. * merge_dicts is used in places where we were merging two dicts in a single expression by converting them to lists, combining the lists, and turning the combined list back into a dict again. You can still do this in Python 3 but you have to add extra `list()` calls and it gets really ugly. Per https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression it's also not resource-efficient, so this seems like a better approach - it's informed by the code in that SO question but I wrote the function myself rather than taking one from that page to avoid technically having a tiny bit of CC-BY-SA code in this AGPL project. Signed-off-by: Adam Williamson <awilliam@redhat.com>
well, yes, I'm not sure if you seem like I need to be convinced, this is a mostly forgotten function that was implemented before Python 3 was implemented for dogpile and it has no test coverage, so there's the bug. |
Mike Bayer has proposed a fix for this issue in the master branch: Encode string key for sha1 on python 3 https://gerrit.sqlalchemy.org/1402 |
ah, OK. It didn't seem like the app I was working on is doing anything particularly unusual so I was surprised it hadn't been spotted till now, I guess. On the fix - maybe it would be better to only encode it if it's actually a string type? Seems like you're not using |
I thought of this but I don't like the performance overhead of isinstance() that much. I would imagine that if you worked around this issue, you just did your own sha1 call, as it's only a one liner. cache keys weren't expected to be bytes. anyway, in this case we'd catch for unicode under python 2 also, I guess. |
of course under Pyhton 2 you can pass u'' or '' and it works equally well, that's annoying |
The project I'm working on was actually already working around this problem to some extent even before I started porting it to Python 3 - it explicitly uses Would just doing a try/except be faster than an isinstance? It's less strictly correct but the difference is pretty academic... |
This does enough Python 3 porting to make Tahrir run and do some basic stuff under Python 3 - I've tested creating badges and series, issuing badges, clicking around in Leaderboard and Explore, looking at RSS and JSON views of things. This does not break Python 2 compatibility - I'd rather not do that yet so we can test things easily both ways and identify any differences. We could remove Python 2 compat later. Most of the changes are based on 2to3 suggestions and are pretty self-explanatory. Some less obvious ones: * The str_to_bytes and dogpile stuff: well, see sqlalchemy/dogpile.cache#159 . The `sha1_mangle_key` mangler that we're using, which is provided by dogpile, needs input as a bytestring. This is pretty awkward. It obviously caused *some* problems even in Python 2 (as this app explicitly uses unicodes in some places), but in Python 3 it's worse; everywhere you see `str_to_bytes` being called is a place where I found a crash because we wound up sending a non-encoded `str` to `sha1_mangle_key` (or, in the case of `email_md5` and `email_sha1`, to hashlib directly). * map moved in Python 3; 2to3 suggests handling it with a six move, but I preferred just replacing all the `map` uses with comprehensions. * 2to3 recommended a change to strip_tags, but I noticed it is not actually used any more. It was used to sanitize HTML input to the admin route back when it was added, but the admin route was entirely rewritten later and the use of strip_tags was taken out. So I just removed strip_tags and its supporting players. * merge_dicts is used in places where we were merging two dicts in a single expression by converting them to lists, combining the lists, and turning the combined list back into a dict again. You can still do this in Python 3 but you have to add extra `list()` calls and it gets really ugly. Per https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression it's also not resource-efficient, so this seems like a better approach - it's informed by the code in that SO question but I wrote the function myself rather than taking one from that page to avoid technically having a tiny bit of CC-BY-SA code in this AGPL project. Signed-off-by: Adam Williamson <awilliam@redhat.com>
This does enough Python 3 porting to make Tahrir run and do some basic stuff under Python 3 - I've tested creating badges and series, issuing badges, clicking around in Leaderboard and Explore, looking at RSS and JSON views of things. This does not break Python 2 compatibility - I'd rather not do that yet so we can test things easily both ways and identify any differences. We could remove Python 2 compat later. Most of the changes are based on 2to3 suggestions and are pretty self-explanatory. Some less obvious ones: * The str_to_bytes and dogpile stuff: well, see sqlalchemy/dogpile.cache#159 . The `sha1_mangle_key` mangler that we're using, which is provided by dogpile, needs input as a bytestring. This is pretty awkward. It obviously caused *some* problems even in Python 2 (as this app explicitly uses unicodes in some places), but in Python 3 it's worse; everywhere you see `str_to_bytes` being called is a place where I found a crash because we wound up sending a non-encoded `str` to `sha1_mangle_key` (or, in the case of `email_md5` and `email_sha1`, to hashlib directly). * map moved in Python 3; 2to3 suggests handling it with a six move, but I preferred just replacing all the `map` uses with comprehensions. * 2to3 recommended a change to strip_tags, but I noticed it is not actually used any more. It was used to sanitize HTML input to the admin route back when it was added, but the admin route was entirely rewritten later and the use of strip_tags was taken out. So I just removed strip_tags and its supporting players. * merge_dicts is used in places where we were merging two dicts in a single expression by converting them to lists, combining the lists, and turning the combined list back into a dict again. You can still do this in Python 3 but you have to add extra `list()` calls and it gets really ugly. Per https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression it's also not resource-efficient, so this seems like a better approach - it's informed by the code in that SO question but I wrote the function myself rather than taking one from that page to avoid technically having a tiny bit of CC-BY-SA code in this AGPL project. Signed-off-by: Adam Williamson <awilliam@redhat.com>
So I only just ran into this and I may be getting the wrong end of a stick, but I don't think so. It seems to me that, in Python 3, dogpile's
sha1_mangle_key
cannot mangle the keys produced by dogpile'sfunction_key_generator
and other key generators - the ones that are used by default for new cache regions.The output format of
function_key_generator
andfunction_multi_key_generator
are defined by a kwarg (to_str
) whose default value isdogpile.util.compat.string_type
...which on Python 3, isstr
. Which is the unicode string type, likeunicode
on Python 2.sha1_mangle_key
basically just callshashlib.sha1()
on whatever it's fed...andhashlib.sha1()
will not accept "Unicode-objects", which in Python 3 meansstr
instances. It requires them to be encoded tobytes
:what this means is that if you have Python 3 code that uses a dogpile cache region with default key generators, and sets the key mangler as the dogpile-provided
sha1_mangle_key
, it just doesn't work right. Here's a minimal reproducer:On Python 2 this works fine. On Python 3 it blows up:
surely the stock mangler should work with the stock and default key generators?
The text was updated successfully, but these errors were encountered: