In Python 3, dogpile's own key mangler can't mangle the output of its default key generators #159

AdamWill · 2019-08-09T21:54:58Z

So I only just ran into this and I may be getting the wrong end of a stick, but I don't think so. It seems to me that, in Python 3, dogpile's sha1_mangle_key cannot mangle the keys produced by dogpile's function_key_generator and other key generators - the ones that are used by default for new cache regions.

The output format of function_key_generator and function_multi_key_generator are defined by a kwarg (to_str) whose default value is dogpile.util.compat.string_type...which on Python 3, is str. Which is the unicode string type, like unicode on Python 2.

sha1_mangle_key basically just calls hashlib.sha1() on whatever it's fed...and hashlib.sha1() will not accept "Unicode-objects", which in Python 3 means str instances. It requires them to be encoded to bytes:

Python 3.7.4 (default, Jul 27 2019, 01:48:07) 
[GCC 9.1.1 20190605 (Red Hat 9.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from hashlib import sha1
>>> sha1('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
>>> sha1('foo'.encode('utf-8'))
<sha1 HASH object @ 0x7fd436d720c0>
>>>

what this means is that if you have Python 3 code that uses a dogpile cache region with default key generators, and sets the key mangler as the dogpile-provided sha1_mangle_key, it just doesn't work right. Here's a minimal reproducer:

import dogpile.cache
import dogpile.cache.util

def make_cached_method(cache):

    @cache.cache_on_arguments()
    def cached_method(key):
        print(key)

    return cached_method

ourcache = dogpile.cache.make_region(key_mangler=dogpile.cache.util.sha1_mangle_key)
ourcache.configure(
    "dogpile.cache.dbm",
    expiration_time=300,
    arguments={
        "filename":"file.dbm"
    }
)
ourmethod = make_cached_method(ourcache)
ourmethod('foo')

On Python 2 this works fine. On Python 3 it blows up:

Traceback (most recent call last):
  File "/tmp/test.py", line 21, in <module>
    ourmethod('foo')
  File "</home/adamw/local/tahrir/tahrir-venv/lib/python3.7/site-packages/decorator.py:decorator-gen-1>", line 2, in cached_method
  File "/home/adamw/local/tahrir/tahrir-venv/lib/python3.7/site-packages/dogpile/cache/region.py", line 1272, in get_or_create_for_user_func
    should_cache_fn, (arg, kw))
  File "/home/adamw/local/tahrir/tahrir-venv/lib/python3.7/site-packages/dogpile/cache/region.py", line 823, in get_or_create
    key = self.key_mangler(key)
  File "/home/adamw/local/tahrir/tahrir-venv/lib/python3.7/site-packages/dogpile/cache/util.py", line 123, in sha1_mangle_key
    return sha1(key).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

surely the stock mangler should work with the stock and default key generators?

The text was updated successfully, but these errors were encountered:

This does enough Python 3 porting to make Tahrir run and do some basic stuff under Python 3 - I've tested creating badges and series, issuing badges, clicking around in Leaderboard and Explore, looking at RSS and JSON views of things. This does not break Python 2 compatibility - I'd rather not do that yet so we can test things easily both ways and identify any differences. We could remove Python 2 compat later. Most of the changes are based on 2to3 suggestions and are pretty self-explanatory. Some less obvious ones: * The str_to_bytes and dogpile stuff: well, see sqlalchemy/dogpile.cache#159 . The `sha1_mangle_key` mangler that we're using, which is provided by dogpile, needs input as a bytestring. This is pretty awkward. It obviously caused *some* problems even in Python 2 (as this app explicitly uses unicodes in some places), but in Python 3 it's worse; everywhere you see `str_to_bytes` being called is a place where I found a crash because we wound up sending a non-encoded `str` to `sha1_mangle_key` (or, in the case of `email_md5` and `email_sha1`, to hashlib directly). * map moved in Python 3; 2to3 suggests handling it with a six move, but I preferred just replacing all the `map` uses with comprehensions. * 2to3 recommended a change to strip_tags, but I noticed it is not actually used any more. It was used to sanitize HTML input to the admin route back when it was added, but the admin route was entirely rewritten later and the use of strip_tags was taken out. So I just removed strip_tags and its supporting players. * merge_dicts is used in places where we were merging two dicts in a single expression by converting them to lists, combining the lists, and turning the combined list back into a dict again. You can still do this in Python 3 but you have to add extra `list()` calls and it gets really ugly. Per https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression it's also not resource-efficient, so this seems like a better approach - it's informed by the code in that SO question but I wrote the function myself rather than taking one from that page to avoid technically having a tiny bit of CC-BY-SA code in this AGPL project. Signed-off-by: Adam Williamson <awilliam@redhat.com>

zzzeek · 2019-08-10T00:40:25Z

surely the stock mangler should work with the stock and default key generators?

well, yes, I'm not sure if you seem like I need to be convinced, this is a mostly forgotten function that was implemented before Python 3 was implemented for dogpile and it has no test coverage, so there's the bug.

sqla-tester · 2019-08-10T00:47:13Z

Mike Bayer has proposed a fix for this issue in the master branch:

Encode string key for sha1 on python 3 https://gerrit.sqlalchemy.org/1402

AdamWill · 2019-08-10T01:15:31Z

ah, OK. It didn't seem like the app I was working on is doing anything particularly unusual so I was surprised it hadn't been spotted till now, I guess.

On the fix - maybe it would be better to only encode it if it's actually a string type? Seems like you're not using six, but then you could just check if it's a unicode for Python 2 or a str for Python 3 and encode it if so. Otherwise just use it as-is. This would avoid it blowing up if someone has already set something up to pass it an encoded value (like I did, for our project).

zzzeek · 2019-08-10T01:26:44Z

ah, OK. It didn't seem like the app I was working on is doing anything particularly unusual so I was surprised it hadn't been spotted till now, I guess.

On the fix - maybe it would be better to only encode it if it's actually a string type? Seems like you're not using six, but then you could just check if it's a unicode for Python 2 or a str for Python 3 and encode it if so. Otherwise just use it as-is. This would avoid it blowing up if someone has already set something up to pass it an encoded value (like I did, for our project).

I thought of this but I don't like the performance overhead of isinstance() that much. I would imagine that if you worked around this issue, you just did your own sha1 call, as it's only a one liner. cache keys weren't expected to be bytes. anyway, in this case we'd catch for unicode under python 2 also, I guess.

zzzeek · 2019-08-10T01:30:26Z

of course under Pyhton 2 you can pass u'' or '' and it works equally well, that's annoying

AdamWill · 2019-08-10T06:19:37Z

The project I'm working on was actually already working around this problem to some extent even before I started porting it to Python 3 - it explicitly uses u"" literals in some places so it had to care. Specifically, this bit has been there since 2014.

Would just doing a try/except be faster than an isinstance? It's less strictly correct but the difference is pretty academic...

This does enough Python 3 porting to make Tahrir run and do some basic stuff under Python 3 - I've tested creating badges and series, issuing badges, clicking around in Leaderboard and Explore, looking at RSS and JSON views of things. This does not break Python 2 compatibility - I'd rather not do that yet so we can test things easily both ways and identify any differences. We could remove Python 2 compat later. Most of the changes are based on 2to3 suggestions and are pretty self-explanatory. Some less obvious ones: * The str_to_bytes and dogpile stuff: well, see sqlalchemy/dogpile.cache#159 . The `sha1_mangle_key` mangler that we're using, which is provided by dogpile, needs input as a bytestring. This is pretty awkward. It obviously caused *some* problems even in Python 2 (as this app explicitly uses unicodes in some places), but in Python 3 it's worse; everywhere you see `str_to_bytes` being called is a place where I found a crash because we wound up sending a non-encoded `str` to `sha1_mangle_key` (or, in the case of `email_md5` and `email_sha1`, to hashlib directly). * map moved in Python 3; 2to3 suggests handling it with a six move, but I preferred just replacing all the `map` uses with comprehensions. * 2to3 recommended a change to strip_tags, but I noticed it is not actually used any more. It was used to sanitize HTML input to the admin route back when it was added, but the admin route was entirely rewritten later and the use of strip_tags was taken out. So I just removed strip_tags and its supporting players. * merge_dicts is used in places where we were merging two dicts in a single expression by converting them to lists, combining the lists, and turning the combined list back into a dict again. You can still do this in Python 3 but you have to add extra `list()` calls and it gets really ugly. Per https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression it's also not resource-efficient, so this seems like a better approach - it's informed by the code in that SO question but I wrote the function myself rather than taking one from that page to avoid technically having a tiny bit of CC-BY-SA code in this AGPL project. Signed-off-by: Adam Williamson <awilliam@redhat.com>

sqlalchemy-bot closed this as completed in 4346981 Aug 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Python 3, dogpile's own key mangler can't mangle the output of its default key generators #159

In Python 3, dogpile's own key mangler can't mangle the output of its default key generators #159

AdamWill commented Aug 9, 2019 •

edited

Loading

zzzeek commented Aug 10, 2019

sqla-tester commented Aug 10, 2019

AdamWill commented Aug 10, 2019

zzzeek commented Aug 10, 2019

zzzeek commented Aug 10, 2019

AdamWill commented Aug 10, 2019 •

edited

Loading

In Python 3, dogpile's own key mangler can't mangle the output of its default key generators #159

In Python 3, dogpile's own key mangler can't mangle the output of its default key generators #159

Comments

AdamWill commented Aug 9, 2019 • edited Loading

zzzeek commented Aug 10, 2019

sqla-tester commented Aug 10, 2019

AdamWill commented Aug 10, 2019

zzzeek commented Aug 10, 2019

zzzeek commented Aug 10, 2019

AdamWill commented Aug 10, 2019 • edited Loading

AdamWill commented Aug 9, 2019 •

edited

Loading

AdamWill commented Aug 10, 2019 •

edited

Loading