New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple serialization logic from CacheBackend classes #191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like it's not done yet.
let's use serialzier / deserializer. proof of concept is that pickle can be ditched entirely and json can be used, if someone wanted.
dogpile/cache/api.py
Outdated
@@ -75,7 +79,29 @@ def __init__(self, arguments): # pragma NO COVERAGE | |||
passed to :func:`.make_registry`. | |||
|
|||
""" | |||
raise NotImplementedError() | |||
# TODO: We can probably use a better name than pickler/unpickler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serializer / deserializer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
dogpile/cache/backends/file.py
Outdated
@@ -140,6 +140,13 @@ def release_write_lock(self): | |||
""" | |||
|
|||
def __init__(self, arguments): | |||
super().__init__( | |||
{ | |||
"pickler": pickle.dumps, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can I send my own pickler here? seems like not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the idea here is that if "pickle" is not in arguments, then we default it to pickle.dumps:
>>> arguments = {} # No arguments from user
>>> {
... "pickler": pickle.dumps,
... "unpickler": pickle.loads,
... **arguments
... }
{'pickler': <built-in function dumps>, 'unpickler': <built-in function loads>}
>>> def noop(x): return x
...
>>> arguments = {"pickler": noop, "unpickler": noop} # User overrides
>>> {
... "pickler": pickle.dumps,
... "unpickler": pickle.loads,
... **arguments
... }
{'pickler': <function noop at 0x102b7a0d0>, 'unpickler': <function noop at 0x102b7a0d0>}
dogpile/cache/backends/memcached.py
Outdated
@@ -112,6 +112,11 @@ class GenericMemcachedBackend(CacheBackend): | |||
to the :meth:`set` method.""" | |||
|
|||
def __init__(self, arguments): | |||
# No need to override serializer, as all the memcached libraries will do it. | |||
# Still, we support customizing the serializer/deserializer to use better default pickle protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we want to use our own pickler here don't we ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this can potentially break backwards compatibility (I have to double check that), but probably in a way that nobody will ever notice.
It seems to me that all the memcached libraries use flags that represent the type of value stored. For example, this is how python-memcached does it. Other memcached implementations (even the C ones) seem to do the same. If we suddently start passing always bytes to memcached client, then it will be suddently start using no flag, to signify that bytes are stored.
This can potentially be a problem for users that expect Client._FLAG_TEXT to be set for some values.
But given the behaviour of memcached to return None on GET if there are errors, it may not be such a big deal after all.
But this is the only reason why I left the default to noop, so to leave the memcached library to determine what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so then.... we don't support any means of changing the serializer right now? They can default to None for serializer/deserializer but we still need to run our "(de)serializer" inside the get/set methods in the backend right?
dogpile/cache/backends/file.py
Outdated
{ | ||
"pickler": pickle.dumps, | ||
"unpickler": pickle.loads, | ||
**arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you are using some flashy new python 3 syntax thing. that syntax is a little icky since it is in my 2 minutes experience pretty hard to spot, but that's on them not you : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also use ChainMap, as I think it was created exactly for this purpose.
dogpile/cache/backends/redis.py
Outdated
super().__init__( | ||
{ | ||
"pickler": pickle.dumps, | ||
"unpickler": pickle.loads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this syntax is messing with me, because it registers as exactly like the syntax that would fail:
Foo(pickler=pickle.dumps, unpickler=pickle.loads, **arguments)
again my problem not yours...
Indeed, it's just a draft.
Yes! |
Also, actually use the configured serializer in the DBMBackend and MemoryBackend
A nice side effect is that we can use custom serializer/deserializer
…kend-serializer # Conflicts: # tests/cache/test_redis_sentinel_backend.py
Tests are now passing (on my machine), and all the backends delegates the serialization logic to the serializer/deserializer attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 9308200 of this pull request into gerrit so we can run tests and reviews and stuff
New Gerrit review created for change 9308200: https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
just FYI if you install and run pre-commit for this repo it will run black +zimports with all our settings and all but E501 will be automatically handled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
(3 comments)
I'm not getting the memcached part still and I think this has to be fully generalized to start
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
- dogpile/cache/backends/memcached.py (line 173): need deserilalize here...or noop if it's None
- dogpile/cache/backends/memcached.py (line 184): need serialize here... or noop if it's None
tests/cache/test_redis_backend.py
Outdated
@@ -50,6 +50,22 @@ class RedisTest(_TestRedisConn, _GenericBackendTest): | |||
} | |||
|
|||
|
|||
# TODO: This can probably be a test that we should run for every backend | |||
class RedisCustomSerializerTest(_TestRedisConn, _GenericBackendTest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
yes this has to be another _GenericBackendTest, and I think having it run for json loads/dumps would be a good proof, it should have some raw cache get that ensures the value comes out as json if deserializer isn't run. this might mean the actual values we're testing have to change a bit to support json
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But would this be a test method implemented in the _GenericBackendTest class? Because I can imagine some backends would not want to support serializer/deserializer explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of the backends we have do want to make use of this so it should be a reusable test fixture. a backend that doesnt include serialization doesn't have to use it in their test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok, I was under the impression that backend tests from dogpile.cache are supposed to be run also by third party plugins, just like sqlalchemy dialects.
Since it's not the case, then I will make the serializer/deserializer test part of the _GenericBackendTest test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me clarify, this should be a new class called _SeralizerFixture or something like that, it shouldn't be embedded in the existing _GenericBackendTest. it should include new tests that prove that the custom serializer, for which I suggested JSON, is actually being used. it should be able to be plugged in as a mixin to a _GenericBackendTest.
that is, we do want third party dialects to use these fixtures and this one would be "opt in" for a backend that does so, and we would want to confirm it in fact is able to serialize/deserialize from bytes on the backend side, since that's the assumption of the new feature we are doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you maybe pick up the testing part? I feel like I am going to spend a long time just to figure out how the class structure should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my current days are 100% getting SQLAlchemy 1.4 documentation written so that the betas can come out. if you aren't blocked by this and can wait awhile, I still have these in my queue so I'd have to get around to them (or maybe someone else wants to poke at it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking forward for the first beta!
If you don't mind a more pragmatic approach, I can make test coverage by repeating the tests on each backend (maybe while I do that I find a way to generalize the tests), but at least we won't block this, and tests can be refactored later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not as complicated as you are making it. the steps are:
-
create a class _GenericSerializerTest
-
add a few test_XYZ methods to this class which round trip values using a JSON serializer, not pickle, to the backend.
-
the test_XYZ methods should pull out raw data from each backend, making sure the data is in fact bytes when we don't use the serializer, and that the data is in fact JSON serialiized
-
inside of test_redis , test_memached, test_dbm, place a stub test case that extends from _GenericSerializerTest as well as _GenericBackendTest
-
that way all the tests in _GenericBackendTest run against a JSON serializer as a proof of concept
-
I just looked at _GenericBackendTest and all the values it tests with are strings. We should include some tests that use a json-like structure of lists and dictionaries as the values. The pickle serializers and json serializers can both handle these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think I understand now. Can you have a look if that's how you meant it? I still need to add some tests for set_multi/get_multi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision bf3ba6a of this pull request into gerrit so we can run tests and reviews and stuff
Patchset bf3ba6a added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
If not configured, it will just be a noop
I forgot to add the call to self.serializer/deserialzier. We should make tests to cover that, but I need to know where should those tests be implemented, and if they should run for every backend, or if every backend test suite should decide for itself. |
yes. sorry , this has not been priority for me right now as i am putting all cycles into getting SQLAlchemy 1.4 out. are you blocked by this issue right now? my understanding is you can use a custom cache backend subclass for now. |
No worries, it's not a priority for now, but it is going to block the upgrade of some of my projects to python3.8 (maybe other users would experience the same?). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 06f449c of this pull request into gerrit so we can run tests and reviews and stuff
Patchset 06f449c added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
Ok it looks like you did all that new testing stuff, so this is far along, OK. |
mike bayer (zzzeek) wrote: openstack-recheck View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
Yes, this supersedes #190 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
(2 comments)
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
@@ -75,7 +79,17 @@ def __init__(self, arguments): # pragma NO COVERAGE | |||
passed to :func:`.make_registry`. | |||
|
|||
""" | |||
raise NotImplementedError() | |||
|
|||
serializer = arguments.get("serializer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
i'm going to move these up to CacheRegion where they will be similar to KeyMangler, taking the default from the cache backend itself.
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
@@ -252,6 +255,7 @@ def f(): | |||
time.sleep(0.5) | |||
|
|||
f() | |||
# TODO: Is this test doing anything useful, given that we return here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
this is a mistake that got committed at some point.
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
mike bayer (zzzeek) wrote: OK so as you can see, a whole lot happened in my patch here. here is the basic idea:
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
mike bayer (zzzeek) wrote: Workflow+1 hey just a reach out to a few peeps, where we propose a dogpile 1.1 that moves serialization to be a feature of the region. backends now deal with bytes in those cases where an explicit serializer is in use. i think this is a good change but i did it pretty quick yesterday so any concerns would be appreciated. View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
(21 comments)
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
- dogpile/cache/api.py (line 53): call this BackendValue and it is:
BackendValue = Union[bytes, NoValue, NoneType, CachedValue]
- dogpile/cache/api.py (line 54): also we can add
KeyType = Any
for now, I'd like to say keys are strings in all cases but I'm not sure that's true
- dogpile/cache/api.py (line 67): serializer: Optional[Callable] = None
deserializer: Optional[Callable] = None - dogpile/cache/api.py (line 142): annotate
def get(self, key: KeyType) -> BackendValue
- dogpile/cache/api.py (line 152): def get_multi(self, keys: Sequence[KeyType]) -> Sequence[BackendValue]
- dogpile/cache/api.py (line 163): def set(self, key: KeyType, value: BackendValue) -> None
- dogpile/cache/api.py (line 175): def set_multi(self, Mapping[KeyType, BackendValue]) -> None
- dogpile/cache/api.py (line 198): def delete(self, key: KeyType) -> None
- dogpile/cache/api.py (line 212): def delete_multi(self, keys: Sequence[KeyType]) -> None
- dogpile/cache/region.py (line 294): typo
- dogpile/cache/region.py (line 729): this could be a single function self._get_from_backend(key)
- dogpile/cache/region.py (line 801): this could be a single function self._get_multi_from_backend(keys)
- dogpile/cache/region.py (line 931): self._get_from_backend(key)
- dogpile/cache/region.py (line 952): self._set_to_backend(key, value)
- dogpile/cache/region.py (line 1097): self._get_multi_from_backend(mangled_keys)
- dogpile/cache/region.py (line 1127): self._set_multi_to_backend(mapping)
- dogpile/cache/region.py (line 1132): self._set_multi_to_backend({k: v for k, v in values_w_created.items() if should_cache_fn(v)})
- dogpile/cache/region.py (line 1147): probably rename this to _create_cached_value(), hard to follow these names
- dogpile/cache/region.py (line 1165): I'm sort of arbitrarily trying to save on method calls here by inlining the serialization and I think I should start out by not doing that, the serialization should be in just one place.
- dogpile/cache/region.py (line 1184): let's just for now put the format here in a function _serialize_cached_value_parts(self, metadata, payload)
also let's make sure we are using the terms "value" (a CachedValue) and "payload" (the user's data) consistently, or perhaps use the terms "cached_value" and "payload"
- dogpile/cache/region.py (line 1191): CachePayloadType is not really accurate here. from the backend we can get:
- a binary string that has a metadata/payload
- a CachedValue
- NO_VALUE
- None
perhaps roll these up into BackendValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
(1 comment)
Just a minor comment
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
- dogpile/cache/region.py (line 1197): metadata, _, payload = value.partition(b'|')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
(1 comment)
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
- dogpile/cache/region.py (line 1197): nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
(1 comment)
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
- dogpile/cache/region.py (line 482): In general I'm not too convinced regarding cast, since it adds a function call at runtime.
I'm not sure if this was added because mypy is not happy, but maybe we could use "#type ignore" if it cannot infer it?
mike bayer (zzzeek) wrote: recheck View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mike bayer (zzzeek) wrote:
(1 comment)
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283
- dogpile/cache/region.py (line 482): this is for correct use of mypy. im not too concerned about fn call overhead. overall mypy is pointing us to all the ambiguous stuff in the API like arguments that take three types of values.
mike bayer (zzzeek) wrote: Code-Review+2 Workflow+2 View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 |
Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/dogpile.cache/+/2283 has been merged. Congratulations! :) |
After this pull request sqlalchemy/dogpile.cache#191 Pickle is called on memory cache
After this pull request sqlalchemy/dogpile.cache#191 Pickle is called on memory cache
After this pull request sqlalchemy/dogpile.cache#191 Pickle is called on memory cache
After this pull request sqlalchemy/dogpile.cache#191 Pickle is called on memory cache
After this pull request sqlalchemy/dogpile.cache#191 Pickle is called on memory cache
After this pull request sqlalchemy/dogpile.cache#191 Pickle is called on memory cache
This PR is a continuation of #190.
The idea is to have CacheBackend subclasses decoupled from the serialization logic. The serializer/deserialzier can be fully configured by the end user, but each subclass can default to something that makes sense for each backend library.
For example, Redis client and file system only works with bytes objects, Memcached clients (pylibmc, python-memcached, bmemcached) can work with all sort of objects, but they have debatable defaults for the pickle protocol they use.
Memory clients can work with all sort of objects too, and they don't need to pickle.