RedisCache - Data Persistence by lemonsaurus · Pull Request #947 · python-discord/bot

lemonsaurus · 2020-05-17T01:57:01Z

This pull requests adds a new feature that the bot can use to add persistent storage.

This works by utilizing a Redis server as a cache, while providing a user-friendly dict-like interface.

Let's have a look at how it works and exactly what this pull request contains.

Creating a new RedisCache

The star of the show is the RedisCache. This must be created as a class attribute.

class SomeClass:
    my_cache = RedisDict()

When we create it as a class attribute, it will automatically create a new hashmap in Redis that uses the class name and the attribute name as its components - for example, SomeClass.my_cache, in this case.

Using a RedisCache

Once you have created this object, it's a lot like an async-friendly dictionary interface. Redis only supports strings and numbers, so all keys and values must be ints, floats, or strings. If you need to store something more complex inside of it, consider writing a to_json and from_json method for your complex object, so you can easily jsonify the object going in and out of the cache.

The following features are available, and mostly behaves like their dictionary counterparts:

.get(key, default=None) - Get a value. If the key doesn't exist, returns default instead.
.set(key, value) - Set a value in the cache.
.delete(key) - Delete a specified key value pair.
.clear() - Clear the entire cache.
.contains(key) - Return True if key exists in cache, otherwise False.
.pop(key, default=None) - Get a value and remove it from the cache. If it doesn't exist, return default instead.
.items() - This returns an AsyncIterator, so you need to async for to iterate it.
.update(dictionary) - Update the cache with values in the provided dict. This will overwrite the value for any key that's already in the cache.
.to_dict() - Returns a dictionary with all key value pairs in the cache.
.increment(key, amount) - increments the value at key by amount
.decrement(key, amount) - decrements the value at key by amount

Valid keys are integers and strings.
Valid values are integers, strings, and floats.

What else is in this PR?

Well, there's a full suite of tests, and we've added Redis to the docker-compose file so it will be possible to use this when running the bot locally.

We also add a few new dependencies, all Redis related.

How do I test this?

Well, you'll need to add a few new variables to your config.yml. Unless you plan to set up Redis (there's a docker-compose service if you want to), you can use this config:

bot: 
    redis:
        host:  "redis"
        port:  6379
        password: ""
        use_fakeredis: true

This will allow the bot to run with fakeredis instead of trying to connect to a Redis server, so you will not need to set up any additional dependencies.

Closes #884

This is almost hilariously easy since we can just use the official image for it.

This is the module we will be using to interface with Redis.

We're using __init_subclass__ to initialize our RedisDict with the subclass name as a namespace. This will be prefixed to all data that we store, so that there won't be collisions between different subclasses.

It was brought to my attention that we may need several caches per Cog for some of our Cogs. This means that the original approach of having this be a mixin is a little bit problematic. Instead, RedisDict will be instantiated directly inside the class you want it in. By leveraging __set_name__, we can create a namespace containing both the class name and the variable name without the user having to provide anything. For example, if you create an attribute MyClass.cache = RedisDict(), this will be using the redis namespace 'MyClass.cache.' before anything you store in it. With this approach, it is also possible to instantiate a RedisDict with a custom namespace by simply passing it into the constructor. - RedisDict("firedog") will create items with the 'firedog.your_item' prefix. - If there are multiple RedisDicts using the same namespace, an underscore will be appended to the namespace, such that the second RedisDict("firedog") will actually create items in the 'firedog_.your_item' namespace. This is also possible to use outside of classes, so long as you provide a custom namespace when you instantiate it. Custom namespaces will always take precedence over automatic 'Class.attribute_name' ones.

The rest of the features should be provided by the MutableMapping abc we're interfacing. Specifically, MutableMapping provides these: .pop, .popitem, .clear, .update, .setdefault, __contains__, .keys, .items, .values, .get, __eq__, and __ne__.

This would've been implemented by MutableMapping, but that implementation is O(n) instead of O(1) since it just iterates the entire hash and does HDEL. Feels wasteful.

This is supposed to be provided by our MutableMapping mixin, but unit tests are demonstrating that these don't really work as intended.

Turns out the MutableMapping class doesn't give us servicable implementations of these, so we need to implement them ourselves. Also, let's not have keys returned as bytestrings.

Turns out that bumping the flake8 version up to 3.8 introduces a long list of new linting errors. Since this PR is the one that bumps the version, I suppose we will also fix all the linting errors in this branch.

kwzrd

I haven't tested yet, but I wanted to leave a couple of initial comments and questions. I have no experience with Redis so this is a learning opportunity for me.

It has taken me some time to figure out how the namespace is inferred when not provided explicitly. I appreciate the effort to have it done automatically, but I found the underlying machinery a little bit confusing. I'm not entirely convinced that it is necessary, either. Couldn't each cog simply provide a namespace explicitly? Would it be too difficult to detect potential collisions?

The PR description shows 3 ways in which the RedisDict can be instantiated. Could we also have this in the class docstring? My instinct would be to require that a namespace is provided everytime we instantiate RedisDict - seems like that would make the initialisation process be a lot simpler and transparent. It would also give us the fabled "one, obvious way to do it".

@MarkKoz

Thanks @MarkKoz!

@MarkKoz

This just tests that the various RuntimeErrors are reachable - that includes the error about not having a bot instance, the one about not being a class attribute, and the one about not having instantiated the class. This test addresses a concern raised by @MarkKoz in a review. I've decided not to test that actual contents of these RuntimeErrors, because I believe that sort of testing is a bit too brittle. It shouldn't break a test just to change the content of an error string.

kwzrd · 2020-05-27T10:04:34Z

During my testing yesterday I also found out that the test cases aren't entirely isolated in the sense that values in the cache persist between them. My approach would have been to make sure the cache is cleared before each test (probably in the setUp), but I don't require that this is done. I just thought I'd mention this in case it's not intentional.

aeros · 2020-05-27T10:19:10Z

@kwzrd

My approach would have been to make sure the cache is cleared before each test (probably in the setUp), but I don't require that this is done.

I agree that it would make sense to clear the cache in between tests, otherwise it ends up being somewhat subject to side effects from previous tests and can lead to inconsistent results depending on the order the tests are executed in (which is typically something to avoid).

lemonsaurus · 2020-05-27T10:22:30Z

My approach would have been to make sure the cache is cleared before each test (probably in the setUp)

Addressed in 35a1de3

@aeros

The way we were doing the asyncio.Lock() stuff for increment was slightly problematic. @aeros has adviced us that it's better to just initialize the lock as None in __init__, and then initialize it inside the first coroutine that uses it instead. This ensures that the correct loop gets attached to the lock, so we don't end up getting errors like this one: RuntimeError: got Future <Future pending> attached to a different loop This happens because the lock and the actual calling coroutines aren't on the same loop. When creating a new test, test_increment_lock, we discovered that we needed a small refactor here and also in the test class to make this new test pass. So, now we're creating a DummyCog for every test method, and this will ensure the loop streams never cross. Cause we all know we must never cross the streams.

ks129

This looks very good for me. I see so much use cases for this.

Also added a test for this. This is the DRYest approach I could find. It's a little ugly, but I think it's probably good enough.

We're using functools.partialmethod to make the code a little cleaner and more readable here. Read more about them here: https://docs.python.org/3/library/functools.html#functools.partial https://docs.python.org/3/library/functools.html#functools.partialmethod

I should be shot.

- Refactor error messages in _to_typestring and _from_typestring to just print the prefix tuples instead of that custom error string. - Create a RedisKeyOrValue type to simplify some annotations. - Simplify partialmethod calls. - Make the signatures for _to_typestring and _from_typestring one-liners - Fix a typo in the errors.

kwzrd

Looks good. I did a function review yesterday & a smaller one today after the changes and everything seems to work fine. The real test will happen once a cog actually begins using the cache. All my comments were addressed so I'm happy to approve. Solid work @lemonsaurus!

Before merge, let's wait for @SebastiaanZ's approval.

@aeros

The bot can get into trouble in three distinct ways: - It has no Bot instance - It has no namespace - It has no parent instance. These happen only if you're using it wrong. To make the test more precise, and to add a little bit more readability (RuntimeError could be anything!), we'll introduce some custom exceptions for these three states. This addresses a review comment by @aeros.

@aeros

This addresses a review comment by @aeros.

Leon Sandøy added 13 commits May 16, 2020 12:23

Adding redis to docker-compose file.

3aeb9e6

This is almost hilariously easy since we can just use the official image for it.

Adding redis-py to the Pipfile

5382dd8

This is the module we will be using to interface with Redis.

Boilerplate for the RedisCacheMixin

7a501fd

We're using __init_subclass__ to initialize our RedisDict with the subclass name as a namespace. This will be prefixed to all data that we store, so that there won't be collisions between different subclasses.

Add basic dict methods for RedisDict.

ee8386e

The rest of the features should be provided by the MutableMapping abc we're interfacing. Specifically, MutableMapping provides these: .pop, .popitem, .clear, .update, .setdefault, __contains__, .keys, .items, .values, .get, __eq__, and __ne__.

copy should dictify the .items(), not just keys.

5859711

Implements .clear with hash deletion.

9eeee1c

This would've been implemented by MutableMapping, but that implementation is O(n) instead of O(1) since it just iterates the entire hash and does HDEL. Feels wasteful.

Implement .get, equality, and membership check

677a7f7

This is supposed to be provided by our MutableMapping mixin, but unit tests are demonstrating that these don't really work as intended.

Add fakeredis to the Pipfile

0843728

Implement .pop, .popitem and .setdefault.

7cf0e83

Turns out the MutableMapping class doesn't give us servicable implementations of these, so we need to implement them ourselves. Also, let's not have keys returned as bytestrings.

Test suite for the redis dict.

bf6c113

Make redis host and port configurable.

258ad9d

namespace "general" -> "global"

8456023

lemonsaurus requested a review from a team as a code owner May 17, 2020 01:57

lemonsaurus requested review from SebastiaanZ, eivl and scragly and removed request for a team May 17, 2020 01:57

lemonsaurus and others added 2 commits May 17, 2020 04:05

Merge branch 'master' into redis_persistence

f19c38c

Fix linting errors introduced by flake8 3.8

e993566

Turns out that bumping the flake8 version up to 3.8 introduces a long list of new linting errors. Since this PR is the one that bumps the version, I suppose we will also fix all the linting errors in this branch.

lemonsaurus mentioned this pull request May 17, 2020

Update pyflakes to version 2.2.0 #884

Closed

Merge branch 'master' into redis_persistence

79f62db

kwzrd self-requested a review May 17, 2020 14:46

kwzrd closed this May 17, 2020

kwzrd reopened this May 17, 2020

kwzrd reviewed May 17, 2020

View reviewed changes

Comment thread bot/utils/redis_dict.py Outdated

Comment thread bot/utils/redis_dict.py Outdated

Comment thread bot/utils/redis_dict.py Outdated

Comment thread bot/utils/redis_dict.py Outdated

Comment thread bot/utils/redis_dict.py Outdated

Comment thread bot/utils/redis_dict.py Outdated

kwzrd added a: backend Related to internal functionality and utilities (error_handler, logging, security, utils and core) a: dependencies Related to package dependencies and management p: 1 - high High Priority status: needs review labels May 17, 2020

Leon Sandøy and others added 4 commits May 26, 2020 19:07

Improve some docstrings for RedisCache.

46a377d

Thanks @MarkKoz!

Swap the order for the validate_cache checks.

ec8205c

Merge branch 'master' into redis_persistence

38fcc10

MarkKoz approved these changes May 26, 2020

View reviewed changes

lemonsaurus mentioned this pull request May 27, 2020

Permit moderation commands in ModMail category #961

Merged

aeros reviewed May 27, 2020

View reviewed changes

Comment thread bot/utils/redis_cache.py

aeros reviewed May 27, 2020

View reviewed changes

Comment thread tests/bot/utils/test_redis_cache.py

Clear cache in asyncSetUp instead of tests.

35a1de3

ks129 approved these changes May 27, 2020

View reviewed changes

Leon Sandøy and others added 4 commits May 27, 2020 20:15

Floats are no longer permitted as RedisCache keys.

4db313e

Also added a test for this. This is the DRYest approach I could find. It's a little ugly, but I think it's probably good enough.

Fix ATROCIOUS comment.

63b81b0

I should be shot.

Merge branch 'master' into redis_persistence

44c3091

aeros reviewed May 27, 2020

View reviewed changes

Comment thread tests/bot/utils/test_redis_cache.py

Leon Sandøy added 2 commits May 28, 2020 00:44

Make prefix consts private and more precise.

11542fb

kwzrd approved these changes May 27, 2020

View reviewed changes

aeros reviewed May 28, 2020

View reviewed changes

Comment thread bot/utils/redis_cache.py Outdated

Prevent a state where a coro could wait forever.

b7c30d4

This addresses a review comment by @aeros.

SebastiaanZ reviewed May 28, 2020

View reviewed changes

Comment thread bot/bot.py Outdated

Move the self.redis_closed into session create.

cc45960

SebastiaanZ approved these changes May 28, 2020

View reviewed changes

SebastiaanZ merged commit aa5aa24 into master May 28, 2020

SebastiaanZ deleted the redis_persistence branch May 28, 2020 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RedisCache - Data Persistence #947

RedisCache - Data Persistence #947
SebastiaanZ merged 79 commits into
masterfrom
redis_persistence

lemonsaurus commented May 17, 2020 •

edited

Loading

Uh oh!

kwzrd left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kwzrd commented May 27, 2020

Uh oh!

aeros commented May 27, 2020

Uh oh!

lemonsaurus commented May 27, 2020

Uh oh!

ks129 left a comment

Uh oh!

Uh oh!

kwzrd left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

lemonsaurus commented May 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Creating a new RedisCache

Using a RedisCache

What else is in this PR?

How do I test this?

Uh oh!

kwzrd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kwzrd commented May 27, 2020

Uh oh!

aeros commented May 27, 2020

Uh oh!

lemonsaurus commented May 27, 2020

Uh oh!

ks129 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kwzrd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

lemonsaurus commented May 17, 2020 •

edited

Loading