Skip to content

RedisCache - Data Persistence #947

Merged
SebastiaanZ merged 79 commits into
masterfrom
redis_persistence
May 28, 2020
Merged

RedisCache - Data Persistence #947
SebastiaanZ merged 79 commits into
masterfrom
redis_persistence

Conversation

@lemonsaurus
Copy link
Copy Markdown
Contributor

@lemonsaurus lemonsaurus commented May 17, 2020

This pull requests adds a new feature that the bot can use to add persistent storage.

This works by utilizing a Redis server as a cache, while providing a user-friendly dict-like interface.

Let's have a look at how it works and exactly what this pull request contains.

Creating a new RedisCache

The star of the show is the RedisCache. This must be created as a class attribute.

class SomeClass:
    my_cache = RedisDict()
  • When we create it as a class attribute, it will automatically create a new hashmap in Redis that uses the class name and the attribute name as its components - for example, SomeClass.my_cache, in this case.

Using a RedisCache

Once you have created this object, it's a lot like an async-friendly dictionary interface. Redis only supports strings and numbers, so all keys and values must be ints, floats, or strings. If you need to store something more complex inside of it, consider writing a to_json and from_json method for your complex object, so you can easily jsonify the object going in and out of the cache.

The following features are available, and mostly behaves like their dictionary counterparts:

  • .get(key, default=None) - Get a value. If the key doesn't exist, returns default instead.
  • .set(key, value) - Set a value in the cache.
  • .delete(key) - Delete a specified key value pair.
  • .clear() - Clear the entire cache.
  • .contains(key) - Return True if key exists in cache, otherwise False.
  • .pop(key, default=None) - Get a value and remove it from the cache. If it doesn't exist, return default instead.
  • .items() - This returns an AsyncIterator, so you need to async for to iterate it.
  • .update(dictionary) - Update the cache with values in the provided dict. This will overwrite the value for any key that's already in the cache.
  • .to_dict() - Returns a dictionary with all key value pairs in the cache.
  • .increment(key, amount) - increments the value at key by amount
  • .decrement(key, amount) - decrements the value at key by amount

Valid keys are integers and strings.
Valid values are integers, strings, and floats.

What else is in this PR?

Well, there's a full suite of tests, and we've added Redis to the docker-compose file so it will be possible to use this when running the bot locally.

We also add a few new dependencies, all Redis related.

How do I test this?

Well, you'll need to add a few new variables to your config.yml. Unless you plan to set up Redis (there's a docker-compose service if you want to), you can use this config:

bot: 
    redis:
        host:  "redis"
        port:  6379
        password: ""
        use_fakeredis: true

This will allow the bot to run with fakeredis instead of trying to connect to a Redis server, so you will not need to set up any additional dependencies.

Closes #884

Leon Sandøy added 13 commits May 16, 2020 12:23
This is almost hilariously easy since we can just use
the official image for it.
This is the module we will be using to interface with Redis.
We're using __init_subclass__ to initialize our RedisDict with the
subclass name as a namespace. This will be prefixed to all data that
we store, so that there won't be collisions between different
subclasses.
It was brought to my attention that we may need several caches per Cog
for some of our Cogs. This means that the original approach of having
this be a mixin is a little bit problematic.

Instead, RedisDict will be instantiated directly inside the class you
want it in. By leveraging __set_name__, we can create a namespace
containing both the class name and the variable name without the user
having to provide anything.

For example, if you create an attribute MyClass.cache = RedisDict(),
this will be using the redis namespace 'MyClass.cache.' before anything
you store in it.

With this approach, it is also possible to instantiate a RedisDict with
a custom namespace by simply passing it into the constructor.

- RedisDict("firedog") will create items with the 'firedog.your_item'
  prefix.
- If there are multiple RedisDicts using the same namespace, an
  underscore will be appended to the namespace, such that the second
  RedisDict("firedog") will actually create items in the
  'firedog_.your_item' namespace.

This is also possible to use outside of classes, so long as you provide
a custom namespace when you instantiate it.

Custom namespaces will always take precedence over automatic
'Class.attribute_name' ones.
The rest of the features should be provided by the MutableMapping abc
we're interfacing. Specifically, MutableMapping provides these:
.pop, .popitem, .clear, .update, .setdefault, __contains__, .keys,
.items, .values, .get, __eq__, and __ne__.
This would've been implemented by MutableMapping, but that
implementation is O(n) instead of O(1) since it just iterates the
entire hash and does HDEL. Feels wasteful.
This is supposed to be provided by our MutableMapping mixin, but unit
tests are demonstrating that these don't really work as intended.
Turns out the MutableMapping class doesn't give us servicable
implementations of these, so we need to implement them ourselves.

Also, let's not have keys returned as bytestrings.
@lemonsaurus lemonsaurus requested a review from a team as a code owner May 17, 2020 01:57
@lemonsaurus lemonsaurus requested review from SebastiaanZ, eivl and scragly and removed request for a team May 17, 2020 01:57
lemonsaurus and others added 2 commits May 17, 2020 04:05
Turns out that bumping the flake8 version up to 3.8 introduces a long
list of new linting errors. Since this PR is the one that bumps the
version, I suppose we will also fix all the linting errors in this
branch.
@kwzrd kwzrd self-requested a review May 17, 2020 14:46
@kwzrd kwzrd closed this May 17, 2020
@kwzrd kwzrd reopened this May 17, 2020
Copy link
Copy Markdown
Contributor

@kwzrd kwzrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested yet, but I wanted to leave a couple of initial comments and questions. I have no experience with Redis so this is a learning opportunity for me.

It has taken me some time to figure out how the namespace is inferred when not provided explicitly. I appreciate the effort to have it done automatically, but I found the underlying machinery a little bit confusing. I'm not entirely convinced that it is necessary, either. Couldn't each cog simply provide a namespace explicitly? Would it be too difficult to detect potential collisions?

The PR description shows 3 ways in which the RedisDict can be instantiated. Could we also have this in the class docstring? My instinct would be to require that a namespace is provided everytime we instantiate RedisDict - seems like that would make the initialisation process be a lot simpler and transparent. It would also give us the fabled "one, obvious way to do it".

Comment thread bot/utils/redis_dict.py Outdated
Comment thread bot/utils/redis_dict.py Outdated
Comment thread bot/utils/redis_dict.py Outdated
Comment thread bot/utils/redis_dict.py Outdated
Comment thread bot/utils/redis_dict.py Outdated
Comment thread bot/utils/redis_dict.py Outdated
@kwzrd kwzrd added a: backend Related to internal functionality and utilities (error_handler, logging, security, utils and core) a: dependencies Related to package dependencies and management p: 1 - high High Priority status: needs review labels May 17, 2020
Leon Sandøy and others added 4 commits May 26, 2020 19:07
This just tests that the various RuntimeErrors are reachable - that
includes the error about not having a bot instance, the one about not
being a class attribute, and the one about not having instantiated the
class.

This test addresses a concern raised by @MarkKoz in a review.

I've decided not to test that actual contents of these RuntimeErrors,
because I believe that sort of testing is a bit too brittle. It
shouldn't break a test just to change the content of an error string.
Comment thread bot/utils/redis_cache.py
Comment thread tests/bot/utils/test_redis_cache.py
@kwzrd
Copy link
Copy Markdown
Contributor

kwzrd commented May 27, 2020

During my testing yesterday I also found out that the test cases aren't entirely isolated in the sense that values in the cache persist between them. My approach would have been to make sure the cache is cleared before each test (probably in the setUp), but I don't require that this is done. I just thought I'd mention this in case it's not intentional.

@aeros
Copy link
Copy Markdown
Contributor

aeros commented May 27, 2020

@kwzrd

My approach would have been to make sure the cache is cleared before each test (probably in the setUp), but I don't require that this is done.

I agree that it would make sense to clear the cache in between tests, otherwise it ends up being somewhat subject to side effects from previous tests and can lead to inconsistent results depending on the order the tests are executed in (which is typically something to avoid).

@lemonsaurus
Copy link
Copy Markdown
Contributor Author

My approach would have been to make sure the cache is cleared before each test (probably in the setUp)

Addressed in 35a1de3

The way we were doing the asyncio.Lock() stuff for increment was
slightly problematic. @aeros has adviced us that it's better to just
initialize the lock as None in __init__, and then initialize it inside
the first coroutine that uses it instead. This ensures that the correct
loop gets attached to the lock, so we don't end up getting errors like
this one:

RuntimeError: got Future <Future pending> attached to a different loop

This happens because the lock and the actual calling coroutines aren't
on the same loop. When creating a new test, test_increment_lock, we
discovered that we needed a small refactor here and also in the test
class to make this new test pass.

So, now we're creating a DummyCog for every test method, and this will
ensure the loop streams never cross. Cause we all know we must never
cross the streams.
Copy link
Copy Markdown
Contributor

@ks129 ks129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good for me. I see so much use cases for this.

Leon Sandøy and others added 4 commits May 27, 2020 20:15
Also added a test for this.

This is the DRYest approach I could find. It's a little ugly, but I
think it's probably good enough.
We're using functools.partialmethod to make the code a little cleaner
and more readable here.

Read more about them here:
https://docs.python.org/3/library/functools.html#functools.partial
https://docs.python.org/3/library/functools.html#functools.partialmethod
I should be shot.
Comment thread tests/bot/utils/test_redis_cache.py
Leon Sandøy added 2 commits May 28, 2020 00:44
- Refactor error messages in _to_typestring and _from_typestring to just
  print the prefix tuples instead of that custom error string.
- Create a RedisKeyOrValue type to simplify some annotations.
- Simplify partialmethod calls.
- Make the signatures for _to_typestring and _from_typestring one-liners
- Fix a typo in the errors.
Copy link
Copy Markdown
Contributor

@kwzrd kwzrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I did a function review yesterday & a smaller one today after the changes and everything seems to work fine. The real test will happen once a cog actually begins using the cache. All my comments were addressed so I'm happy to approve. Solid work @lemonsaurus!

Before merge, let's wait for @SebastiaanZ's approval.

The bot can get into trouble in three distinct ways:
- It has no Bot instance
- It has no namespace
- It has no parent instance.

These happen only if you're using it wrong. To make the test more
precise, and to add a little bit more readability (RuntimeError could be
anything!), we'll introduce some custom exceptions for these three
states.

This addresses a review comment by @aeros.
Comment thread bot/utils/redis_cache.py Outdated
This addresses a review comment by @aeros.
Comment thread bot/bot.py Outdated
@SebastiaanZ SebastiaanZ merged commit aa5aa24 into master May 28, 2020
@SebastiaanZ SebastiaanZ deleted the redis_persistence branch May 28, 2020 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a: backend Related to internal functionality and utilities (error_handler, logging, security, utils and core) a: dependencies Related to package dependencies and management p: 1 - high High Priority t: feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update pyflakes to version 2.2.0

7 participants