Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified storage interfaces #1184

Merged
merged 28 commits into from
Feb 19, 2023
Merged

Unified storage interfaces #1184

merged 28 commits into from
Feb 19, 2023

Conversation

provinzkraut
Copy link
Member

@provinzkraut provinzkraut commented Feb 10, 2023

Create a new storage module providing abstraction for storing data in:

  • Redis
  • Memory
  • Files

These new backends are used by the cache as well as session backends, reducing code and test duplication.

Some more simplifications were made in the process:

  • Backends are now async only
  • Backends deal with bytes only (since this is our only use case for them)
  • Dropped locking from the Cache class and shifted into the storage backends where needed
  • Added renew-on-access functionality to storage backends, and server-side sessions

The SQLAlchemy session backend was removed since it was adding unnecessary complexity for a use case we don't want to encourage anyway.

TODO

  • Add proper namespacing for storages to allow selective pruning / clearing of different "substorages"
  • Fix session examples
  • Update documentation

PR Checklist

  • Have you followed the guidelines in CONTRIBUTING.rst?
  • Have you got 100% test coverage on new code?
  • Have you updated the prose documentation?
  • Have you updated the reference documentation?

@provinzkraut provinzkraut changed the title Unified storage interfaces RFC: Unified storage interfaces Feb 10, 2023
@provinzkraut provinzkraut added this to the 2.0 milestone Feb 10, 2023
Copy link
Contributor

@Goldziher Goldziher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! great stuff.

starlite/middleware/session/server_side.py Outdated Show resolved Hide resolved
starlite/middleware/session/server_side.py Outdated Show resolved Hide resolved
starlite/middleware/session/server_side.py Outdated Show resolved Hide resolved
starlite/middleware/session/server_side.py Show resolved Hide resolved
starlite/storage/base.py Outdated Show resolved Hide resolved
starlite/storage/file_backend.py Outdated Show resolved Hide resolved
starlite/storage/file_backend.py Outdated Show resolved Hide resolved
starlite/storage/memcached_backend.py Outdated Show resolved Hide resolved
@provinzkraut
Copy link
Member Author

provinzkraut commented Feb 12, 2023

One thing this still needs for the memory and file backends is a way to (semi) automatically evict expired items. Especially for the in-memory storage when used as a cache this could become a problem otherwise, since items would only be evicted if you're trying to access them.

There are two general ways to go about this:

  1. Instruct the user to call delete_expired periodically from e.g. saq or a long running asyncio task (wouldn't work for trio then)
  2. Periodically evict expired items when the cache is accessed

For the second option, there are more things to consider:

When should eviction be performed?

  1. Time based
  2. Counter based
  3. Randomly

How should items be evicted?

  1. Check all items if they're expired, and if needed evict them. Since we can't perform this work in the background if this is done when interacting with the storage, it might potentially cause large spikes in response times for single responses
  2. Select a random sample, and if needed evict expired items. This is a bit more involved, and doesn't guarantee all possible space is freed, but has the benefit of potentially being more performant
  3. A hybrid solution of 1 and 2, foregoing random sampling below a certain threshold

Both solutions 1 and 2 can cause performance issues, depending on the size of the namespace. For smaller ones, checking every item would be more performant, while for larger ones, random sampling would be.


We could of course also make periodic eviction opt-in, and let the user decide how they want to handle it, based on their use case.

Another option would be to simply file this under "If you need this functionality, use a redis backend".

@yocalebo
Copy link

Using collections.deque and setting a maxlen (or have the user specify one) would be a relatively cheap way of ensuring the cache doesn't grow outside the bounds the user requested. Also, the additional functionality you describe is best served by redis imo. It's purpose built for this very task 😄

@provinzkraut
Copy link
Member Author

Using collections.deque and setting a maxlen (or have the user specify one) would be a relatively cheap way of ensuring the cache doesn't grow outside the bounds the user requested.

This isn't just a cache. These storage backends are also used for sessions, for which this would be very undesirable behavior.

Also, the additional functionality you describe is best served by redis imo

I kinda agree, that's why I mentioned it as an option. On the other hand, especially for use as a cache, in-memory storage that's not reliant on another service is very convenient, and for many use cases good enough.

@peterschutt
Copy link
Contributor

saq or a long running asyncio task (wouldn't work for trio then)

How about an after response hook?

Given there is no one-size-fits-all solution to clearing expired cache objects, and that we'd only really want to encourage using in-memory storage when the user knows what they are getting themselves into, I think we should document that the user has to take on that responsibility themselves, and perhaps offer some example solutions as you've listed above.

@provinzkraut provinzkraut force-pushed the unified-storage-interfaces branch 2 times, most recently from b924864 to 63f1559 Compare February 15, 2023 13:15
@provinzkraut provinzkraut changed the title RFC: Unified storage interfaces Unified storage interfaces Feb 15, 2023
@provinzkraut
Copy link
Member Author

provinzkraut commented Feb 15, 2023

Okay, this is mostly done now and only missing docs.

One additional thing we might want to do is to add even more unification by getting rid of the Cache abstraction. This could offer a more "integrated" DX. My proposal for that would be:

  • Remove Cache abstraction
  • Replace app.cache with app.storage (we currently "abuse" the cache for non-cache things as well, e.g. rate-limiting. Using storage, which is by definition a generic key/value store would make more sense semantically)
  • Change CacheConfig and ServerSideSessionConfig in such a way that they accepts an optional storage_backend, but if none is specified, falls back to app.storage. This could also be neatly integrated with redis virtual namespaces etc so they automatically get a STARLITE_CACHE / STARLITE_SESSIONS namespace assigned

@starlite-api/maintainers wdyt?

@Goldziher
Copy link
Contributor

Okay, this is mostly done now and only missing docs.

One additional thing we might want to do is to add even more unification by getting rid of the Cache abstraction. This could offer a more "integrated" DX. My proposal for that would be:

  • Remove Cache abstraction
  • Replace app.cache with app.storage (we currently "abuse" the cache for non-cache things as well, e.g. rate-limiting. Using storage, which is by definition a generic key/value store would make more sense semantically)
  • Change CacheConfig and ServerSideSessionConfig in such a way that they accepts an optional storage_backend, but if none is specified, falls back to app.storage. This could also be neatly integrated with redis virtual namespaces etc so they automatically get a STARLITE_CACHE / STARLITE_SESSIONS namespace assigned

@starlite-api/maintainers wdyt?

I think the cache abstraction makes it easy to work with the cache and offers a uniform interface. It's meant to be user facing and simple to operate, as a facade. Do you think your proposal will offer the same kind of DX?

As for storage instead if cache, i think semantics are arguable here and cache is better and easier.

@provinzkraut
Copy link
Member Author

I think the cache abstraction makes it easy to work with the cache and offers a uniform interface. It's meant to be user facing and simple to operate, as a facade. Do you think your proposal will offer the same kind of DX?

I think it would, since it removes one layer of abstraction that doesn't do much anyway. With this PR, Cache just proxies all its methods to the Storage 1:1, it only adds the key builder method. This logic however is specific to request caching, and could just live there. If we were to expose the storage as a general storage, it would make sense to not have request-specific things there.

I'd argue it would actually be a better DX to have things more integrated, but I guess that's a general question whether or not we want to go that direction. I'm not yet sure what the best API for this is though, maybe something like the getLogger, where you can request a specific cache / type of cache?

As for storage instead if cache, i think semantics are arguable here and cache is better and easier.

I'm impartial to naming it either way.

@Goldziher
Copy link
Contributor

I think the cache abstraction makes it easy to work with the cache and offers a uniform interface. It's meant to be user facing and simple to operate, as a facade. Do you think your proposal will offer the same kind of DX?

I think it would, since it removes one layer of abstraction that doesn't do much anyway. With this PR, Cache just proxies all its methods to the Storage 1:1, it only adds the key builder method. This logic however is specific to request caching, and could just live there. If we were to expose the storage as a general storage, it would make sense to not have request-specific things there.

I'd argue it would actually be a better DX to have things more integrated, but I guess that's a general question whether or not we want to go that direction. I'm not yet sure what the best API for this is though, maybe something like the getLogger, where you can request a specific cache / type of cache?

As for storage instead if cache, i think semantics are arguable here and cache is better and easier.

I'm impartial to naming it either way.

The important thing is that the method names and signatures are standardized by the abstraction. If this is the case in your refactor and you always have the same methods / signatures etc, you can remove the cache abstraction.

@provinzkraut
Copy link
Member Author

provinzkraut commented Feb 17, 2023

If this is the case in your refactor and you always have the same methods / signatures etc, you can remove the cache abstraction.

It is! That's one of the main reasons for this refactor.

There is one additional method on the file and memory storage, but they all implement the Storage interface, which has the core functionality for get, set, delete, etc.

@provinzkraut provinzkraut mentioned this pull request Feb 18, 2023
4 tasks
@provinzkraut provinzkraut marked this pull request as ready for review February 18, 2023 20:28
@provinzkraut provinzkraut requested a review from a team as a code owner February 18, 2023 20:28
@provinzkraut
Copy link
Member Author

Okay, I want to take the steps towards removing the Cache abstraction, and possibly integrating the storage pattern more, but I think that should be in a separate PR. There's also other things that should be taken care of first, so I'd like to get this merged for now.

@provinzkraut provinzkraut merged commit cd19b95 into main Feb 19, 2023
@provinzkraut provinzkraut deleted the unified-storage-interfaces branch February 19, 2023 14:00
@sonarcloud
Copy link

sonarcloud bot commented Feb 19, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

96.8% 96.8% Coverage
0.0% 0.0% Duplication

@provinzkraut provinzkraut mentioned this pull request Mar 10, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants