Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solid Cache should be the default caching backend for Rails 8 #50443

Open
2 tasks
dhh opened this issue Dec 26, 2023 · 27 comments
Open
2 tasks

Solid Cache should be the default caching backend for Rails 8 #50443

dhh opened this issue Dec 26, 2023 · 27 comments
Labels
Milestone

Comments

@dhh
Copy link
Member

dhh commented Dec 26, 2023

Like with Solid Queue, Solid Cache gives us a database-agnostic backend for Rails.cache that works well as an out-of-the-box default in production – without any configuration needed or dependencies (like Redis) required.

The tables should be setup out of the box with "rails new", but you should be able to avoid this using --skip-solid-cache or just --skip-solid.

Work outstanding:

  • Add size-based trimming to prevent SC from filling up tiny DBs.
  • Release Solid Cache 1.0

cc @djmb

@dhh dhh added the railties label Dec 26, 2023
@dhh dhh added this to the 8.0.0 milestone Dec 26, 2023
@frederikspang
Copy link

Could this be a switch like —-cache=memory,solid,redis like we have for --database and —-javascript - or are we going for a “SolidCache or setup yourself” kinda direction?

@inkyvoxel
Copy link

inkyvoxel commented Dec 27, 2023

Would —skip-solid skip all the ‘Solid’ features e.g. Cache and Queues? (I’m sure there’s a Snake hiding somewhere here too)

@byroot
Copy link
Member

byroot commented Dec 27, 2023

As mentioned in person in Amsterdam, I don't think SolidCache is a good default because:

  • It's not LRU but FIFO, so it needs a lot of space to thrive
  • It needs a specialized configured backing DB to perform well.

Given that a very large part of users (especially people trying Rails for the very first time) are using a managed database with fairly low limits on the number of rows and total storage, it would be a big footgun.

Even for users going the rented server + Kamal route, hosting a memcached with default settings will likely perform better and be less work than setting up a second MySQL or Postgres server with a tweaked config.

And even for users going with SQLite, it means they are on a single server, so FileStore should be adequate.

I agree it's annoying we can't have a cache store setup by default, but I don't think SolidCache is the solution.

If anything I'd be more in favor of enabling the FileStore as a default (even though it makes cache.delete inconsistent if you have multiple servers)

@dhh
Copy link
Member Author

dhh commented Dec 27, 2023

Here are the problems I'd like to solve:

  • The cache store should be auto-limit out of the box. A default file store cache will fill the disk until it's out of space. Solid Cache can be set with a conservative limit, like, say 1 week cache for starters.
  • The cache store should be multi-machine by default. Adding two dynos for the same app should share the same cache. That also rules out the file store, but would allow something like Redis.
  • The cache store should not require any additional moving parts than what's included by Rails by default. That makes Redis a tough fit.

But let's validate whether these design goals are compatible with the performance of envelopes of common, low-end VMs.

I'd be surprised if you run into DB-related issues, given a low default Solid Cache limit, before you'd run out of other resources at the low end. If anything, the reverse may well be true. That more effective and longer lived caches makes your app perform better, even if it does lean on a constrained DB.

But this would be good to test! So let's open this issue to people who'd like to help us discern those factors. Try to setup Solid Cache on small VMs, run a bunch of benchmarks against an app that uses caching, and let's see where things might fall over.

Appreciate the concerns you raise, @byroot! It's absolutely possible to add bad caching to the mix and make things worse. So we need to avoid that.

@byroot
Copy link
Member

byroot commented Dec 27, 2023

  • A default file store cache will fill the disk until it's out of space.

That's a good point It's something that we could try to improve though. Memsize would be complicated, but we could cap the number of entries, and LRU. Wouldn't be exactly ideal but doable.

The cache store should be multi-machine by default. Adding two dynos for the same app should share the same cache. That also rules out the file store

Yes and no. It only rule it out of you delete or overwrite existing keys, which IMO isn't great design since most caches are eventually consistent, but that's a much longer debate 😄 .

There is also the question of whether once you have multiple machine you aren't already past the point where it's OK to setup a dedicated cache service. I agree that Redis being a swiss army knife it requires a careful config, but Memcached is beyond trivial to setup.

So that's where I don't quite follow the direction. As I believe we both agree, defaults should try to optimize for the most common use case, people who know what they are doing shouldn't be afraid to change the defaults.

So I'm trying to put myself in the shoes of someone starting a new Rails app, that's not gonna handle a huge load right away.

In my mind they are either starting with a PaaS (e.g. Heroku, Fly, Render, etc):

  • Databases there have a limited number of row and disk space.
  • They can't be tuned for SolidCache
  • It's very expensive to go on a higher tier.
  • That's where they are likely to have multiple dynos relatively early, because of how weak each dyno is.

Or they are starting with some cheap VPS or bare metal (e.g. capistrano or Kamal), in which case:

  • They will likely stick with a single machine for a long time, making a local store acceptable.
  • Even when they'll move to multiple machines, Memcached will be much easier to setup, monitor etc than one extra MySQL or PostgreSQL database.

That's why I don't see SolidCache fitting the bill for being the default. It's a bit too situational, and can turn into a footgun if it fills the limited database.

@dhh
Copy link
Member Author

dhh commented Dec 28, 2023

What are those limits? On say Heroku? If you're running a tiny app on a tiny dyno, you'll presumably also have few users, and thus not much data to cache? So I think these things go together, but let's explore.

I think the current situation is not good. There's no default, persistent cache that won't fill up your disk. That's a problem we should fix. Solid Cache fixes that, but maybe in a way that demands too much of tiny DBs?

On small systems, though, I reckon you're more likely to be constrained with memory (Redis) and CPU (no cache) before disk (DB). Will explore some testing to validate this hypothesis.

@byroot
Copy link
Member

byroot commented Dec 28, 2023

What are those limits? On say Heroku?

From: https://elements.heroku.com/addons/heroku-postgresql

Name Row Limit Size Limit Price
Mini 10k 1G $5
Basic 10M 10G $9
Standard 0 None 64GB $50

I think the current situation is not good. [...] That's a problem we should fix.

I totally agree with the premise, I just don't see a solution 😢.

@dhh
Copy link
Member Author

dhh commented Dec 28, 2023

Okay. If you're on that tiny tier, you may well not want to use that space for caching. But you could also just not. By default, Rails doesn't actually cache anything.

I don't want to design primarily for such a poor setup. 1GB/10K seems like limits that probably made sense for Heroku in 2009, and then just weren't ever updated since.

A $7 DO Droplet has 25GB of storage, which could be used by a database.

We are never going to satisfy all the constraints. But a default setup that's multi-machine, uses disks over RAM, and supports both auto-trimming and encryption out of the box seems superior to what we have now.

Then we can document what to do if you still want a large cache but live under the severe constraints of the smallest possible cloud VMs. Either way, even at 10K/1GB, you're going to be fine for a long time.

@dhh
Copy link
Member Author

dhh commented Dec 28, 2023

But would love if anyone is ALSO interested in improving the filestore with trimming controls and encryption. Would be nice to have great options for both files, DB, and Redis/memcached.

@byroot
Copy link
Member

byroot commented Dec 28, 2023

I don't want to design primarily for such a poor setup. [...] even at 10K/1GB, you're going to be fine for a long time.

Note that my apprehension isn't so much about being suitable for such small setups, but more about failure mode. My big worry is someone create a new Rails app, deploy it to one of these platforms (there are tons of tutorials for that), it works fine for a while until suddenly the DB is full and everything falls apart.

If it was only taking down the cache I wouldn't mind, but here it would also take down the app primary features.

It's really about "unknown unknowns", when a user opt-in to a feature, we can consider they are responsible for making sure it will work for their setup. When it's the default it's more our responsibility to make sure it won't bite them.

improving the filestore [...] encryption

I have a big refactoring of Active Support Cache in the back of my mind for a couple years now, to solve a few perf problems but also make this sort of stuff easier. Not sure if / when I'll get around to work on it though.

@dhh
Copy link
Member Author

dhh commented Dec 28, 2023

Yeah filling up the DB with caching entries is a no go. Let's make this contingent on having a space-based limit in Solid Cache rather than just the current time-based limit. Then we can ensure that we only start the original setting at 100-250mb, leaving lots of room for data on a 1GB-capped DB. We will get that sorted before proceeding 👌

@dhh
Copy link
Member Author

dhh commented Dec 28, 2023

But I would ALSO love to see a better file store with both auto-trimming and encryption.

@igorkasyanchuk
Copy link
Contributor

I wish I could have solid_cache, file_store_cache, and redis_cache in one app.

e.g.

Rails.cache_storage(:redis).cache do ... end 
Rails.cache_storage(:memcached).cache do ... end 
# or
config.controller.cache_storage = :file

@dhh
Copy link
Member Author

dhh commented Dec 29, 2023

@igorkasyanchuk Can you explain more why you need/want to use multiple stores?

@skatkov
Copy link

skatkov commented Dec 30, 2023

Solid Cache with SQLite seems like a better choice than Filestore (probably in most cases).

SQLite seems more efficient at storage, it reads less from disk, more performant in my synthetic tests.

@dhh

@igorkasyanchuk Can you explain more why you need/want to use multiple stores?

composite_cache_store explains really well why you might want to use multiple cache stores.

@dhh
Copy link
Member Author

dhh commented Dec 31, 2023

Not sure that's worth the effort for most, but not opposed to let people use different stores in different blocks. Like we do with the multi-db setup. PDI.

@skatkov
Copy link

skatkov commented Jan 1, 2024

@byroot

I have a big refactoring of Active Support Cache in the back of my mind for a couple years now, to solve a few perf problems but also make this sort of stuff easier. Not sure if / when I'll get around to work on it though.

Would be interesting to hear what you would change? Somebody else might pick up this work (maybe me, I've been dabbling with rails cache related functionality last year).

@igorkasyanchuk
Copy link
Contributor

@igorkasyanchuk Can you explain more why you need/want to use multiple stores?

for example, if I have a server with not a lot of RAM, but I still want to use Memcached for example + use file storage to do page/action caching because I don't want to use a lot of RAM.

@dhh
Copy link
Member Author

dhh commented Jan 2, 2024

Is this a situation you've actually been in or a theory? Again, not necessarily against exploring it, but it's gotta be an extraction, not a speculation.

@simi
Copy link
Contributor

simi commented Jan 2, 2024

I would appreciate multi-backend for transition period (for migrating from Memcached to Solid Cache for example) to deploy code able to serve from old cache, but warw the new one to prevent missing the whole cache after cache storage switch. That would make transition to Solid Cache for example much smoother for some apps relying on cache a much, since cold cache could put a stress on DB (or other backend doing the hard work to warm the cache).

On the other side, it could be done manually by initializing cache store temporarily manually (like new_cache = ActiveSupport::Cache::MyStore.new) and handle the warmup manually. Could be at least mentioned in guides.

@igorkasyanchuk
Copy link
Contributor

Is this a situation you've actually been in or a theory? Again, not necessarily against exploring it, but it's gotta be an extraction, not a speculation.

yes, for me was a real case. It was some time ago and my app was very simple, with many almost static pages and some stats that I wanted to cache in memory (for better performance). If in the future we can have such flexibility it would be great to be able to specify storage for caching.

And thanks for your questions.

@dhh
Copy link
Member Author

dhh commented Jan 3, 2024

Gotcha, yeah, I like the idea of a certain cache store governing a block. Please do look into that.

@djmb
Copy link
Contributor

djmb commented Jan 4, 2024

@igorkasyanchuk - you can set a different cache store for fragment caching already with config.action_controller.cache_store = ..... Would this have been enough for your situation?

@simi - this is what we used in Basecamp to switch from Redis to Solid Queue.

We assigned X percent of traffic to Solid Cache or Redis by hashing the cache key and gradually shifted more traffic over over the course of a week. I don't know if it would work as a generic cache splitter - we were only using it for Rails fragment caching so there may be some cases or cache methods it doesn't work well with.

@djmb
Copy link
Contributor

djmb commented Jan 9, 2024

I've got some notes on how I'm planning to estimate the cache size here.

I'd also like to introduce an indexed key_hash column which would be a 64 bit integer so the lookup index can be more compact (@byroot - this was something you suggested to me at Rails World). I think though that this is something that is only worth doing if we also drop the index on key which means losing support for delete_matched.

Would that be acceptable, or it be a requirement to support it if Solid Cache was going to be the Rails default?

MemCacheStore doesn't support it and RedisCacheStore does it by scanning all the keys which is not going to be a good idea with a large cache. I suppose we could have a similar slow delete_matched that checks batches of records, but that seems like something to be avoided.

@byroot
Copy link
Member

byroot commented Jan 9, 2024

this was something you suggested to me at Rails World

Yup, that should quite drastically reduce your index size.

Would that be acceptable, or it be a requirement to support it if Solid Cache was going to be the Rails default?

IMO delete_matched should be deprecated and removed from all stores. It's really a terrible pattern that most K/V stores either can't handle at all, and do but with absolutely terrible performance. Even the file-store and in-memory store will do this with O(N) performance, so yeah it's a total anti-pattern if you ask me.

I did ask on campfire why it was there though, and @dhh and @jeremy suggested a few use cases like clearing cache in development, and clearing a customer cache. IMO the former is better handled by clearing the entire cache, and the later by just throwing the decryption key given your cache entries are encrypted.

All this to say I think delete_matched should be deprecated in 7.2, and removed in 8.0.

@djmb
Copy link
Contributor

djmb commented Jan 9, 2024

I did ask on campfire why it was there though, and @dhh and @jeremy suggested a few use cases like clearing cache in development, and clearing a customer cache. IMO the former is better handled by clearing the entire cache, and the later by just throwing the decryption key given your cache entries are encrypted.

I'm going to keep the key column in the solid_cache_entries table, but leave it unindexed, so we can confirm there wasn't a cache collision (not impossible with 64 bit hashes and very large caches). So if someone was motivated they can add an index on key and implement delete_matched themselves.

All this to say I think delete_matched should be deprecated in 7.2, and removed in 8.0.

Sounds good! Another point in favour of this is that delete_matched implementations are not consistent anyway - MemoryStore and FileStore use regex, RedisCacheStore uses Redis globs and the current SolidCacheStore implementation uses a SQL LIKE.

@byroot
Copy link
Member

byroot commented Jan 9, 2024

I'm going to keep the key column in the solid_cache_entries table, but leave it unindexed, so we can confirm there wasn't a cache collision (not impossible with 64 bit hashes and very large caches).

If that is a concern, you can store 128 bit hashes in a pair of bigint (int64) columns.

But yes, keeping the key doesn't cost much as long as it's not indexed, so why not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants