Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to ignore counter cache columns while they are backfilling #51453

Merged
merged 1 commit into from Apr 2, 2024

Conversation

fatkodima
Copy link
Member

See the full feature discussion at https://discuss.rubyonrails.org/t/new-feature-to-make-introducing-counter-caches-safer-and-easier/85456/2

Starting to use counter caches on existing large tables can be troublesome, because the column values must be backfilled separately of the column addition (to not lock the table for too long) and before the use of :counter_cache (otherwise methods like size/any?/etc, which use counter caches internally, can produce incorrect results). People usually use database triggers or callbacks on child associations while backfilling before introducing a counter cache
configuration to the association.

Now, to safely backfill the column, while keeping the column updated with child records added/removed, use:

class Comment < ApplicationRecord
  belongs_to :post, counter_cache: { active: false }
end

While the counter cache is not "active", the methods like size/any?/etc will not use it, but get the results directly from the database. After the counter cache column is backfilled, simply remove the { active: false } part from the counter cache definition, and it will now be used by the mentioned methods.

cc @byroot

@jmarchello
Copy link

What if instead of having to modify the model, we simply assume the cache is inactive if its value is less than 0. Then the cache can be active/inactive on a per row basis rather than for an entire model/table. Additionally this allows the cache to automatically activate itself simply by being updated, rather than requiring a code change.

@fatkodima
Copy link
Member Author

While this is simpler to implement, unfortunately, this approach has some problems. For example:

  1. we need to teach people to set some negative value large enough that counter_cache: true won't eventually increment it to be >= 0 before the value is backfilled
  2. we need to still backfill the whole table to set this negative value for each row. We can just change the column's default value (which will be returned instead of NULLs), but then at the end we still need to revert this change to 0, for example
  3. some implicit behaviour (compared to explicit)

@trevorturk
Copy link
Contributor

I think this active: false idea is nice a simple, and would work well. The downsides I see are that it would require a deploy to remove the active: false and the counter cache cannot be used until the backfill is complete. I think that's totally acceptable and the Shopify issues as described would be solved, so it's good!

I was thinking of @jmarchello's comment, though, and wondering if we could allow the counter cache column to be nullable, which (if null) would essentially make that record's has_active_cached_counter? return false? I think that may be too much of a refactor to be worth considering, but perhaps an interesting idea. You'd have to dynamically set the default to 0 instead of relying on the database, which you mentioned as well.

Another related idea might be to leverage the created_at attribute, so you could mark a time after the counter cache migration was complete to consider those new records safe to use. You could even consider extending the idea so the logic could be custom as we have for Active Record Callbacks here: https://guides.rubyonrails.org/active_record_callbacks.html#using-if-and-unless-with-a-proc

belongs_to :post, counter_cache: Proc.new { |post| post.created_at >= ENV["COMMENTS_COUNTER_CACHE_CREATED_AT"] }

In any case, I think this PR is a great idea, I just wanted to share some idle chatter since I was wondering if there could be a simpler way, but I think the active: false idea is nice and simple and does the job. Thanks!

@matthewd
Copy link
Member

matthewd commented Apr 2, 2024

wondering if we could allow the counter cache column to be nullable

I like this as a conceptual design. SQL Null propagation semantics would mean that things like relative increments would all Just Work (read: remain null), and the only thing you need to do to start using the column is set it -- even incrementally, in the case of a large backfill.

All you would need to do would be to add the column with a null default, and then change its default to 0 -- old rows get null, new rows start at 0. And then backfill as the opportunity arises (maybe even automatically, if we're writing the row for some other reason?)

(But I'm saying this without having looked at the code, so I'm not sure how straightforward it would be to implement.)

@byroot
Copy link
Member

byroot commented Apr 2, 2024

All you would need to do would be to add the column with a null default, and then change its default to 0

This may work on postgres, and perhaps might work with MySQL new online alters, but for people using LHM / gh-ost etc, this isn't nice. Because if you need a week to introduce the nullable column, you'll need another week to change it's default.

It then runs into schema cache issues, you'll at least need a deploy for the change to be noticed by Rails. And I'm not even looking at the backward compatibility issue, because right now counter caches treat NULL as 0 (with COALESCE).

In the end disabling it in the model seem simpler to me, and is akin to Base.ignored_column, and more importantly, having Active Record increment/decrement counter but not use them while the DB is being backfilled means you have no race condition issues. I'm not so certain of that with the NULL solution (depends how you do the backfilling I guess)

Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come minor comments.

@fatkodima
Copy link
Member Author

Updated 👍

@byroot byroot merged commit 02f6c29 into rails:main Apr 2, 2024
3 of 4 checks passed
@fatkodima fatkodima deleted the active-counter-caches branch April 2, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants