Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy Associations in Background Job #36912

Closed
wants to merge 3 commits into from

Conversation

gwincr11
Copy link
Contributor

@gwincr11 gwincr11 commented Aug 12, 2019

Motivation:

  • Sometimes cascading association deletions can cause timeouts due to
    IO issue. Perhaps a model has associations that are destroyed on
    deletion which in turn trigger other deletions and this can continue
    down a complex tree. Along this tree you may also hit other IO
    operations. Such deep deletions can lead to server timeouts while
    awaiting completion and really the user may not notice all the
    changes on their side immediately making them wait unnecessarily or
    worse causing a timeout during the operation.
  • It is possible to move the association deletion into jobs
    today, however it is pretty boilerplate and it would be nice to make
    it easier to do.
  • This pr builds off the work of @georgeclaghorn in PR Add ActiveRecord::Base.destroy_later #34901 and is
    currently a simple proposal of how this could work. I branched off his
    pr for now to quickly product this proof of concept.
  • I have since learned this is part of the GitHub code base, so this is an effort to upstream as well.

Related Issues:

Changes:

  • Added a new job to handle deleting associations.
  • Added a new dependent option on associations,
    destroy_async (Open to other names).

To Do:

  • Custom scopes test.
  • Custom primary key test.
  • Record not actually destroyed test.
  • [ ] Just load ids when loading associations? This is non-trivial and error prone.
  • :destroy_later change.
  • Don't schedule a job if there are no associations assigned.
  • Add parent ensuring methods.
  • [ ] Add child ensuring methods. After thinking about this, I think it should only be implemented if it works with destroy later and normal dependent destroy.

gwincr11 added a commit to gwincr11/rails that referenced this pull request Aug 23, 2019
Motivation:

In rails#36912 I added background destroys, this pr brings in deletes as
well. Currently in the GitHub codebase we have methods that make both of
these functions available, I just learned this thanks @seejohnrun.
However it is hidden and likely underutlized due to the fact that it is
not core Rails functionality.

It would be nice to upstream both these tools if the Rails core team is
interested.

Related Issues:

- rails#34901
- rails/actionmailbox#16
- rails#36912

Changes:

- Added a new job to handle deleting associations.
- Added a new dependent option on associations,
background_delete (Open to other names).
@georgeclaghorn
Copy link
Contributor

georgeclaghorn commented Aug 25, 2019

Hey @gwincr11! Nice work on this so far.

I don’t think we’ll want to serialize all associated records into a job. What I had imagined from our discussion was something like enqueuing a job with a class name and whatever conditions are needed to find the destroyable records. The job would then look up those records in batches to destroy them. This avoids (1) needing to load all associated records into memory at once and (2) needing to load them synchronously in order to enqueue the destroy job.

I’d also like to see us stick to the _later naming pattern we use for work that happens in a background job, so the new :dependent option would be :destroy_later rather than :destroy_async.

@gwincr11
Copy link
Contributor Author

@georgeclaghorn thanks for the feedback. I have discovered we have a very similar job at GitHub, I am going to dig into how we handle this and see if I can do more to port over our functionality. It seems we are closer to what you are suggesting and I think you are right it is better not to serialize everything.

@rails-bot rails-bot bot added the activejob label Sep 3, 2019
@gwincr11
Copy link
Contributor Author

gwincr11 commented Sep 9, 2019

One thing that we do that does not fit with this model, some times we do a soft delete where we set a certain db column to true if something has been soft deleted. We have a check to ensure that the record was not soft deleted before deleting associations.

I am wondering if I should let people pass in a custom check as a method on the owning model, or possibly pass in a different job to allow for custom escape logic? Any thoughts opinions?

cc @tenderlove or @eileencodes or @georgeclaghorn?

@gwincr11
Copy link
Contributor Author

gwincr11 commented Sep 9, 2019

@georgeclaghorn I have significantly changed how this works based on your feedback, what do you think of it now. I couldn't just use what we have as it doesn't deal with scopes and is a little hard to work with if the model was hard deleted, so I needed to come up with a new solution, I think it is headed in a good direction but would love some more feedback. Thanks!

@gwincr11
Copy link
Contributor Author

gwincr11 commented Sep 10, 2019

I just realized, I do not need to worry about soft deletes. I am checking to see if a model instance still exists, if it does I do not take a destroy action in the job. In order for a soft delete to work, the record needs to still exists, so I will already back out. I think this is a non-issue. Not sure why I didn't think of this earlier other then I was just staring at our implementation and thinking about what I had not replicated 😅

Errr, maybe not... I think I confused my self and should drink caffeine before I comment in issues.

cc @tenderlove, @eileencodes and @georgeclaghorn

@gwincr11
Copy link
Contributor Author

I could follow the ensuring pattern @georgeclaghorn laid out, maybe something like

has_many :children, dependent: :destroy_later, owner_ensuring_destroy: :method_to_call_on_owner, child_ensuring_destroy: :method_to_call_on_child

I don't have a current need to call a method on the child, but that maybe nice to add.

@gwincr11
Copy link
Contributor Author

gwincr11 commented Jan 14, 2020

This pr now contains the ability to throw an exception if ActiveRecord is not present, it has also been rebased against the current master. I feel this is the best forward, I feel this way as often times we have to hold onto records for legal reasons for x number of days. If the app is suddenly render unable to complete a job for some reason or is mis-configured deleting content could lead to legal troubles for some users. I know others feel otherwise @rafaelfranca for example, so if we desire we can go an alternate route.

Either way we go this test suite is now showing a condition I am unclear on how to remedy. Currently the test files test/cases/destroy_later_test.rb, test/cases/destroy_association_later_test.rb and test/cases/active_job_not_present_test.rb all work when run independently. There exists a race condition in running the entire suite which I am unclear on how to resolve.
cc @eileencodes

@gwincr11
Copy link
Contributor Author

Just to throw out a thought, we could use appraisal for testing different gem sets https://github.com/thoughtbot/appraisal

@simi
Copy link
Contributor

simi commented Jan 29, 2020

@gwincr11 it should be easier to just introduce activejob_present? method and stub in tests (can be used also in check_dependent_options, instead of direct check).

@gwincr11
Copy link
Contributor Author

@simi I wasn't sure how comfortable we would be with mocking in this case vs. a true integration test. I do agree wrapping the check is a nice clean up. I have been struggling with where to put the logic though.

@simi
Copy link
Contributor

simi commented Jan 29, 2020

@gwincr11 I was thinking about this before and wasn't sure if ActiveSupport or directly ActiveRecord should check this. If this is first check in codebase, it would make sense to scope it just locally to this class.

Anyway I'm not sure, we can try to scan codebase if any other functionality depends on ActiveJob optionally and unify those checks.

@gwincr11
Copy link
Contributor Author

@simi, I added ActiveJob back as a development dependency, centralized the presence check and stubbed. This seems like a sane path forward, thanks for jumping in!

@simi
Copy link
Contributor

simi commented Jan 29, 2020

@gwincr11 AFAIK development dependencies live in Gemfile. There's no need for that in gemspec.

@gwincr11 gwincr11 changed the title WIP: Destroy Associations in Background Job Destroy Associations in Background Job Mar 10, 2020
@eileencodes eileencodes added this to the 6.1.0 milestone Mar 11, 2020
@@ -171,6 +174,10 @@ def self.eager_load!
ActiveRecord::AttributeMethods.eager_load!
ActiveRecord::ConnectionAdapters.eager_load!
end

def self.active_job?
defined?(ActiveJob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be here. We should not check Active Job, Active Job that should register itself on Active Record.

Can we please follow #34901 (comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelfranca thanks for the feedback.

Are you reffering to this portion of the comment?

In the Active Job railtie we use a on_load(:active_record) to set the destroy job to ActiveRecord::DestroyJob.

Are you suggesting we just set a config option such as this

config.active_job = ActiveSupport::OrderedOptions.new

Or some other approach? Would I then just switch this method to check for the config as opposed to checking for class definition?

Thanks ❤️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that part. I think the config is in the opposite direction. Active Job should change the active_record.destroy_later_job and Active Record provides the job that raises if Active Job is not loaded and the job that does the right thing.

That way we don't need to check if Active Job is loaded in Active Record, neither we need to check if Active Record is loaded in Active Job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually raising earlier, prior to trying to enqueue the job and instead when trying to load the parent class. That way if you are in the development phase you get the notice as early as possible lowering the chance that you will miss the dependency when trying to use this option.

https://github.com/rails/rails/blob/b502fea0a81868e700fec2f2856a83b6442b4501/activerecord/lib/active_record/associations/builder/association.rb#L123-L131

I could set the destroy_later_job to false in active_record and check for it not being false or do you think I should only fail when the job is enqueued and not when the developer tries to set the dependent_destroy option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your solution of raising early is better, so setting the config to false or nil is good enough to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. One more question, hopefully only 1. Do I need to move my tests somewhere else or do something to fire the railtie? I haven't tried yet figured I would ask before I potentially bang my head against my desk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelfranca I took a stab at this. wanna sanity check I am going in the right direction?

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still coupling Active Record to Active Job. We should be inverting the dependency. Active Record should not even know Active Job exists.

@simi
Copy link
Contributor

simi commented Apr 1, 2020

Wouldn't be the way to add way to add custom destroy strategies and add this one on active job load if active record is present?

ActiveSupport.on_load(:active_record) do
active_record_options = app.config.active_record
active_record_options.destroy_association_later_job = ActiveRecord::DestroyAssociationLaterJob
active_record_options.destroy_later_job = ActiveRecord::DestroyLaterJob
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelfranca Is this the correct place and way to do this? Is there a good way to test this setting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the correct way, but you don't need to add the config to app.config.active_record anymore. You are already inside ActiveRecord::Base here. So this code can be:

self.destroy_association_later_job = ActiveRecord::DestroyAssociationLaterJob
self.destroy_later_job = ActiveRecord::DestroyLaterJob

To test it you will need an integration test in the railties folder. Similar to those tests here. https://github.com/rails/rails/blob/master/railties/test/application/configuration_test.rb#L423-L432

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is it. I just made a comment about the implementation and the tests.

ActiveSupport.on_load(:active_record) do
active_record_options = app.config.active_record
active_record_options.destroy_association_later_job = ActiveRecord::DestroyAssociationLaterJob
active_record_options.destroy_later_job = ActiveRecord::DestroyLaterJob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the correct way, but you don't need to add the config to app.config.active_record anymore. You are already inside ActiveRecord::Base here. So this code can be:

self.destroy_association_later_job = ActiveRecord::DestroyAssociationLaterJob
self.destroy_later_job = ActiveRecord::DestroyLaterJob

To test it you will need an integration test in the railties folder. Similar to those tests here. https://github.com/rails/rails/blob/master/railties/test/application/configuration_test.rb#L423-L432

@@ -34,6 +34,7 @@ Gem::Specification.new do |s|
# NOTE: Please read our dependency guidelines before updating versions:
# https://edgeguides.rubyonrails.org/security.html#dependency-management-and-cves

s.add_development_dependency "activejob", version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add this here. The Gemfile is where we put development dependencies.

@gwincr11
Copy link
Contributor Author

gwincr11 commented Apr 8, 2020

@rafaelfranca I think I have the test written correctly. Thanks for the assist.

Generalize Action Mailbox's incineration feature. Permit destroying records after a specified amount of time.
end


test "enqueues the has_many through to be deleted with custom primary key" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test trying to assert? If it works when the has_many association has a primary_key option or when the model that defines association has a custom primary key?

I'm trying to reduce the number of new tables added and I was trying to reuse some of the existing models but I'm having a hard time to know what exactly is being tested here.

Motivation:
  - Sometimes cascading association deletions can cause timeouts due to
  IO issue. Perhaps a model has associations that are destroyed on
  deletion which in turn trigger other deletions and this can continue
  down a complex tree. Along this tree you may also hit other IO
  operations. Such deep deletions can lead to server timeouts while
  awaiting completion and really the user may not notice all the
  changes on their side immediately making them wait unnecesarially or
  worse causing a timeout during the operation.

  It is possible to move the association deletion into jobs
  today, however it is pretty boilerplate and it would be nice to make
  it easier to do.

  This pr builds off the work of @georgeclaghorn in PR rails#34901 and is
  currently a simple proposal of how this could work. I branched off his
  pr for now to quickly product this proof of concept.

Related Issues:
  - rails#34901
  - rails/actionmailbox#16

Changes:
  - Added a new job to handle deleting associations.
  - Added a new `dependent` option on `has_many` associations,
  `destroy_async` (Open to other names).
  - Added a single proof of concept test.
* Fix CHANGELOG entry

* Avoid using abreviations in argument names

* Refactor the job to use `find_by`

* Improve documentation

* Check if both jobs are configured when calling destroy_later at class

* Pass keyword arguments to the job to avoid warnings on Ruby 2.7

* Remove duplicated method

* Fix typos and avoid adding new tables

* Move active record queues configuration to Active Record

This config is related to jobs but it doesn't load anything on Active
Job, so there is no need to move the config to the Active Job railtie.

* Allow to configure the destroy later jobs per model

We are still going to configure this globaly to the framework but if
some user wants to use a different job in a model the system will work.
This also make easier to isolate the Active Job tests from the Active
Record jobs.

* Change the tests description to talk about the framework features

The tests were talking about the model features, not which features in
the framework they were asserting.

* Add documentation about how to configure the destroy jobs
@rafaelfranca rafaelfranca force-pushed the cg-destroy-association-later branch from 5bc8da1 to f6eff0f Compare May 5, 2020 03:46
assert tag.reload
end

test "enqueues the has_many through to be deleted with custom primary key" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test ensures that we can find and delete an association with a custom primary key:

 self.primary_key = "many_key"

@rafaelfranca
Copy link
Member

Merged in #40157

@gwincr11 gwincr11 mentioned this pull request Dec 27, 2020
gwincr11 added a commit to gwincr11/rails that referenced this pull request Dec 27, 2020
Motivation:
  - Now that destroy_async is an option on association, a natural next
    step seems to be adding this method to standard objects for a more
    consistent offering.

Related Issues:
  - rails#40157
  - rails#36912

Changes:
  - Add a destroy_async method to persistence.
  - Add a rail tie to setup the active job configs.
  - Add a default destroy async job.
  - Test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants