-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEV-1125 - run duplicate holdings cleanup under sidekiq #303
Conversation
However -- I see the jobs running, but I'm not convinced it's properly cleaning things up (despite the passing tests); will investigate. |
dcb5067
to
25a6e2b
Compare
I think this is an artifact of how I loaded the test data - I get e.g. date_received: {"$date"=>"2020-07-21T00:00:00.000Z"} I'll see if I can do a more proper test in dev... |
After loading the data with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass, code makes sense save for a headscratcher or two, no reason not to APPROVE.
lib/cleanup_duplicate_holdings.rb
Outdated
end | ||
end | ||
|
||
def dedupe_holdings(cluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fishy. Should dedupe_holdings
have a body?
...or, as I look closer, this might be a remnant from the previous version, and should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, needs to be deleted. I inlined the method because I wanted to keep track in remove_duplicate_holdings
of how many things had been removed (so that we could avoid saving the cluster if nothing had changed)
804be5c
to
8c0b063
Compare
* Queues batches of clusters to clean up as jobs * Move logic to lib; add phctl command * Don't re-save the cluster's holdings if there are no changes (optimization)
* add rack-session gem (needed by rack 3) * update standalone code from https://github.com/sidekiq/sidekiq/wiki/Monitoring#standalone * fix sidekiq_web service in docker-compose.yml
8c0b063
to
3078966
Compare
Motivation
Running this as a single thread was way too slow and crashed in the middle, giving us no easy way to retry or determine what specific cluster might have failed.
This change:
Reviewing
@mwarin I think this should be fairly straightforward to re-review. Basically, this is the functionality you already looked at, but adding some functionality to queue small jobs in sidekiq with batches of (by default) 100 clusters at a time.
The sidekiq web update comes from https://github.com/sidekiq/sidekiq/wiki/Monitoring#standalone - we missed adding the
rack-session
gem before but it was required for a rack update we did earlier but hadn't tested since the tests don't load sidekiq-web.I ran this locally by adding 5000 clusters from the production database, queuing up the jobs, and watching them run.