New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add link_count column to screenshots to improve deletion speed #2294
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2294 +/- ##
=========================================
- Coverage 86.82% 86.73% -0.1%
=========================================
Files 167 167
Lines 10848 10854 +6
=========================================
- Hits 9419 9414 -5
- Misses 1429 1440 +11
Continue to review full report at Codecov.
|
The screenshot creation/deletion was only partially covered so I created an explicit test. I made the migration a Minion job but I still need to test that locally. Job deletion is blocked until the migration has run. |
Codecov Report
@@ Coverage Diff @@
## master #2294 +/- ##
==========================================
+ Coverage 86.79% 87.12% +0.32%
==========================================
Files 167 167
Lines 10848 10852 +4
==========================================
+ Hits 9416 9455 +39
+ Misses 1432 1397 -35
Continue to review full report at Codecov.
|
@@ -36,8 +36,8 @@ __PACKAGE__->add_columns( | |||
is_nullable => 0, | |||
}); | |||
|
|||
__PACKAGE__->belongs_to(job => 'OpenQA::Schema::Result::Jobs', 'job_id'); | |||
__PACKAGE__->belongs_to(screenshot => 'OpenQA::Schema::Result::Screenshots', 'screenshot_id'); | |||
__PACKAGE__->belongs_to(job => 'OpenQA::Schema::Result::Jobs', 'job_id', {on_delete => 'CASCADE'}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would believe this is too dangerous. If you happen to remove a job, you erase the screenshot links without the UPDATE in count.
Perhaps a postgresql trigger can help? If not, I'd prefer to drop the cascade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right and this cascade delete (unlike the one in the next line) serves no purpose. So I'm dropping it. The cascade delete for screenshots is required but shouldn't be a problem because the screenshot is then gone anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One would sure hope so :)
@nicksinger @okurz When this change is deployed, deletion of jobs will no longer be possible until the deployment for adding a screenshot link count to the database has been completed. This deployment takes quite a while and is hence done as a Minion task which is enqueued automatically. I estimate that it should be done within an hour (took almost half an hour on my local machine using the OSD database). The status is visible as usual on the Minion dashboard. Be sure to check that the job is visible there and becomes active. If it fails for any reason, it can be re-triggered using |
lib/OpenQA/Schema/Result/Jobs.pm
Outdated
->search({screenshot_id => {-in => [sort keys %ids]}}, {columns => 'screenshot_id', distinct => 1}); | ||
map { delete $ids{$_->screenshot_id} } $sls->all; | ||
# delete exclusively used screenshots | ||
if ($link_count <= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look any more performant than what we have, sorry :(
If you want performance you need to leave the complete job to the database - and that means requiring the migration to be run. I'd rather take a one maintenance window with a long running migration than a continously slow cleanup.
E.g. please benchmark UPDATE screenshots SET link_count = link_count - 1 WHERE id in (select screenshot_id FROM screenshot_links WHERE job_id = ?
Deleting the unreferenced screenshots can happen any time, you can actually make that a complete different task running on different schedule, they are clearly marked as having linked_count=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so you want the screenshot deletion happening in a separate (Minion) job and only decrement the link count here. That would be faster, indeed.
this doesn't look any more performant than what we have, sorry :(
We don't get rid of the file system operations but the database query is now actually quite fast. It was 11 seconds on my system before alone for finding the exclusively used screenshots. Now it is just about 0.0001 seconds per screenshot in that loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then avoid the loop as well
Don't forget about @AdamWill - but I see no reasonable way around making the migration blocking. An hour can be scheduled |
If everybody is fine with this taking an hour I make it a regular migration. I'm just asking before because I don't want to flip this back and fourth. I guess I'll keep my adjustments for triggering Gru tasks in migrations on another branch in any case so I can use it if needed in the future for something different. @coolo I tested the |
I'm afraid it does. The UPDATE of the migration locks all rows. |
... which not just blocks deletes but also inserts btw. So it's not just that we can't delete jobs during the migration runtime, but we can't create new screenshots either. Possibly not even screenshot links. As I said: I see no other way. |
Ok, so my suspicion about the locking is true and it even gets worse. I'll make it a regular migration then. Even though people don't like it I see no other way, too. |
Seeing all this, we should have used hardlinks from the start :) |
I don't like hard-links but I guess here using them would have actually made sense (as long as we stay on one file system). Nevertheless, I make it a regular migration. I can also use your |
So decrementation of link count can happen in one query and query for iterating screenshots gets simpler as well (no join required anymore).
By the way, that was the initial issue: https://progress.opensuse.org/issues/55907 |
So what's the tl;dr here, updating to this is going to require a database migration that will take some time? I can live with that. |
The status is that we're still checking how to avoid the ref counting and migration altogether. So I'm closing this. |
Still a draft because the migration should likely be a Minion task and tests are missing. Let's see whether we have coverage for the code I've touched.