Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add link_count column to screenshots to improve deletion speed #2294

Closed
wants to merge 2 commits into from

Conversation

Martchus
Copy link
Contributor

@Martchus Martchus commented Aug 29, 2019

Still a draft because the migration should likely be a Minion task and tests are missing. Let's see whether we have coverage for the code I've touched.

@codecov
Copy link

codecov bot commented Aug 29, 2019

Codecov Report

Merging #2294 into master will decrease coverage by 0.09%.
The diff coverage is 52.63%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #2294     +/-   ##
=========================================
- Coverage   86.82%   86.73%   -0.1%     
=========================================
  Files         167      167             
  Lines       10848    10854      +6     
=========================================
- Hits         9419     9414      -5     
- Misses       1429     1440     +11
Impacted Files Coverage Δ
lib/OpenQA/Schema.pm 100% <ø> (ø) ⬆️
lib/OpenQA/Schema/Result/Screenshots.pm 43.75% <ø> (ø) ⬆️
lib/OpenQA/Schema/Result/ScreenshotLinks.pm 41.86% <44.44%> (-0.25%) ⬇️
lib/OpenQA/Schema/Result/Jobs.pm 85.19% <60%> (-0.36%) ⬇️
lib/OpenQA/Worker/Settings.pm 84.9% <0%> (-13.21%) ⬇️
lib/OpenQA/Worker/Engines/isotovideo.pm 94.66% <0%> (+0.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e5f655...9e985b4. Read the comment docs.

@Martchus
Copy link
Contributor Author

The screenshot creation/deletion was only partially covered so I created an explicit test.

I made the migration a Minion job but I still need to test that locally. Job deletion is blocked until the migration has run.

@codecov
Copy link

codecov bot commented Aug 30, 2019

Codecov Report

Merging #2294 into master will increase coverage by 0.32%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2294      +/-   ##
==========================================
+ Coverage   86.79%   87.12%   +0.32%     
==========================================
  Files         167      167              
  Lines       10848    10852       +4     
==========================================
+ Hits         9416     9455      +39     
+ Misses       1432     1397      -35
Impacted Files Coverage Δ
lib/OpenQA/Schema.pm 100% <ø> (ø) ⬆️
lib/OpenQA/Schema/Result/Screenshots.pm 100% <ø> (+56.25%) ⬆️
lib/OpenQA/Task/Job/Limit.pm 100% <100%> (+61.53%) ⬆️
lib/OpenQA/Schema/Result/Jobs.pm 85.98% <100%> (+0.43%) ⬆️
lib/OpenQA/Schema/Result/ScreenshotLinks.pm 62.79% <88.88%> (+20.68%) ⬆️
lib/OpenQA/Worker/Engines/isotovideo.pm 93.68% <0%> (-0.98%) ⬇️
lib/OpenQA/Worker/WebUIConnection.pm 84.42% <0%> (+2.51%) ⬆️
lib/OpenQA/Schema/Result/JobGroups.pm 94.2% <0%> (+7.24%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d21a0a...8300354. Read the comment docs.

@@ -36,8 +36,8 @@ __PACKAGE__->add_columns(
is_nullable => 0,
});

__PACKAGE__->belongs_to(job => 'OpenQA::Schema::Result::Jobs', 'job_id');
__PACKAGE__->belongs_to(screenshot => 'OpenQA::Schema::Result::Screenshots', 'screenshot_id');
__PACKAGE__->belongs_to(job => 'OpenQA::Schema::Result::Jobs', 'job_id', {on_delete => 'CASCADE'});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would believe this is too dangerous. If you happen to remove a job, you erase the screenshot links without the UPDATE in count.

Perhaps a postgresql trigger can help? If not, I'd prefer to drop the cascade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right and this cascade delete (unlike the one in the next line) serves no purpose. So I'm dropping it. The cascade delete for screenshots is required but shouldn't be a problem because the screenshot is then gone anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One would sure hope so :)

@Martchus
Copy link
Contributor Author

@nicksinger @okurz When this change is deployed, deletion of jobs will no longer be possible until the deployment for adding a screenshot link count to the database has been completed. This deployment takes quite a while and is hence done as a Minion task which is enqueued automatically. I estimate that it should be done within an hour (took almost half an hour on my local machine using the OSD database). The status is visible as usual on the Minion dashboard. Be sure to check that the job is visible there and becomes active. If it fails for any reason, it can be re-triggered using sudo -u geekotest /usr/share/openqa/script/openqa minion -m production job -e populate_screenshot_link_count.

->search({screenshot_id => {-in => [sort keys %ids]}}, {columns => 'screenshot_id', distinct => 1});
map { delete $ids{$_->screenshot_id} } $sls->all;
# delete exclusively used screenshots
if ($link_count <= 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look any more performant than what we have, sorry :(

If you want performance you need to leave the complete job to the database - and that means requiring the migration to be run. I'd rather take a one maintenance window with a long running migration than a continously slow cleanup.

E.g. please benchmark UPDATE screenshots SET link_count = link_count - 1 WHERE id in (select screenshot_id FROM screenshot_links WHERE job_id = ?

Deleting the unreferenced screenshots can happen any time, you can actually make that a complete different task running on different schedule, they are clearly marked as having linked_count=0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so you want the screenshot deletion happening in a separate (Minion) job and only decrement the link count here. That would be faster, indeed.

this doesn't look any more performant than what we have, sorry :(

We don't get rid of the file system operations but the database query is now actually quite fast. It was 11 seconds on my system before alone for finding the exclusively used screenshots. Now it is just about 0.0001 seconds per screenshot in that loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then avoid the loop as well

@coolo
Copy link
Contributor

coolo commented Aug 30, 2019

Don't forget about @AdamWill - but I see no reasonable way around making the migration blocking. An hour can be scheduled

@Martchus
Copy link
Contributor Author

If everybody is fine with this taking an hour I make it a regular migration. I'm just asking before because I don't want to flip this back and fourth. I guess I'll keep my adjustments for triggering Gru tasks in migrations on another branch in any case so I can use it if needed in the future for something different.

@coolo I tested the UPDATE query you suggested on the same sample job I've mentioned earlier and it took only 23,230 ms. In theory the query could also run before/during the migration using coalesce(link_count, 1). But maybe the locking prevents this to work in practice.

@coolo
Copy link
Contributor

coolo commented Aug 30, 2019

I'm afraid it does. The UPDATE of the migration locks all rows.

@coolo
Copy link
Contributor

coolo commented Aug 30, 2019

... which not just blocks deletes but also inserts btw. So it's not just that we can't delete jobs during the migration runtime, but we can't create new screenshots either. Possibly not even screenshot links.

As I said: I see no other way.

@Martchus
Copy link
Contributor Author

As I said: I see no other way.

Ok, so my suspicion about the locking is true and it even gets worse. I'll make it a regular migration then. Even though people don't like it I see no other way, too.

@coolo
Copy link
Contributor

coolo commented Aug 30, 2019

Seeing all this, we should have used hardlinks from the start :)

@Martchus
Copy link
Contributor Author

I don't like hard-links but I guess here using them would have actually made sense (as long as we stay on one file system). Nevertheless, I make it a regular migration.

I can also use your UPDATE query. The actual deletion on the file system will then be done is a separate loop. Note that the job deletion already happens in a Minion task. So I'd just extend this existing task to delete unused screenshots at the end.

So decrementation of link count can happen in one query and query
for iterating screenshots gets simpler as well (no join required
anymore).
@Martchus Martchus marked this pull request as ready for review August 30, 2019 13:09
@Martchus
Copy link
Contributor Author

By the way, that was the initial issue: https://progress.opensuse.org/issues/55907

@AdamWill
Copy link
Contributor

So what's the tl;dr here, updating to this is going to require a database migration that will take some time? I can live with that.

@Martchus
Copy link
Contributor Author

The status is that we're still checking how to avoid the ref counting and migration altogether. So I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants