-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize job details loading with empty disk cache #3144
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea makes sense. I had this idea actually myself and also measured the performance gain but didn't implement it because it isn't that simple (see review comments). Therefore I haven't had the time to implement it "properly" but it might be worth taking the time instead of having to deal with the possible downsides of this "simple" approach.
Test result cleanup reworked as minion job. I'll add a separate commit for the step view and deduplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It looks generally good, just a few points.
Codecov Report
@@ Coverage Diff @@
## master #3144 +/- ##
=========================================
Coverage ? 91.42%
=========================================
Files ? 212
Lines ? 12980
Branches ? 0
=========================================
Hits ? 11867
Misses ? 1113
Partials ? 0
Continue to review full report at Codecov.
|
rename $tmpfile->to_string, $file->to_string; | ||
}; | ||
|
||
eval { $tmpfile->remove if -e $tmpfile; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't just pre-emptively use eval
. If you're expecting a specific exception here then handle that case gracefully, but don't hide everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this eval still deserves a handler - even if it would just log the message. And I'm nevertheless wondering whether we should mark the Minion job as failed in the end when any of these error handler were entered. Otherwise we'll probably never notice when there's a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second eval is just a post-fail cleanup. It does nothing if the first eval
succeeds. Do we really need log_error "Cleaning up after the error that was already logged above also failed"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh… ok in this case one of the error would likely be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That explanation doesn't make much sense to me. What is the actual purpose of this cleanup code? Why would the temp file still exist here? Why do you expect removing it to fail? It looks to me like if the first eval
succeeds, then this would do nothing since the tempfile is gone. Do you expect the first eval
to fail and leave a temp file behind that needs to be cleaned up? Why is the cleanup code not in the $@
block then? Why would the process be unable to remove it? Do you expect permissions to change? Why not do a -w
check then?
I'm probably missing something here, but i can't imagine there is a good reason to have an eval
without exception handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point 1 seems irrelevant here, it's just a temp file. And point 3 you still left unanswered, what exception do you expect that makes the code fail but that's totally irrelevant. If the file system is that broken that you can't delete files you created that would seem like something i'd like to see as an error for in the job result (instead of just finding garbage files without any clue where they came from). That leaves only point 2, and that's just the reason for the
->remove
call, not a justification to wrap it in aneval
.
If the code can't delete files, you WILL see it in the log. It'll be the exact same error that was already logged when $tmpfile->spurt()
or rename()
failed. So there's no point logging it twice. Especially when you can expect hundreds of duplicate errors because I'm deliberately keeping the minion job going until it cycles through all test modules.
The reason for wrapping $tmpfile->remove()
in an eval
block is that I don't want to terminate the minion job if cleanup fails. The failure could be a temporary hickup.
tempfile()
creates the file with permissions 0600
. I'd have to run chmod()
in addition to rename()
. And no, I will NOT use move_to()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then all of your problems would be solved by using a scope guard there. No need for ugly hacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then all of your problems would be solved by using a scope guard there. No need for ugly hacks.
And risk that the scope guard behavior changes in the future and it starts throwing bogus errors when it fails to remove the renamed file? No, thanks. Do you understand that the only thing your "improvements" would add to the code is technical debt? The code is working as intended. Period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a silly line of argumentation, how do you know the $file->sibling
call above you did not wrap in an eval
won't change? I control that method upstream, i could change it right now just to mess with you. I won't of course, but all the methods we are talking about here are controlled by me upstream, and i'm just trying to tell you how to use them in the safest way possible.
Anyway, it doesn't matter because scope guards work exactly in the way you need them to.
$ perl -MMojo::Util=scope_guard -E 'eval { my $guard = scope_guard sub { warn "warning from scope guard"; die "exception from scope guard" }; die "real exception" }; warn "eval error: $@" if $@;'
warning from scope guard at -e line 1.
eval error: real exception at -e line 1.
Just because you don't understand a concept yet that doesn't make it technical debt. Scope guards are a standard Perl pattern, and i'm happy to explain it to you in more detail if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a silly line of argumentation, how do you know the
$file->sibling
call above you did not wrap in aneval
won't change? I control that method upstream, i could change it right now just to mess with you. I won't of course, but all the methods we are talking about here are controlled by me upstream, and i'm just trying to tell you how to use them in the safest way possible.
I would write this PR exactly the same way even if I myself were the upstream maintainer of the Mojo
library. The code is as simple as it can be and any extra abstractions would only make it more complex.
BTW, I ran into some os-autoinst bugs yesterday while testing Unicode handling in this PR. When a command executed on SUT prints non-ASCII UTF-8 output, it'll show up correctly in |
eee61a4
to
6e5e3e9
Compare
I've added error handling to the finalize task, obsolete file deletion task and moved |
} | ||
|
||
sub _finalize_results { | ||
my ($app, $task, $job_id) = @_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should actually be $job
instead of $task
(it's a Minion::Job
object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use $minion_job
to better distinguish from openQA jobs (which are more commonly "the job" in the context of openQA).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm already using $job
for the OpenQA job. I don't see what's the problem with calling the minion job $task
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Minion the term "task" has a specific meaning. It's like a class name for the job, while the jobs are instances of it. There can only be one task "finalize_job_results", but there can be many "finalize_job_results" jobs in the queue. Hope that explanation makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So $minion_job
would be quite a bit better.
Finalize jobs will now be marked as failed if any of the modules doesn't get properly finalized. Also added a comment explaining the |
@mdoucha told me he will spent no further time on this PR so I'm taking over. It would be a waste to throw the code away at this point and this is actually a beneficial optimization. |
f4b3f3e
to
084d6e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized you add new functionality that is not covered by tests. As you are on it I strongly recommend to cover the complete patch
Each snippet of console output is stored in a separate file in the job result directory. Collect these tiny files into module result JSON files at the end of each job. This can speed up server-side job details JSON generation up to 5 times when the disk cache is empty.
According to the checks our criteria for test coverage is reached. However, I haven't even tested this sufficiently manually and checked whether the unit tests are sufficient. |
a7566b4
to
c9c4737
Compare
The existing tests covered everything except the error handling which is now covered as well. I also tested it locally on a real job and it works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome! Let's see if mergify merges after having two approvals but checkstyle fails :)
c9c4737
to
f118d75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to fix the points mentioned by @mdoucha.
* Remove $collect_text flag and instead cover that case within tests * Handle errors when reading the text file * Improve coding style
* Test error cases as well * Check the minion job state
f118d75
to
2271115
Compare
2271115
to
9344968
Compare
9344968
to
22dfa75
Compare
I guess it should be good enough now. The last commit is hard to test but the code should be straight forward enough.
I think this may have broken stale job detection. Generally speaking it seems dangerous to have DB schema code assuming whatever 'app' is using it will have a |
Do you mean that the conditional execution of |
#3389 will fix it implicitly because I amended the condition to make tests pass. However, the solution isn't ideal. Maybe it makes sense to initialize the gru plugin within the scheduler. |
Since gru is part of the model it's an easy argument to make that it should be shared between all webui micro services. |
Each snippet of console output is stored in a separate file in the job result directory. Collect these tiny files into module result JSON files at the end of each job. This can speed up server-side job details JSON generation up to 5 times when the disk cache is empty.
On my local OpenQA instance, this patch speeds up
details_ajax
JSON generation for ltp_syscalls jobs as follows: