New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't fear the (sub)reaper #1590
Conversation
More cowbells! https://youtu.be/Gwb4o2qs7Pc?t=14 |
lib/OpenQA/Worker/Jobs.pm
Outdated
$worker->{child}->on( | ||
collected => sub { | ||
my $self = shift; | ||
STDERR->printflush("collect status: " . $self->exit_status . "\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this end in the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this changes are not going so much distant from #1579
lib/OpenQA/Worker/Jobs.pm
Outdated
my $self = shift; | ||
STDERR->printflush("collect status: " . $self->exit_status . "\n"); | ||
# TODO: deal when the job was restarted. The process is always killed. | ||
# This is only problematic if we die, Otherwise this is just to exterminate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you are at this, mind changing the wording to process tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far
a9bca80
to
124f150
Compare
Codecov Report
@@ Coverage Diff @@
## master #1590 +/- ##
==========================================
+ Coverage 82.59% 88.62% +6.03%
==========================================
Files 121 121
Lines 9037 9040 +3
==========================================
+ Hits 7464 8012 +548
+ Misses 1573 1028 -545
Continue to review full report at Codecov.
|
acd1aaf
to
0bef6f2
Compare
lib/OpenQA/Worker/Jobs.pm
Outdated
@@ -914,8 +902,10 @@ sub check_backend { | |||
log_debug("checking backend state"); | |||
|
|||
return log_debug("backend is running") if $worker->{child}->is_running(); | |||
return log_debug("backend state not known") unless defined $worker->{child}->exit_status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this to log_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we could remove that timer now, if stop_job would not be so 'blocking' - give me some time to try to re-elaborate this
lib/OpenQA/Worker/Jobs.pm
Outdated
|
||
if ($worker->{child}->is_running()) { | ||
log_debug("backend is not running anymore"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one to info
with RWP 0.16 the worker is always detecting dead children - as collect_status finds the children. There seems to be an incompatible change in there :( Had to notice after reinstalling the arm worker |
So yes, let me be clear a bit #1579 was a premature merge - as collect_status was an internal event and you was required to manually check what you are collecting for - aside from the missing bit shifting to be sure of what is the real status of the process. in >0.16 there is the 'collected' event that it's fired when we have the exit status of the process. This PR requires the latest version (0.19) .
|
tweaked MAX_JOB_TIME: (Search for 14311 - will show how without this change we would otherwise loose one qemu process) standard jobs: |
See related Progres issue: https://progress.opensuse.org/issues/32263
tests: we have 2 timers now
d82d0f0
to
c100b99
Compare
This is still a wip, and commit message will be replaced :) - needs also new RWP release as sessions were introduced recently - and there was a typo preventing orphaned child resolution during event firing when using subreapers, not in the feature itself (and update cpanfile accordingly once its done).