Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fear the (sub)reaper #1590

Merged
merged 8 commits into from Feb 27, 2018
Merged

Conversation

mudler
Copy link
Member

@mudler mudler commented Feb 21, 2018

This is still a wip, and commit message will be replaced :) - needs also new RWP release as sessions were introduced recently - and there was a typo preventing orphaned child resolution during event firing when using subreapers, not in the feature itself (and update cpanfile accordingly once its done).

@foursixnine
Copy link
Member

@mudler
Copy link
Member Author

mudler commented Feb 21, 2018

More cowbells! https://youtu.be/Gwb4o2qs7Pc?t=14

$worker->{child}->on(
collected => sub {
my $self = shift;
STDERR->printflush("collect status: " . $self->exit_status . "\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this end in the logs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes are not going so much distant from #1579

my $self = shift;
STDERR->printflush("collect status: " . $self->exit_status . "\n");
# TODO: deal when the job was restarted. The process is always killed.
# This is only problematic if we die, Otherwise this is just to exterminate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you are at this, mind changing the wording to process tree?

Copy link
Member

@foursixnine foursixnine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far

@codecov
Copy link

codecov bot commented Feb 22, 2018

Codecov Report

Merging #1590 into master will increase coverage by 6.03%.
The diff coverage is 84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1590      +/-   ##
==========================================
+ Coverage   82.59%   88.62%   +6.03%     
==========================================
  Files         121      121              
  Lines        9037     9040       +3     
==========================================
+ Hits         7464     8012     +548     
+ Misses       1573     1028     -545
Impacted Files Coverage Δ
lib/OpenQA/Worker/Common.pm 80.29% <ø> (-0.73%) ⬇️
lib/OpenQA/WebAPI/Controller/API/V1/Table.pm 90.26% <ø> (ø) ⬆️
lib/OpenQA/WebAPI/Controller/API/V1/Mm.pm 56.52% <ø> (ø) ⬆️
lib/OpenQA/Scheduler/Scheduler.pm 88.41% <ø> (+0.85%) ⬆️
lib/OpenQA/Schema/Result/JobGroups.pm 100% <ø> (ø) ⬆️
lib/OpenQA/WebAPI/Controller/Admin/Workers.pm 100% <ø> (ø) ⬆️
lib/OpenQA/Worker/Jobs.pm 79.63% <25%> (+51.76%) ⬆️
lib/OpenQA/Worker/Engines/isotovideo.pm 94.55% <95.23%> (+79.63%) ⬆️
lib/OpenQA/WebAPI/Plugin/Helpers.pm 97.14% <0%> (+0.57%) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7472fdf...c100b99. Read the comment docs.

@mudler mudler added the acceptance-tests-needed Needed for code that is required to be tested on a production-like environment label Feb 22, 2018
@@ -914,8 +902,10 @@ sub check_backend {
log_debug("checking backend state");

return log_debug("backend is running") if $worker->{child}->is_running();
return log_debug("backend state not known") unless defined $worker->{child}->exit_status;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to log_error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we could remove that timer now, if stop_job would not be so 'blocking' - give me some time to try to re-elaborate this


if ($worker->{child}->is_running()) {
log_debug("backend is not running anymore");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one to info

@coolo
Copy link
Contributor

coolo commented Feb 23, 2018

with RWP 0.16 the worker is always detecting dead children - as collect_status finds the children. There seems to be an incompatible change in there :(

Had to notice after reinstalling the arm worker

@mudler
Copy link
Member Author

mudler commented Feb 23, 2018

So yes, let me be clear a bit #1579 was a premature merge - as collect_status was an internal event and you was required to manually check what you are collecting for - aside from the missing bit shifting to be sure of what is the real status of the process.
The workers as it is in the current code are collecting all statuses from all processes.

in >0.16 there is the 'collected' event that it's fired when we have the exit status of the process.

This PR requires the latest version (0.19) .

+requires 'Mojo::IOLoop::ReadWriteProcess', '0.19';

@mudler
Copy link
Member Author

mudler commented Feb 26, 2018

@mudler mudler changed the title [WIP] Don't fear the (sub)reaper Don't fear the (sub)reaper Feb 26, 2018
@foursixnine
Copy link
Member

giphy-downsized

@foursixnine foursixnine merged commit d541014 into os-autoinst:master Feb 27, 2018
coolo pushed a commit that referenced this pull request Feb 27, 2018
commit d541014
Merge: 230e5e0 c100b99
Author:     Santiago Zarate <santiago@zarate.net.ve>
AuthorDate: Tue Feb 27 12:50:02 2018 +0100
Commit:     GitHub <noreply@github.com>
CommitDate: Tue Feb 27 12:50:02 2018 +0100

    Merge pull request #1590 from mudler/dontfearthesubreaper

    Don't fear the (sub)reaper
coolo pushed a commit that referenced this pull request Feb 28, 2018
commit d541014
Merge: 230e5e0 c100b99
Author:     Santiago Zarate <santiago@zarate.net.ve>
AuthorDate: Tue Feb 27 12:50:02 2018 +0100
Commit:     GitHub <noreply@github.com>
CommitDate: Tue Feb 27 12:50:02 2018 +0100

    Merge pull request #1590 from mudler/dontfearthesubreaper

    Don't fear the (sub)reaper
coolo pushed a commit that referenced this pull request Mar 1, 2018
commit d541014
Merge: 230e5e0 c100b99
Author:     Santiago Zarate <santiago@zarate.net.ve>
AuthorDate: Tue Feb 27 12:50:02 2018 +0100
Commit:     GitHub <noreply@github.com>
CommitDate: Tue Feb 27 12:50:02 2018 +0100

    Merge pull request #1590 from mudler/dontfearthesubreaper

    Don't fear the (sub)reaper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acceptance-tests-needed Needed for code that is required to be tested on a production-like environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants