Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the timeout to consider workers offline configurable #3389

Merged
merged 4 commits into from
Sep 16, 2020

Conversation

Martchus
Copy link
Contributor

lib/OpenQA/Constants.pm Show resolved Hide resolved
lib/OpenQA/Log.pm Show resolved Hide resolved
@Martchus Martchus marked this pull request as ready for review September 15, 2020 14:15
@kalikiana kalikiana self-requested a review September 15, 2020 14:22
lib/OpenQA/Setup.pm Outdated Show resolved Hide resolved
t/04-scheduler.t Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 15, 2020

Codecov Report

Merging #3389 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3389      +/-   ##
==========================================
+ Coverage   91.68%   91.71%   +0.02%     
==========================================
  Files         216      216              
  Lines       13253    13264      +11     
==========================================
+ Hits        12151    12165      +14     
+ Misses       1102     1099       -3     
Impacted Files Coverage Δ
lib/OpenQA/Constants.pm 100.00% <ø> (ø)
lib/OpenQA/Log.pm 100.00% <ø> (ø)
lib/OpenQA/Scheduler/Model/Jobs.pm 93.52% <ø> (+0.05%) ⬆️
lib/OpenQA/Schema/Result/Jobs.pm 97.63% <ø> (+0.10%) ⬆️
lib/OpenQA/Schema/Result/Workers.pm 100.00% <ø> (ø)
lib/OpenQA/Schema/ResultSet/Jobs.pm 94.14% <ø> (ø)
lib/OpenQA/Setup.pm 96.74% <ø> (+0.25%) ⬆️
lib/OpenQA/WebSockets/Controller/Worker.pm 100.00% <ø> (ø)
lib/OpenQA/Worker/WebUIConnection.pm 90.04% <ø> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cfb61a...585a0a3. Read the comment docs.

* This should avoid jobs being stuck in the assigned state too long when a
  worker disconnects unexpectedly.
* See https://progress.opensuse.org/issues/69784#note-11
* Avoid marking assigned jobs as incomplete and set them instead back to
  scheduled
* Consider all jobs assigned to a worker (with directly chained
  dependencies multiple jobs might be assigned to a worker)
* See https://progress.opensuse.org/issues/69784#note-11
@okurz
Copy link
Member

okurz commented Sep 16, 2020

https://app.circleci.com/pipelines/github/os-autoinst/openQA/4205/workflows/778e2876-2965-4426-bb28-fcd850fe900d/jobs/40210 shows

timeout -s SIGINT -k 5 -v $((10 * (3 + 1) ))m tools/retry prove -l --harness TAP::Harness::JUnit --timer --merge t/33-developer_mode.t
Retry 1 of 3 …
[16:28:14] t/33-developer_mode.t .. 25/? make[2]: *** [Makefile:174: test-unit-and-integration] Terminated
make[1]: *** [Makefile:169: test-with-database] Terminated
make: *** [Makefile:162: test-developer] Terminated

Too long with no output (exceeded 10m0s): context deadline exceeded

so similar to what @kalikiana was working on here we should also ensure that we have a time limit within 33-developer-mode.t which is less than 10m. I prepared #3393 for this.

@kalikiana
Copy link
Member

https://app.circleci.com/pipelines/github/os-autoinst/openQA/4205/workflows/778e2876-2965-4426-bb28-fcd850fe900d/jobs/40210 shows

timeout -s SIGINT -k 5 -v $((10 * (3 + 1) ))m tools/retry prove -l --harness TAP::Harness::JUnit --timer --merge t/33-developer_mode.t
Retry 1 of 3 …
[16:28:14] t/33-developer_mode.t .. 25/? make[2]: *** [Makefile:174: test-unit-and-integration] Terminated
make[1]: *** [Makefile:169: test-with-database] Terminated
make: *** [Makefile:162: test-developer] Terminated

Too long with no output (exceeded 10m0s): context deadline exceeded

so similar to what @kalikiana was working on here we should also ensure that we have a time limit within 33-developer-mode.t which is less than 10m. I prepared #3393 for this.

And, too, addressing the potential cause for it: #3394

@mergify mergify bot merged commit 78972ea into os-autoinst:master Sep 16, 2020
@Martchus Martchus deleted the stale-job-detection branch September 16, 2020 08:34
@okurz
Copy link
Member

okurz commented Sep 16, 2020

@kalikiana excellent :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants