New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow pinning parallel clusters on one host #5536
Conversation
Great PR! Please pay attention to the following items before merging: Files matching
This is an automatically generated QA checklist based on modified files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh geez, I have no idea what your very elaborate algorithm is doing :D
only trivial questions or corrections. The majority of the change I did not really want to dive into … yet
After thinking about it further I must admit that the algorithm is probably also not able to sort out all sorts of inputs. I guess for that we'd really needed a full SAT solver. However, I hope it is sufficient for our purposes. Actually I suppose we don't even have cases as complicated as the one in the unit test in production. The worst case is that jobs won't be scheduled although they could be scheduled if one mixes up worker classes within a scenario too wildely and then we can still improve the algorithm (with TDD by first adding unit tests according to that production scenaio). |
e16961a
to
be533d6
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5536 +/- ##
========================================
Coverage 98.39% 98.39%
========================================
Files 391 392 +1
Lines 38044 38208 +164
========================================
+ Hits 37432 37596 +164
Misses 612 612 ☔ View full report in Codecov by Sentry. |
40ee544
to
69a19c2
Compare
I added/extended unit tests where needed and refactored the code into smaller functions and moved it into its own file. I also gave it a real test on my local machine with two worker slots and faking the worker host (by manually editing the worker code before starting the particular worker instances). I tested with and without |
It's a very complex feature, but i like how well encapsulated it is. That will help a lot with debugging in the future. |
* See contained documentation change * See https://progress.opensuse.org/issues/135035
* Move code to its own file * Split code into smaller functions * See https://progress.opensuse.org/issues/135035
* Ensure the `PARALLEL_ONE_HOST_ONLY`-setting is considered for all jobs in a parallel cluster when determining scheduled jobs and add according test * Extend tests for picking parallel siblings of running jobs * See https://progress.opensuse.org/issues/135035
113b11e
to
f19f0ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this and potential follow-up work I came up with one open point that I would like to clarify before merging: The openQA scheduler already has access to worker classes and possibly further worker properties from workers. So would it be possible to define a special worker class like "tap-parallel_one_host_only" resembling the new test setting and read that from the worker to have the same effect as the test variable which is to not schedule any across-host clusters on that host?
We discussed more features that involve worker classes and dependencies and how they are intertwined. We need to make sure that this PR is in-line with those ideas for future features:
Note that the last point (3.) is maybe the most interesting one for us because in practice we probably really want to put the setting in Here's the relation of those feature ideas with this PR (how I see it):
I would tend to merge this PR right now as is (plus maybe a note in the documentation that this feature is still subject to change). Then we have something to work with at all for tickets like https://progress.opensuse.org/issues/157414 and can already gather experience with scheduler changes like this in general. Otherwise the PR might also become even more complicated to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree with this PR as-is except for fixing CI failures and please just add one line in the documentation that you added. "This feature and the naming of the variable is subject to change."
I extended the documentation. |
You forgot to add the new file it seems. |
It is not a model itself so to avoid confusion it is better moved to be in just `OpenQA::Scheduler`.
37d553c
to
9a0115e
Compare
The CI failure was due to https://progress.opensuse.org/issues/157540, added a comment there, bumped prio and retriggered the CI job here. |
A draft because I still need to extend unit tests. Possibly the big function should also be split. It got a little bit involved but I think the algorithm works without being a full-blown SAT solver. You might want to read the test code first (before the implementation) to see what the difficulty here actually is.