-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "-enable-monitoring" option to GPU plugin + testing for its options #628
Conversation
I don't see how "needing large amount of test-case duplication" can be justified. Since monitor count is independent from how many devices there are, it can be tested separately:
Further, since
(or how testing 2 and 4 makes math different?) Finally, what could be added still is the check that if there are no devices, also
|
Current expectation is that options are independent. Combining the options is testing for that. But combined options testing works also if a dependent option like this is added later on: eero-t@3fb02c0
I can think of some (contrived) code changes after which two out of 1, 2 & 4 cases would pass, but not all. While such changes might not pass manual review, I think it's good if unit tests catch them before one considers sending changes for review. But I'll change 2 & 4 to a single, larger prime number, that should catch even those contrived examples well enough. :-) In general, I think it's good practice to add few guards also against "impossible" happening, because unit tests aren't supposed to test only current code, but detect also when (potentially much) later changes break expectations in (supposedly) unrelated earlier functionality.
Good point, testing that is definitely needed! I'll add options to test that for one of the no-dev cases. |
e1ca88f
to
d6f7614
Compare
Did the changes mentioned above and pushed rebased version. That dang unrelated "opae-nlb-demo" was broken again. Test update commit message lists what testing it is supposed to add: eero-t@11609a3 |
my feedback is that we should drop |
The cmd-line flag itself looks ok. I do not have an opinion about testing. |
d6f7614
to
6211033
Compare
After discussing with Ukri about windmills, I replaced (CI failed as "opae-nlb-demo" breakage seems now to be permanent.) [1] it moved to "gpu-plugin-proper-option-testing" branch from where suitable bits can be resurrected after GPU plugin test code has acquired more sysfs/devfs setups it needs to test, and more options they need to be tested against. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Added a couple of nits.
6211033
to
b1a1949
Compare
Besides getting ClearLinux testing fixed for CI, only thing still missing before this could be merged, is testing the operator support (last, trivial WIP commit). Do CI checks include any kind of operator testing, or is that something that needs to be done completely manually? According to Ukri, testing operator support is a major undertaking / pain (especially when one does it for the first time), so I was wondering could it be dropped from this PR, or would somebody more familiar with it be willing to review / test the commit that tries to add operator support? :-) |
It's OK to drop from this PR. |
b1a1949
to
ca423b1
Compare
Dropped the WIP/operator commit and updated PR description to reflect current state, so that what gets in will be properly documented. |
ca423b1
to
0877f32
Compare
To reduce scan() function complexity before adding more functionality to it.
To help in: * adding more CLI options in next and later commits, and * to replace magic newDevicePlugin() input parameters with explicitly named one(s)
Make "i915_monitoring" resource (granting access to all GPUs) optional so that it can be enabled only when it's needed.
Tests plugin scan results in setups having none, one and multiple eligible GPU devices, with and without SRIOV enabled, with two different options values. This does not cover verifying number of devices added under "i915_monitoring" resource as that would be much larger change.
0877f32
to
57c8d76
Compare
I fixed the conflicts with main updates, but the result with previous PR changes to GPU plugin scan() function pushed it over the CI lint complexity limit => I added commit moving GPU device sysfs compatibility checks to its own function. Do you want a separate PR for that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to split to another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Commits in this PR do following changes for the GPU plugin:
These changes were dropped from latest versions of the PR:
Alternatives for the last item are: