-
Couldn't load subscription status.
- Fork 387
Refactor how we build unikernels for testing #2316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Unfortunately git didn't understand that |
7c4958f to
0f49331
Compare
|
Tests are still passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this is a good step forward. I'm approving these on the condition that you update the unsandbox_list with all the network tests that require additional capabs, and with an explanation of why they are not sandboxed as a comment next to each item in that list. If you don't know the answer, just describe the error message you get from nix so that others can see what to expect if they try to run the same test sandboxed.
Also, I'd like to hear @MagnusS thoughts on this. We have discussed alternative ways of running tests in the past, to not require any privileges at all, but I think that's a bigger discussion.
test.sh
Outdated
| "websocket" # Linking fails, undefined ref to http_parser_parse_url, http_parser_execute | ||
| ) | ||
|
|
||
| sandboxed=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing that you have two mechanisms for adding and removing from sandbox - setting / unsetting this and then also the unsandbox list. I would prefer a single mechanism, and that we explicitly add the tests that can't run sandboxed do the unsandbox list, with an explanation of why.
For the kernel/integration/term test I think it's because it's trying to open a telnet connection directly. For the others I thought they didn't require capabilities themselves to run, but that creating the bridge needed to happen ahead of time, and that requires additional network / admin capabs. Assuming a CI machine was set up with that bridge by default, would the networking tests then also require extra capabs? This is just a question, I don't know the answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing that you have two mechanisms for adding and removing from sandbox
Agreed. Let's just explicitly list them. It's better to know why than to just glob it. I didn't list them before because there was quite many in the networking list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought they didn't require capabilities themselves to run,
vmrunner requires permission to attach to the bridge. I'm pretty sure it's vmrunner's startup that requires the capability, not the unikernel or messaging.
Assuming a CI machine was set up with that bridge by default, would the networking tests then also require extra capabs? This is just a question, I don't know the answer.
I think so. I don't know either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the kernel/integration/term test I think it's because it's trying to open a telnet connection directly.
I just checked the test. It seems to connect to LiveUpdate (I still haven't understood what that is) over telnet, as you mentioned. Maybe we should rename the test to avoid confusion?
0f49331 to
e8e7224
Compare
|
I think I've addressed the issues mentioned (tests are still passing, hopefully this time without skipping any). Did I miss something? |
All good from my side, but I requested a review from @MagnusS as well as mentioned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I think this is an improvement over the previous solution - it's much better to run these in nix-build. Changes look good to me, but see minor comments inline. It also looks like the section about "Running tests" in README.md will need a small update. Regarding your question about --check -- as long as tests rerun when the kernel has been changed I think it's fine to not add this (we can add it later if needed)
| unikernelPath = absolutePathOf ./. unikernel; | ||
| vmrunnerPkg = | ||
| if vmrunner == "" then | ||
| includeos.vmrunner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be the default above or isn't ${includeos} available yet in the header? If it's not possible to set the default earlier it would be useful with a comment above the "vmrunner ? .." declaration that it will be set to default if left empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, and personally I agree and prefer what you're suggesting, but did it this way anyway for convenience.
The downside of that is that people need to write out --arg vmrunner '(includeos.pkgs.callPackage "/path/to/vmrunner" {})' instead of just --argstr vmrunner ./path/to/vmrunner, which might be confusing for people that don't know about our overlay.
Up to you, I prefer the explicit-by-the-user approach.
|
|
||
| for subfolder in "$base_folder"/*/; do | ||
| local skip=false | ||
| local sandboxed=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a cli option for enabling (or disabling) the tests that require the sandbox to be disabled? It should be possible to run tests on systems without the bridge available (even if the net tests are skipped).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if "sandbox" is the right name here. While it is true that these tests run without a sandbox, the common feature is that they all require access to a network bridge, so being able to run a command that specifically enables them, e.g. "./test.sh --enable-network-bridge-tests", would make more sense to me. But as this has already passed one review I'm happy to leave this as it is for now - we can iterate on this later if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the naming: I was also thinking of calling it requires_shell or something of that nature, but it does require accessing non-/nix/store paths such as those in /dev/ and /etc/qemu.
I was considering taking another look at test.sh again in the future anyway, so we can definitely find a better nomenclature for it by then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding enabling/disabling the sandbox, I'm not sure I see the value. Introducing conditionals could lead to giving false positives, especially if it sets the precedent for more edge cases. Forcing them to stay enabled pushes people to actually set this up properly. See also includeos/vmrunner#45.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful to be able to skip the network tests that require bridge-access in environments where you'd still like to run the regular tests (like CI). Ideally we should find a way to run them without special permissions, but we're not quite there yet. If they just fail I'm ok with that for now - but we may have to make this more flexible for CI later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Regarding building specific tests, there is also the drafted #2314 which I will probably redo from scratch later. Might add a flag like --exclude= when I work on that.
| if $sandboxed; then | ||
| cmd="nix-build ./unikernel.nix $CCACHE_FLAG --argstr unikernel ${subfolder%/} --arg doCheck true" | ||
| else | ||
| cmd="nix-shell ./unikernel.nix $CCACHE_FLAG --argstr unikernel ${subfolder%/} --arg doCheck true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer --pure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might look like that, but I'm reasonably sure this actually still is pure. Let me explain.
nix-build is inherently pure, and nix-shell calls nix-build internally so the purity of the unikernel itself remains. Following, we also set packages to include python with a higher precedence than the host. Unless the test.py introduces something odd, this should be perfectly fine.
I initially tried nix-shell --pure, but this would drop nix-build from the PATH, which I didn't find an elegant solution for. I'd be happy to make it --pure if we have a solution for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, note that --arg doCheck true when added to nix-shell doesn't actually run the checkPhase, but it does expose its required *checkInputs to the shell.
e8e7224 to
07d6e42
Compare
07d6e42 to
7fde7e4
Compare
|
Tests are still working, let me know what we want to do with |
In order to fix #2310, I have migrated the building of the unikernels to nix instead of manually calling cmake in a shellHook.
This changes how we run tests. nix has a
checkPhasespecifically designed to test builds. We now use this whenever we can. Because we're now usingnix-buildfor all the building, this implicitly becomes pure.Another change with this is that we will no longer need to rebuild and recheck tests we already know are passing. We could enable
--checkto force rechecking unchanged derivations. Do we want that?Furthermore, we should now be able to pass an architecture target for testing.
Annoyingly, not all tests are able to run through the checking phase because capabilities (specifically, we need
cap_net_adminandcap_net_rawonqemu-bridge-helper) are dropped upon entering this phase. To work around this, we run the few tests that actually need to be run without a proper reproducible sandbox in a shellHook (this is partially what we were doing before). Ideally, we'd get rid of this. I have seen that security.wrappers.<name>.capabilities exists: maybe we can handle the capabilities through this instead: but I suppose that is part of includeos/vmrunner instead.I have also renamed
example.nixtounikernel.nixbecause it's not really specific toexample/at all. I considered moving it totests/default.nix(it would make./test.shslightly cleaner), instead, but wasn't sure how we feel about "hiding" these entry points. Personally I have a preference towards keeping the root directory uncluttered.shell.nixis not involved in testing at all anymore, but it does provide some useful PATHs upon entering the shell. Maybe we can mergedevelop.nixback intoshell.nixafter this...In summary:
checkPhaseovershellHookwhen possibleshell.nixis now mostly a stubexample.nixtounikernel.nixunikernel.nixthrough nix artifactscap_net_admin+ep qemu-bridge-helperwas missing!)All tests are passing.