Skip to content

Conversation

@mazunki
Copy link
Contributor

@mazunki mazunki commented Oct 17, 2025

In order to fix #2310, I have migrated the building of the unikernels to nix instead of manually calling cmake in a shellHook.

This changes how we run tests. nix has a checkPhase specifically designed to test builds. We now use this whenever we can. Because we're now using nix-build for all the building, this implicitly becomes pure.

Another change with this is that we will no longer need to rebuild and recheck tests we already know are passing. We could enable --check to force rechecking unchanged derivations. Do we want that?
Furthermore, we should now be able to pass an architecture target for testing.

Annoyingly, not all tests are able to run through the checking phase because capabilities (specifically, we need cap_net_admin and cap_net_raw on qemu-bridge-helper) are dropped upon entering this phase. To work around this, we run the few tests that actually need to be run without a proper reproducible sandbox in a shellHook (this is partially what we were doing before). Ideally, we'd get rid of this. I have seen that security.wrappers.<name>.capabilities exists: maybe we can handle the capabilities through this instead: but I suppose that is part of includeos/vmrunner instead.


I have also renamed example.nix to unikernel.nix because it's not really specific to example/ at all. I considered moving it to tests/default.nix (it would make ./test.sh slightly cleaner), instead, but wasn't sure how we feel about "hiding" these entry points. Personally I have a preference towards keeping the root directory uncluttered.

shell.nix is not involved in testing at all anymore, but it does provide some useful PATHs upon entering the shell. Maybe we can merge develop.nix back into shell.nix after this...

In summary:

  • Fixes and closes Impure path used during linking of unikernel image #2310
  • Prefer checkPhase over shellHook when possible
  • shell.nix is now mostly a stub
  • Rename example.nix to unikernel.nix
  • Handle building of test services in unikernel.nix through nix artifacts
  • Update the motd/README upon entering a shell (cap_net_admin+ep qemu-bridge-helper was missing!)

All tests are passing.

@mazunki
Copy link
Contributor Author

mazunki commented Oct 17, 2025

Unfortunately git didn't understand that example.nix => unikernel.nix was a rename, despite using git mv. Sorry about that, maybe I should have made that a stub commit.

@mazunki
Copy link
Contributor Author

mazunki commented Oct 17, 2025

s+u should be stronger than cap_net_admin+ep, but functionally equivalent. I'm not sure why the latter worked on my Gentoo machine, but setuid didn't. I've slightly modified the motd to clarify this.

Tests are still passing.

Copy link
Contributor

@alfreb alfreb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is a good step forward. I'm approving these on the condition that you update the unsandbox_list with all the network tests that require additional capabs, and with an explanation of why they are not sandboxed as a comment next to each item in that list. If you don't know the answer, just describe the error message you get from nix so that others can see what to expect if they try to run the same test sandboxed.

Also, I'd like to hear @MagnusS thoughts on this. We have discussed alternative ways of running tests in the past, to not require any privileges at all, but I think that's a bigger discussion.

test.sh Outdated
"websocket" # Linking fails, undefined ref to http_parser_parse_url, http_parser_execute
)

sandboxed=false
Copy link
Contributor

@alfreb alfreb Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing that you have two mechanisms for adding and removing from sandbox - setting / unsetting this and then also the unsandbox list. I would prefer a single mechanism, and that we explicitly add the tests that can't run sandboxed do the unsandbox list, with an explanation of why.

For the kernel/integration/term test I think it's because it's trying to open a telnet connection directly. For the others I thought they didn't require capabilities themselves to run, but that creating the bridge needed to happen ahead of time, and that requires additional network / admin capabs. Assuming a CI machine was set up with that bridge by default, would the networking tests then also require extra capabs? This is just a question, I don't know the answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing that you have two mechanisms for adding and removing from sandbox

Agreed. Let's just explicitly list them. It's better to know why than to just glob it. I didn't list them before because there was quite many in the networking list.

Copy link
Contributor Author

@mazunki mazunki Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought they didn't require capabilities themselves to run,

vmrunner requires permission to attach to the bridge. I'm pretty sure it's vmrunner's startup that requires the capability, not the unikernel or messaging.

Assuming a CI machine was set up with that bridge by default, would the networking tests then also require extra capabs? This is just a question, I don't know the answer.

I think so. I don't know either.

Copy link
Contributor Author

@mazunki mazunki Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the kernel/integration/term test I think it's because it's trying to open a telnet connection directly.

I just checked the test. It seems to connect to LiveUpdate (I still haven't understood what that is) over telnet, as you mentioned. Maybe we should rename the test to avoid confusion?

@mazunki
Copy link
Contributor Author

mazunki commented Oct 18, 2025

I think I've addressed the issues mentioned (tests are still passing, hopefully this time without skipping any). Did I miss something?

@alfreb
Copy link
Contributor

alfreb commented Oct 19, 2025

I think I've addressed the issues mentioned (tests are still passing, hopefully this time without skipping any). Did I miss something?

All good from my side, but I requested a review from @MagnusS as well as mentioned

Copy link
Member

@MagnusS MagnusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I think this is an improvement over the previous solution - it's much better to run these in nix-build. Changes look good to me, but see minor comments inline. It also looks like the section about "Running tests" in README.md will need a small update. Regarding your question about --check -- as long as tests rerun when the kernel has been changed I think it's fine to not add this (we can add it later if needed)

unikernelPath = absolutePathOf ./. unikernel;
vmrunnerPkg =
if vmrunner == "" then
includeos.vmrunner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be the default above or isn't ${includeos} available yet in the header? If it's not possible to set the default earlier it would be useful with a comment above the "vmrunner ? .." declaration that it will be set to default if left empty.

Copy link
Contributor Author

@mazunki mazunki Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, and personally I agree and prefer what you're suggesting, but did it this way anyway for convenience.

The downside of that is that people need to write out --arg vmrunner '(includeos.pkgs.callPackage "/path/to/vmrunner" {})' instead of just --argstr vmrunner ./path/to/vmrunner, which might be confusing for people that don't know about our overlay.

Up to you, I prefer the explicit-by-the-user approach.


for subfolder in "$base_folder"/*/; do
local skip=false
local sandboxed=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a cli option for enabling (or disabling) the tests that require the sandbox to be disabled? It should be possible to run tests on systems without the bridge available (even if the net tests are skipped).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if "sandbox" is the right name here. While it is true that these tests run without a sandbox, the common feature is that they all require access to a network bridge, so being able to run a command that specifically enables them, e.g. "./test.sh --enable-network-bridge-tests", would make more sense to me. But as this has already passed one review I'm happy to leave this as it is for now - we can iterate on this later if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming: I was also thinking of calling it requires_shell or something of that nature, but it does require accessing non-/nix/store paths such as those in /dev/ and /etc/qemu.

I was considering taking another look at test.sh again in the future anyway, so we can definitely find a better nomenclature for it by then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding enabling/disabling the sandbox, I'm not sure I see the value. Introducing conditionals could lead to giving false positives, especially if it sets the precedent for more edge cases. Forcing them to stay enabled pushes people to actually set this up properly. See also includeos/vmrunner#45.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to be able to skip the network tests that require bridge-access in environments where you'd still like to run the regular tests (like CI). Ideally we should find a way to run them without special permissions, but we're not quite there yet. If they just fail I'm ok with that for now - but we may have to make this more flexible for CI later

Copy link
Contributor Author

@mazunki mazunki Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Regarding building specific tests, there is also the drafted #2314 which I will probably redo from scratch later. Might add a flag like --exclude= when I work on that.

if $sandboxed; then
cmd="nix-build ./unikernel.nix $CCACHE_FLAG --argstr unikernel ${subfolder%/} --arg doCheck true"
else
cmd="nix-shell ./unikernel.nix $CCACHE_FLAG --argstr unikernel ${subfolder%/} --arg doCheck true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer --pure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might look like that, but I'm reasonably sure this actually still is pure. Let me explain.

nix-build is inherently pure, and nix-shell calls nix-build internally so the purity of the unikernel itself remains. Following, we also set packages to include python with a higher precedence than the host. Unless the test.py introduces something odd, this should be perfectly fine.

I initially tried nix-shell --pure, but this would drop nix-build from the PATH, which I didn't find an elegant solution for. I'd be happy to make it --pure if we have a solution for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that --arg doCheck true when added to nix-shell doesn't actually run the checkPhase, but it does expose its required *checkInputs to the shell.

@mazunki
Copy link
Contributor Author

mazunki commented Oct 27, 2025

Tests are still working, let me know what we want to do with vmrunner. I have a tested commit ready to be pushed for the change :)

@MagnusS MagnusS merged commit d361624 into includeos:main Oct 27, 2025
@mazunki mazunki deleted the ref-example-nix branch October 27, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impure path used during linking of unikernel image

3 participants