Skip to content

feat: add macOS VM support via Apple Virtualization.framework#90

Merged
rgarcia merged 24 commits intomainfrom
feat/vz-hypervisor
Feb 15, 2026
Merged

feat: add macOS VM support via Apple Virtualization.framework#90
rgarcia merged 24 commits intomainfrom
feat/vz-hypervisor

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Feb 10, 2026

Summary

  • Add vz hypervisor implementation using Apple's Virtualization.framework via a codesigned subprocess (vz-shim)
  • vsock-based guest communication, shared directory mounts for disk access, macOS-native networking via vmnet
  • Makefile targets: build-darwin, test-darwin, dev-darwin, sign-darwin
  • CI: macOS runner for test-darwin
  • scripts/install.sh and scripts/uninstall.sh with macOS support (launchd, Homebrew PATH, codesign)

Depends on #89 (cross-platform foundation) — merge that first, then rebase this onto main.

Test plan

  • CI passes on Linux runner (no regressions)
  • CI passes on macOS runner (test-darwin)
  • make build-darwin && make sign-darwin succeeds on macOS
  • E2E install test passes on macOS (scripts/e2e-install-test.sh)

🤖 Generated with Claude Code


Note

High Risk
Introduces a new hypervisor backend and a privileged/codesigned subprocess plus platform-specific install/service management, which can impact VM lifecycle, guest connectivity, and production deployments on macOS.

Overview
Adds a new macOS-only vz hypervisor implementation that runs VMs in a separate vz-shim subprocess, exposes a Cloud-Hypervisor-like control API over a Unix socket, and proxies vsock connections via a per-VM vz.vsock socket.

Updates instance creation/boot defaults for vz (vsock socket path + console=hvc0) and routes API operations (exec, cp, stat) through InstanceManager.GetVsockDialer to support multiple hypervisors consistently. Guest init now always overwrites the guest-agent binary to avoid stale/corrupt binaries after restarts.

Adds macOS developer and deployment plumbing: new .env.darwin.example, .air.darwin.toml, Makefile targets for building/signing (build-darwin, sign-darwin, dev-darwin, sign-vz-shim), a self-hosted macOS CI job plus an E2E install test, and expanded scripts/install.sh/scripts/uninstall.sh to support macOS (launchd service, codesigning, macOS paths, CLI artifact differences).

Written by Cursor Bugbot for commit 07550ed. This will update automatically on new commits. Configure here.

@cursor

This comment has been minimized.

@cursor

This comment has been minimized.

Base automatically changed from refactor/cross-platform-foundation to main February 11, 2026 01:09
@cursor

This comment has been minimized.

@cursor

This comment has been minimized.

@cursor

This comment has been minimized.

@rgarcia
Copy link
Contributor Author

rgarcia commented Feb 12, 2026

@cursor push 1353ca6

Copy link
Collaborator

@sjmiller609 sjmiller609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits or possible additional color, looks good

Copy link
Contributor

@hiroTamada hiroTamada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid PR. Clean hypervisor abstraction, good use of build tags, thorough docs and E2E coverage. The vz-shim subprocess design (surviving hypeman restarts, CH-compatible control API) is well thought out. The API handler refactor to use GetVsockDialer is a nice decoupling.

A few minor nits inline, nothing blocking.

@cursor

This comment has been minimized.

@cursor

This comment has been minimized.

@cursor

This comment has been minimized.

if err != nil {
return err
}
devices = append(devices, dev)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network disable flag ignored on macOS

Medium Severity

configureNetwork always creates a NAT NIC when networks is empty, so instances created with NetworkEnabled=false still get guest networking on vz. This makes macOS behavior diverge from the instance request and can unintentionally expose workloads that were expected to run without network access.

Fix in Cursor Fix in Web

@cursor

This comment has been minimized.

Split platform-specific code into _linux.go and _darwin.go files across
resources, network, devices, ingress, vmm, and vm_metrics packages.
Add hypervisor abstraction with registration pattern (RegisterSocketName,
RegisterVsockDialerFactory, RegisterClientFactory) to decouple instance
management from specific hypervisor implementations. Add "vz" to the
OpenAPI hypervisor type enum, erofs disk format support, and insecure
registry option for builds.

No behavioral changes on Linux. macOS can now compile but has no VM
functionality yet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rgarcia and others added 19 commits February 13, 2026 22:03
Add vz hypervisor implementation that runs VMs on macOS using Apple's
Virtualization.framework via a codesigned subprocess (vz-shim). Includes
vsock-based guest communication, shared directory mounts for disk access,
and macOS-native networking via vmnet.

Key components:
- cmd/vz-shim: subprocess that creates and manages vz VMs
- lib/hypervisor/vz: starter, client, and vsock dialer for vz
- Makefile targets: build-darwin, test-darwin, dev-darwin, sign-darwin
- CI: macOS runner for test-darwin
- scripts/install.sh: macOS support (launchd, Homebrew, codesign)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Embed vz.entitlements as a Go resource and write it to a temp file at
  runtime for codesigning, replacing the broken entitlementsPath() that
  looked for the file next to the executable
- Add vz-shim copy step in .air.darwin.toml so the go:embed directive
  can find the binary during dev builds
- Add --entitlements flag to codesign in install.sh download path so
  binaries receive the virtualization entitlement
- Prepend /opt/homebrew/opt/e2fsprogs/sbin to launchd plist PATH so
  mkfs.ext4 from keg-only e2fsprogs is found at runtime

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
disk_darwin.go and disk_linux.go were unified into disk.go in PR #89
but snuck back in during the rebase as new files with no conflicts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nstall

- Read from bufio.Reader instead of raw conn in vsock proxy to prevent
  silent data loss when the buffered reader consumed beyond the newline
- Replace cmd.Process.Release() with go cmd.Wait() to properly reap
  vz-shim child processes instead of leaving zombies
- Update hypervisor README to reflect vz subprocess model (not in-process)
- Remove vz-shim from install/uninstall scripts (it's embedded in
  hypeman-api and extracted at runtime)
- Add CLI smoke tests (hypeman ps, hypeman images) to e2e install test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract JWT_SECRET/PORT with grep instead of sourcing the config file,
  which breaks on macOS where paths contain spaces
- Skip CLI smoke tests gracefully when CLI binary is not installed
  (e.g., no darwin/arm64 release available)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Builder images are now auto-built on startup, so manual push workflow
and the -registry-push flag are no longer needed. The underlying
repo_access JWT infrastructure remains for other registry auth flows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The vz-shim embed is darwin-only (build tag), so the directory isn't
needed on Linux. On macOS the Makefile creates it before compiling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests pull, run, exec, stop, and rm using the CLI against a real
alpine VM to verify the full stack works end-to-end after install.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The .gitkeep was removed so the directory no longer exists in the repo.
The Makefile needs to mkdir -p before copying the built binary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The macOS CLI install on line 653 used bare 'install' while all other
binary installs to $INSTALL_DIR used '$SUDO install'. When /usr/local/bin
isn't writable and $SUDO is set to 'sudo', this caused a permission error
that aborted the script (due to set -e) after the service was already
running, leaving a partial installation.

Applied via @cursor push command
CLI releases use goreleaser naming ("macos" not "darwin", .zip not
.tar.gz). Fix artifact lookup and extraction to handle both formats.

Make CLI presence a hard fail in e2e test — if the install script
can't install the CLI, that's a real failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CLI doesn't have an 'images' subcommand. The VM lifecycle tests
(pull, run, exec, stop, rm) cover real functionality.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Image pulls are async — 'hypeman pull' returns immediately with
status:pending. Retry 'hypeman run' in a loop until the image
is available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove "Alternative Commands" section (make dev covers it)
- Remove known limitations that are implementation details or wrong:
  disk format is handled automatically, snapshots aren't supported,
  network ingress is internal, vz-shim is a subprocess not in-process
- Keep disk format and snapshots as brief notes
- Makefile: 'run' target comment says "for agents" not "for testing"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Requirements: remove "Production"/"Experimental" labels
- Quick Start: "Linux and macOS supported"
- CLI section: reword for local-first usage, remove "remote" framing
- Remove entire "macOS Support" section (platform details belong in
  DEVELOPMENT.md, not the user-facing README)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…egration tests

Changes based on PR review feedback:
- Reduce vz HTTP client timeout from 30s to 10s (local Unix socket)
- Add comment on 2GB memory safety default in vz-shim
- Fix graceful shutdown to only send ACPI power button without immediate
  force-kill fallback, aligning with CH/QEMU semantics
- Add macOS vz integration tests (TestVZBasicLifecycle, TestVZExecAndShutdown)

Test infrastructure improvements:
- Use short /tmp/ paths for vz test temp dirs to avoid macOS 104-byte
  Unix socket path limit (t.TempDir() paths are too long)
- Capture vz-shim stderr and log file contents in error messages for
  better diagnostics when shim fails to start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After a force-kill (vm.Stop), the overlay filesystem could have a
corrupted guest-agent binary. The lazy copy optimization skipped
re-copying the binary if it already existed, causing exec format
error on restart. Always copy from initrd to ensure correctness.

Also adds restart coverage to TestVZBasicLifecycle (stop → start →
exec → verify) with diagnostic log dumping on failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Warn on codesign failure instead of silently swallowing (install.sh)
- Fix vz control interface description: HTTP, not gRPC (README.md)
- Remove dead if/else that set same path on both branches (e2e-install-test.sh)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… networks is empty

When NetworkEnabled=false, the instance's Networks slice is intentionally
empty. The vz shim was incorrectly treating an empty networks slice as
'add default NAT NIC', which gave the guest network access even when
the caller explicitly disabled networking.

Now, when networks is empty, configureNetwork returns immediately without
attaching any NIC, matching the behavior of QEMU and Cloud Hypervisor.

Applied via @cursor push command
@rgarcia
Copy link
Contributor Author

rgarcia commented Feb 14, 2026

@cursor push eaaacb0

StartInstance now takes a StartInstanceRequest parameter (from PR #99).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.

if err != nil {
// Connection reset is expected when shim exits
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shutdown masks shim communication failures

Medium Severity

Client.Shutdown returns nil for any httpClient.Do error, not just expected connection-reset-on-exit cases. If the shim socket is unreachable or the request fails, callers still treat shutdown as successful, which can leave a running vz-shim process while metadata and control flow proceed as if the VM stopped.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Feb 14, 2026

Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.

  • ✅ Fixed: Shutdown masks shim communication failures
    • Changed Shutdown to only ignore expected connection-closed errors (EOF, ECONNRESET, EPIPE, net.ErrClosed) and propagate all other transport errors, so callers can detect when the shutdown request never reached the shim.

Create PR

Or push these changes by commenting:

@cursor push bd4f570bbd
Preview (bd4f570bbd)
diff --git a/lib/hypervisor/vz/client.go b/lib/hypervisor/vz/client.go
--- a/lib/hypervisor/vz/client.go
+++ b/lib/hypervisor/vz/client.go
@@ -5,10 +5,12 @@
 import (
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"io"
 	"net"
 	"net/http"
+	"syscall"
 	"time"
 
 	"github.com/kernel/hypeman/lib/hypervisor"
@@ -116,13 +118,28 @@
 	}
 	resp, err := c.httpClient.Do(req)
 	if err != nil {
-		// Connection reset is expected when shim exits
-		return nil
+		// Connection reset / EOF is expected when the shim exits in
+		// response to the shutdown request. Any other error means the
+		// request may not have reached the shim.
+		if isExpectedShutdownError(err) {
+			return nil
+		}
+		return fmt.Errorf("shutdown shim: %w", err)
 	}
 	defer resp.Body.Close()
 	return nil
 }
 
+// isExpectedShutdownError reports whether err is a connection-closed error
+// that is expected when the vz-shim process exits after handling a shutdown
+// request.
+func isExpectedShutdownError(err error) bool {
+	return errors.Is(err, io.EOF) ||
+		errors.Is(err, syscall.ECONNRESET) ||
+		errors.Is(err, syscall.EPIPE) ||
+		errors.Is(err, net.ErrClosed)
+}
+
 func (c *Client) GetVMInfo(ctx context.Context) (*hypervisor.VMInfo, error) {
 	body, err := c.doGet(ctx, "/api/v1/vm.info")
 	if err != nil {

rgarcia and others added 3 commits February 13, 2026 22:16
After PR #99, init does reboot(POWER_OFF) when the entrypoint exits.
Alpine's default entrypoint (/bin/sh) exits immediately with no stdin,
killing the VM before exec tests can run. Add Cmd: sleep infinity to
keep the VM alive, matching the pattern in volumes_test.go.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After PR #99, init does reboot(POWER_OFF) when the entrypoint exits.
Alpine's /bin/sh exits immediately with no stdin, killing the VM before
exec can run. nginx:alpine has a long-running daemon entrypoint that
keeps the VM alive, matching the pattern in exec_test.go.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 was calling uninstall.sh without KEEP_DATA=false, so the data
directory (including stale VMs from previous failed runs) persisted.
This caused name_conflict errors when the test tried to create
e2e-test-vm again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rgarcia rgarcia merged commit 1c69f41 into main Feb 15, 2026
6 checks passed
@rgarcia rgarcia deleted the feat/vz-hypervisor branch February 15, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants