feat: implement Firecracker compute backend with E2E tests#57
Conversation
📝 WalkthroughWalkthroughAdds a Firecracker compute backend: Linux adapter and non-Linux noop, config fields and wiring, unit/E2E/API tests, CI job to run Firecracker tests, go.mod dependency on firecracker-go-sdk, and a design doc describing the implementation plan. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as API Server
participant Adapter as Firecracker Adapter
participant VM as Firecracker VM
participant FS as File System
Client->>API: POST /instances (name,image,type)
API->>Adapter: LaunchInstanceWithOptions(opts)
Adapter->>Adapter: Build VM config & update state
alt Mock Mode
Adapter->>Adapter: Mark RUNNING without boot
else Real Firecracker (Linux)
Adapter->>FS: Create sockets & resources
Adapter->>VM: Start Firecracker process (binary)
VM->>VM: Boot kernel & mount rootfs
end
Adapter-->>API: Return instance ID (202)
Client->>API: GET /instances/{id}
API->>Adapter: Query status
Adapter-->>API: Return status (RUNNING)
Client->>API: DELETE /instances/{id}
API->>Adapter: DeleteInstance(id)
alt Mock Mode
Adapter->>Adapter: Remove from state
else Real Firecracker (Linux)
Adapter->>VM: Terminate VM
Adapter->>FS: Cleanup sockets/files
end
Adapter-->>API: Success (200)
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In @.github/workflows/ci.yml:
- Around line 304-307: The CI step that runs TestFirecrackerBackend_E2E should
set the FIRECRACKER_MOCK_MODE env var so the E2E test uses mocked artifacts
instead of relying on /var/lib/thecloud paths; update the "Run Firecracker E2E
Test" job to include FIRECRACKER_MOCK_MODE: "true" in its env block (and verify
the test code that reads FIRECRACKER_MOCK_MODE used by
TestFirecrackerBackend_E2E honors that variable), or if you prefer not to change
behavior add a clear comment on that step stating it is only a compile/startup
smoke-check and may skip actual launches.
- Around line 331-332: The build job's needs list currently omits the
test-firecracker job causing Firecracker failures to not block
deploy-staging/deploy-production; update the build job definition (job name
"build") to include "test-firecracker" in its needs array, or if omission is
intentional, add an inline comment next to the build job explaining why
test-firecracker is not required; modify the CI workflow where the "build" job's
needs are declared to either add "test-firecracker" or document the intentional
exclusion.
In `@internal/repositories/firecracker/adapter_noop.go`:
- Around line 96-98: The noop Ping implementation on FirecrackerAdapter
currently returns an error which will fail health checks; change the Ping(ctx
context.Context) method to behave like DeleteInstance/DeleteNetwork and return
nil for non-Linux/no-op platforms (or at minimum log a warning and then return
nil) so startup health checks don't fail; locate the Ping method on type
FirecrackerAdapter and update it to return nil (and add a warning via the
adapter's logger if available) instead of fmt.Errorf.
- Around line 14-20: The noop Config struct in
internal/repositories/firecracker/adapter_noop.go is missing the MockMode field
present in the Linux Config (adapter.go), causing build failures where
infrastructure.go sets MockMode: cfg.FirecrackerMockMode; add a MockMode bool
field to the noop Config type so its definition matches the Linux Config (ensure
the struct name Config contains MockMode bool), rebuild and verify compilation
on non-Linux targets.
In `@internal/repositories/firecracker/adapter.go`:
- Around line 193-210: The WaitTask method on FirecrackerAdapter currently
swallows the error from m.Wait and returns a fabricated exit code; change the
error handling in WaitTask so that when m.Wait(ctx) returns an error you
propagate that error to the caller instead of returning (1, nil) — e.g., replace
the current err branch in WaitTask to return the appropriate exit code and the
actual err (return 1, err or map to a more specific exit code if you can extract
one from m.Wait), leaving the success path as return 0, nil; update references
to m.Wait, WaitTask, and FirecrackerAdapter in that function accordingly.
- Around line 37-48: The constructor NewFirecrackerAdapter currently ignores
errors from os.MkdirAll and uses a hardcoded "/tmp/firecracker"; update it to
use a named constant (e.g., defaultSocketDir) instead of the magic string for
cfg.SocketDir, call os.MkdirAll and check its returned error, and fail fast by
returning an error or nil adapter (or logging and returning) when directory
creation fails so subsequent operations on FirecrackerAdapter.machines don't hit
confusing errors; ensure references are to NewFirecrackerAdapter, cfg.SocketDir,
os.MkdirAll and the FirecrackerAdapter return path.
- Around line 140-156: The DeleteInstance method accepts a user-controllable id
and currently constructs a file path and swallows errors; to fix,
validate/sanitize id at the start of DeleteInstance (e.g., accept only UUIDs or
restrict to a safe charset and reject if it contains ".." or path separators)
before looking up a.machines[id], use filepath.Join(a.cfg.SocketDir,
id+".socket") to build the path safely, call m.StopVMM() and os.Remove(...)
without discarding their returned errors (handle or wrap and return them), and
return a clear error when the id is invalid or when StopVMM/os.Remove fail;
reference DeleteInstance, a.machines, StopVMM, and cfg.SocketDir when making
these changes.
In `@tests/firecracker_api_test.go`:
- Around line 51-55: The subtest currently does "return" when resp.StatusCode !=
http.StatusAccepted which lets the test silently pass; replace the silent return
with an explicit test failure (e.g., call t.Fatalf or t.Fatalf-like helper) so
the test fails when a non-202 response is received and include the actual status
in the failure message (reference the resp variable and the test handle t in
this check).
- Around line 41-42: json.Marshal and http.NewRequest errors are being ignored;
change the calls in this test to capture their error returns and assert success
with require.NoError so failures aren't silently dropped (e.g., replace bodies
like body, _ := json.Marshal(payload) and req, _ := http.NewRequest(...) with
body, err := json.Marshal(payload); require.NoError(t, err, "json.Marshal
failed") and req, err := http.NewRequest(...); require.NoError(t, err,
"http.NewRequest failed")); apply the same fix to the other occurrences
referenced (the similar calls around lines 73 and 92).
In `@tests/firecracker_e2e_test.go`:
- Around line 47-50: The subtest currently treats a LaunchInstanceWithOptions
error by logging and returning, which makes the test incorrectly pass; replace
the bare return with a call to t.Skipf (e.g., t.Skipf("Launch failed, skipping
test (likely missing artifacts): %v", err)) so the test is marked skipped
instead of passed when LaunchInstanceWithOptions returns an error; update the
code path in the test function in tests/firecracker_e2e_test.go where
LaunchInstanceWithOptions is checked and remove the implicit return-only
behavior.
- Around line 22-26: The test currently constructs firecracker.Config with
hardcoded BinaryPath/KernelPath/RootfsPath which ignores CI-provided env vars;
update the test to read FIRECRACKER_BINARY, FIRECRACKER_KERNEL, and
FIRECRACKER_ROOTFS (via os.Getenv or a small helper that returns the env value
or falls back to the existing hardcoded string) and use those values when
setting firecracker.Config.BinaryPath, .KernelPath, and .RootfsPath so the
CI-installed binary and artifacts are used when present.
🧹 Nitpick comments (7)
internal/repositories/firecracker/adapter.go (1)
54-62: Magic numbers for default vCPU count and memory.Use named constants for the defaults
1and512, and for the byte-to-MB divisor.♻️ Proposed refactor
+const ( + defaultVCPUs = int64(1) + defaultMemoryMB = int64(512) + bytesPerMB = int64(1024 * 1024) +) + func (a *FirecrackerAdapter) LaunchInstanceWithOptions(ctx context.Context, opts ports.CreateInstanceOptions) (string, []string, error) { id := uuid.New().String() socketPath := fmt.Sprintf("%s/%s.socket", a.cfg.SocketDir, id) - vcpus := int64(1) + vcpus := defaultVCPUs if opts.CPULimit > 0 { vcpus = opts.CPULimit } - mem := int64(512) + mem := defaultMemoryMB if opts.MemoryLimit > 0 { - mem = opts.MemoryLimit / 1024 / 1024 // Convert to MB + mem = opts.MemoryLimit / bytesPerMB }As per coding guidelines, "Do not use magic numbers - use named constants instead".
internal/repositories/firecracker/adapter_test.go (1)
17-31: Consider expanding test coverage with table-driven tests.The constructor test only covers the happy path with all fields set. Consider a table-driven test covering edge cases (e.g., empty
SocketDirto verify the default is applied,MockMode: trueto verify type returns"firecracker-mock").As per coding guidelines, "Use table-driven tests in test files".
internal/api/setup/infrastructure.go (1)
113-119:InitLBProxysilently falls through to Docker proxy for Firecracker backend.While the comment on Lines 117-118 acknowledges this, there's no runtime log when a Firecracker-backed deployment hits this fallback. Consider adding an
InfoorWarnlog so operators know the LB proxy behavior isn't Firecracker-native..github/workflows/ci.yml (2)
293-301: Fragile Firecracker version detection and brittle tarball extraction path.Line 295 chains
curl | grep | cut | tr | xargsto extract the latest tag — this is fragile and will fail silently if the GitHub API response changes or if rate-limiting returns an error page. Line 299 assumes a specific directory name inside the tarball (release-${FC_VERSION}-$(uname -m)/) which could change between releases.Consider using
jqfor reliable JSON parsing (it's available onubuntu-latest) and adding a guard to abort early if the version is empty:Proposed fix
- name: Install Firecracker run: | - FC_VERSION=$(curl -s https://api.github.com/repos/firecracker-microvm/firecracker/releases/latest | grep "tag_name" | cut -d : -f 2,3 | tr -d \" | tr -d " ," | xargs) + FC_VERSION=$(curl -sf https://api.github.com/repos/firecracker-microvm/firecracker/releases/latest | jq -r '.tag_name') + if [ -z "$FC_VERSION" ] || [ "$FC_VERSION" = "null" ]; then + echo "::error::Failed to determine Firecracker version" + exit 1 + fi echo "Installing Firecracker $FC_VERSION"
261-286: Service definitions are duplicated instead of reusing the YAML anchor.The Postgres service block is a near-exact copy of
&postgres-servicedefined at line 40. Consider using the existing anchor for DRY-ness. Redis is new, so it can stay inline.tests/firecracker_e2e_test.go (1)
35-41: Magic number for memory limit.Per coding guidelines, magic numbers should be avoided.
128 * 1024 * 1024could use a named constant for clarity, though the expression is reasonably self-documenting in a test context.As per coding guidelines, "Do not use magic numbers - use named constants instead".
tests/firecracker_api_test.go (1)
64-65: Hardcodedtime.Sleepfor provisioning wait is fragile.A 5-second sleep may be too long (wastes CI time) or too short (flaky on slow runners). A polling loop with a timeout on the GET endpoint checking for
RUNNINGstatus would be more reliable and self-documenting.Proposed approach
- // Wait for provision worker to finish (especially in mock mode) - time.Sleep(5 * time.Second) + // Poll until instance reaches RUNNING status (or timeout) + deadline := time.Now().Add(15 * time.Second) + for time.Now().Before(deadline) { + getReq, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil) + require.NoError(t, err) + getReq.Header.Set(testutil.TestHeaderAPIKey, token) + applyTenantHeader(t, getReq, token) + getResp, err := client.Do(getReq) + require.NoError(t, err) + var status struct { + Data domain.Instance `json:"data"` + } + json.NewDecoder(getResp.Body).Decode(&status) + getResp.Body.Close() + if status.Data.Status == domain.StatusRunning { + break + } + time.Sleep(500 * time.Millisecond) + }
There was a problem hiding this comment.
Pull request overview
Adds a new Firecracker-based ComputeBackend implementation (with a non-Linux noop variant), wires it into API setup/config, and introduces Firecracker-focused E2E/API tests plus a dedicated CI job to exercise the new backend (primarily via mock mode).
Changes:
- Added
internal/repositories/firecrackeradapter (linux + noop) backed byfirecracker-go-sdk. - Extended platform config + infrastructure wiring to select Firecracker and pass binary/kernel/rootfs/mock settings.
- Added Firecracker E2E + API lifecycle tests and a CI workflow job to run them.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
internal/repositories/firecracker/adapter.go |
Linux Firecracker adapter implementation + mock mode. |
internal/repositories/firecracker/adapter_noop.go |
Non-Linux noop implementation of the adapter. |
internal/repositories/firecracker/adapter_test.go |
Basic linux-only construction/interface checks. |
internal/repositories/firecracker/adapter_noop_test.go |
Non-linux behavior checks for noop adapter. |
internal/platform/config.go |
Adds Firecracker env-backed configuration fields. |
internal/api/setup/infrastructure.go |
Wires Firecracker backend selection into dependency setup. |
tests/firecracker_e2e_test.go |
“Real” Firecracker backend E2E test scaffold. |
tests/firecracker_api_test.go |
API-level lifecycle test (Launch/Get/Terminate) for Firecracker backend. |
.github/workflows/ci.yml |
Adds a test-firecracker CI job to run Firecracker tests. |
go.mod / go.sum |
Adds Firecracker SDK dependency (and new transitive deps). |
docs/plans/firecracker-backend-implementation.md |
Implementation plan document for the backend. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{ | ||
| Name: opts.Name, | ||
| ImageName: opts.Image, | ||
| Env: opts.Env, | ||
| Cmd: opts.Command, | ||
| CPULimit: int64(opts.CPUs), |
There was a problem hiding this comment.
RunTaskOptions.CPUs is a float64, but here it's converted to int64, which truncates fractional CPU requests (e.g., 0.5 -> 0 -> default to 1 vCPU). This changes scheduling semantics compared to other backends and may over-provision. Consider defining an explicit mapping (e.g., ceil to at least 1, or interpret CPUs as millicores) and documenting it consistently across backends.
| return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{ | |
| Name: opts.Name, | |
| ImageName: opts.Image, | |
| Env: opts.Env, | |
| Cmd: opts.Command, | |
| CPULimit: int64(opts.CPUs), | |
| // Map fractional CPU requests to an integer CPULimit. | |
| // Preserve existing behavior: | |
| // - integer values pass through unchanged, | |
| // - fractional values >= 1 are truncated, | |
| // - 0 < CPUs < 1 is treated as 1 vCPU instead of 0 (which would mean "use default"). | |
| cpuLimit := int64(opts.CPUs) | |
| if opts.CPUs > 0 && cpuLimit == 0 { | |
| cpuLimit = 1 | |
| } | |
| return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{ | |
| Name: opts.Name, | |
| ImageName: opts.Image, | |
| Env: opts.Env, | |
| Cmd: opts.Command, | |
| CPULimit: cpuLimit, |
| go test -v -run TestFirecrackerAPI_E2E ./tests/... | ||
|
|
||
| build: | ||
| needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt] |
There was a problem hiding this comment.
The new test-firecracker job is not included in the build.needs list. If the repository relies on the build job as the primary required check, this can allow merges even when Firecracker tests fail. Consider adding test-firecracker to build.needs (or otherwise ensuring it is a required check).
| needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt] | |
| needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt, test-firecracker] |
| BinaryPath string | ||
| KernelPath string | ||
| RootfsPath string | ||
| SocketDir string |
There was a problem hiding this comment.
Config in the non-Linux (noop) implementation is missing the MockMode field that exists in the Linux build. This will break compilation on non-Linux platforms because callers (e.g., setup/infrastructure) initialize firecracker.Config{..., MockMode: ...} regardless of OS. Keep the Config shape consistent across build tags (add MockMode bool here, even if unused).
| SocketDir string | |
| SocketDir string | |
| MockMode bool |
| id, _, err := adapter.LaunchInstanceWithOptions(ctx, opts) | ||
| // We expect an error if the kernel/rootfs are missing, | ||
| // but we want to see HOW it fails in CI. | ||
| if err != nil { | ||
| t.Logf("Launch failed as expected (likely missing artifacts): %v", err) | ||
| return | ||
| } |
There was a problem hiding this comment.
This E2E test currently treats a failed LaunchInstanceWithOptions as a successful test run (it logs and returns). That makes CI green even when Firecracker provisioning is broken or misconfigured. Consider gating the test behind an explicit env flag (or artifact existence checks) and either t.Skip when prerequisites are missing, or require.NoError when the test is enabled so regressions are caught.
| if resp.StatusCode != http.StatusAccepted { | ||
| t.Logf("Launch failed with status %d (expected 202)", resp.StatusCode) | ||
| return | ||
| } | ||
|
|
||
| var res struct { | ||
| Data domain.Instance `json:"data"` | ||
| } | ||
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) |
There was a problem hiding this comment.
In the Launch step, if the API doesn't return 202 you Logf and return, which marks the subtest as passing and then the rest of the test is skipped. This can hide real regressions in the firecracker backend/API contract. Prefer require.Equal(http.StatusAccepted, resp.StatusCode) (or t.Skipf if the intent is to allow backend unavailability) so the outcome is explicit in test results.
| if resp.StatusCode != http.StatusAccepted { | |
| t.Logf("Launch failed with status %d (expected 202)", resp.StatusCode) | |
| return | |
| } | |
| var res struct { | |
| Data domain.Instance `json:"data"` | |
| } | |
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | |
| require.Equal(t, http.StatusAccepted, resp.StatusCode, "Launch failed with unexpected status") | |
| var res struct { | |
| Data domain.Instance `json:"data"` | |
| } | |
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | |
| Data domain.Instance `json:"data"` | |
| } | |
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) |
| // Wait for provision worker to finish (especially in mock mode) | ||
| time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Using a fixed time.Sleep(5 * time.Second) to wait for provisioning is likely to be flaky across CI and different host loads. There is an existing polling pattern in other E2E tests (loop + timeout) that would make this deterministic; consider polling the instance GET until it reaches RUNNING (or ERROR) with a bounded timeout.
| // Wait for provision worker to finish (especially in mock mode) | |
| time.Sleep(5 * time.Second) | |
| // Wait for provision worker to finish by polling instance status | |
| deadline := time.Now().Add(60 * time.Second) | |
| for { | |
| if time.Now().After(deadline) { | |
| t.Fatalf("instance %s did not reach RUNNING or ERROR status within timeout", instanceID) | |
| } | |
| statusReq, _ := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil) | |
| statusReq.Header.Set(testutil.TestHeaderAPIKey, token) | |
| applyTenantHeader(t, statusReq, token) | |
| statusResp, err := client.Do(statusReq) | |
| require.NoError(t, err) | |
| var statusRes struct { | |
| Data domain.Instance `json:"data"` | |
| } | |
| func() { | |
| defer statusResp.Body.Close() | |
| require.NoError(t, json.NewDecoder(statusResp.Body).Decode(&statusRes)) | |
| }() | |
| if statusRes.Data.Status == domain.StatusRunning { | |
| break | |
| } | |
| if statusRes.Data.Status == domain.StatusError { | |
| t.Fatalf("instance %s entered ERROR status during provisioning", instanceID) | |
| } | |
| time.Sleep(500 * time.Millisecond) | |
| } |
| _ = os.MkdirAll(cfg.SocketDir, 0755) | ||
|
|
There was a problem hiding this comment.
os.MkdirAll errors are ignored here. If the socket directory can't be created (permissions, read-only FS), later failures will be harder to diagnose and may leak partial state. Consider checking the error and either returning a constructor error (change signature) or at least logging and falling back to a safe directory.
| _ = os.MkdirAll(cfg.SocketDir, 0755) | |
| if err := os.MkdirAll(cfg.SocketDir, 0o755); err != nil { | |
| logger.Error("failed to create Firecracker socket directory", "dir", cfg.SocketDir, "err", err) | |
| // Attempt to fall back to a safe default directory if a custom one fails. | |
| const fallbackDir = "/tmp/firecracker" | |
| if cfg.SocketDir != fallbackDir { | |
| if errFallback := os.MkdirAll(fallbackDir, 0o755); errFallback != nil { | |
| logger.Error("failed to create fallback Firecracker socket directory", "dir", fallbackDir, "err", errFallback) | |
| } else { | |
| logger.Info("using fallback Firecracker socket directory", "dir", fallbackDir) | |
| cfg.SocketDir = fallbackDir | |
| } | |
| } | |
| } |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return "", nil, fmt.Errorf("failed to create machine: %w", err) | ||
| } | ||
|
|
||
| if err := m.Start(ctx); err != nil { |
There was a problem hiding this comment.
Potential socket file leak: If m.Start(ctx) fails on line 111, the socket path directory and potentially the socket file itself are not cleaned up. The socket path is created by the Firecracker VMM but if start fails midway, these resources may remain. Consider adding cleanup in the error path or using defer with proper error handling.
| if err := m.Start(ctx); err != nil { | |
| if err := m.Start(ctx); err != nil { | |
| // Best-effort cleanup of the Firecracker socket if start fails to avoid | |
| // leaking leftover socket files on the host filesystem. | |
| if rmErr := os.Remove(socketPath); rmErr != nil && !os.IsNotExist(rmErr) { | |
| a.logger.Warn("failed to remove Firecracker socket after start failure", "socket_path", socketPath, "error", rmErr) | |
| } |
| if a.cfg.MockMode { | ||
| a.logger.Info("Mock mode enabled, skipping real Firecracker start", "instance_id", id) | ||
| a.mu.Lock() | ||
| a.machines[id] = &firecracker.Machine{} // Minimal mock | ||
| a.mu.Unlock() | ||
| return id, nil, nil | ||
| } | ||
|
|
||
| fcCfg := firecracker.Config{ | ||
| SocketPath: socketPath, | ||
| KernelImagePath: a.cfg.KernelPath, | ||
| Drives: []models.Drive{ | ||
| { | ||
| DriveID: firecracker.String("1"), | ||
| IsRootDevice: firecracker.Bool(true), | ||
| IsReadOnly: firecracker.Bool(false), | ||
| PathOnHost: firecracker.String(a.cfg.RootfsPath), | ||
| }, | ||
| }, | ||
| MachineCfg: models.MachineConfiguration{ | ||
| VcpuCount: firecracker.Int64(vcpus), | ||
| MemSizeMib: firecracker.Int64(mem), | ||
| }, | ||
| } | ||
|
|
||
| cmd := firecracker.VMCommandBuilder{}. | ||
| WithBin(a.cfg.BinaryPath). | ||
| WithSocketPath(socketPath). | ||
| Build(ctx) | ||
|
|
||
| m, err := firecracker.NewMachine(ctx, fcCfg, firecracker.WithProcessRunner(cmd)) | ||
| if err != nil { | ||
| return "", nil, fmt.Errorf("failed to create machine: %w", err) | ||
| } | ||
|
|
||
| if err := m.Start(ctx); err != nil { | ||
| return "", nil, fmt.Errorf("failed to start machine: %w", err) | ||
| } | ||
|
|
||
| a.mu.Lock() | ||
| a.machines[id] = m | ||
| a.mu.Unlock() |
There was a problem hiding this comment.
Race condition in machine storage: The machines map is updated without holding the lock during the critical section between checking mock mode (line 76) and storing the machine (lines 78-80, 115-117). If multiple goroutines launch instances concurrently, they could potentially create entries with the same ID or experience data races. Consider holding the lock for the entire critical section or using a single lock/unlock pair.
| req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil) | ||
| require.NoError(t, err, "failed to create request") | ||
|
|
||
| req.Header.Set(testutil.TestHeaderAPIKey, token) | ||
| applyTenantHeader(t, req, token) | ||
|
|
||
| resp, err := client.Do(req) | ||
| require.NoError(t, err) | ||
| defer resp.Body.Close() | ||
|
|
||
| assert.Equal(t, http.StatusOK, resp.StatusCode) | ||
|
|
||
| var res struct { | ||
| Data domain.Instance `json:"data"` | ||
| } | ||
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | ||
| // In mock mode, the status should transition to RUNNING quickly | ||
| assert.Equal(t, domain.StatusRunning, res.Data.Status) |
There was a problem hiding this comment.
The test expects domain.StatusRunning on line 93, but there's no guarantee the status will transition to RUNNING after only a 5-second sleep (line 68). In mock mode, the status transition depends on the mock implementation. This assertion may be flaky. Consider polling with a timeout or verifying the actual behavior of mock mode status transitions.
| req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil) | |
| require.NoError(t, err, "failed to create request") | |
| req.Header.Set(testutil.TestHeaderAPIKey, token) | |
| applyTenantHeader(t, req, token) | |
| resp, err := client.Do(req) | |
| require.NoError(t, err) | |
| defer resp.Body.Close() | |
| assert.Equal(t, http.StatusOK, resp.StatusCode) | |
| var res struct { | |
| Data domain.Instance `json:"data"` | |
| } | |
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | |
| // In mock mode, the status should transition to RUNNING quickly | |
| assert.Equal(t, domain.StatusRunning, res.Data.Status) | |
| deadline := time.Now().Add(30 * time.Second) | |
| var lastStatus domain.Status | |
| for { | |
| req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil) | |
| require.NoError(t, err, "failed to create request") | |
| req.Header.Set(testutil.TestHeaderAPIKey, token) | |
| applyTenantHeader(t, req, token) | |
| resp, err := client.Do(req) | |
| require.NoError(t, err) | |
| func() { | |
| defer resp.Body.Close() | |
| assert.Equal(t, http.StatusOK, resp.StatusCode) | |
| var res struct { | |
| Data domain.Instance `json:"data"` | |
| } | |
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | |
| lastStatus = res.Data.Status | |
| }() | |
| if lastStatus == domain.StatusRunning { | |
| break | |
| } | |
| if time.Now().After(deadline) { | |
| t.Fatalf("instance did not reach RUNNING status within timeout, last status: %v", lastStatus) | |
| } | |
| time.Sleep(500 * time.Millisecond) | |
| } |
| if a.logger != nil { | ||
| a.logger.Warn("Ping called on no-op firecracker adapter") | ||
| } | ||
| return nil |
There was a problem hiding this comment.
The Ping() method for the noop adapter returns nil on line 101, but according to the error pattern in other methods (lines 34-50), it should return an error indicating the platform is not supported. This inconsistency could cause confusion when checking backend availability.
| return nil | |
| return fmt.Errorf("firecracker not supported on this platform") |
| func NewFirecrackerAdapter(logger *slog.Logger, cfg Config) (*FirecrackerAdapter, error) { | ||
| logger.Warn("Firecracker is only supported on Linux. This adapter will not function.") | ||
| return &FirecrackerAdapter{logger: logger}, nil | ||
| } |
There was a problem hiding this comment.
The return signature for NewFirecrackerAdapter in the noop build (line 28) is (*FirecrackerAdapter, error) but only returns the adapter without an error. This is inconsistent with the Linux version (line 47 in adapter.go) where it can return an error. Consider returning just *FirecrackerAdapter for the noop version since it never errors, or maintain consistency by returning (adapter, nil).
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.github/workflows/ci.yml:
- Around line 310-331: The "Run Firecracker API Integration" step lacks an
explicit REDIS_URL environment variable which can cause API init failures;
update that CI step (the job named "Run Firecracker API Integration" where you
run "go run ./cmd/api" and start the API with FIRECRACKER_MOCK_MODE) to set
REDIS_URL (for example to redis://localhost:6379 or redis://127.0.0.1:6379)
either in the env block and/or inline before launching the API so the process
uses the declared Redis service rather than relying on a default fallback.
In `@internal/api/setup/infrastructure.go`:
- Around line 117-119: The InitLBProxy path currently falls back to
docker.NewLBProxyAdapter even when COMPUTE_BACKEND is "firecracker", which can
silently misroute traffic; update InitLBProxy to detect the Firecracker backend
(e.g., check the compute backend config or the branch that currently returns
docker.NewLBProxyAdapter) and instead return an explicit error or a dedicated
noop/unsupported LB proxy adapter for Firecracker (do not return
docker.NewLBProxyAdapter for Firecracker). Locate the return that calls
docker.NewLBProxyAdapter(instanceRepo, vpcRepo) and replace it with a branch
that either returns an error indicating unsupported LB proxy for Firecracker or
constructs/returns a NoopLBProxyAdapter/UnsupportedLBProxy so callers know
Firecracker LB routing is not implemented.
🧹 Nitpick comments (4)
internal/repositories/firecracker/adapter_test.go (1)
18-33: Test leaves/tmp/firecracker-testdirectory behind after run.
NewFirecrackerAdaptercreatescfg.SocketDiron disk viaos.MkdirAll. Consider usingt.TempDir()(auto-cleaned by the test framework) instead of a hardcoded path, which also avoids conflicts with parallel test runs.Proposed fix
func TestNewFirecrackerAdapter(t *testing.T) { logger := slog.Default() cfg := Config{ BinaryPath: "/usr/local/bin/firecracker", KernelPath: "/var/lib/thecloud/vmlinux", RootfsPath: "/var/lib/thecloud/rootfs.ext4", - SocketDir: "/tmp/firecracker-test", + SocketDir: t.TempDir(), }.github/workflows/ci.yml (2)
293-301: Unpinned Firecracker version makes CI non-reproducible.Fetching the
latestrelease means a new Firecracker release with breaking changes (e.g., different archive layout) could silently break CI. Consider pinning to a specific version, similar to how other tools in this workflow are pinned (e.g.,golangci-lint v1.64.8,swag v1.16.6).Proposed fix
- name: Install Firecracker run: | - FC_VERSION=$(curl -s https://api.github.com/repos/firecracker-microvm/firecracker/releases/latest | grep "tag_name" | cut -d : -f 2,3 | tr -d \" | tr -d " ," | xargs) + FC_VERSION="v1.10.1" # Pin for reproducibility; bump explicitly echo "Installing Firecracker $FC_VERSION"
326-331: Background API server may not be ready when tests start, and is never cleaned up.The test code does call
waitForServer()which mitigates the race, but the CI step doesn't explicitly wait or verify server startup before invokinggo test. Also, the backgroundgo runprocess is never explicitly killed—it relies on the job ending. Consider adding an explicit wait and trap for cleanup:Proposed fix
# Start API in background with mock mode FIRECRACKER_MOCK_MODE=true COMPUTE_BACKEND=firecracker go run ./cmd/api & + API_PID=$! + sleep 5 # Give server time to start # Run API E2E tests for firecracker go test -v -run TestFirecrackerAPI_E2E ./tests/... + + kill $API_PID 2>/dev/null || truetests/firecracker_e2e_test.go (1)
46-65: Magic number for memory limit, and no cleanup if assertion fails.Two minor points:
128 * 1024 * 1024is a magic number. Per coding guidelines, consider a named constant (e.g.,const testMemoryLimit = 128 * 1024 * 1024).If
require.NotEmpty(t, id)on Line 61 fails,DeleteInstanceon Line 63 is never reached, potentially leaking the VM in non-mock mode. Consider usingt.Cleanupto ensure teardown.Proposed fix
t.Run("Launch and Delete", func(t *testing.T) { id, _, err := adapter.LaunchInstanceWithOptions(ctx, opts) if err != nil { t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err) } require.NotEmpty(t, id) - - err = adapter.DeleteInstance(ctx, id) - assert.NoError(t, err) + t.Cleanup(func() { + assert.NoError(t, adapter.DeleteInstance(ctx, id)) + }) })
| - name: Run Firecracker API Integration | ||
| env: | ||
| DATABASE_URL: postgres://cloud:cloud@localhost:5433/cloud?sslmode=disable | ||
| COMPUTE_BACKEND: firecracker | ||
| FIRECRACKER_BINARY: /usr/local/bin/firecracker | ||
| FIRECRACKER_KERNEL: /tmp/vmlinux | ||
| FIRECRACKER_ROOTFS: /tmp/rootfs.ext4 | ||
| FIRECRACKER_MOCK_MODE: "true" | ||
| run: | | ||
| # Create dummy artifacts for Go logic path testing | ||
| touch /tmp/vmlinux /tmp/rootfs.ext4 | ||
|
|
||
| # Ensure clean DB | ||
| PGPASSWORD=cloud psql -h localhost -p 5433 -U cloud -d postgres -c "DROP DATABASE IF EXISTS cloud;" | ||
| PGPASSWORD=cloud psql -h localhost -p 5433 -U cloud -d postgres -c "CREATE DATABASE cloud;" | ||
| go run ./cmd/api -migrate-only | ||
|
|
||
| # Start API in background with mock mode | ||
| FIRECRACKER_MOCK_MODE=true COMPUTE_BACKEND=firecracker go run ./cmd/api & | ||
|
|
||
| # Run API E2E tests for firecracker | ||
| go test -v -run TestFirecrackerAPI_E2E ./tests/... |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the CI workflow file
find . -name "ci.yml" -o -name "ci.yaml" | head -5Repository: poyrazK/thecloud
Length of output: 86
🏁 Script executed:
# Look for the workflow file in .github/workflows
cat -n .github/workflows/ci.yml | sed -n '270,340p'Repository: poyrazK/thecloud
Length of output: 3465
🏁 Script executed:
# Search for REDIS_URL usage in Go code
rg -n 'REDIS_URL|RedisURL|redis_url' --type=go -C 2Repository: poyrazK/thecloud
Length of output: 2784
🏁 Script executed:
# Also check for Redis initialization or config in the API
rg -n 'redis|Redis' --type=go ./cmd/api -C 2 | head -100Repository: poyrazK/thecloud
Length of output: 5440
Add REDIS_URL environment variable to the API integration step.
Redis is declared as a service (port 6379) but the "Run Firecracker API Integration" step doesn't explicitly set REDIS_URL. While the API has a default fallback to localhost:6379, the step should explicitly configure Redis when a service is declared in the workflow. The API initialization will fail if Redis is unavailable.
Proposed fix
- name: Run Firecracker API Integration
env:
DATABASE_URL: postgres://cloud:cloud@localhost:5433/cloud?sslmode=disable
COMPUTE_BACKEND: firecracker
FIRECRACKER_BINARY: /usr/local/bin/firecracker
FIRECRACKER_KERNEL: /tmp/vmlinux
FIRECRACKER_ROOTFS: /tmp/rootfs.ext4
FIRECRACKER_MOCK_MODE: "true"
+ REDIS_URL: redis://localhost:6379🤖 Prompt for AI Agents
In @.github/workflows/ci.yml around lines 310 - 331, The "Run Firecracker API
Integration" step lacks an explicit REDIS_URL environment variable which can
cause API init failures; update that CI step (the job named "Run Firecracker API
Integration" where you run "go run ./cmd/api" and start the API with
FIRECRACKER_MOCK_MODE) to set REDIS_URL (for example to redis://localhost:6379
or redis://127.0.0.1:6379) either in the env block and/or inline before
launching the API so the process uses the declared Redis service rather than
relying on a default fallback.
| // Firecracker currently uses basic SLIRP or TAP which might not work with host-level nginx yet, | ||
| // but we fallback to docker-style proxy logic for now or could implement a dedicated one. | ||
| return docker.NewLBProxyAdapter(instanceRepo, vpcRepo) |
There was a problem hiding this comment.
Firecracker LB proxy fallback may silently produce wrong behavior at runtime.
The comment documents that Firecracker falls through to the Docker-based LB proxy. If a user configures COMPUTE_BACKEND=firecracker, InitLBProxy will return a Docker proxy adapter that likely cannot route to Firecracker MicroVMs. Consider returning an explicit error or a noop proxy for Firecracker until proper support is implemented, to avoid silent misrouting.
🤖 Prompt for AI Agents
In `@internal/api/setup/infrastructure.go` around lines 117 - 119, The InitLBProxy
path currently falls back to docker.NewLBProxyAdapter even when COMPUTE_BACKEND
is "firecracker", which can silently misroute traffic; update InitLBProxy to
detect the Firecracker backend (e.g., check the compute backend config or the
branch that currently returns docker.NewLBProxyAdapter) and instead return an
explicit error or a dedicated noop/unsupported LB proxy adapter for Firecracker
(do not return docker.NewLBProxyAdapter for Firecracker). Locate the return that
calls docker.NewLBProxyAdapter(instanceRepo, vpcRepo) and replace it with a
branch that either returns an error indicating unsupported LB proxy for
Firecracker or constructs/returns a NoopLBProxyAdapter/UnsupportedLBProxy so
callers know Firecracker LB routing is not implemented.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| machines map[string]*firecracker.Machine | ||
| mu sync.RWMutex | ||
| } | ||
|
|
||
| // NewFirecrackerAdapter creates a new FirecrackerAdapter. | ||
| func NewFirecrackerAdapter(logger *slog.Logger, cfg Config) (*FirecrackerAdapter, error) { | ||
| if cfg.SocketDir == "" { | ||
| cfg.SocketDir = defaultSocketDir | ||
| } | ||
| if err := os.MkdirAll(cfg.SocketDir, 0755); err != nil { | ||
| return nil, fmt.Errorf("failed to create socket directory %s: %w", cfg.SocketDir, err) | ||
| } | ||
|
|
||
| return &FirecrackerAdapter{ | ||
| cfg: cfg, | ||
| logger: logger, | ||
| machines: make(map[string]*firecracker.Machine), |
There was a problem hiding this comment.
The machines map could grow indefinitely if DeleteInstance is never called or if there are failures during instance launch. Consider implementing a background cleanup mechanism or a maximum map size limit to prevent memory leaks in long-running deployments.
| delete(a.machines, id) | ||
| a.mu.Unlock() | ||
|
|
||
| if !a.cfg.MockMode { | ||
| if err := m.StopVMM(); err != nil { | ||
| a.logger.Warn("failed to stop VMM during deletion", "instance_id", id, "error", err) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The machine instance is deleted from the map before attempting to stop the VMM. If StopVMM fails and needs to be retried, the machine reference is already lost, making cleanup impossible. Consider moving the delete operation after successful VMM cleanup, or implementing a more robust cleanup pattern with retry logic.
| delete(a.machines, id) | |
| a.mu.Unlock() | |
| if !a.cfg.MockMode { | |
| if err := m.StopVMM(); err != nil { | |
| a.logger.Warn("failed to stop VMM during deletion", "instance_id", id, "error", err) | |
| } | |
| } | |
| a.mu.Unlock() | |
| if !a.cfg.MockMode { | |
| if err := m.StopVMM(); err != nil { | |
| a.logger.Warn("failed to stop VMM during deletion", "instance_id", id, "error", err) | |
| return fmt.Errorf("failed to stop VMM for instance %s: %w", id, err) | |
| } | |
| } | |
| a.mu.Lock() | |
| delete(a.machines, id) | |
| a.mu.Unlock() |
| socketPath := filepath.Join(a.cfg.SocketDir, id+".socket") | ||
| if err := os.Remove(socketPath); err != nil && !os.IsNotExist(err) { | ||
| return fmt.Errorf("failed to remove socket file %s: %w", socketPath, err) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
The socket file cleanup happens after the machine is removed from the map and StopVMM is called with only a warning on failure. If the socket file removal fails, the error is returned, but the machine state is already inconsistent (removed from map but VMM potentially still running). Consider handling this scenario more carefully or documenting the expected behavior when socket cleanup fails.
| Drives: []models.Drive{ | ||
| { | ||
| DriveID: firecracker.String("1"), | ||
| IsRootDevice: firecracker.Bool(true), | ||
| IsReadOnly: firecracker.Bool(false), | ||
| PathOnHost: firecracker.String(a.cfg.RootfsPath), | ||
| }, |
There was a problem hiding this comment.
The RootFS drive is configured with IsReadOnly: false, which allows the VM to modify the root filesystem. This could lead to state pollution across VM instances if the same RootFS image is reused. Consider making this configurable or using copy-on-write snapshots to ensure isolation between instances.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (a *FirecrackerAdapter) RunTask(ctx context.Context, opts ports.RunTaskOptions) (string, []string, error) { | ||
| return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{ | ||
| Name: opts.Name, | ||
| ImageName: opts.Image, | ||
| Env: opts.Env, | ||
| Cmd: opts.Command, | ||
| CPULimit: int64(opts.CPUs), | ||
| MemoryLimit: opts.MemoryMB * 1024 * 1024, | ||
| }) |
There was a problem hiding this comment.
RunTask converts opts.CPUs (float64) to int64 via truncation. This silently drops fractional CPU requests (e.g., 2.5 -> 2) and can underprovision tasks. Consider defining a clear mapping (e.g., ceil to at least 1 vCPU, or return a validation error when CPUs is non-integer) and document the behavior.
| // Wait for provision worker to finish (especially in mock mode) | ||
| time.Sleep(5 * time.Second) | ||
| }) | ||
|
|
||
| if instanceID == "" { | ||
| t.Skip("Skipping subsequent steps because instance launch failed or was skipped") | ||
| } | ||
|
|
||
| t.Run("Get Firecracker Instance", func(t *testing.T) { | ||
| req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil) | ||
| require.NoError(t, err, "failed to create request") | ||
|
|
||
| req.Header.Set(testutil.TestHeaderAPIKey, token) | ||
| applyTenantHeader(t, req, token) | ||
|
|
||
| resp, err := client.Do(req) | ||
| require.NoError(t, err) | ||
| defer resp.Body.Close() | ||
|
|
||
| assert.Equal(t, http.StatusOK, resp.StatusCode) | ||
|
|
||
| var res struct { | ||
| Data domain.Instance `json:"data"` | ||
| } | ||
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | ||
| // In mock mode, the status should transition to RUNNING quickly | ||
| assert.Equal(t, domain.StatusRunning, res.Data.Status) | ||
| }) |
There was a problem hiding this comment.
This test uses a fixed sleep (5s) and then asserts the instance is RUNNING. This is flaky in CI because provisioning is async and timing varies; it can fail even if the system is correct. Prefer polling the GET /instances/{id} endpoint until StatusRunning (or until a timeout), similar to TestComputeE2E's WaitForRunning loop.
| // We expect an error if the kernel/rootfs are missing, | ||
| // but we want to see HOW it fails in CI. | ||
| if err != nil { | ||
| t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err) | ||
| } | ||
|
|
||
| require.NotEmpty(t, id) | ||
|
|
There was a problem hiding this comment.
This test skips on LaunchInstanceWithOptions error unconditionally. In FIRECRACKER_MOCK_MODE=true (the CI path), a launch error is a real regression and should fail the test instead of being skipped. Consider only skipping when running in real mode and the environment is missing KVM/artifacts, and using require.NoError when MockMode is enabled.
| // We expect an error if the kernel/rootfs are missing, | |
| // but we want to see HOW it fails in CI. | |
| if err != nil { | |
| t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err) | |
| } | |
| require.NotEmpty(t, id) | |
| // In real mode, we may expect an error if the kernel/rootfs are missing or KVM is unavailable, | |
| // in which case we skip. In mock mode (used in CI), any launch error is a real regression. | |
| if err != nil { | |
| if cfg.MockMode { | |
| require.NoError(t, err, "expected no launch error in mock mode") | |
| } | |
| t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err) | |
| } | |
| require.NotEmpty(t, id) |
| t.Run("firecracker", func(t *testing.T) { | ||
| cfg := &platform.Config{ | ||
| ComputeBackend: "firecracker", | ||
| FirecrackerMockMode: true, | ||
| } | ||
| backend, err := InitComputeBackend(cfg, logger) | ||
| assert.NoError(t, err) | ||
| // type depends on platform if not in mock mode, but with mock mode it should be firecracker-mock | ||
| assert.Contains(t, backend.Type(), "firecracker") | ||
| }) |
There was a problem hiding this comment.
The assertion here is too weak to validate Firecracker init behavior: assert.Contains(..., "firecracker") will also pass on non-Linux ("firecracker-noop") and doesn't verify that MockMode actually results in the mock adapter. Consider asserting firecracker-mock on Linux when FirecrackerMockMode is true, and firecracker-noop on non-Linux (e.g., via runtime.GOOS) to make the test meaningful.
| // Only run this if we are in an environment that supports firecracker | ||
| // or explicitly enabled for CI. | ||
| backend := os.Getenv("COMPUTE_BACKEND") | ||
| if backend != "firecracker" && backend != "firecracker-mock" { | ||
| t.Skipf("Skipping firecracker API test: COMPUTE_BACKEND (%s) is not firecracker or firecracker-mock", backend) |
There was a problem hiding this comment.
The skip gate allows COMPUTE_BACKEND=firecracker-mock, but the runtime configuration (InitComputeBackend) only recognizes COMPUTE_BACKEND=firecracker and uses FIRECRACKER_MOCK_MODE to enable mock behavior. Allowing "firecracker-mock" here is misleading and can cause confusion when the app falls back to Docker. Prefer gating on COMPUTE_BACKEND=="firecracker" and/or FIRECRACKER_MOCK_MODE instead.
| // Only run this if we are in an environment that supports firecracker | |
| // or explicitly enabled for CI. | |
| backend := os.Getenv("COMPUTE_BACKEND") | |
| if backend != "firecracker" && backend != "firecracker-mock" { | |
| t.Skipf("Skipping firecracker API test: COMPUTE_BACKEND (%s) is not firecracker or firecracker-mock", backend) | |
| // Only run this if we are in an environment that supports firecracker | |
| // (COMPUTE_BACKEND=firecracker) or when FIRECRACKER_MOCK_MODE is enabled. | |
| backend := os.Getenv("COMPUTE_BACKEND") | |
| mockMode := os.Getenv("FIRECRACKER_MOCK_MODE") | |
| if backend != "firecracker" && mockMode == "" { | |
| t.Skipf("Skipping firecracker API test: COMPUTE_BACKEND (%s) is not 'firecracker' and FIRECRACKER_MOCK_MODE is not set", backend) |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/core/services/instance_internal_test.go`:
- Around line 154-178: Add test housekeeping to
TestInstanceService_UpdateInstanceMetadata: call t.Parallel() at the top of the
test and call repo.AssertExpectations(t) at the end to verify the mock
expectations set on mockInstanceRepo (the GetByID and Update expectations).
Ensure these calls are placed in TestInstanceService_UpdateInstanceMetadata so
they mirror other tests in the file.
🧹 Nitpick comments (5)
internal/api/setup/infrastructure_test.go (2)
40-50: Skipping on any error masks non-dependency failures.If libvirt is installed but
InitComputeBackendfails for another reason (e.g., invalid URI handling, code regression), this test silently skips rather than failing. Consider checking for a more specific error or environment signal to decide when to skip.Suggested approach
t.Run("libvirt", func(t *testing.T) { + if os.Getenv("LIBVIRT_AVAILABLE") == "" { + t.Skip("skipping libvirt test: LIBVIRT_AVAILABLE not set") + } cfg := &platform.Config{ ComputeBackend: "libvirt", LibvirtURI: "test:///default", } backend, err := InitComputeBackend(cfg, logger) - if err != nil { - t.Skipf("skipping libvirt test as Init failed (likely missing libvirt-dev): %v", err) - } + assert.NoError(t, err) assert.Equal(t, "libvirt", backend.Type()) })
91-106: Passingnilfor all dependencies is fragile and risks nil-pointer panics.Both test cases pass
nilforcomputeBackend,instanceRepo, andvpcRepo. This works only because the current constructors happen to store these without dereferencing. If any constructor adds validation or eagerly uses a dependency, these tests will panic rather than fail gracefully.Consider providing minimal mock implementations instead. As per coding guidelines, "Use
testify/mockfor creating mock objects in tests".Example using simple mocks
+import ( + "github.com/poyrazk/thecloud/internal/core/ports/mocks" +) func TestInitLBProxy(t *testing.T) { - cfg := &platform.Config{} - t.Run("default", func(t *testing.T) { - proxy, err := InitLBProxy(cfg, nil, nil, nil) + cfg := &platform.Config{} + proxy, err := InitLBProxy(cfg, &mocks.ComputeBackend{}, &mocks.InstanceRepository{}, &mocks.VpcRepository{}) assert.NoError(t, err) assert.NotNil(t, proxy) }) t.Run("libvirt", func(t *testing.T) { cfgLibvirt := &platform.Config{ComputeBackend: "libvirt"} - proxy, err := InitLBProxy(cfgLibvirt, nil, nil, nil) + proxy, err := InitLBProxy(cfgLibvirt, &mocks.ComputeBackend{}, &mocks.InstanceRepository{}, &mocks.VpcRepository{}) assert.NoError(t, err) assert.NotNil(t, proxy) }) }internal/core/services/instance_internal_test.go (3)
90-106: Add edge-case coverage for zero denominators in stats calculation.This test only covers the happy path. If
SystemCPUUsagedelta orMemoryStats.Limitis zero, the underlying function will likely divide by zero. Adding at least a zero-delta test case would confirm the function handles it gracefully (returns 0.0, NaN, etc.) rather than panicking.Also, consider using a table-driven pattern here to make it easy to add these cases. As per coding guidelines, "Use table-driven tests in test files".
115-134: Consider converting to a table-driven test.The four sub-tests follow a repetitive pattern that maps naturally to a
[]struct{ name string; ip string; want bool }table. As per coding guidelines, "Use table-driven tests in test files".
136-152: Consider adding boundary values and converting to table-driven.Port parsing has interesting boundaries (
"0","65535","65536","-1"). A table-driven structure would make it trivial to add these. As per coding guidelines, "Use table-driven tests in test files".
| func TestInstanceService_UpdateInstanceMetadata(t *testing.T) { | ||
| repo := new(mockInstanceRepo) | ||
| svc := &InstanceService{repo: repo} | ||
| ctx := context.Background() | ||
| id := uuid.New() | ||
| inst := &domain.Instance{ | ||
| ID: id, | ||
| Metadata: map[string]string{"old": "val"}, | ||
| Labels: map[string]string{"l1": "v1"}, | ||
| } | ||
|
|
||
| repo.On("GetByID", ctx, id).Return(inst, nil).Once() | ||
| repo.On("Update", ctx, inst).Return(nil).Once() | ||
|
|
||
| metadata := map[string]string{"new": "val", "old": ""} // "old" should be deleted | ||
| labels := map[string]string{"l2": "v2"} | ||
|
|
||
| err := svc.UpdateInstanceMetadata(ctx, id, metadata, labels) | ||
| assert.NoError(t, err) | ||
| assert.Equal(t, "val", inst.Metadata["new"]) | ||
| assert.Equal(t, "v2", inst.Labels["l2"]) | ||
| assert.Equal(t, "v1", inst.Labels["l1"]) | ||
| _, ok := inst.Metadata["old"] | ||
| assert.False(t, ok) | ||
| } |
There was a problem hiding this comment.
Missing repo.AssertExpectations(t) and t.Parallel().
Other tests in this file (e.g., line 87) call repo.AssertExpectations(t) to verify all expected mock calls were made. This test sets up GetByID and Update expectations but never asserts they were fulfilled. Also, the existing tests in this file use t.Parallel() — this one should follow the same pattern for consistency.
Proposed fix
func TestInstanceService_UpdateInstanceMetadata(t *testing.T) {
+ t.Parallel()
repo := new(mockInstanceRepo)
svc := &InstanceService{repo: repo}
ctx := context.Background()
@@ -176,4 +177,5 @@
_, ok := inst.Metadata["old"]
assert.False(t, ok)
+ repo.AssertExpectations(t)
}🤖 Prompt for AI Agents
In `@internal/core/services/instance_internal_test.go` around lines 154 - 178, Add
test housekeeping to TestInstanceService_UpdateInstanceMetadata: call
t.Parallel() at the top of the test and call repo.AssertExpectations(t) at the
end to verify the mock expectations set on mockInstanceRepo (the GetByID and
Update expectations). Ensure these calls are placed in
TestInstanceService_UpdateInstanceMetadata so they mirror other tests in the
file.
This PR implements the Firecracker MicroVM compute backend for high-density virtualization.
Changes:
Verification:
Summary by CodeRabbit
New Features
Documentation
Tests
Chores / CI