feat: implement Firecracker compute backend with E2E tests by poyrazK · Pull Request #57 · poyrazK/thecloud

poyrazK · 2026-02-13T14:58:17Z

This PR implements the Firecracker MicroVM compute backend for high-density virtualization.

Changes:

Firecracker Adapter: Core implementation using firecracker-go-sdk v1.0.0.
Hexagonal Architecture: Added firecracker package under internal/repositories.
Mock Mode: Added a mock execution path (FIRECRACKER_MOCK_MODE) to allow CI testing without KVM acceleration.
E2E Tests: Comprehensive API-level tests in tests/firecracker_api_test.go covering the full lifecycle (Launch -> Get -> Terminate).
CI/CD Integration: Updated GitHub Actions to install Firecracker VMM and run specialized Linux tests with full service (Postgres/Redis) dependencies.
Configuration: Added standard environment variables for Firecracker binary, kernel, and rootfs paths.

Verification:

Verified all unit tests pass locally and in CI.
Verified E2E tests pass in CI using the Mock VMM mode.
Achieved >50% test coverage for the new adapter.

Summary by CodeRabbit

New Features
- Introduced Firecracker microVM as an available compute backend with configurable binary/kernel/rootfs paths and a mock mode.
Documentation
- Added a detailed implementation plan for the Firecracker backend.
Tests
- Added unit, integration, and end-to-end tests covering the Firecracker backend.
Chores / CI
- Expanded CI to install Firecracker services and run Firecracker-specific test jobs.

coderabbitai · 2026-02-13T14:58:34Z

📝 Walkthrough

Walkthrough

Adds a Firecracker compute backend: Linux adapter and non-Linux noop, config fields and wiring, unit/E2E/API tests, CI job to run Firecracker tests, go.mod dependency on firecracker-go-sdk, and a design doc describing the implementation plan.

Changes

Cohort / File(s)	Summary
CI & Workflow `.github/workflows/ci.yml`	Adds `test-firecracker` CI job (Postgres/Redis services, installs Firecracker, sets up Go, runs unit, adapter E2E, and API integration tests); updates `build` job `needs` to depend on `test-firecracker`.
Design doc `docs/plans/firecracker-backend-implementation.md`	New implementation plan describing objectives, subtasks, challenges, and success criteria for a Firecracker ComputeBackend adapter.
Dependencies `go.mod`	Adds `github.com/firecracker-microvm/firecracker-go-sdk v1.0.0` plus several transitive dependencies (CNI, netlink/netns, go-openapi, mongo-driver, YAML, etc.).
Config & Init `internal/platform/config.go`, `internal/api/setup/infrastructure.go`	Adds config fields `FirecrackerBinary`, `FirecrackerKernel`, `FirecrackerRootfs`, `FirecrackerMockMode`; InitComputeBackend gains `firecracker` branch constructing the adapter; LB proxy note added.
Firecracker Adapter (Linux) `internal/repositories/firecracker/adapter.go`	New Linux implementation: `Config` and `FirecrackerAdapter` types, machine state map with RWMutex, Launch/Start/Stop/Delete, RunTask/WaitTask, GetInstanceIP, network stubs, mock-mode behavior, and several unimplemented/no-op methods.
Firecracker Adapter (non-Linux noop) `internal/repositories/firecracker/adapter_noop.go`	Non-Linux build-tagged noop adapter returning platform-not-supported errors or defaults; `NewFirecrackerAdapter` returns noop; `Type()` = `firecracker-noop`.
Adapter Tests `internal/repositories/firecracker/adapter_test.go`, `internal/repositories/firecracker/adapter_noop_test.go`	Linux and non-Linux tests asserting interface compliance, initialization, `Type()`, and expected behaviors (errors or defaults) for adapter methods; includes mock-mode checks.
E2E & API Tests `tests/firecracker_e2e_test.go`, `tests/firecracker_api_test.go`	Adds adapter-level E2E test (launch/delete VM) and API-level E2E test exercising instance lifecycle via HTTP; tests gated by `COMPUTE_BACKEND` and platform availability.
Infrastructure & Service Tests `internal/api/setup/infrastructure_test.go`, `internal/core/services/instance_internal_test.go`	Adds tests covering InitComputeBackend variants (firecracker/libvirt), LBProxy variants, storage backend init, and multiple instance service unit tests (stats, name formatting, IP validation, metadata updates).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as API Server
    participant Adapter as Firecracker Adapter
    participant VM as Firecracker VM
    participant FS as File System

    Client->>API: POST /instances (name,image,type)
    API->>Adapter: LaunchInstanceWithOptions(opts)
    Adapter->>Adapter: Build VM config & update state
    alt Mock Mode
        Adapter->>Adapter: Mark RUNNING without boot
    else Real Firecracker (Linux)
        Adapter->>FS: Create sockets & resources
        Adapter->>VM: Start Firecracker process (binary)
        VM->>VM: Boot kernel & mount rootfs
    end
    Adapter-->>API: Return instance ID (202)
    Client->>API: GET /instances/{id}
    API->>Adapter: Query status
    Adapter-->>API: Return status (RUNNING)
    Client->>API: DELETE /instances/{id}
    API->>Adapter: DeleteInstance(id)
    alt Mock Mode
        Adapter->>Adapter: Remove from state
    else Real Firecracker (Linux)
        Adapter->>VM: Terminate VM
        Adapter->>FS: Cleanup sockets/files
    end
    Adapter-->>API: Success (200)

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I stitched a tiny microVM from code and thread,
Kernels hum, or mock ones dream inside my head.
CI wakes the tests, adapters hop and learn to play,
Sockets tidy, logs calm — I nibble bugs away.
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (9 files): ⚔️ `.github/workflows/ci.yml` (content) ⚔️ `go.mod` (content) ⚔️ `go.sum` (content) ⚔️ `internal/api/setup/infrastructure.go` (content) ⚔️ `internal/api/setup/infrastructure_test.go` (content) ⚔️ `internal/core/services/instance_internal_test.go` (content) ⚔️ `internal/core/services/storage.go` (content) ⚔️ `internal/core/services/storage_test.go` (content) ⚔️ `internal/platform/config.go` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: implementing a Firecracker compute backend with end-to-end tests, which matches the substantial additions across adapter code, configuration, CI workflows, and comprehensive test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/firecracker-backend

⚔️ Resolve merge conflicts (beta)

Auto-commit resolved conflicts to branch feature/firecracker-backend
Create stacked PR with resolved conflicts
Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

🤖 Fix all issues with AI agents

In @.github/workflows/ci.yml:
- Around line 304-307: The CI step that runs TestFirecrackerBackend_E2E should
set the FIRECRACKER_MOCK_MODE env var so the E2E test uses mocked artifacts
instead of relying on /var/lib/thecloud paths; update the "Run Firecracker E2E
Test" job to include FIRECRACKER_MOCK_MODE: "true" in its env block (and verify
the test code that reads FIRECRACKER_MOCK_MODE used by
TestFirecrackerBackend_E2E honors that variable), or if you prefer not to change
behavior add a clear comment on that step stating it is only a compile/startup
smoke-check and may skip actual launches.
- Around line 331-332: The build job's needs list currently omits the
test-firecracker job causing Firecracker failures to not block
deploy-staging/deploy-production; update the build job definition (job name
"build") to include "test-firecracker" in its needs array, or if omission is
intentional, add an inline comment next to the build job explaining why
test-firecracker is not required; modify the CI workflow where the "build" job's
needs are declared to either add "test-firecracker" or document the intentional
exclusion.

In `@internal/repositories/firecracker/adapter_noop.go`:
- Around line 96-98: The noop Ping implementation on FirecrackerAdapter
currently returns an error which will fail health checks; change the Ping(ctx
context.Context) method to behave like DeleteInstance/DeleteNetwork and return
nil for non-Linux/no-op platforms (or at minimum log a warning and then return
nil) so startup health checks don't fail; locate the Ping method on type
FirecrackerAdapter and update it to return nil (and add a warning via the
adapter's logger if available) instead of fmt.Errorf.
- Around line 14-20: The noop Config struct in
internal/repositories/firecracker/adapter_noop.go is missing the MockMode field
present in the Linux Config (adapter.go), causing build failures where
infrastructure.go sets MockMode: cfg.FirecrackerMockMode; add a MockMode bool
field to the noop Config type so its definition matches the Linux Config (ensure
the struct name Config contains MockMode bool), rebuild and verify compilation
on non-Linux targets.

In `@internal/repositories/firecracker/adapter.go`:
- Around line 193-210: The WaitTask method on FirecrackerAdapter currently
swallows the error from m.Wait and returns a fabricated exit code; change the
error handling in WaitTask so that when m.Wait(ctx) returns an error you
propagate that error to the caller instead of returning (1, nil) — e.g., replace
the current err branch in WaitTask to return the appropriate exit code and the
actual err (return 1, err or map to a more specific exit code if you can extract
one from m.Wait), leaving the success path as return 0, nil; update references
to m.Wait, WaitTask, and FirecrackerAdapter in that function accordingly.
- Around line 37-48: The constructor NewFirecrackerAdapter currently ignores
errors from os.MkdirAll and uses a hardcoded "/tmp/firecracker"; update it to
use a named constant (e.g., defaultSocketDir) instead of the magic string for
cfg.SocketDir, call os.MkdirAll and check its returned error, and fail fast by
returning an error or nil adapter (or logging and returning) when directory
creation fails so subsequent operations on FirecrackerAdapter.machines don't hit
confusing errors; ensure references are to NewFirecrackerAdapter, cfg.SocketDir,
os.MkdirAll and the FirecrackerAdapter return path.
- Around line 140-156: The DeleteInstance method accepts a user-controllable id
and currently constructs a file path and swallows errors; to fix,
validate/sanitize id at the start of DeleteInstance (e.g., accept only UUIDs or
restrict to a safe charset and reject if it contains ".." or path separators)
before looking up a.machines[id], use filepath.Join(a.cfg.SocketDir,
id+".socket") to build the path safely, call m.StopVMM() and os.Remove(...)
without discarding their returned errors (handle or wrap and return them), and
return a clear error when the id is invalid or when StopVMM/os.Remove fail;
reference DeleteInstance, a.machines, StopVMM, and cfg.SocketDir when making
these changes.

In `@tests/firecracker_api_test.go`:
- Around line 51-55: The subtest currently does "return" when resp.StatusCode !=
http.StatusAccepted which lets the test silently pass; replace the silent return
with an explicit test failure (e.g., call t.Fatalf or t.Fatalf-like helper) so
the test fails when a non-202 response is received and include the actual status
in the failure message (reference the resp variable and the test handle t in
this check).
- Around line 41-42: json.Marshal and http.NewRequest errors are being ignored;
change the calls in this test to capture their error returns and assert success
with require.NoError so failures aren't silently dropped (e.g., replace bodies
like body, _ := json.Marshal(payload) and req, _ := http.NewRequest(...) with
body, err := json.Marshal(payload); require.NoError(t, err, "json.Marshal
failed") and req, err := http.NewRequest(...); require.NoError(t, err,
"http.NewRequest failed")); apply the same fix to the other occurrences
referenced (the similar calls around lines 73 and 92).

In `@tests/firecracker_e2e_test.go`:
- Around line 47-50: The subtest currently treats a LaunchInstanceWithOptions
error by logging and returning, which makes the test incorrectly pass; replace
the bare return with a call to t.Skipf (e.g., t.Skipf("Launch failed, skipping
test (likely missing artifacts): %v", err)) so the test is marked skipped
instead of passed when LaunchInstanceWithOptions returns an error; update the
code path in the test function in tests/firecracker_e2e_test.go where
LaunchInstanceWithOptions is checked and remove the implicit return-only
behavior.
- Around line 22-26: The test currently constructs firecracker.Config with
hardcoded BinaryPath/KernelPath/RootfsPath which ignores CI-provided env vars;
update the test to read FIRECRACKER_BINARY, FIRECRACKER_KERNEL, and
FIRECRACKER_ROOTFS (via os.Getenv or a small helper that returns the env value
or falls back to the existing hardcoded string) and use those values when
setting firecracker.Config.BinaryPath, .KernelPath, and .RootfsPath so the
CI-installed binary and artifacts are used when present.

🧹 Nitpick comments (7)

internal/repositories/firecracker/adapter.go (1)
54-62: Magic numbers for default vCPU count and memory.

Use named constants for the defaults 1 and 512, and for the byte-to-MB divisor.
♻️ Proposed refactor
+const (
+	defaultVCPUs      = int64(1)
+	defaultMemoryMB   = int64(512)
+	bytesPerMB        = int64(1024 * 1024)
+)
+
 func (a *FirecrackerAdapter) LaunchInstanceWithOptions(ctx context.Context, opts ports.CreateInstanceOptions) (string, []string, error) {
 	id := uuid.New().String()
 	socketPath := fmt.Sprintf("%s/%s.socket", a.cfg.SocketDir, id)
 
-	vcpus := int64(1)
+	vcpus := defaultVCPUs
 	if opts.CPULimit > 0 {
 		vcpus = opts.CPULimit
 	}
 
-	mem := int64(512)
+	mem := defaultMemoryMB
 	if opts.MemoryLimit > 0 {
-		mem = opts.MemoryLimit / 1024 / 1024 // Convert to MB
+		mem = opts.MemoryLimit / bytesPerMB
 	}
As per coding guidelines, "Do not use magic numbers - use named constants instead".
internal/repositories/firecracker/adapter_test.go (1)

17-31: Consider expanding test coverage with table-driven tests.

The constructor test only covers the happy path with all fields set. Consider a table-driven test covering edge cases (e.g., empty SocketDir to verify the default is applied, MockMode: true to verify type returns "firecracker-mock").

As per coding guidelines, "Use table-driven tests in test files".

internal/api/setup/infrastructure.go (1)

113-119: InitLBProxy silently falls through to Docker proxy for Firecracker backend.

While the comment on Lines 117-118 acknowledges this, there's no runtime log when a Firecracker-backed deployment hits this fallback. Consider adding an Info or Warn log so operators know the LB proxy behavior isn't Firecracker-native.
.github/workflows/ci.yml (2)
293-301: Fragile Firecracker version detection and brittle tarball extraction path.

Line 295 chains curl | grep | cut | tr | xargs to extract the latest tag — this is fragile and will fail silently if the GitHub API response changes or if rate-limiting returns an error page. Line 299 assumes a specific directory name inside the tarball (release-${FC_VERSION}-$(uname -m)/) which could change between releases.

Consider using jq for reliable JSON parsing (it's available on ubuntu-latest) and adding a guard to abort early if the version is empty:
Proposed fix
       - name: Install Firecracker
         run: |
-          FC_VERSION=$(curl -s https://api.github.com/repos/firecracker-microvm/firecracker/releases/latest | grep "tag_name" | cut -d : -f 2,3 | tr -d \" | tr -d " ," | xargs)
+          FC_VERSION=$(curl -sf https://api.github.com/repos/firecracker-microvm/firecracker/releases/latest | jq -r '.tag_name')
+          if [ -z "$FC_VERSION" ] || [ "$FC_VERSION" = "null" ]; then
+            echo "::error::Failed to determine Firecracker version"
+            exit 1
+          fi
           echo "Installing Firecracker $FC_VERSION"
261-286: Service definitions are duplicated instead of reusing the YAML anchor.

The Postgres service block is a near-exact copy of &postgres-service defined at line 40. Consider using the existing anchor for DRY-ness. Redis is new, so it can stay inline.
tests/firecracker_e2e_test.go (1)

35-41: Magic number for memory limit.

Per coding guidelines, magic numbers should be avoided. 128 * 1024 * 1024 could use a named constant for clarity, though the expression is reasonably self-documenting in a test context.

As per coding guidelines, "Do not use magic numbers - use named constants instead".
tests/firecracker_api_test.go (1)
64-65: Hardcoded time.Sleep for provisioning wait is fragile.

A 5-second sleep may be too long (wastes CI time) or too short (flaky on slow runners). A polling loop with a timeout on the GET endpoint checking for RUNNING status would be more reliable and self-documenting.
Proposed approach
-		// Wait for provision worker to finish (especially in mock mode)
-		time.Sleep(5 * time.Second)
+		// Poll until instance reaches RUNNING status (or timeout)
+		deadline := time.Now().Add(15 * time.Second)
+		for time.Now().Before(deadline) {
+			getReq, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil)
+			require.NoError(t, err)
+			getReq.Header.Set(testutil.TestHeaderAPIKey, token)
+			applyTenantHeader(t, getReq, token)
+			getResp, err := client.Do(getReq)
+			require.NoError(t, err)
+			var status struct {
+				Data domain.Instance `json:"data"`
+			}
+			json.NewDecoder(getResp.Body).Decode(&status)
+			getResp.Body.Close()
+			if status.Data.Status == domain.StatusRunning {
+				break
+			}
+			time.Sleep(500 * time.Millisecond)
+		}

Copilot

Pull request overview

Adds a new Firecracker-based ComputeBackend implementation (with a non-Linux noop variant), wires it into API setup/config, and introduces Firecracker-focused E2E/API tests plus a dedicated CI job to exercise the new backend (primarily via mock mode).

Changes:

Added internal/repositories/firecracker adapter (linux + noop) backed by firecracker-go-sdk.
Extended platform config + infrastructure wiring to select Firecracker and pass binary/kernel/rootfs/mock settings.
Added Firecracker E2E + API lifecycle tests and a CI workflow job to run them.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`internal/repositories/firecracker/adapter.go`	Linux Firecracker adapter implementation + mock mode.
`internal/repositories/firecracker/adapter_noop.go`	Non-Linux noop implementation of the adapter.
`internal/repositories/firecracker/adapter_test.go`	Basic linux-only construction/interface checks.
`internal/repositories/firecracker/adapter_noop_test.go`	Non-linux behavior checks for noop adapter.
`internal/platform/config.go`	Adds Firecracker env-backed configuration fields.
`internal/api/setup/infrastructure.go`	Wires Firecracker backend selection into dependency setup.
`tests/firecracker_e2e_test.go`	“Real” Firecracker backend E2E test scaffold.
`tests/firecracker_api_test.go`	API-level lifecycle test (Launch/Get/Terminate) for Firecracker backend.
`.github/workflows/ci.yml`	Adds a `test-firecracker` CI job to run Firecracker tests.
`go.mod` / `go.sum`	Adds Firecracker SDK dependency (and new transitive deps).
`docs/plans/firecracker-backend-implementation.md`	Implementation plan document for the backend.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T15:06:15Z

+	return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{
+		Name:        opts.Name,
+		ImageName:   opts.Image,
+		Env:         opts.Env,
+		Cmd:         opts.Command,
+		CPULimit:    int64(opts.CPUs),


RunTaskOptions.CPUs is a float64, but here it's converted to int64, which truncates fractional CPU requests (e.g., 0.5 -> 0 -> default to 1 vCPU). This changes scheduling semantics compared to other backends and may over-provision. Consider defining an explicit mapping (e.g., ceil to at least 1, or interpret CPUs as millicores) and documenting it consistently across backends.

Suggested change

return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{

Name: opts.Name,

ImageName: opts.Image,

Env: opts.Env,

Cmd: opts.Command,

CPULimit: int64(opts.CPUs),

// Map fractional CPU requests to an integer CPULimit.

// Preserve existing behavior:

// - integer values pass through unchanged,

// - fractional values >= 1 are truncated,

// - 0 < CPUs < 1 is treated as 1 vCPU instead of 0 (which would mean "use default").

cpuLimit := int64(opts.CPUs)

if opts.CPUs > 0 && cpuLimit == 0 {

cpuLimit = 1

}

return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{

Name: opts.Name,

ImageName: opts.Image,

Env: opts.Env,

Cmd: opts.Command,

CPULimit: cpuLimit,

Copilot · 2026-02-13T15:06:15Z

+          go test -v -run TestFirecrackerAPI_E2E ./tests/...
+
  build:
    needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt]


The new test-firecracker job is not included in the build.needs list. If the repository relies on the build job as the primary required check, this can allow merges even when Firecracker tests fail. Consider adding test-firecracker to build.needs (or otherwise ensuring it is a required check).

Suggested change

needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt]

needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt, test-firecracker]

Copilot · 2026-02-13T15:06:15Z

+	BinaryPath string
+	KernelPath string
+	RootfsPath string
+	SocketDir  string


Config in the non-Linux (noop) implementation is missing the MockMode field that exists in the Linux build. This will break compilation on non-Linux platforms because callers (e.g., setup/infrastructure) initialize firecracker.Config{..., MockMode: ...} regardless of OS. Keep the Config shape consistent across build tags (add MockMode bool here, even if unused).

Suggested change

SocketDir string

SocketDir string

MockMode bool

Copilot · 2026-02-13T15:06:15Z

+		id, _, err := adapter.LaunchInstanceWithOptions(ctx, opts)
+		// We expect an error if the kernel/rootfs are missing, 
+		// but we want to see HOW it fails in CI.
+		if err != nil {
+			t.Logf("Launch failed as expected (likely missing artifacts): %v", err)
+			return 
+		}


This E2E test currently treats a failed LaunchInstanceWithOptions as a successful test run (it logs and returns). That makes CI green even when Firecracker provisioning is broken or misconfigured. Consider gating the test behind an explicit env flag (or artifact existence checks) and either t.Skip when prerequisites are missing, or require.NoError when the test is enabled so regressions are caught.

Copilot · 2026-02-13T15:06:16Z

+		if resp.StatusCode != http.StatusAccepted {
+			t.Logf("Launch failed with status %d (expected 202)", resp.StatusCode)
+			return
+		}
+
+		var res struct {
+			Data domain.Instance `json:"data"`
+		}
+		require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))


In the Launch step, if the API doesn't return 202 you Logf and return, which marks the subtest as passing and then the rest of the test is skipped. This can hide real regressions in the firecracker backend/API contract. Prefer require.Equal(http.StatusAccepted, resp.StatusCode) (or t.Skipf if the intent is to allow backend unavailability) so the outcome is explicit in test results.

Suggested change

if resp.StatusCode != http.StatusAccepted {

t.Logf("Launch failed with status %d (expected 202)", resp.StatusCode)

return

}

var res struct {

Data domain.Instance `json:"data"`

}

require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))

require.Equal(t, http.StatusAccepted, resp.StatusCode, "Launch failed with unexpected status")

var res struct {

Data domain.Instance `json:"data"`

}

require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))

Data domain.Instance `json:"data"`

}

require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))

Copilot · 2026-02-13T15:06:16Z

+		// Wait for provision worker to finish (especially in mock mode)
+		time.Sleep(5 * time.Second)


Using a fixed time.Sleep(5 * time.Second) to wait for provisioning is likely to be flaky across CI and different host loads. There is an existing polling pattern in other E2E tests (loop + timeout) that would make this deterministic; consider polling the instance GET until it reaches RUNNING (or ERROR) with a bounded timeout.

Suggested change

// Wait for provision worker to finish (especially in mock mode)

time.Sleep(5 * time.Second)

// Wait for provision worker to finish by polling instance status

deadline := time.Now().Add(60 * time.Second)

for {

if time.Now().After(deadline) {

t.Fatalf("instance %s did not reach RUNNING or ERROR status within timeout", instanceID)

}

statusReq, _ := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil)

statusReq.Header.Set(testutil.TestHeaderAPIKey, token)

applyTenantHeader(t, statusReq, token)

statusResp, err := client.Do(statusReq)

require.NoError(t, err)

var statusRes struct {

Data domain.Instance `json:"data"`

}

func() {

defer statusResp.Body.Close()

require.NoError(t, json.NewDecoder(statusResp.Body).Decode(&statusRes))

}()

if statusRes.Data.Status == domain.StatusRunning {

break

}

if statusRes.Data.Status == domain.StatusError {

t.Fatalf("instance %s entered ERROR status during provisioning", instanceID)

}

time.Sleep(500 * time.Millisecond)

}

Copilot · 2026-02-13T15:06:16Z

+	_ = os.MkdirAll(cfg.SocketDir, 0755)
+


os.MkdirAll errors are ignored here. If the socket directory can't be created (permissions, read-only FS), later failures will be harder to diagnose and may leak partial state. Consider checking the error and either returning a constructor error (change signature) or at least logging and falling back to a safe directory.

Suggested change

_ = os.MkdirAll(cfg.SocketDir, 0755)

if err := os.MkdirAll(cfg.SocketDir, 0o755); err != nil {

logger.Error("failed to create Firecracker socket directory", "dir", cfg.SocketDir, "err", err)

// Attempt to fall back to a safe default directory if a custom one fails.

const fallbackDir = "/tmp/firecracker"

if cfg.SocketDir != fallbackDir {

if errFallback := os.MkdirAll(fallbackDir, 0o755); errFallback != nil {

logger.Error("failed to create fallback Firecracker socket directory", "dir", fallbackDir, "err", errFallback)

} else {

logger.Info("using fallback Firecracker socket directory", "dir", fallbackDir)

cfg.SocketDir = fallbackDir

}

}

}

…2E tests

Copilot

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T15:37:25Z

+		return "", nil, fmt.Errorf("failed to create machine: %w", err)
+	}
+
+	if err := m.Start(ctx); err != nil {


Potential socket file leak: If m.Start(ctx) fails on line 111, the socket path directory and potentially the socket file itself are not cleaned up. The socket path is created by the Firecracker VMM but if start fails midway, these resources may remain. Consider adding cleanup in the error path or using defer with proper error handling.

Suggested change

if err := m.Start(ctx); err != nil {

if err := m.Start(ctx); err != nil {

// Best-effort cleanup of the Firecracker socket if start fails to avoid

// leaking leftover socket files on the host filesystem.

if rmErr := os.Remove(socketPath); rmErr != nil && !os.IsNotExist(rmErr) {

a.logger.Warn("failed to remove Firecracker socket after start failure", "socket_path", socketPath, "error", rmErr)

}

Copilot · 2026-02-13T15:37:26Z

+	if a.cfg.MockMode {
+		a.logger.Info("Mock mode enabled, skipping real Firecracker start", "instance_id", id)
+		a.mu.Lock()
+		a.machines[id] = &firecracker.Machine{} // Minimal mock
+		a.mu.Unlock()
+		return id, nil, nil
+	}
+
+	fcCfg := firecracker.Config{
+		SocketPath:      socketPath,
+		KernelImagePath: a.cfg.KernelPath,
+		Drives: []models.Drive{
+			{
+				DriveID:      firecracker.String("1"),
+				IsRootDevice: firecracker.Bool(true),
+				IsReadOnly:   firecracker.Bool(false),
+				PathOnHost:   firecracker.String(a.cfg.RootfsPath),
+			},
+		},
+		MachineCfg: models.MachineConfiguration{
+			VcpuCount:  firecracker.Int64(vcpus),
+			MemSizeMib: firecracker.Int64(mem),
+		},
+	}
+
+	cmd := firecracker.VMCommandBuilder{}.
+		WithBin(a.cfg.BinaryPath).
+		WithSocketPath(socketPath).
+		Build(ctx)
+
+	m, err := firecracker.NewMachine(ctx, fcCfg, firecracker.WithProcessRunner(cmd))
+	if err != nil {
+		return "", nil, fmt.Errorf("failed to create machine: %w", err)
+	}
+
+	if err := m.Start(ctx); err != nil {
+		return "", nil, fmt.Errorf("failed to start machine: %w", err)
+	}
+
+	a.mu.Lock()
+	a.machines[id] = m
+	a.mu.Unlock()


Race condition in machine storage: The machines map is updated without holding the lock during the critical section between checking mock mode (line 76) and storing the machine (lines 78-80, 115-117). If multiple goroutines launch instances concurrently, they could potentially create entries with the same ID or experience data races. Consider holding the lock for the entire critical section or using a single lock/unlock pair.

Copilot · 2026-02-13T15:37:26Z

+		req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil)
+		require.NoError(t, err, "failed to create request")
+
+		req.Header.Set(testutil.TestHeaderAPIKey, token)
+		applyTenantHeader(t, req, token)
+
+		resp, err := client.Do(req)
+		require.NoError(t, err)
+		defer resp.Body.Close()
+
+		assert.Equal(t, http.StatusOK, resp.StatusCode)
+
+		var res struct {
+			Data domain.Instance `json:"data"`
+		}
+		require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))
+		// In mock mode, the status should transition to RUNNING quickly
+		assert.Equal(t, domain.StatusRunning, res.Data.Status)


The test expects domain.StatusRunning on line 93, but there's no guarantee the status will transition to RUNNING after only a 5-second sleep (line 68). In mock mode, the status transition depends on the mock implementation. This assertion may be flaky. Consider polling with a timeout or verifying the actual behavior of mock mode status transitions.

Suggested change

req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil)

require.NoError(t, err, "failed to create request")

req.Header.Set(testutil.TestHeaderAPIKey, token)

applyTenantHeader(t, req, token)

resp, err := client.Do(req)

require.NoError(t, err)

defer resp.Body.Close()

assert.Equal(t, http.StatusOK, resp.StatusCode)

var res struct {

Data domain.Instance `json:"data"`

}

require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))

// In mock mode, the status should transition to RUNNING quickly

assert.Equal(t, domain.StatusRunning, res.Data.Status)

deadline := time.Now().Add(30 * time.Second)

var lastStatus domain.Status

for {

req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil)

require.NoError(t, err, "failed to create request")

req.Header.Set(testutil.TestHeaderAPIKey, token)

applyTenantHeader(t, req, token)

resp, err := client.Do(req)

require.NoError(t, err)

func() {

defer resp.Body.Close()

assert.Equal(t, http.StatusOK, resp.StatusCode)

var res struct {

Data domain.Instance `json:"data"`

}

require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))

lastStatus = res.Data.Status

}()

if lastStatus == domain.StatusRunning {

break

}

if time.Now().After(deadline) {

t.Fatalf("instance did not reach RUNNING status within timeout, last status: %v", lastStatus)

}

time.Sleep(500 * time.Millisecond)

}

Copilot · 2026-02-13T15:37:26Z

+	if a.logger != nil {
+		a.logger.Warn("Ping called on no-op firecracker adapter")
+	}
+	return nil


The Ping() method for the noop adapter returns nil on line 101, but according to the error pattern in other methods (lines 34-50), it should return an error indicating the platform is not supported. This inconsistency could cause confusion when checking backend availability.

Suggested change

return nil

return fmt.Errorf("firecracker not supported on this platform")

Copilot · 2026-02-13T15:37:26Z

+func NewFirecrackerAdapter(logger *slog.Logger, cfg Config) (*FirecrackerAdapter, error) {
+	logger.Warn("Firecracker is only supported on Linux. This adapter will not function.")
+	return &FirecrackerAdapter{logger: logger}, nil
+}


The return signature for NewFirecrackerAdapter in the noop build (line 28) is (*FirecrackerAdapter, error) but only returns the adapter without an error. This is inconsistent with the Linux version (line 47 in adapter.go) where it can return an error. Consider returning just *FirecrackerAdapter for the noop version since it never errors, or maintain consistency by returning (adapter, nil).

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.github/workflows/ci.yml:
- Around line 310-331: The "Run Firecracker API Integration" step lacks an
explicit REDIS_URL environment variable which can cause API init failures;
update that CI step (the job named "Run Firecracker API Integration" where you
run "go run ./cmd/api" and start the API with FIRECRACKER_MOCK_MODE) to set
REDIS_URL (for example to redis://localhost:6379 or redis://127.0.0.1:6379)
either in the env block and/or inline before launching the API so the process
uses the declared Redis service rather than relying on a default fallback.

In `@internal/api/setup/infrastructure.go`:
- Around line 117-119: The InitLBProxy path currently falls back to
docker.NewLBProxyAdapter even when COMPUTE_BACKEND is "firecracker", which can
silently misroute traffic; update InitLBProxy to detect the Firecracker backend
(e.g., check the compute backend config or the branch that currently returns
docker.NewLBProxyAdapter) and instead return an explicit error or a dedicated
noop/unsupported LB proxy adapter for Firecracker (do not return
docker.NewLBProxyAdapter for Firecracker). Locate the return that calls
docker.NewLBProxyAdapter(instanceRepo, vpcRepo) and replace it with a branch
that either returns an error indicating unsupported LB proxy for Firecracker or
constructs/returns a NoopLBProxyAdapter/UnsupportedLBProxy so callers know
Firecracker LB routing is not implemented.

🧹 Nitpick comments (4)

internal/repositories/firecracker/adapter_test.go (1)
18-33: Test leaves /tmp/firecracker-test directory behind after run.

NewFirecrackerAdapter creates cfg.SocketDir on disk via os.MkdirAll. Consider using t.TempDir() (auto-cleaned by the test framework) instead of a hardcoded path, which also avoids conflicts with parallel test runs.
Proposed fix
 func TestNewFirecrackerAdapter(t *testing.T) {
 	logger := slog.Default()
 	cfg := Config{
 		BinaryPath: "/usr/local/bin/firecracker",
 		KernelPath: "/var/lib/thecloud/vmlinux",
 		RootfsPath: "/var/lib/thecloud/rootfs.ext4",
-		SocketDir:  "/tmp/firecracker-test",
+		SocketDir:  t.TempDir(),
 	}
.github/workflows/ci.yml (2)
293-301: Unpinned Firecracker version makes CI non-reproducible.

Fetching the latest release means a new Firecracker release with breaking changes (e.g., different archive layout) could silently break CI. Consider pinning to a specific version, similar to how other tools in this workflow are pinned (e.g., golangci-lint v1.64.8, swag v1.16.6).
Proposed fix
      - name: Install Firecracker
        run: |
-         FC_VERSION=$(curl -s https://api.github.com/repos/firecracker-microvm/firecracker/releases/latest | grep "tag_name" | cut -d : -f 2,3 | tr -d \" | tr -d " ," | xargs)
+         FC_VERSION="v1.10.1"  # Pin for reproducibility; bump explicitly
          echo "Installing Firecracker $FC_VERSION"
326-331: Background API server may not be ready when tests start, and is never cleaned up.

The test code does call waitForServer() which mitigates the race, but the CI step doesn't explicitly wait or verify server startup before invoking go test. Also, the background go run process is never explicitly killed—it relies on the job ending. Consider adding an explicit wait and trap for cleanup:
Proposed fix
          # Start API in background with mock mode
          FIRECRACKER_MOCK_MODE=true COMPUTE_BACKEND=firecracker go run ./cmd/api &
+         API_PID=$!
+         sleep 5  # Give server time to start
          
          # Run API E2E tests for firecracker
          go test -v -run TestFirecrackerAPI_E2E ./tests/...
+         
+         kill $API_PID 2>/dev/null || true
tests/firecracker_e2e_test.go (1)
46-65: Magic number for memory limit, and no cleanup if assertion fails.

Two minor points:

128 * 1024 * 1024 is a magic number. Per coding guidelines, consider a named constant (e.g., const testMemoryLimit = 128 * 1024 * 1024).

If require.NotEmpty(t, id) on Line 61 fails, DeleteInstance on Line 63 is never reached, potentially leaking the VM in non-mock mode. Consider using t.Cleanup to ensure teardown.
Proposed fix
 	t.Run("Launch and Delete", func(t *testing.T) {
 		id, _, err := adapter.LaunchInstanceWithOptions(ctx, opts)
 		if err != nil {
 			t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err)
 		}
 
 		require.NotEmpty(t, id)
-		
-		err = adapter.DeleteInstance(ctx, id)
-		assert.NoError(t, err)
+		t.Cleanup(func() {
+			assert.NoError(t, adapter.DeleteInstance(ctx, id))
+		})
 	})

coderabbitai · 2026-02-13T15:37:42Z

+      - name: Run Firecracker API Integration
+        env:
+          DATABASE_URL: postgres://cloud:cloud@localhost:5433/cloud?sslmode=disable
+          COMPUTE_BACKEND: firecracker
+          FIRECRACKER_BINARY: /usr/local/bin/firecracker
+          FIRECRACKER_KERNEL: /tmp/vmlinux
+          FIRECRACKER_ROOTFS: /tmp/rootfs.ext4
+          FIRECRACKER_MOCK_MODE: "true"
+        run: |
+          # Create dummy artifacts for Go logic path testing
+          touch /tmp/vmlinux /tmp/rootfs.ext4
+
+          # Ensure clean DB
+          PGPASSWORD=cloud psql -h localhost -p 5433 -U cloud -d postgres -c "DROP DATABASE IF EXISTS cloud;"
+          PGPASSWORD=cloud psql -h localhost -p 5433 -U cloud -d postgres -c "CREATE DATABASE cloud;"
+          go run ./cmd/api -migrate-only
+
+          # Start API in background with mock mode
+          FIRECRACKER_MOCK_MODE=true COMPUTE_BACKEND=firecracker go run ./cmd/api &
+
+          # Run API E2E tests for firecracker
+          go test -v -run TestFirecrackerAPI_E2E ./tests/...


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the CI workflow file find . -name "ci.yml" -o -name "ci.yaml" | head -5

Repository: poyrazK/thecloud

Length of output: 86

🏁 Script executed:

# Look for the workflow file in .github/workflows cat -n .github/workflows/ci.yml | sed -n '270,340p'

Repository: poyrazK/thecloud

Length of output: 3465

🏁 Script executed:

# Search for REDIS_URL usage in Go code rg -n 'REDIS_URL|RedisURL|redis_url' --type=go -C 2

Repository: poyrazK/thecloud

Length of output: 2784

🏁 Script executed:

# Also check for Redis initialization or config in the API rg -n 'redis|Redis' --type=go ./cmd/api -C 2 | head -100

Repository: poyrazK/thecloud

Length of output: 5440

Add REDIS_URL environment variable to the API integration step.

Redis is declared as a service (port 6379) but the "Run Firecracker API Integration" step doesn't explicitly set REDIS_URL. While the API has a default fallback to localhost:6379, the step should explicitly configure Redis when a service is declared in the workflow. The API initialization will fail if Redis is unavailable.

Proposed fix

- name: Run Firecracker API Integration env: DATABASE_URL: postgres://cloud:cloud@localhost:5433/cloud?sslmode=disable COMPUTE_BACKEND: firecracker FIRECRACKER_BINARY: /usr/local/bin/firecracker FIRECRACKER_KERNEL: /tmp/vmlinux FIRECRACKER_ROOTFS: /tmp/rootfs.ext4 FIRECRACKER_MOCK_MODE: "true" + REDIS_URL: redis://localhost:6379

🤖 Prompt for AI Agents

In @.github/workflows/ci.yml around lines 310 - 331, The "Run Firecracker API Integration" step lacks an explicit REDIS_URL environment variable which can cause API init failures; update that CI step (the job named "Run Firecracker API Integration" where you run "go run ./cmd/api" and start the API with FIRECRACKER_MOCK_MODE) to set REDIS_URL (for example to redis://localhost:6379 or redis://127.0.0.1:6379) either in the env block and/or inline before launching the API so the process uses the declared Redis service rather than relying on a default fallback.

coderabbitai · 2026-02-13T15:37:42Z

+	// Firecracker currently uses basic SLIRP or TAP which might not work with host-level nginx yet,
+	// but we fallback to docker-style proxy logic for now or could implement a dedicated one.
 	return docker.NewLBProxyAdapter(instanceRepo, vpcRepo)


⚠️ Potential issue | 🟡 Minor

Firecracker LB proxy fallback may silently produce wrong behavior at runtime.

The comment documents that Firecracker falls through to the Docker-based LB proxy. If a user configures COMPUTE_BACKEND=firecracker, InitLBProxy will return a Docker proxy adapter that likely cannot route to Firecracker MicroVMs. Consider returning an explicit error or a noop proxy for Firecracker until proper support is implemented, to avoid silent misrouting.

🤖 Prompt for AI Agents

In `@internal/api/setup/infrastructure.go` around lines 117 - 119, The InitLBProxy path currently falls back to docker.NewLBProxyAdapter even when COMPUTE_BACKEND is "firecracker", which can silently misroute traffic; update InitLBProxy to detect the Firecracker backend (e.g., check the compute backend config or the branch that currently returns docker.NewLBProxyAdapter) and instead return an explicit error or a dedicated noop/unsupported LB proxy adapter for Firecracker (do not return docker.NewLBProxyAdapter for Firecracker). Locate the return that calls docker.NewLBProxyAdapter(instanceRepo, vpcRepo) and replace it with a branch that either returns an error indicating unsupported LB proxy for Firecracker or constructs/returns a NoopLBProxyAdapter/UnsupportedLBProxy so callers know Firecracker LB routing is not implemented.

Copilot

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T21:00:23Z

+	machines map[string]*firecracker.Machine
+	mu       sync.RWMutex
+}
+
+// NewFirecrackerAdapter creates a new FirecrackerAdapter.
+func NewFirecrackerAdapter(logger *slog.Logger, cfg Config) (*FirecrackerAdapter, error) {
+	if cfg.SocketDir == "" {
+		cfg.SocketDir = defaultSocketDir
+	}
+	if err := os.MkdirAll(cfg.SocketDir, 0755); err != nil {
+		return nil, fmt.Errorf("failed to create socket directory %s: %w", cfg.SocketDir, err)
+	}
+
+	return &FirecrackerAdapter{
+		cfg:      cfg,
+		logger:   logger,
+		machines: make(map[string]*firecracker.Machine),


The machines map could grow indefinitely if DeleteInstance is never called or if there are failures during instance launch. Consider implementing a background cleanup mechanism or a maximum map size limit to prevent memory leaks in long-running deployments.

Copilot · 2026-02-13T21:00:24Z

+	delete(a.machines, id)
+	a.mu.Unlock()
+
+	if !a.cfg.MockMode {
+		if err := m.StopVMM(); err != nil {
+			a.logger.Warn("failed to stop VMM during deletion", "instance_id", id, "error", err)
+		}
+	}
+


The machine instance is deleted from the map before attempting to stop the VMM. If StopVMM fails and needs to be retried, the machine reference is already lost, making cleanup impossible. Consider moving the delete operation after successful VMM cleanup, or implementing a more robust cleanup pattern with retry logic.

Suggested change

delete(a.machines, id)

a.mu.Unlock()

if !a.cfg.MockMode {

if err := m.StopVMM(); err != nil {

a.logger.Warn("failed to stop VMM during deletion", "instance_id", id, "error", err)

}

}

a.mu.Unlock()

if !a.cfg.MockMode {

if err := m.StopVMM(); err != nil {

a.logger.Warn("failed to stop VMM during deletion", "instance_id", id, "error", err)

return fmt.Errorf("failed to stop VMM for instance %s: %w", id, err)

}

}

a.mu.Lock()

delete(a.machines, id)

a.mu.Unlock()

Copilot · 2026-02-13T21:00:24Z

+	socketPath := filepath.Join(a.cfg.SocketDir, id+".socket")
+	if err := os.Remove(socketPath); err != nil && !os.IsNotExist(err) {
+		return fmt.Errorf("failed to remove socket file %s: %w", socketPath, err)
+	}
+
+	return nil


The socket file cleanup happens after the machine is removed from the map and StopVMM is called with only a warning on failure. If the socket file removal fails, the error is returned, but the machine state is already inconsistent (removed from map but VMM potentially still running). Consider handling this scenario more carefully or documenting the expected behavior when socket cleanup fails.

Copilot · 2026-02-13T21:00:25Z

+		Drives: []models.Drive{
+			{
+				DriveID:      firecracker.String("1"),
+				IsRootDevice: firecracker.Bool(true),
+				IsReadOnly:   firecracker.Bool(false),
+				PathOnHost:   firecracker.String(a.cfg.RootfsPath),
+			},


The RootFS drive is configured with IsReadOnly: false, which allows the VM to modify the root filesystem. This could lead to state pollution across VM instances if the same RootFS image is reused. Consider making this configurable or using copy-on-write snapshots to ensure isolation between instances.

…ndencies

Copilot

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T21:21:25Z

+func (a *FirecrackerAdapter) RunTask(ctx context.Context, opts ports.RunTaskOptions) (string, []string, error) {
+	return a.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{
+		Name:        opts.Name,
+		ImageName:   opts.Image,
+		Env:         opts.Env,
+		Cmd:         opts.Command,
+		CPULimit:    int64(opts.CPUs),
+		MemoryLimit: opts.MemoryMB * 1024 * 1024,
+	})


RunTask converts opts.CPUs (float64) to int64 via truncation. This silently drops fractional CPU requests (e.g., 2.5 -> 2) and can underprovision tasks. Consider defining a clear mapping (e.g., ceil to at least 1 vCPU, or return a validation error when CPUs is non-integer) and document the behavior.

Copilot · 2026-02-13T21:21:25Z

+		// Wait for provision worker to finish (especially in mock mode)
+		time.Sleep(5 * time.Second)
+	})
+
+	if instanceID == "" {
+		t.Skip("Skipping subsequent steps because instance launch failed or was skipped")
+	}
+
+	t.Run("Get Firecracker Instance", func(t *testing.T) {
+		req, err := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil)
+		require.NoError(t, err, "failed to create request")
+
+		req.Header.Set(testutil.TestHeaderAPIKey, token)
+		applyTenantHeader(t, req, token)
+
+		resp, err := client.Do(req)
+		require.NoError(t, err)
+		defer resp.Body.Close()
+
+		assert.Equal(t, http.StatusOK, resp.StatusCode)
+
+		var res struct {
+			Data domain.Instance `json:"data"`
+		}
+		require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))
+		// In mock mode, the status should transition to RUNNING quickly
+		assert.Equal(t, domain.StatusRunning, res.Data.Status)
+	})


This test uses a fixed sleep (5s) and then asserts the instance is RUNNING. This is flaky in CI because provisioning is async and timing varies; it can fail even if the system is correct. Prefer polling the GET /instances/{id} endpoint until StatusRunning (or until a timeout), similar to TestComputeE2E's WaitForRunning loop.

Copilot · 2026-02-13T21:21:25Z

+		// We expect an error if the kernel/rootfs are missing, 
+		// but we want to see HOW it fails in CI.
+		if err != nil {
+			t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err)
+		}
+
+		require.NotEmpty(t, id)
+


This test skips on LaunchInstanceWithOptions error unconditionally. In FIRECRACKER_MOCK_MODE=true (the CI path), a launch error is a real regression and should fail the test instead of being skipped. Consider only skipping when running in real mode and the environment is missing KVM/artifacts, and using require.NoError when MockMode is enabled.

Suggested change

// We expect an error if the kernel/rootfs are missing,

// but we want to see HOW it fails in CI.

if err != nil {

t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err)

}

require.NotEmpty(t, id)

// In real mode, we may expect an error if the kernel/rootfs are missing or KVM is unavailable,

// in which case we skip. In mock mode (used in CI), any launch error is a real regression.

if err != nil {

if cfg.MockMode {

require.NoError(t, err, "expected no launch error in mock mode")

}

t.Skipf("Launch failed, skipping test (likely missing artifacts or KVM access): %v", err)

}

require.NotEmpty(t, id)

Copilot · 2026-02-13T21:21:26Z

+	t.Run("firecracker", func(t *testing.T) {
+		cfg := &platform.Config{
+			ComputeBackend:      "firecracker",
+			FirecrackerMockMode: true,
+		}
+		backend, err := InitComputeBackend(cfg, logger)
+		assert.NoError(t, err)
+		// type depends on platform if not in mock mode, but with mock mode it should be firecracker-mock
+		assert.Contains(t, backend.Type(), "firecracker")
+	})


The assertion here is too weak to validate Firecracker init behavior: assert.Contains(..., "firecracker") will also pass on non-Linux ("firecracker-noop") and doesn't verify that MockMode actually results in the mock adapter. Consider asserting firecracker-mock on Linux when FirecrackerMockMode is true, and firecracker-noop on non-Linux (e.g., via runtime.GOOS) to make the test meaningful.

Copilot · 2026-02-13T21:21:26Z

+	// Only run this if we are in an environment that supports firecracker 
+	// or explicitly enabled for CI.
+	backend := os.Getenv("COMPUTE_BACKEND")
+	if backend != "firecracker" && backend != "firecracker-mock" {
+		t.Skipf("Skipping firecracker API test: COMPUTE_BACKEND (%s) is not firecracker or firecracker-mock", backend)


The skip gate allows COMPUTE_BACKEND=firecracker-mock, but the runtime configuration (InitComputeBackend) only recognizes COMPUTE_BACKEND=firecracker and uses FIRECRACKER_MOCK_MODE to enable mock behavior. Allowing "firecracker-mock" here is misleading and can cause confusion when the app falls back to Docker. Prefer gating on COMPUTE_BACKEND=="firecracker" and/or FIRECRACKER_MOCK_MODE instead.

Suggested change

// Only run this if we are in an environment that supports firecracker

// or explicitly enabled for CI.

backend := os.Getenv("COMPUTE_BACKEND")

if backend != "firecracker" && backend != "firecracker-mock" {

t.Skipf("Skipping firecracker API test: COMPUTE_BACKEND (%s) is not firecracker or firecracker-mock", backend)

// Only run this if we are in an environment that supports firecracker

// (COMPUTE_BACKEND=firecracker) or when FIRECRACKER_MOCK_MODE is enabled.

backend := os.Getenv("COMPUTE_BACKEND")

mockMode := os.Getenv("FIRECRACKER_MOCK_MODE")

if backend != "firecracker" && mockMode == "" {

t.Skipf("Skipping firecracker API test: COMPUTE_BACKEND (%s) is not 'firecracker' and FIRECRACKER_MOCK_MODE is not set", backend)

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@internal/core/services/instance_internal_test.go`:
- Around line 154-178: Add test housekeeping to
TestInstanceService_UpdateInstanceMetadata: call t.Parallel() at the top of the
test and call repo.AssertExpectations(t) at the end to verify the mock
expectations set on mockInstanceRepo (the GetByID and Update expectations).
Ensure these calls are placed in TestInstanceService_UpdateInstanceMetadata so
they mirror other tests in the file.

🧹 Nitpick comments (5)

internal/api/setup/infrastructure_test.go (2)
40-50: Skipping on any error masks non-dependency failures.

If libvirt is installed but InitComputeBackend fails for another reason (e.g., invalid URI handling, code regression), this test silently skips rather than failing. Consider checking for a more specific error or environment signal to decide when to skip.
Suggested approach
 	t.Run("libvirt", func(t *testing.T) {
+		if os.Getenv("LIBVIRT_AVAILABLE") == "" {
+			t.Skip("skipping libvirt test: LIBVIRT_AVAILABLE not set")
+		}
 		cfg := &platform.Config{
 			ComputeBackend: "libvirt",
 			LibvirtURI:     "test:///default",
 		}
 		backend, err := InitComputeBackend(cfg, logger)
-		if err != nil {
-			t.Skipf("skipping libvirt test as Init failed (likely missing libvirt-dev): %v", err)
-		}
+		assert.NoError(t, err)
 		assert.Equal(t, "libvirt", backend.Type())
 	})
91-106: Passing nil for all dependencies is fragile and risks nil-pointer panics.

Both test cases pass nil for computeBackend, instanceRepo, and vpcRepo. This works only because the current constructors happen to store these without dereferencing. If any constructor adds validation or eagerly uses a dependency, these tests will panic rather than fail gracefully.

Consider providing minimal mock implementations instead. As per coding guidelines, "Use testify/mock for creating mock objects in tests".
Example using simple mocks
+import (
+	"github.com/poyrazk/thecloud/internal/core/ports/mocks"
+)

 func TestInitLBProxy(t *testing.T) {
-	cfg := &platform.Config{}
-
 	t.Run("default", func(t *testing.T) {
-		proxy, err := InitLBProxy(cfg, nil, nil, nil)
+		cfg := &platform.Config{}
+		proxy, err := InitLBProxy(cfg, &mocks.ComputeBackend{}, &mocks.InstanceRepository{}, &mocks.VpcRepository{})
 		assert.NoError(t, err)
 		assert.NotNil(t, proxy)
 	})
 
 	t.Run("libvirt", func(t *testing.T) {
 		cfgLibvirt := &platform.Config{ComputeBackend: "libvirt"}
-		proxy, err := InitLBProxy(cfgLibvirt, nil, nil, nil)
+		proxy, err := InitLBProxy(cfgLibvirt, &mocks.ComputeBackend{}, &mocks.InstanceRepository{}, &mocks.VpcRepository{})
 		assert.NoError(t, err)
 		assert.NotNil(t, proxy)
 	})
 }
internal/core/services/instance_internal_test.go (3)

90-106: Add edge-case coverage for zero denominators in stats calculation.

This test only covers the happy path. If SystemCPUUsage delta or MemoryStats.Limit is zero, the underlying function will likely divide by zero. Adding at least a zero-delta test case would confirm the function handles it gracefully (returns 0.0, NaN, etc.) rather than panicking.

Also, consider using a table-driven pattern here to make it easy to add these cases. As per coding guidelines, "Use table-driven tests in test files".

115-134: Consider converting to a table-driven test.

The four sub-tests follow a repetitive pattern that maps naturally to a []struct{ name string; ip string; want bool } table. As per coding guidelines, "Use table-driven tests in test files".

136-152: Consider adding boundary values and converting to table-driven.

Port parsing has interesting boundaries ("0", "65535", "65536", "-1"). A table-driven structure would make it trivial to add these. As per coding guidelines, "Use table-driven tests in test files".

coderabbitai · 2026-02-14T11:58:28Z

+func TestInstanceService_UpdateInstanceMetadata(t *testing.T) {
+	repo := new(mockInstanceRepo)
+	svc := &InstanceService{repo: repo}
+	ctx := context.Background()
+	id := uuid.New()
+	inst := &domain.Instance{
+		ID:       id,
+		Metadata: map[string]string{"old": "val"},
+		Labels:   map[string]string{"l1": "v1"},
+	}
+
+	repo.On("GetByID", ctx, id).Return(inst, nil).Once()
+	repo.On("Update", ctx, inst).Return(nil).Once()
+
+	metadata := map[string]string{"new": "val", "old": ""} // "old" should be deleted
+	labels := map[string]string{"l2": "v2"}
+
+	err := svc.UpdateInstanceMetadata(ctx, id, metadata, labels)
+	assert.NoError(t, err)
+	assert.Equal(t, "val", inst.Metadata["new"])
+	assert.Equal(t, "v2", inst.Labels["l2"])
+	assert.Equal(t, "v1", inst.Labels["l1"])
+	_, ok := inst.Metadata["old"]
+	assert.False(t, ok)
+}


⚠️ Potential issue | 🟡 Minor

Missing repo.AssertExpectations(t) and t.Parallel().

Other tests in this file (e.g., line 87) call repo.AssertExpectations(t) to verify all expected mock calls were made. This test sets up GetByID and Update expectations but never asserts they were fulfilled. Also, the existing tests in this file use t.Parallel() — this one should follow the same pattern for consistency.

Proposed fix

func TestInstanceService_UpdateInstanceMetadata(t *testing.T) { + t.Parallel() repo := new(mockInstanceRepo) svc := &InstanceService{repo: repo} ctx := context.Background() @@ -176,4 +177,5 @@ _, ok := inst.Metadata["old"] assert.False(t, ok) + repo.AssertExpectations(t) }

🤖 Prompt for AI Agents

In `@internal/core/services/instance_internal_test.go` around lines 154 - 178, Add test housekeeping to TestInstanceService_UpdateInstanceMetadata: call t.Parallel() at the top of the test and call repo.AssertExpectations(t) at the end to verify the mock expectations set on mockInstanceRepo (the GetByID and Update expectations). Ensure these calls are placed in TestInstanceService_UpdateInstanceMetadata so they mirror other tests in the file.

poyrazK added 17 commits February 13, 2026 15:35

docs: add implementation plan for firecracker backend

4fa97a1

chore: add firecracker-go-sdk dependency

ff9f10e

feat: add firecracker configuration fields

7d3d01e

feat: implement firecracker compute adapter with multi-platform support

c5de044

feat: integrate firecracker backend into infrastructure setup

a62c6b0

test: add firecracker e2e test and CI job

cf1db72

fix: correct firecracker install and task options mapping

3f0a00a

fix: resolve firecracker sdk compilation and install issues in CI

7801c14

test: add comprehensive coverage for no-op firecracker adapter

6522bf9

test: add firecracker API level E2E test and artifact simulation

a82b075

fix: remove unused fmt import in firecracker api test

9621cbc

fix: link postgres service to test-firecracker job

af28e13

fix: add redis service to test-firecracker job

79bc3d6

feat: add firecracker mock mode for CI testing without KVM

5997cc2

test: fix firecracker API status expectation and update type naming

2f6553c

test: update firecracker E2E to support firecracker-mock backend

03aef37

test: ensure mock mode is propagated to API and verify running status

edf1de7

Copilot AI review requested due to automatic review settings February 13, 2026 14:58

github-advanced-security AI found potential problems Feb 13, 2026

View reviewed changes

Comment thread internal/repositories/firecracker/adapter.go Fixed

Copilot started reviewing on behalf of poyrazK February 13, 2026 14:58 View session

coderabbitai Bot reviewed Feb 13, 2026

View reviewed changes

Copilot AI reviewed Feb 13, 2026

View reviewed changes

poyrazK added 2 commits February 13, 2026 18:28

refactor: apply coderabbit improvements for firecracker backend and E…

4c13561

…2E tests

fix: handle NewFirecrackerAdapter error return and update no-op Ping

8caa660

Copilot AI review requested due to automatic review settings February 13, 2026 15:33

Copilot started reviewing on behalf of poyrazK February 13, 2026 15:33 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 13, 2026

View reviewed changes

test: improve coverage for firecracker adapter

dd09303

test: maximize firecracker adapter coverage

3db3f9c

Copilot AI review requested due to automatic review settings February 13, 2026 20:56

Copilot started reviewing on behalf of poyrazK February 13, 2026 20:57 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

poyrazK added 2 commits February 14, 2026 00:06

test: add infrastructure setup tests to improve coverage

f4f2e7b

test: skip libvirt init in setup tests if environment is missing depe…

a2a3734

…ndencies

Copilot AI review requested due to automatic review settings February 13, 2026 21:13

Copilot started reviewing on behalf of poyrazK February 13, 2026 21:14 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

test: improve instance service coverage with more unit tests

29a8543

coderabbitai Bot reviewed Feb 14, 2026

View reviewed changes

poyrazK merged commit cb8f024 into main Feb 14, 2026
34 of 35 checks passed

poyrazK deleted the feature/firecracker-backend branch February 14, 2026 12:00

	needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt]
	needs: [lint, unit-tests, race-detection, integration-tests, test-libvirt, test-firecracker]

		// Wait for provision worker to finish (especially in mock mode)
		time.Sleep(5 * time.Second)

-		// Wait for provision worker to finish (especially in mock mode)
-		time.Sleep(5 * time.Second)
+		// Wait for provision worker to finish by polling instance status
+		deadline := time.Now().Add(60 * time.Second)
+		for {
+			if time.Now().After(deadline) {
+				t.Fatalf("instance %s did not reach RUNNING or ERROR status within timeout", instanceID)
+			}
+			statusReq, _ := http.NewRequest("GET", testutil.TestBaseURL+"/instances/"+instanceID, nil)
+			statusReq.Header.Set(testutil.TestHeaderAPIKey, token)
+			applyTenantHeader(t, statusReq, token)
+			statusResp, err := client.Do(statusReq)
+			require.NoError(t, err)
+			var statusRes struct {
+				Data domain.Instance `json:"data"`
+			}
+			func() {
+				defer statusResp.Body.Close()
+				require.NoError(t, json.NewDecoder(statusResp.Body).Decode(&statusRes))
+			}()
+			if statusRes.Data.Status == domain.StatusRunning {
+				break
+			}
+			if statusRes.Data.Status == domain.StatusError {
+				t.Fatalf("instance %s entered ERROR status during provisioning", instanceID)
+			}
+			time.Sleep(500 * time.Millisecond)
+		}

-	_ = os.MkdirAll(cfg.SocketDir, 0755)
+	if err := os.MkdirAll(cfg.SocketDir, 0o755); err != nil {
+		logger.Error("failed to create Firecracker socket directory", "dir", cfg.SocketDir, "err", err)
+		// Attempt to fall back to a safe default directory if a custom one fails.
+		const fallbackDir = "/tmp/firecracker"
+		if cfg.SocketDir != fallbackDir {
+			if errFallback := os.MkdirAll(fallbackDir, 0o755); errFallback != nil {
+				logger.Error("failed to create fallback Firecracker socket directory", "dir", fallbackDir, "err", errFallback)
+			} else {
+				logger.Info("using fallback Firecracker socket directory", "dir", fallbackDir)
+				cfg.SocketDir = fallbackDir
+			}
+		}
+	}

-	if err := m.Start(ctx); err != nil {
+	if err := m.Start(ctx); err != nil {
+		// Best-effort cleanup of the Firecracker socket if start fails to avoid
+		// leaking leftover socket files on the host filesystem.
+		if rmErr := os.Remove(socketPath); rmErr != nil && !os.IsNotExist(rmErr) {
+			a.logger.Warn("failed to remove Firecracker socket after start failure", "socket_path", socketPath, "error", rmErr)
+		}

	return nil
	return fmt.Errorf("firecracker not supported on this platform")

Conversation

poyrazK commented Feb 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes:

Verification:

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

poyrazK commented Feb 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 13, 2026 •

edited

Loading