Skip to content

fix: Resolve CI/CD failures, race conditions, and flaky tests#35

Merged
poyrazK merged 84 commits intomainfrom
feature/tenant-middleware-tests
Jan 27, 2026
Merged

fix: Resolve CI/CD failures, race conditions, and flaky tests#35
poyrazK merged 84 commits intomainfrom
feature/tenant-middleware-tests

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Jan 27, 2026

This PR resolves several critical issues causing CI/CD failures:

Fixes

  • Race Condition: Fixed a data race in pkg/sshutil/client.go where stdout and stderr writers were sharing a buffer.
  • Test Flake: Fixed TestWaitForSSHHostWithoutPortTimeout by using a reserved non-routable IP (192.0.2.1) to guarantee timeout behavior regardless of local environment configuration.
  • Logic Bugs: Fixed potential panics in mock matchers and corrected error type assertions in repository tests.
  • Performance: Optimized WaitForReady in node_executor.go to check readiness immediately before entering the poll loop.
  • Refactoring: Moved business logic out of handlers into services where appropriate to satisfy linter constraints.

Verification

  • All tests pass with go test -race ./...

Summary by CodeRabbit

  • New Features

    • Tenant scoping added across resources (instances, VPCs, security groups); VPC-aware database and cache creation; consistent cluster defaults (version/workers).
  • Bug Fixes

    • SSH client stdout/stderr race fixed; readiness check can short-circuit when already ready; improved robustness of several handlers and CLI outputs.
  • Tests

    • Large expansion of unit and e2e coverage across services, SDKs, and provisioning flows.
  • Chores

    • New end-to-end test target with configurable timeout.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 27, 2026 18:56
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

Adds tenant awareness to multiple domain objects and request flows, moves cluster defaulting from handlers to the service layer, adjusts SSH exec buffering, marks a DEK parameter unused, adds a Makefile test-e2e target with timeout, and substantially expands unit, repository, handler, SDK, and end-to-end test coverage across the codebase.

Changes

Cohort / File(s) Summary
Build & CLI
Makefile, cmd/cloud/storage.go
New TEST_TIMEOUT var and test-e2e target in Makefile; added headerCreatedAt constant and refactored versioning status parsing in storage CLI.
Domain tenant wiring
internal/core/services/instance.go, internal/core/services/security_group.go, internal/core/services/vpc.go, internal/core/domain/*
Populate TenantID from context when creating Instances, SecurityGroups, and VPCs (new TenantID usage in domain objects).
Cluster defaults (handler → service)
internal/handlers/cluster_handler.go, internal/core/services/cluster.go
Removed handler-level defaults; service applies defaults (defaultClusterVersion = "v1.29.0", defaultWorkerCount = 2).
Encryption service
internal/core/services/encryption_svc.go, internal/core/services/encryption_svc_test.go
getAndDecryptDEK marks versionID as unused (_); tests refactored with newService helper and expanded rotation/error tests.
SSH / exec buffering
pkg/sshutil/client.go, pkg/sshutil/client_test.go
Replaced single combined buffer with separate stdout/stderr buffers; tests expanded with in-test SSH server and many scenarios.
Instance service internals
internal/core/services/instance_internal_test.go, internal/core/services/instance.go
Added logger field to InstanceService in tests and populate TenantID on instance creation; added internal volume-resolution tests.
Cache creation ordering & VPC support
internal/core/services/cache.go, internal/core/services/cache_test.go
Resolve VPC network ID before repo create/launch container; added VPC-aware cache tests and error-path tests.
Load-balancer worker tests & dialer
internal/core/services/lb_worker_test.go
Added mock dialer and tests for port checks, health checks, and dial timeout behavior.
Storage service tests & helpers
internal/core/services/storage_test.go
New test helper for storage service config; broad multipart, versioning, presigned URL, encryption, and bucket validation tests added.
Extensive service test expansions
internal/core/services/*.go
Large additions/expansions of unit tests across autoscaling, cluster, cluster lifecycle, database, subnet, stack, volume, security group, secret, tenant, vpc and others covering error paths, rollbacks, backoff, quotas, and VPC flows.
Handlers test suites
internal/handlers/{cluster_handler_test,lifecycle_handler_test,tenant_handler_test}.go
New comprehensive HTTP handler tests with mocks, route constants, renames, and many success/error scenarios.
Kubernetes provisioner & executor tests
internal/repositories/k8s/{export_test,ha_test,health_test,lifecycle_test,node_executor_test,provisioner_all_test}.go
New HA/health/lifecycle/node/provisioner tests and small exported test wrappers.
Postgres repository tests
internal/repositories/postgres/*_test.go
New pgxmock-based tests for cluster, encryption, lifecycle, tenant, and VPC repos (one type-assertion tweak).
Workers & lifecycle tests
internal/workers/{cluster_worker_test,lifecycle_worker_test,metrics_worker_test}.go
Added worker tests for deprovision/upgrade, lifecycle deletions, and metrics collection.
Storage node tests
internal/storage/node/{gossip_test,rpc_test,store_test}.go
Added gossip heartbeat test, RPC store/delete cases, and local-store mtime fallback & invalid-path tests.
SDK client tests
pkg/sdk/*.go
Broad error-path tests for many SDK clients (auth, autoscaling, cache, compute, database, events, functions, iac, kubernetes, lb, queue, secret, snapshot, storage, volume, vpc).
Utilities & middleware tests
pkg/crypto/*, pkg/httputil/httputil_test.go, pkg/ratelimit/limiter_unit_test.go, pkg/tracing/tracing_test.go
Added crypto error-path tests, SSH key generation test, RBAC mock/authorization tests, rate-limiter key/IP tests, and env fallback test.
End-to-end tests & test helpers
tests/*, tests/helpers_test.go
Tenant header propagation (applyTenantHeader) added widely, tenant create/switch helpers, RBAC skip logic, reduced timeouts (e.g., 2m→30s with skip), minor path/constants refactors and test renames.
Minor behavior / control flow tweaks
internal/repositories/k8s/node_executor.go, pkg/sshutil/client.go, others
Early short-circuit in WaitForReady (exec "echo ready"); improved SSH output aggregation; simple signature/parameter usage tweaks.

Sequence Diagram(s)

(omitted — changes are broad test additions and many small functional tweaks without a single new multi-component runtime flow requiring diagramming)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰
I hopped through tests both large and small,
Tenant IDs tucked in, defaults moved down the hall.
Buffers split, keys rotated, and timeouts set neat,
Hundreds of assertions — coverage tastes sweet. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: Resolve CI/CD failures, race conditions, and flaky tests' accurately and specifically summarizes the main objective of the PR, which is to fix multiple issues causing CI/CD failures, race conditions, and test flakiness.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR primarily improves test robustness and coverage while tightening multi-tenancy and error-handling semantics, and fixing a race in SSH command execution that was causing CI flakes. It also moves some configuration defaults and tenant-awareness from handlers into service and repository layers to better align with the domain model.

Changes:

  • Introduced tenant-aware HTTP test helpers and updated E2E tests to consistently send X-Tenant-ID, exercising multitenant authorization paths.
  • Fixed an SSH client race by separating stdout/stderr buffers, added SSH context/timeout tests, and optimized node executor readiness checks.
  • Expanded unit tests across SDK clients, repositories, workers, handlers, encryption, RBAC, and core services (VPC, subnet, volume, cache, database, cluster, autoscaling, load balancer, secrets), including many negative-path and rollback scenarios.
  • Propagated TenantID into VPC, security group, and instance creation at the service layer, with repository queries already filtering by tenant, and moved cluster defaulting logic from the handler into ClusterService.
  • Added lifecycle HTTP handler tests, metrics tests for new storage metrics, CLI table header constants, and a test-e2e Makefile target with a configurable timeout.

Reviewed changes

Copilot reviewed 85 out of 85 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/storage_e2e_test.go Ensures all storage E2E calls apply tenant header via applyTenantHeader, aligning tests with multitenant auth.
tests/security_edge_test.go Refactors instances path constant and adds tenant header when user B accesses user A’s instance to exercise tenant scoping.
tests/rbac_e2e_test.go Introduces shared skip message and skips RBAC subtests if role creation is forbidden, reducing CI flakiness.
tests/multitenancy_e2e_test.go Adds formatted instances path, applies tenant headers consistently, and makes delete helpers tenant-aware.
tests/kubernetes_e2e_test.go Applies tenant headers to kubeconfig, delete, VPC and cluster helper requests.
tests/helpers_test.go Adds multitenant-aware test helpers (createTenant, switchTenant, tenant map with RWMutex, applyTenantHeader, and requiresTenantHeader); auto-injects tenant IDs for non-auth routes.
tests/functions_e2e_test.go Ensures function creation and invocation E2E tests send X-Tenant-ID.
tests/edge_cases_test.go Replaces helper-based malformed JSON sender with explicit request that sets tenant header and asserts 400 Bad Request.
tests/data_integrity_test.go Applies tenant header to invalid content-type and VPC patch requests for data integrity scenarios.
tests/compute_e2e_test.go Shortens wait-for-running timeout and skips when backend is unavailable; relaxes expected statuses for logs and stop to tolerate backend variance.
pkg/tracing/tracing_test.go Adds tests for envOr behavior when env var is set or missing.
pkg/sshutil/client_test.go Greatly expands SSH client tests: timeouts, host-without-port, non-routable IP behavior, context cancellation, mock SSH server for successful Run/WriteFile, and SCP error handling.
pkg/sshutil/client.go Fixes race by using separate stdout/stderr buffers, combining them only when returning, and improving error output aggregation.
pkg/sdk/vpc_test.go Adds test verifying client methods return errors on non-2xx responses from VPC endpoints.
pkg/sdk/volume_test.go Adds negative-path tests for the volume client (server errors and request errors).
pkg/sdk/storage_test.go Extensively tests storage SDK: error paths, versioned download/delete, bucket CRUD, versioning, cluster status, presigned URLs, lifecycle rules, and error handling.
pkg/sdk/snapshot_test.go Adds error-path tests for snapshot client methods.
pkg/sdk/security_group_test.go Adds tests for listing security groups without VPC, error paths, removing rules, detaching security groups, and SG creation error handling.
pkg/sdk/secret_test.go Adds tests ensuring secret SDK methods error on non-2xx responses.
pkg/sdk/queue_test.go Adds queue SDK error-path coverage for all queue operations.
pkg/sdk/loadbalancer_test.go Adds LB SDK tests covering various error responses.
pkg/sdk/kubernetes_test.go New test suite for Kubernetes SDK: cluster CRUD, kubeconfig (with and without role), maintenance actions, health, and error handling.
pkg/sdk/iac_test.go Adds IaC SDK error-path coverage for stack CRUD and template validation.
pkg/sdk/function_test.go Adds function SDK error and request-error coverage for all operations.
pkg/sdk/events_test.go Adds event listing error-path test.
pkg/sdk/database_test.go Adds database SDK tests for error responses across CRUD and connection string retrieval.
pkg/sdk/compute_test.go Adds error-path coverage for GetInstanceLogs, stats, GetInstance, and LaunchInstance.
pkg/sdk/client_test.go New tests for low-level client helpers (get, post, delete, put, patch) error and status-error behavior, and header initialization.
pkg/sdk/cache_test.go Adds cache SDK tests covering error responses for all operations.
pkg/sdk/autoscaling_test.go Adds autoscaling SDK error-path coverage and request-error coverage for all methods.
pkg/sdk/auth_test.go New tests for auth key creation success, error statuses, and request errors in the SDK.
pkg/ratelimit/limiter_unit_test.go Adds tests ensuring middleware uses API key prefix for rate limiting when present and falls back to client IP otherwise.
pkg/httputil/httputil_test.go Extends tests for permission middleware, tenant context helpers, auth with invalid tenant headers or missing membership, RequireTenant, TenantMember middleware behavior, and RBAC authorization flows.
pkg/crypto/ssh_test.go Adds test for SSH key pair generation format (private key PEM and public key prefix).
pkg/crypto/crypto_test.go Adds negative tests for invalid key length in Encrypt and malformed ciphertext in Decrypt.
internal/workers/metrics_worker_test.go New tests for metrics collector worker run loop, error handling, and node status counting using a fake storage service.
internal/workers/lifecycle_worker_test.go New tests for lifecycle worker rule processing (expired deletions, list/repo/delete error paths) using fakes.
internal/workers/cluster_worker_test.go Adds deprovision, deprovision error, upgrade, and “cluster not found” job processing tests for cluster worker.
internal/storage/node/store_test.go Adds tests for LocalStore read fallback to mtime when metadata is missing and rejection of absolute paths.
internal/storage/node/rpc_test.go Extends RPC server tests with gossip member population, store error handling for invalid keys, and delete-on-missing-key behavior.
internal/storage/node/gossip_test.go Adds test ensuring gossip ignores older heartbeats and preserves newer member state.
internal/repositories/postgres/vpc_repo_unit_test.go Fixes error type assertion to value type errors.Error when checking NotFound; keeps other tests intact.
internal/repositories/postgres/tenant_repo_test.go New comprehensive tests for tenant repository CRUD, membership, user tenants, quotas, and various exec/query error paths using pgxmock.
internal/repositories/postgres/lifecycle_repo_unit_test.go Adds lifecycle repository tests (create, get, delete) honoring user context via appcontext.
internal/repositories/postgres/encryption_repo_unit_test.go New tests for encryption key save and retrieval logic.
internal/repositories/postgres/cluster_repo_unit_test.go Adds tests for cluster repository Create, GetByID, and cluster node add/list operations.
internal/repositories/k8s/provisioner_all_test.go New black-box tests around kubeadm provisioner: SG creation, node creation, health/repair, kubeconfig retrieval, and scale-down behavior, via exported helpers.
internal/repositories/k8s/node_executor_test.go Tests ServiceExecutor.Run and WaitForReady success/timeout behavior against mocked instance service.
internal/repositories/k8s/node_executor.go Optimizes ServiceExecutor.WaitForReady by adding an immediate readiness check before entering the poll loop, using a timeout-derived context.
internal/repositories/k8s/lifecycle_test.go Adds tests for kubeadm provisioner upgrade, secret rotation, backup creation, and restore flows, using a mocked storage service.
internal/repositories/k8s/health_test.go Introduces mocks and tests for kubeadm provisioner health reporting and kubeconfig retrieval, including error/APIServer-down paths.
internal/repositories/k8s/ha_test.go Tests parsing of join commands and kubeconfig storage (including encrypt/update failures) for HA clusters.
internal/repositories/k8s/export_test.go Exposes selected internal provisioner methods for testing (ExportEnsureClusterSecurityGroup, ExportCreateNode).
internal/platform/metrics_test.go Extends metrics smoke test to cover new storage-related Prometheus metrics.
internal/handlers/tenant_handler_test.go Refactors tenant handler tests to use route constants and adds more targeted test names while exercising error paths.
internal/handlers/lifecycle_handler_test.go New tests for lifecycle HTTP handler create/list/delete flows with success and error conditions.
internal/handlers/lifecycle_handler.go Lifecycle handler implementation using LifecycleService and shared HTTP utility helpers; unchanged logic now covered by tests.
internal/handlers/cluster_handler.go Moves defaulting of cluster version/worker count out of handler into service layer (comment updated accordingly).
internal/core/services/vpc_test.go Expands VPC service tests to cover DB rollback on bridge creation failures, default CIDR behavior, delete logic on various error conditions, and delete-by-name.
internal/core/services/vpc.go Adds TenantID propagation from context into created domain.VPC.
internal/core/services/volume_test.go Adds tests for volume creation storage/repo error handling, delete edge cases (storage error but continuing, repo errors, name-based deletion), and release-volumes error/partial-failure handling.
internal/core/services/volume.go (unchanged shown, but exercised) Behavior around storage/delete rollback, repo errors, and release-of-volumes now verified by tests.
internal/core/services/subnet_test.go Adds error-path tests for subnet creation (repo/vpc errors, invalid CIDRs) and delete/get-by-name behavior.
internal/core/services/stack_test.go Adds rollback tests verifying deletion of all resource types and behavior when rollback fails or listing resources errors.
internal/core/services/security_group_test.go Adds test verifying detach-from-instance removes network flow rules as well as repo bindings.
internal/core/services/security_group.go Propagates TenantID from context into newly created security groups.
internal/core/services/secret_test.go Adds encrypt/decrypt round-trip and invalid ciphertext tests, plus ctor behavior for default/short keys.
internal/core/services/rbac_cached_test.go Adds cache-miss and error-path tests for cached RBAC service HasPermission.
internal/core/services/lb_worker_test.go Adds tests for isPortOpen via a mock PortDialer, target health updates, and real dialer timeout behavior.
internal/core/services/lb_worker.go Uses injectable PortDialer (realDialer in production) to facilitate testing isPortOpen and health checks.
internal/core/services/instance_test.go Adds tests for instance provisioning network errors (status updates), metrics updates on terminate, and Exec behavior in normal and error cases.
internal/core/services/instance_internal_test.go Adds internal tests for resolving and updating volumes after instance launch, using injected logger and volume repo mock.
internal/core/services/instance.go Propagates TenantID from context into new instances at launch time.
internal/core/services/encryption_svc_test.go Refactors encryption service tests to reinitialize per subtest; adds RotateKey, repo-error, decrypt-invalid-data, SaveKey error, and invalid-constructor tests.
internal/core/services/encryption_svc.go Adjusts getAndDecryptDEK signature to ignore version ID parameter (still using latest key via repo), with tests covering error paths and malformed ciphertext.
internal/core/services/database_test.go Adds database service tests for VPC-aware creation, invalid engines, VPC resolution errors, and MySQL engine variant.
internal/core/services/cluster_test.go Adds tests for ClusterService create failure modes, listing, kubeconfig retrieval (admin/viewer), repair, scale, health, secret rotation, and restore preconditions.
internal/core/services/cluster_lifecycle_test.go Extends cluster lifecycle tests for upgrade validations, enqueue failures, backup when not running, and failure handling.
internal/core/services/cluster.go Moves defaulting of Kubernetes version and worker count into service-level CreateCluster.
internal/core/services/cache_test.go Adds cache service tests for VPC-aware creation and VPC resolution errors.
internal/core/services/lb_worker.go (already covered) Health-check logic now explicitly testable via injected dialer.
internal/core/services/autoscaling_worker_test.go Adds tests for cleanup of groups with instances, failure backoff behavior, bound adjustments for desired counts, and more nuanced Evaluate paths.
internal/core/services/autoscaling_worker.go Existing logic for failure backoff and desired-count normalization is now exercised by new tests.
internal/core/context/context_test.go Adds tests for invalid-type entries for user and tenant IDs.
internal/core/context/context.go Defines WithTenantID / TenantIDFromContext (already used in repos/services, now with extra tests).
internal/core/services/tenant_test.go Significantly expands tenant service tests across creation flows, slug clashes, membership management, default tenant behavior, invites, switching, quota checks, and error paths.
internal/core/services/stack.go (existing) Rollback handling is now tested for multi-type resource deletion and list-failure behavior.
internal/core/services/database.go (existing) VPC-aware DB creation logic is now exercised by tests.
internal/api/setup/router_test.go Refactors OVS health check tests to use a shared path constant and adds more descriptive test names.
cmd/cloud/storage.go Extracts "CREATED AT" header label into a constant and refactors storage versioning subcommand status parsing to a switch for clarity.
Makefile Adds TEST_TIMEOUT variable and test-e2e target running go test against ./tests/... with configurable timeout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
internal/repositories/k8s/node_executor.go (1)

76-93: Use the timeout context for readiness checks to avoid blocking past deadline.

ctxTimeout is created but e.Run uses the parent ctx, so a blocking Exec can outlive the timeout and stall the loop. Pass ctxTimeout to all readiness checks so the timeout is enforced. As per coding guidelines, propagate context.Context to all blocking calls.

🛠️ Proposed fix
-	// Check immediately once
-	if _, err := e.Run(ctx, "echo ready"); err == nil {
+	// Check immediately once
+	if _, err := e.Run(ctxTimeout, "echo ready"); err == nil {
 		return nil
 	}
@@
-		case <-ticker.C:
-			_, err := e.Run(ctx, "echo ready")
+		case <-ticker.C:
+			_, err := e.Run(ctxTimeout, "echo ready")
 			if err == nil {
 				return nil
 			}
 		}
 	}
internal/core/services/instance.go (1)

92-97: Enforce tenant context before persisting instance

Line 96 accepts TenantIDFromContext(ctx), which returns uuid.Nil when missing/invalid; that can silently create unscoped instances. Consider rejecting the request when tenant id is absent.

🔧 Suggested fix
-	// 2. Create domain entity
-	inst := &domain.Instance{
-		ID:        uuid.New(),
-		UserID:    appcontext.UserIDFromContext(ctx),
-		TenantID:  appcontext.TenantIDFromContext(ctx),
+	// 2. Create domain entity
+	tenantID := appcontext.TenantIDFromContext(ctx)
+	if tenantID == uuid.Nil {
+		return nil, errors.New(errors.InvalidInput, "tenant id missing from context")
+	}
+	inst := &domain.Instance{
+		ID:        uuid.New(),
+		UserID:    appcontext.UserIDFromContext(ctx),
+		TenantID:  tenantID,
 		Name:      name,
 		Image:     image,
internal/core/services/vpc.go (1)

75-86: Guard against missing TenantID when creating VPCs.

If TenantIDFromContext(ctx) returns uuid.Nil, the VPC will be created without tenant isolation, undermining multi-tenancy guarantees. Add a guard at the service layer:

🔒 Proposed guard for missing TenantID
+	tenantID := appcontext.TenantIDFromContext(ctx)
+	if tenantID == uuid.Nil {
+		return nil, errors.New(errors.Unauthorized, "tenant id required in context")
+	}
 	vpc := &domain.VPC{
 		ID:        vpcID,
 		UserID:    userID,
-		TenantID:  appcontext.TenantIDFromContext(ctx),
+		TenantID:  tenantID,
 		Name:      name,
 		CIDRBlock: cidrBlock,
tests/security_edge_test.go (1)

61-63: Handle JSON decode error to avoid misleading test failures.

If the decode fails (e.g., server returns unexpected response), instID will be empty, and the subsequent cross-tenant test will fail with an unrelated error message.

Suggested fix
 var w Wrapper
-_ = json.NewDecoder(respA.Body).Decode(&w)
+if err := json.NewDecoder(respA.Body).Decode(&w); err != nil {
+	t.Fatalf("Failed to decode response: %v", err)
+}
 instID := w.Data.ID
+if instID == "" {
+	t.Fatal("Instance ID not returned from create")
+}
tests/compute_e2e_test.go (1)

66-85: Close response bodies before failing the decode.

If decoding fails, the body isn’t closed because require.NoError aborts immediately. Close the body before asserting to avoid leaks even on error paths.

🐛 Suggested fix
-			require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))
-			resp.Body.Close()
+			err := json.NewDecoder(resp.Body).Decode(&res)
+			resp.Body.Close()
+			require.NoError(t, err)
🤖 Fix all issues with AI agents
In `@internal/core/services/cache_test.go`:
- Around line 134-144: The cache creation flow in svc.CreateCache currently
calls repo.Create before resolving the VPC (resolveNetworkID), which can leave
an orphaned Cache record if VPC lookup fails; fix by either (A) moving VPC
validation (resolveNetworkID/VPCRepo.GetByID) to occur before calling
repo.Create so no record is created unless the VPC exists, or (B) if you must
create first, add compensating cleanup: on resolveNetworkID or VPCRepo.GetByID
error call repo.Delete (or equivalent) to remove the created *domain.Cache
(mirror the existing cleanup used on container launch failure), and ensure
errors are returned after cleanup; update tests accordingly (e.g.,
TestCreateCacheVpcError) to expect no orphaned record.

In `@internal/core/services/cluster.go`:
- Around line 63-69: Replace the magic literals in cluster setup with named
constants: define constants like defaultVersion = "v1.29.0" and defaultWorkers =
2 (e.g., at top of the file or in an appropriate constants block) and use those
constants instead of the hard-coded strings/numbers when setting params.Version
and params.Workers in the cluster initialization code; update any related
comments to reference the constants and ensure imports/build compile if you
place them in a different package or file.

In `@internal/handlers/lifecycle_handler_test.go`:
- Around line 50-60: The setupLifecycleHandlerTest helper does not assert mock
expectations, allowing false positives; update setupLifecycleHandlerTest to
register a cleanup that calls svc.AssertExpectations(t) (use t.Cleanup to run
it) so mockLifecycleService expectations are verified after each test, and
refactor TestLifecycleHandlerCreateRule into a table-driven test using subtests
to consolidate cases while relying on the updated setupLifecycleHandlerTest for
automatic expectation assertion.

In `@internal/repositories/k8s/provisioner_all_test.go`:
- Around line 46-54: The test registers mocked calls on sgSvc in the
"SecurityGroupOps" subtest (prepared by prepareProvisioner) but never asserts
the mock expectations, so .Once() and other expectations are not enforced; after
invoking p.ExportEnsureClusterSecurityGroup(ctx, cluster) and asserting no
error, call sgSvc.AssertExpectations(t) (or the appropriate testify mock
assertion method) to enforce the mocked call expectations; apply the same
pattern to the other subtests that set expectations (use each mock instance's
AssertExpectations(t) at the end of its subtest).

In `@internal/storage/node/rpc_test.go`:
- Around line 85-99: The test TestRPCServerStoreError currently ignores the
error returned by NewLocalStore; update the setup to check that NewLocalStore
succeeded by capturing its error and calling require.NoError(t, err) (or
require.NoErrorf) before using the store, so replace the blank identifier with a
proper err variable and assert no error from NewLocalStore and then proceed to
call NewRPCServer(store, nil).
- Around line 101-112: The test TestRPCServerDeleteMissing is discarding the
error returned by NewLocalStore (store, _ := NewLocalStore(tmpDir)); change this
to capture and check the error (e.g., store, err := NewLocalStore(tmpDir) and
require.NoError(t, err) or t.Fatalf on err) before using store to construct
NewRPCServer so the test fails loudly on setup errors rather than silently
proceeding with a nil/invalid store.

In `@internal/workers/metrics_worker_test.go`:
- Around line 87-108: The test TestMetricsCollectorWorker_Run uses fixed magic
durations (interval 10ms and time.Sleep(35ms) ) which makes it flaky; change it
to poll with a timeout using named constants instead of a fixed sleep: define
consts like pollInterval and testTimeout, start worker with
MetricsCollectorWorker.Run, then loop/select on a time.After(testTimeout) and
time.Tick(pollInterval) (or use context.WithTimeout) checking
fakeStorageService.StatusCalls() until it becomes >0, cancel and wait; update
references to the interval literal and replace the single time.Sleep call in
TestMetricsCollectorWorker_Run and the assertion to use this deterministic
polling approach.

In `@pkg/sdk/iac_test.go`:
- Around line 141-163: TestClient_IacErrors currently repeats the same error
assertions for five API calls; convert it to a table-driven test by creating a
slice of test cases (each with a name and a function that invokes one of the
client methods and returns an error) and then iterate over them with t.Run,
calling the function and asserting error for each case. Keep the existing
httptest server and client creation (NewClient) intact, and reference the client
methods CreateStack, ListStacks, GetStack, DeleteStack, and ValidateTemplate in
the test-case wrappers so each case calls the appropriate method and returns its
error for assertion.

In `@pkg/sdk/vpc_test.go`:
- Around line 97-115: Refactor TestClient_VPCErrors into a table-driven test:
create the httptest.Server and client once (using NewClient), then define a
slice of test cases each with a name and a function/identifier for the API call
to exercise (ListVPCs, CreateVPC, GetVPC, DeleteVPC) plus any needed args (e.g.,
name/CIDR). Iterate over the cases with t.Run(name, func(t *testing.T){ ... })
and call the appropriate method inside the loop, asserting that an error is
returned for each case; this removes the duplicated server/client setup and
consolidates the four assertions into a single table-driven loop.

In `@pkg/sshutil/client_test.go`:
- Around line 77-97: The tests TestWaitForSSHTimeout,
TestWaitForSSHContextCanceled, and TestWaitForSSHHostWithoutPortTimeout
currently assert on error message substrings; instead, change them to assert
error types using errors.Is-compatible helpers: in TestWaitForSSHTimeout and
TestWaitForSSHHostWithoutPortTimeout replace the assert.Contains check with
assert.ErrorIs(t, err, context.DeadlineExceeded), and in
TestWaitForSSHContextCanceled replace the assert.Contains check with
assert.ErrorIs(t, err, context.Canceled); keep the rest of each test (client
creation, timeouts/cancel) unchanged so the wrapped ctx.Err() is verified by
type rather than string.

In `@tests/compute_e2e_test.go`:
- Line 123: The test currently allows http.StatusInternalServerError as an
acceptable response which hides server failures; update the assertion in
tests/compute_e2e_test.go that checks resp.StatusCode (the line with
assert.Contains(t, []int{...}, resp.StatusCode)) to remove
http.StatusInternalServerError so only expected successful/known-client outcomes
remain (e.g., http.StatusOK and http.StatusConflict) and re-run the test to
ensure failures surface correctly.
🟡 Minor comments (15)
internal/repositories/postgres/lifecycle_repo_unit_test.go-65-77 (1)

65-77: Handle error and verify mock expectations.

Same issues as other subtests.

Proposed fix
 	t.Run("Delete", func(t *testing.T) {
-		mock, _ := pgxmock.NewPool()
+		mock, err := pgxmock.NewPool()
+		assert.NoError(t, err)
 		defer mock.Close()
 		repo := NewLifecycleRepository(mock)
 		id := uuid.New()

 		mock.ExpectExec("DELETE FROM lifecycle_rules").
 			WithArgs(id, userID).
 			WillReturnResult(pgxmock.NewResult("DELETE", 1))

-		err := repo.Delete(ctx, id)
+		err = repo.Delete(ctx, id)
 		assert.NoError(t, err)
+		assert.NoError(t, mock.ExpectationsWereMet())
 	})

As per coding guidelines, silent failures should be avoided.

internal/repositories/postgres/lifecycle_repo_unit_test.go-26-47 (1)

26-47: Handle pgxmock.NewPool() error and verify mock expectations.

Two issues:

  1. The error from pgxmock.NewPool() is silently ignored, violating the coding guideline against blank identifier assignments.
  2. Missing mock.ExpectationsWereMet() call - without this, the test passes even if the expected SQL wasn't executed.
Proposed fix
 	t.Run("Create", func(t *testing.T) {
-		mock, _ := pgxmock.NewPool()
+		mock, err := pgxmock.NewPool()
+		assert.NoError(t, err)
 		defer mock.Close()
 		repo := NewLifecycleRepository(mock)
 		rule := &domain.LifecycleRule{
 			ID:             uuid.New(),
 			UserID:         userID,
 			BucketName:     bucketName,
 			Prefix:         testLifecyclePrefix,
 			ExpirationDays: 30,
 			Enabled:        true,
 			CreatedAt:      time.Now(),
 			UpdatedAt:      time.Now(),
 		}

 		mock.ExpectExec("INSERT INTO lifecycle_rules").
 			WithArgs(rule.ID, rule.UserID, rule.BucketName, rule.Prefix, rule.ExpirationDays, rule.Enabled, rule.CreatedAt, rule.UpdatedAt).
 			WillReturnResult(pgxmock.NewResult("INSERT", 1))

-		err := repo.Create(ctx, rule)
+		err = repo.Create(ctx, rule)
 		assert.NoError(t, err)
+		assert.NoError(t, mock.ExpectationsWereMet())
 	})

As per coding guidelines, silent failures should be avoided.

internal/repositories/postgres/lifecycle_repo_unit_test.go-49-63 (1)

49-63: Handle error, verify expectations, and assert returned field values.

Same issues as Create test, plus: the test only checks that rule is not nil but doesn't verify that the returned fields match the expected mock data.

Proposed fix
 	t.Run("Get", func(t *testing.T) {
-		mock, _ := pgxmock.NewPool()
+		mock, err := pgxmock.NewPool()
+		assert.NoError(t, err)
 		defer mock.Close()
 		repo := NewLifecycleRepository(mock)
 		id := uuid.New()

 		mock.ExpectQuery(selectLifecycleRules).
 			WithArgs(id, userID).
 			WillReturnRows(pgxmock.NewRows([]string{"id", "user_id", "bucket_name", "prefix", "expiration_days", "enabled", "created_at", "updated_at"}).
 				AddRow(id, userID, bucketName, testLifecyclePrefix, 30, true, time.Now(), time.Now()))

-		rule, err := repo.Get(ctx, id)
+		rule, err := repo.Get(ctx, id)
 		assert.NoError(t, err)
 		assert.NotNil(t, rule)
+		assert.Equal(t, id, rule.ID)
+		assert.Equal(t, userID, rule.UserID)
+		assert.Equal(t, bucketName, rule.BucketName)
+		assert.Equal(t, testLifecyclePrefix, rule.Prefix)
+		assert.Equal(t, 30, rule.ExpirationDays)
+		assert.True(t, rule.Enabled)
+		assert.NoError(t, mock.ExpectationsWereMet())
 	})
internal/core/services/lb_worker_test.go-294-294 (1)

294-294: Handle the error from conn.Close().

This violates the coding guideline: "Do not use silent failures - avoid blank identifier assignment like _ = someFunc()".

🔧 Proposed fix
 	conn, err := d.DialTimeout("tcp", "127.0.0.1:0", 10*time.Millisecond)
 	if err == nil {
-		_ = conn.Close()
+		if closeErr := conn.Close(); closeErr != nil {
+			t.Logf("failed to close connection: %v", closeErr)
+		}
 	}
 	assert.Error(t, err)

As per coding guidelines, avoid using the blank identifier for error assignment.

internal/core/services/lb_worker_test.go-26-26 (1)

26-26: Handle the error from conn2.Close().

This violates the coding guideline: "Do not use silent failures - avoid blank identifier assignment like _ = someFunc()". Even in test mocks, errors should be handled explicitly.

🔧 Proposed fix
 	conn1, conn2 := net.Pipe()
-	_ = conn2.Close()
+	if err := conn2.Close(); err != nil {
+		return nil, err
+	}
 	return conn1, nil

As per coding guidelines, avoid using the blank identifier for error assignment.

internal/repositories/postgres/tenant_repo_test.go-43-70 (1)

43-70: Move expectations verification to a deferred check at the end of the test

Calling ExpectationsWereMet() mid-test leaves the second expectation unverified. If the second GetByID() call fails to match the registered expectation, the test will pass silently. Use a single deferred check to verify all expectations.

Suggested fix
 func TestTenantRepo_GetByID(t *testing.T) {
 	mock, err := pgxmock.NewPool()
 	require.NoError(t, err)
 	defer mock.Close()
+	defer func() { assert.NoError(t, mock.ExpectationsWereMet()) }()
 
 	repo := NewTenantRepo(mock)
 	id := uuid.New()
 	now := time.Now()
 
 	mock.ExpectQuery(`SELECT id, name, slug, owner_id, plan, status, created_at, updated_at FROM tenants WHERE id = \$1`).
 		WithArgs(id).
 		WillReturnRows(pgxmock.NewRows([]string{"id", "name", "slug", "owner_id", "plan", "status", "created_at", "updated_at"}).
 			AddRow(id, "Tenant", "tenant", uuid.New(), "free", "active", now, now))
 
 	tenant, err := repo.GetByID(context.Background(), id)
 	assert.NoError(t, err)
 	assert.NotNil(t, tenant)
 	assert.Equal(t, id, tenant.ID)
-	assert.NoError(t, mock.ExpectationsWereMet())
 
 	mock.ExpectQuery(`SELECT id, name, slug, owner_id, plan, status, created_at, updated_at FROM tenants WHERE id = \$1`).
 		WithArgs(id).
 		WillReturnError(pgx.ErrNoRows)
 
 	tenant, err = repo.GetByID(context.Background(), id)
 	assert.Error(t, err)
 	assert.Nil(t, tenant)
 }
internal/repositories/postgres/cluster_repo_unit_test.go-53-55 (1)

53-55: Add mock expectation assertions to verify all expectations were met.

Each test case sets mock expectations via ExpectExec() and ExpectQuery(), but never validates them with mock.ExpectationsWereMet(). Without this assertion, tests can pass even if the repository skipped expected queries. Add the validation at the end of each test case.

Suggested fix
 		err := repo.Create(ctx, cluster)
 		assert.NoError(t, err)
+		assert.NoError(t, mock.ExpectationsWereMet())

Also applies to: 67-70, 91-93

internal/repositories/postgres/cluster_repo_unit_test.go-26-27 (1)

26-27: Capture and handle pgxmock.NewPool() errors instead of discarding them.

The error is discarded with the blank identifier, which violates the no-silent-failures guideline. If NewPool() fails, mock will be nil and cause a panic on subsequent operations.

Suggested fix
 import (
 	"context"
 	"testing"
 	"time"

 	"github.com/google/uuid"
 	"github.com/pashagolub/pgxmock/v3"
 	appcontext "github.com/poyrazk/thecloud/internal/core/context"
 	"github.com/poyrazk/thecloud/internal/core/domain"
 	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
 )
@@
-		mock, _ := pgxmock.NewPool()
+		mock, err := pgxmock.NewPool()
+		require.NoError(t, err)

Also applies to: 58-59, 73-74

internal/core/services/cluster_lifecycle_test.go-90-124 (1)

90-124: Add AssertExpectations calls and consider converting to table-driven tests

The three upgrade subtests (lines 93, 103, 118) and the backup subtest (line 176) set .Once() mock expectations but never call AssertExpectations, allowing false passes if the mocked calls don't execute. Also, these cases are repetitive and should be table-driven per the coding guidelines.

🔧 Suggested fix for missing assertions
 	t.Run("fails with invalid target version", func(t *testing.T) {
 		cluster.Status = domain.ClusterStatusRunning
 		cluster.Version = testK8sBase
 		repo.On("GetByID", mock.Anything, id).Return(cluster, nil).Once()
 
 		err := svc.UpgradeCluster(ctx, id, "not-a-version")
 		assert.Error(t, err)
 		assert.Contains(t, err.Error(), "invalid target version")
+		repo.AssertExpectations(t)
 	})
 
 	t.Run("allows upgrade with invalid current version", func(t *testing.T) {
 		cluster.Status = domain.ClusterStatusRunning
 		cluster.Version = "bad-current"
 		repo.On("GetByID", mock.Anything, id).Return(cluster, nil).Once()
 		repo.On("Update", mock.Anything, mock.MatchedBy(func(c *domain.Cluster) bool {
 			return c.Status == domain.ClusterStatusUpgrading
 		})).Return(nil).Once()
 		taskQueue.On("Enqueue", mock.Anything, "k8s_jobs", mock.MatchedBy(func(job domain.ClusterJob) bool {
 			return job.Type == domain.ClusterJobUpgrade && job.ClusterID == id && job.Version == testK8sVersion
 		})).Return(nil).Once()
 
 		err := svc.UpgradeCluster(ctx, id, testK8sVersion)
 		assert.NoError(t, err)
+		repo.AssertExpectations(t)
+		taskQueue.AssertExpectations(t)
 	})
 
 	t.Run("fails when enqueue fails", func(t *testing.T) {
 		cluster.Status = domain.ClusterStatusRunning
 		cluster.Version = testK8sBase
 		repo.On("GetByID", mock.Anything, id).Return(cluster, nil).Once()
 		repo.On("Update", mock.Anything, mock.Anything).Return(nil).Once()
 		taskQueue.On("Enqueue", mock.Anything, "k8s_jobs", mock.Anything).Return(assert.AnError).Once()
 
 		err := svc.UpgradeCluster(ctx, id, testK8sVersion)
 		assert.Error(t, err)
+		repo.AssertExpectations(t)
+		taskQueue.AssertExpectations(t)
 	})

Also applies to: line 176 (CreateBackup "fails when not running" test)

internal/core/services/rbac_cached_test.go-144-184 (1)

144-184: Add mock expectation assertions to ensure HasPermission is called.

The tests set .Once() expectations but don't assert them with mockSvc.AssertExpectations(t), so they can pass even if HasPermission isn't called. Add defer mockSvc.AssertExpectations(t) to both test functions to catch silent failures. Additionally, consider converting these two similar tests into a single table-driven test as per the coding guidelines for test files.

Suggested fix
 func TestCachedRBACService_HasPermissionCacheMiss(t *testing.T) {
 	mockSvc, cache, mr := setupCachedRBACTest(t)
 	defer mr.Close()
+	defer mockSvc.AssertExpectations(t)
 
 	logger := slog.New(slog.NewTextHandler(io.Discard, nil))
 	svc := services.NewCachedRBACService(mockSvc, cache, logger)
 
 func TestCachedRBACService_HasPermissionError(t *testing.T) {
 	mockSvc, cache, mr := setupCachedRBACTest(t)
 	defer mr.Close()
+	defer mockSvc.AssertExpectations(t)
 
 	logger := slog.New(slog.NewTextHandler(io.Discard, nil))
 	svc := services.NewCachedRBACService(mockSvc, cache, logger)
pkg/sshutil/client_test.go-276-297 (1)

276-297: Close the connection on ssh.NewServerConn failure.

When ssh.NewServerConn fails, the underlying net.Conn is not closed, leaking resources during stress test runs.

Suggested fix
 sshConn, chans, reqs, err := ssh.NewServerConn(conn, config)
 if err != nil {
+	_ = conn.Close()
 	return
 }
internal/handlers/cluster_handler_test.go-79-100 (1)

79-100: Remove no-op blank identifier assignments in mocks.

These _ = ... lines are redundant and violate the “no silent failures” guideline.

🧹 Proposed cleanup
 func (m *mockClusterService) UpgradeCluster(ctx context.Context, id uuid.UUID, version string) error {
-	_ = version
 	return m.Called(ctx, id, version).Error(0)
 }
 
 func (m *mockClusterService) RotateSecrets(ctx context.Context, id uuid.UUID) error {
-	_ = "action:rotate"
 	return m.Called(ctx, id).Error(0)
 }
 
 func (m *mockClusterService) RestoreBackup(ctx context.Context, id uuid.UUID, backupPath string) error {
-	_ = "restore-logic"
 	return m.Called(ctx, id, backupPath).Error(0)
 }

As per coding guidelines, avoid blank identifier assignments for silent failures.

internal/repositories/postgres/encryption_repo_unit_test.go-41-43 (1)

41-43: Same issue: handle error from pgxmock.NewPool().

Proposed fix
 	t.Run("GetKey", func(t *testing.T) {
-		mock, _ := pgxmock.NewPool()
+		mock, err := pgxmock.NewPool()
+		require.NoError(t, err)
 		defer mock.Close()
internal/repositories/postgres/encryption_repo_unit_test.go-22-24 (1)

22-24: Handle error from pgxmock.NewPool() instead of discarding it.

Per coding guidelines, silent failures via blank identifier assignment should be avoided. If the mock pool fails to create, the test should fail immediately rather than proceed with undefined behavior.

Proposed fix
 	t.Run("SaveKey", func(t *testing.T) {
-		mock, _ := pgxmock.NewPool()
+		mock, err := pgxmock.NewPool()
+		require.NoError(t, err)
 		defer mock.Close()

You'll also need to add require to the imports:

import (
	"github.com/stretchr/testify/require"
)
internal/repositories/k8s/provisioner_all_test.go-101-103 (1)

101-103: Guard the matcher against empty args to avoid panic.
args[len(args)-1] will panic if the slice is empty.

🐛 Safer matcher
-		instSvc.On("Exec", ctx, masterID.String(), mock.MatchedBy(func(args []string) bool {
-			return args[len(args)-1] == expectedCmd
-		})).Return("kubeconfig-content", nil).Once()
+		instSvc.On("Exec", ctx, masterID.String(), mock.MatchedBy(func(args []string) bool {
+			return len(args) > 0 && args[len(args)-1] == expectedCmd
+		})).Return("kubeconfig-content", nil).Once()

Comment thread internal/core/services/cache_test.go
Comment thread internal/core/services/cluster.go
Comment thread internal/handlers/lifecycle_handler_test.go Outdated
Comment thread internal/repositories/k8s/provisioner_all_test.go
Comment thread internal/storage/node/rpc_test.go
Comment thread internal/workers/metrics_worker_test.go
Comment thread pkg/sdk/iac_test.go
Comment thread pkg/sdk/vpc_test.go Outdated
Comment thread pkg/sshutil/client_test.go
Comment thread tests/compute_e2e_test.go Outdated
- internal/core/services/cache.go: Validate VPC before creating cache record
- internal/core/services/cluster.go: Use constants for defaults
- internal/handlers/lifecycle_handler_test.go: Refactor to table-driven and assert mocks
- internal/repositories/k8s/provisioner_all_test.go: Assert mock expectations
- internal/storage/node/rpc_test.go: Check SetUp errors
- internal/workers/metrics_worker_test.go: Use deterministic polling
- pkg/sdk/iac_test.go: Refactor to table-driven
- pkg/sdk/vpc_test.go: Refactor to table-driven
- pkg/sshutil/client_test.go: Use assert.ErrorIs
- tests/compute_e2e_test.go: Tighten status code assertion
fix: Address QA findings and improve test reliability
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@internal/core/services/cache.go`:
- Around line 73-78: The call to s.resolveNetworkID must validate that a found
VPC has a non-empty NetworkID and return a clear service-layer error if
NetworkID == "" to avoid falling back to the default network; update
resolveNetworkID to check for empty string and return a typed/internal error
(e.g., using internal/errors.New or similar) when missing. After that, ensure
repository errors returned from s.repo.Create(...) are wrapped with
internal/errors (or the project's error-wrapping helper) before returning so
callers get contextualized errors from the cache creation flow; reference
resolveNetworkID for the validation and s.repo.Create for wrapping repository
errors.

In `@internal/handlers/lifecycle_handler_test.go`:
- Line 113: The test currently ignores errors from json.Marshal(tc.body) which
can hide failures; change the line to capture the error (e.g., bodyBytes, err :=
json.Marshal(tc.body)) and assert or fail on error (use t.Fatalf or your test
helper like require.NoError(t, err)) before using bodyBytes so the test fails
loudly if marshaling fails.

In `@internal/repositories/k8s/provisioner_all_test.go`:
- Around line 101-116: In the KubeConfig test's Exec matcher (inside the t.Run
"KubeConfig"), make the matcher defensive by checking args length before
indexing: update the mock.MatchedBy func used in instSvc.On("Exec", ...) to
return len(args) > 0 && args[len(args)-1] == expectedCmd so it guards against
empty slices when calling p.GetKubeconfig; this mirrors the defensive checks
used in the other matchers in this test file.

In `@internal/workers/metrics_worker_test.go`:
- Around line 146-165: Rename the test to reflect what it actually asserts or
change the code to expose the counted value; specifically either (A) rename
TestMetricsCollectorWorker_CollectMetricsCountsUpNodes to
TestMetricsCollectorWorker_CollectMetricsCallsGetClusterStatus and update the
test name string to match, leaving the call to worker.collectMetrics and the
existing assertion on fakeStorageService.StatusCalls(), or (B) modify
MetricsCollectorWorker.collectMetrics to return (or emit via a metric/log) the
computed upCount so the test can assert the counting logic (then update the test
to call the new return value and assert upCount == expected using the
fakeStorageService cluster with statuses "up","alive","down"); reference
MetricsCollectorWorker.collectMetrics, fakeStorageService
(StatusCalls/GetClusterStatus) and the test function name when applying the
change.
🧹 Nitpick comments (7)
internal/handlers/lifecycle_handler_test.go (1)

123-173: Consider refactoring to table-driven tests for consistency.

TestLifecycleHandlerListRules and TestLifecycleHandlerDeleteRule use sequential subtests with a shared mock, while TestLifecycleHandlerCreateRule correctly uses the table-driven pattern. Refactoring these to table-driven tests would:

  • Align with coding guidelines requiring table-driven tests
  • Ensure each subtest gets its own isolated mock instance via setupLifecycleHandlerTest(t)
  • Maintain consistency across the test file
Example refactor for TestLifecycleHandlerListRules
 func TestLifecycleHandlerListRules(t *testing.T) {
-	svc, handler, r := setupLifecycleHandlerTest(t)
-	r.GET("/storage/buckets/:bucket/lifecycle", handler.ListRules)
-
-	t.Run("Success", func(t *testing.T) {
-		svc.On("ListRules", mock.Anything, testBucketName).Return([]*domain.LifecycleRule{}, nil).Once()
-
-		w := httptest.NewRecorder()
-		req := httptest.NewRequest("GET", testLifecyclePath, nil)
-		r.ServeHTTP(w, req)
-
-		assert.Equal(t, http.StatusOK, w.Code)
-	})
-
-	t.Run("Error", func(t *testing.T) {
-		svc.On("ListRules", mock.Anything, testBucketName).Return(nil, errors.New(errors.Internal, "error")).Once()
-
-		w := httptest.NewRecorder()
-		req := httptest.NewRequest("GET", testLifecyclePath, nil)
-		r.ServeHTTP(w, req)
-
-		assert.Equal(t, http.StatusInternalServerError, w.Code)
-	})
+	tests := []struct {
+		name           string
+		setupMock      func(*mockLifecycleService)
+		expectedStatus int
+	}{
+		{
+			name: "Success",
+			setupMock: func(m *mockLifecycleService) {
+				m.On("ListRules", mock.Anything, testBucketName).Return([]*domain.LifecycleRule{}, nil).Once()
+			},
+			expectedStatus: http.StatusOK,
+		},
+		{
+			name: "Error",
+			setupMock: func(m *mockLifecycleService) {
+				m.On("ListRules", mock.Anything, testBucketName).Return(nil, errors.New(errors.Internal, "error")).Once()
+			},
+			expectedStatus: http.StatusInternalServerError,
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			svc, handler, r := setupLifecycleHandlerTest(t)
+			r.GET("/storage/buckets/:bucket/lifecycle", handler.ListRules)
+
+			tc.setupMock(svc)
+
+			w := httptest.NewRecorder()
+			req := httptest.NewRequest("GET", testLifecyclePath, nil)
+			r.ServeHTTP(w, req)
+
+			assert.Equal(t, tc.expectedStatus, w.Code)
+		})
+	}
 }
pkg/sshutil/client_test.go (3)

40-40: Remove unused constant testErrTimedOut.

This constant is no longer used after the error assertions were updated to use assert.ErrorIs with context.DeadlineExceeded and context.Canceled.

🧹 Suggested cleanup
 const (
 	testLocalhostSSH = "localhost:22"
 	testLoopbackAddr = "127.0.0.1:0"
 	testLocalhostIP  = "127.0.0.1"
-	testErrTimedOut  = "timed out"
 	testCmdEcho      = "echo hello"
 	testTmpFile      = "/tmp/test"
 	testScpCmdPrefix = "/usr/bin/scp -t "
 )

307-309: Consider logging or failing on unmarshal error in test mock.

If ssh.Unmarshal fails, payload.Command will be empty, potentially causing confusing test failures. While silent discards are pragmatic in test infrastructure, logging the error via t.Logf or failing explicitly would improve debuggability.


327-329: Consider using strings.HasPrefix for consistency.

The manual length check and slice comparison works but is less idiomatic. Since strings is already imported and used elsewhere (line 206), using strings.HasPrefix would be more consistent.

♻️ Suggested refactor
-	if len(cmd) >= len(testScpCmdPrefix) && cmd[:len(testScpCmdPrefix)] == testScpCmdPrefix {
+	if strings.HasPrefix(cmd, testScpCmdPrefix) {
 		return handleScp(ch)
 	}
internal/storage/node/rpc_test.go (2)

85-116: Consider table-driven coverage for the new RPC error cases.

These two tests share setup and could be expressed as a single table-driven test to reduce duplication and align with test guidelines. Based on learnings, use table-driven tests in *_test.go.


87-88: Unit tests should avoid real filesystem I/O.

NewLocalStore touches disk (os.MkdirAll + file writes). If these are intended as unit tests, consider injecting a mock store (testify/mock) or a lightweight in-memory stub to keep tests hermetic. Based on learnings, avoid real external dependencies in unit tests and use testify/mock.

Also applies to: 105-106

internal/repositories/k8s/provisioner_all_test.go (1)

30-45: Consider a table‑driven structure for the subtests.
This would reduce duplication and align with the repo’s testing conventions. As per coding guidelines, use table-driven tests in *_test.go.

Comment on lines +73 to +78
networkID, err := s.resolveNetworkID(ctx, vpcID)
if err != nil {
return nil, err
}

networkID, err := s.resolveNetworkID(ctx, vpcID)
if err != nil {
if err := s.repo.Create(ctx, cache); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the cache.go file and examine its structure
find . -name "cache.go" -type f | head -20

Repository: PoyrazK/thecloud

Length of output: 196


🏁 Script executed:

# Get the size of cache.go to determine how to read it
wc -l internal/core/services/cache.go

Repository: PoyrazK/thecloud

Length of output: 95


🏁 Script executed:

# Examine the full CreateCache method and resolveNetworkID implementation
sed -n '1,150p' internal/core/services/cache.go | cat -n

Repository: PoyrazK/thecloud

Length of output: 5202


🏁 Script executed:

# Search for resolveNetworkID function
rg -A 20 "func.*resolveNetworkID" internal/core/services/cache.go

Repository: PoyrazK/thecloud

Length of output: 741


🏁 Script executed:

# Check the internal/errors package structure
find . -path "*/internal/errors*" -type f | head -20

Repository: PoyrazK/thecloud

Length of output: 120


🏁 Script executed:

# Examine the errors package
cat internal/errors/errors.go

Repository: PoyrazK/thecloud

Length of output: 2886


🏁 Script executed:

# Search for VPC definition and NetworkID field
rg -B 5 -A 10 "type VPC struct" --

Repository: PoyrazK/thecloud

Length of output: 2676


🏁 Script executed:

# Look for existing tests for CreateCache/resolveNetworkID
find . -name "*cache*test*.go" -o -name "*test*cache*.go" | head -20

Repository: PoyrazK/thecloud

Length of output: 336


🏁 Script executed:

# Check the cache_test.go for tests covering resolveNetworkID and empty NetworkID
cat -n internal/core/services/cache_test.go | head -100

Repository: PoyrazK/thecloud

Length of output: 4408


🏁 Script executed:

# Search for test cases testing empty NetworkID or VPC without NetworkID
rg -A 10 -B 5 "NetworkID.*empty\|empty.*NetworkID\|resolveNetworkID" internal/core/services/cache_test.go

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Check if InvalidArgument or InvalidInput is used elsewhere
rg "InvalidArgument\|InvalidInput" internal/

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Look at the full VPC domain struct
cat -n internal/core/domain/vpc.go

Repository: PoyrazK/thecloud

Length of output: 1024


🏁 Script executed:

# Check for any validation on NetworkID in VPC creation or update flows
rg -B 5 -A 5 "NetworkID\s*==\s*\"\"\|NetworkID.*empty" internal/

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Check launchCacheContainer and how it uses networkID with compute backend
sed -n '119,145p' internal/core/services/cache.go | cat -n

Repository: PoyrazK/thecloud

Length of output: 1078


🏁 Script executed:

# Check the ports/compute backend interface to understand CreateInstance
rg -B 5 -A 15 "type CreateInstanceOptions\|CreateInstance.*func" internal/core/ports/

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# More comprehensive search in cache_test.go to find VPC-related tests
cat internal/core/services/cache_test.go | tail -200

Repository: PoyrazK/thecloud

Length of output: 6112


🏁 Script executed:

# Check if there are any VPC creation or validation flows that might validate NetworkID
rg -B 3 -A 8 "func.*CreateVPC\|VPC.*NetworkID" internal/core/services/

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Verify all error types available in internal/errors package
grep "Type =" internal/errors/errors.go

Repository: PoyrazK/thecloud

Length of output: 968


🏁 Script executed:

# Check if InvalidArgument is used anywhere in the codebase
rg "InvalidArgument" internal/

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Look at the complete errors.go to ensure we see all error types
cat internal/errors/errors.go

Repository: PoyrazK/thecloud

Length of output: 2886


Guard against empty VPC network IDs when a VPC is specified.

If a VPC exists but has an empty NetworkID, this flow will launch the cache in the default network, which can break tenant isolation. Please ensure resolveNetworkID validates that NetworkID is non-empty and returns a service-layer error when missing. Additionally, wrap repository errors using internal/errors as per the coding guidelines.

🔧 Suggested fix (in resolveNetworkID)
 func (s *CacheService) resolveNetworkID(ctx context.Context, vpcID *uuid.UUID) (string, error) {
 	if vpcID == nil {
 		return "", nil
 	}
 	vpc, err := s.vpcRepo.GetByID(ctx, *vpcID)
 	if err != nil {
 		s.logger.Error("failed to get VPC", "vpc_id", vpcID, "error", err)
-		return "", err
+		return "", errors.Wrap(errors.Internal, "failed to resolve VPC", err)
 	}
+	if vpc.NetworkID == "" {
+		return "", errors.New(errors.InvalidInput, "vpc has no network ID")
+	}
 	return vpc.NetworkID, nil
 }
🤖 Prompt for AI Agents
In `@internal/core/services/cache.go` around lines 73 - 78, The call to
s.resolveNetworkID must validate that a found VPC has a non-empty NetworkID and
return a clear service-layer error if NetworkID == "" to avoid falling back to
the default network; update resolveNetworkID to check for empty string and
return a typed/internal error (e.g., using internal/errors.New or similar) when
missing. After that, ensure repository errors returned from s.repo.Create(...)
are wrapped with internal/errors (or the project's error-wrapping helper) before
returning so callers get contextualized errors from the cache creation flow;
reference resolveNetworkID for the validation and s.repo.Create for wrapping
repository errors.


tc.setupMock(svc)

bodyBytes, _ := json.Marshal(tc.body)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid silent error handling.

The json.Marshal error is silently ignored with a blank identifier, which violates coding guidelines. While unlikely to fail for these test inputs, explicitly handling the error improves test reliability.

Proposed fix
-			bodyBytes, _ := json.Marshal(tc.body)
+			bodyBytes, err := json.Marshal(tc.body)
+			assert.NoError(t, err)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bodyBytes, _ := json.Marshal(tc.body)
bodyBytes, err := json.Marshal(tc.body)
assert.NoError(t, err)
🤖 Prompt for AI Agents
In `@internal/handlers/lifecycle_handler_test.go` at line 113, The test currently
ignores errors from json.Marshal(tc.body) which can hide failures; change the
line to capture the error (e.g., bodyBytes, err := json.Marshal(tc.body)) and
assert or fail on error (use t.Fatalf or your test helper like
require.NoError(t, err)) before using bodyBytes so the test fails loudly if
marshaling fails.

Comment on lines +101 to +116
t.Run("KubeConfig", func(t *testing.T) {
p, instSvc, repo, _, _, _, _ := prepareProvisioner()
repo.On("GetNodes", ctx, clusterID).Return([]*domain.ClusterNode{masterNode}, nil)
instSvc.On("GetInstance", ctx, masterID.String()).Return(&domain.Instance{ID: masterID, PrivateIP: testIPMaster}, nil)

expectedCmd := "cat /etc/kubernetes/admin.conf"
instSvc.On("Exec", ctx, masterID.String(), mock.MatchedBy(func(args []string) bool {
return args[len(args)-1] == expectedCmd
})).Return("kubeconfig-content", nil).Once()

conf, err := p.GetKubeconfig(ctx, cluster, "admin")
assert.NoError(t, err)
assert.Equal(t, "kubeconfig-content", conf)
repo.AssertExpectations(t)
instSvc.AssertExpectations(t)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual test file to confirm the code exists
cd /root && find . -name "provisioner_all_test.go" -type f | head -5

Repository: PoyrazK/thecloud

Length of output: 115


🏁 Script executed:

# Search for the test file from current directory
find . -name "provisioner_all_test.go" -type f

Repository: PoyrazK/thecloud

Length of output: 111


🏁 Script executed:

# Read the test file around lines 101-116
sed -n '95,120p' ./internal/repositories/k8s/provisioner_all_test.go

Repository: PoyrazK/thecloud

Length of output: 1025


🏁 Script executed:

# Now let's find the GetKubeconfig method implementation to understand what args it passes
grep -n "GetKubeconfig" ./internal/repositories/k8s/provisioner_all_test.go

Repository: PoyrazK/thecloud

Length of output: 117


🏁 Script executed:

# Find the provisioner implementation file
find . -name "provisioner*.go" -type f | grep -v test

Repository: PoyrazK/thecloud

Length of output: 102


🏁 Script executed:

# Find the GetKubeconfig method implementation
grep -n "GetKubeconfig" ./internal/repositories/k8s/provisioner.go

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Search more broadly for GetKubeconfig
rg "GetKubeconfig" --type go

Repository: PoyrazK/thecloud

Length of output: 4647


🏁 Script executed:

# Look at the GetKubeconfig implementation
sed -n '1,50p' ./internal/repositories/k8s/health.go | head -50

Repository: PoyrazK/thecloud

Length of output: 1288


🏁 Script executed:

# Search for GetKubeconfig in health.go
grep -n "func.*GetKubeconfig" ./internal/repositories/k8s/health.go

Repository: PoyrazK/thecloud

Length of output: 183


🏁 Script executed:

# Read the GetKubeconfig function implementation
sed -n '165,210p' ./internal/repositories/k8s/health.go

Repository: PoyrazK/thecloud

Length of output: 1462


🏁 Script executed:

# Let's find the executor implementation and see what arguments it passes to Exec
rg "type.*Executor" --type go -A 5

Repository: PoyrazK/thecloud

Length of output: 1241


🏁 Script executed:

# Look at ServiceExecutor.Run implementation
rg "func.*ServiceExecutor.*Run" --type go -A 15

Repository: PoyrazK/thecloud

Length of output: 1505


🏁 Script executed:

# Let's look at the prepareProvisioner function to understand how the mock is set up
sed -n '1,100p' ./internal/repositories/k8s/provisioner_all_test.go | tail -50

Repository: PoyrazK/thecloud

Length of output: 2244


🏁 Script executed:

# Let's see the full prepareProvisioner function to understand the mock setup
grep -n "func prepareProvisioner" ./internal/repositories/k8s/provisioner_all_test.go

Repository: PoyrazK/thecloud

Length of output: 242


🏁 Script executed:

# Let's read the prepareProvisioner function
sed -n '17,35p' ./internal/repositories/k8s/provisioner_all_test.go

Repository: PoyrazK/thecloud

Length of output: 853


🏁 Script executed:

# Let's verify the actual behavior by checking what args are passed in other similar matchers
sed -n '65,85p' ./internal/repositories/k8s/provisioner_all_test.go

Repository: PoyrazK/thecloud

Length of output: 1080


Make the matcher defensive by guarding against empty args.
The matcher accesses args[len(args)-1] without checking length. While the ServiceExecutor implementation currently ensures args has 3 elements (["/bin/sh", "-c", cmd]), the matcher code should be defensive. Other similar matchers in this same test file (lines 75-76, 81-82) already include the guard; this test should follow the same pattern.

🔧 Proposed fix
 instSvc.On("Exec", ctx, masterID.String(), mock.MatchedBy(func(args []string) bool {
-	return args[len(args)-1] == expectedCmd
+	return len(args) > 0 && args[len(args)-1] == expectedCmd
 })).Return("kubeconfig-content", nil).Once()
🤖 Prompt for AI Agents
In `@internal/repositories/k8s/provisioner_all_test.go` around lines 101 - 116, In
the KubeConfig test's Exec matcher (inside the t.Run "KubeConfig"), make the
matcher defensive by checking args length before indexing: update the
mock.MatchedBy func used in instSvc.On("Exec", ...) to return len(args) > 0 &&
args[len(args)-1] == expectedCmd so it guards against empty slices when calling
p.GetKubeconfig; this mirrors the defensive checks used in the other matchers in
this test file.

Comment on lines +146 to +165
func TestMetricsCollectorWorker_CollectMetricsCountsUpNodes(t *testing.T) {
svc := &fakeStorageService{clusterStatus: &domain.StorageCluster{
Nodes: []domain.StorageNode{
{Status: "up"},
{Status: "alive"},
{Status: "down"},
},
}}
worker := &MetricsCollectorWorker{
storageRepo: nil,
storageSvc: svc,
logger: slog.New(slog.NewTextHandler(io.Discard, nil)),
}

worker.collectMetrics(context.Background())

if svc.StatusCalls() != 1 {
t.Fatalf("expected GetClusterStatus to be called once")
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd internal/workers && ast-grep --pattern $'func ($_* *MetricsCollectorWorker) collectMetrics($_) {
  $$$
}'

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

cd internal/workers && cat -n metrics_worker_test.go | head -200

Repository: PoyrazK/thecloud

Length of output: 5869


🏁 Script executed:

cd internal/workers && wc -l metrics_worker.go && head -50 metrics_worker.go

Repository: PoyrazK/thecloud

Length of output: 1217


🏁 Script executed:

cd internal/workers && tail -30 metrics_worker.go

Repository: PoyrazK/thecloud

Length of output: 1256


🏁 Script executed:

cd internal/workers && sed -n '146,165p' metrics_worker_test.go

Repository: PoyrazK/thecloud

Length of output: 585


🏁 Script executed:

cd internal/workers && grep -n "logger\|upNodes\|slog\|Fatalf\|Errorf\|Assert" metrics_worker_test.go | tail -20

Repository: PoyrazK/thecloud

Length of output: 468


🏁 Script executed:

cd internal/workers && grep -A 5 -B 5 "CountsUpNodes" metrics_worker_test.go

Repository: PoyrazK/thecloud

Length of output: 388


Test name does not match its assertions.

The test is named CountsUpNodes but only verifies that GetClusterStatus was called once—the counting logic itself is never validated. While the implementation does count nodes with status "up" or "alive", this count is stored in a local variable and never exposed (the metric export is commented out), making it impossible for the test to assert the counting behavior as the name suggests.

Rename the test to reflect what it actually verifies (e.g., TestMetricsCollectorWorker_CollectMetricsCallsGetClusterStatus), or refactor the implementation to expose the counting result so assertions can be added (e.g., via logging, metric export, or return value).

🤖 Prompt for AI Agents
In `@internal/workers/metrics_worker_test.go` around lines 146 - 165, Rename the
test to reflect what it actually asserts or change the code to expose the
counted value; specifically either (A) rename
TestMetricsCollectorWorker_CollectMetricsCountsUpNodes to
TestMetricsCollectorWorker_CollectMetricsCallsGetClusterStatus and update the
test name string to match, leaving the call to worker.collectMetrics and the
existing assertion on fakeStorageService.StatusCalls(), or (B) modify
MetricsCollectorWorker.collectMetrics to return (or emit via a metric/log) the
computed upCount so the test can assert the counting logic (then update the test
to call the new return value and assert upCount == expected using the
fakeStorageService cluster with statuses "up","alive","down"); reference
MetricsCollectorWorker.collectMetrics, fakeStorageService
(StatusCalls/GetClusterStatus) and the test function name when applying the
change.

Copilot AI review requested due to automatic review settings January 27, 2026 19:31
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots
5.7% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 87 out of 87 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

internal/repositories/k8s/node_executor.go:97

  • WaitForReady creates a timeout context (ctxTimeout) but still calls e.Run with the original ctx both in the initial readiness check and inside the polling loop. This means the timeout you set here won't actually bound the underlying Exec calls, and they can block longer than timeout. To honor the timeout, use ctxTimeout for the Run calls instead of ctx in this method.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@poyrazK poyrazK merged commit 1b72c76 into main Jan 27, 2026
30 of 31 checks passed
@poyrazK poyrazK deleted the feature/tenant-middleware-tests branch January 27, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants