Skip to content

feat: add DNS service with PowerDNS integration and E2E tests#41

Merged
poyrazK merged 45 commits intomainfrom
feature/dns-service-e2e-tests
Jan 30, 2026
Merged

feat: add DNS service with PowerDNS integration and E2E tests#41
poyrazK merged 45 commits intomainfrom
feature/dns-service-e2e-tests

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Jan 30, 2026

Overview

This PR introduces a comprehensive DNS management service to The Cloud, integrated with PowerDNS. It allows users to manage DNS zones and records (A, AAAA, CNAME, MX, TXT) with full tenant isolation and automatic instance registration.

Key Changes

  • DNS Service Implementation: Core business logic for DNS zones and records.
  • PowerDNS Adapter: Integration with PowerDNS HTTP API for backend management.
  • Postgres Repositories: Persistent storage for DNS resources with multi-tenant scoping.
  • Automatic Instance Registration: Instances launched in a VPC with an associated DNS zone automatically get an 'A' record assigned.
  • Improved Compute Provisioning: Added retry logic to Docker IP allocation and tenant context propagation to workers to resolve 'instance not found' and race condition errors.
  • E2E Infrastructure: Added PowerDNS to docker-compose.yml and comprehensive E2E tests for the DNS lifecycle.

Testing

  • Unit tests for service, repository, and handlers.
  • E2E tests covering:
    • Zone management (CRUD).
    • Record management (CRUD for multiple types).
    • Instance auto-registration and cleanup.
    • DNS isolation between VPCs.

Summary by CodeRabbit

  • New Features

    • Managed private DNS: create/list/delete zones and records, automatic instance A‑record registration, API, CLI and SDK support.
  • Configuration

    • Added PowerDNS configuration options and a local PowerDNS service for development compose.
  • Bug Fixes

    • More robust instance IP retrieval with retries/backoff; DNS registration/cleanup integrated into instance lifecycle (non‑blocking).
  • Documentation

    • CloudDNS guides and updated API/Swagger docs.
  • Tests

    • Extensive unit, integration and end‑to‑end DNS test coverage.

✏️ Tip: You can customize this high-level summary in your review settings.

…ations

- Database: Added multi-engine support, VPC integration, connection strings, metrics
- Cache: Added FlushCache, GetCacheStats, connection strings, ID/name lookup
- Functions: Added all 5 runtimes (nodejs, python, go, rust, java), ZIP support
- LoadBalancer: Added VPC-awareness, target management, cross-VPC validation, idempotency
- Secrets: Added per-user key derivation, access tracking, redaction, production safety
- IAM: Added full RBAC documentation with roles, permissions, binding
- Observability: Added Prometheus, Grafana, Dashboard service, event system
Copilot AI review requested due to automatic review settings January 30, 2026 10:28
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Adds a multi-tenant CloudDNS feature: PowerDNS integration, DNS domain models/ports, Postgres repo + migrations, PowerDNS adapter, service layer with instance auto-register/unregister, HTTP handlers/routes, CLI/SDK, extensive unit and e2e tests, DI wiring, and config/docker entries.

Changes

Cohort / File(s) Summary
Configuration & Runtime
\.env.example, docker-compose.yml, internal/platform/config.go
Add PowerDNS env vars and defaults; new powerdns service, volume, healthcheck; Config fields and env loading for PowerDNS endpoint/credentials.
Domain & Ports
internal/core/domain/dns.go, internal/core/ports/dns.go, internal/core/domain/jobs.go
Introduce DNS domain types (DNSZone, DNSRecord, RecordType, zone statuses), DNS ports/interfaces, and add TenantID to ProvisionJob.
PowerDNS Adapter & Dependency
internal/adapters/dns/powerdns.go, internal/adapters/dns/powerdns_test.go, go.mod
New PowerDNSBackend using go-powerdns and REST RRset PATCHing with unit tests; add go-powerdns dependency.
Database, Repo & Migrations
internal/repositories/postgres/dns_repo.go, internal/repositories/postgres/dns_repo_test.go, internal/repositories/postgres/migrations/073_create_dns_zones.*, internal/repositories/postgres/migrations/074_create_dns_records.*
Postgres DNSRepository implementing CRUD, tenant scoping, and indexes; migrations for dns_zones and dns_records; repo tests.
Service Layer & Instance Integration
internal/core/services/dns.go, internal/core/services/dns_test.go, internal/core/services/instance.go
DNSService implementation coordinating repo/backend/audit/events; instance provisioning now registers/unregisters DNS records; tenant-aware provisioning job wiring.
HTTP Handlers & Routing
internal/handlers/dns_handler.go, internal/handlers/dns_handler_test.go, internal/api/setup/router.go, internal/api/setup/dependencies.go
Add DNSHandler with CRUD endpoints under /dns, DI wiring in router/dependencies, auth/tenant middleware enforcement, and handler tests.
Mocks & Test Support
internal/core/ports/mocks/dns_mocks.go, internal/core/ports/mocks/common_mocks.go, pkg/testutil/constants.go
Testify mocks for DNSRepository/DNSBackend/DNSService and other common mocks; new test route constants for DNS endpoints.
CLI, SDK & Docs
cmd/cloud/dns.go, cmd/cloud/main.go, pkg/sdk/dns.go, docs/guides/dns.md, docs/services/cloud-dns.md, docs/FEATURES.md, docs/swagger/*
Add dns CLI subcommands, SDK client methods, user docs/guides, and Swagger/OpenAPI additions for DNS resources and request/response shapes.
E2E Tests & Helpers
tests/helpers_test.go, tests/dns_e2e_test.go, tests/dns_instance_e2e_test.go, tests/dns_isolation_e2e_test.go
Comprehensive E2E suites for DNS flows: create zones/records, instance auto-registration, tenant isolation; minor HTTP/health test tweaks.
Repository & Misc Wide Changes
internal/repositories/..., many pkg/*, internal/*, tests/*
New repo types and SQL usage; many unit/e2e tests added; widespread small stylistic/error-handling changes (ignored return values, defer-close patterns, table return-value discards).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Handler as DNS Handler
    participant Service as DNS Service
    participant VpcRepo as VPC Repo
    participant DNSRepo as DNS Repo
    participant DNSBackend as PowerDNS Backend
    participant DB as PostgreSQL

    Client->>Handler: POST /dns/zones (vpcID, name, desc)
    Handler->>Service: CreateZone(vpcID, name, desc)
    Service->>VpcRepo: GetByID(vpcID)
    VpcRepo-->>Service: VPC
    Service->>DNSRepo: GetZoneByVPC(vpcID)
    DNSRepo->>DB: SELECT dns_zones WHERE vpc_id
    DB-->>DNSRepo: no rows
    DNSRepo-->>Service: not found
    Service->>DNSBackend: CreateZone(name, nameservers)
    DNSBackend-->>Service: success (powerdns_id)
    Service->>DNSRepo: CreateZone(zone entity)
    DNSRepo->>DB: INSERT dns_zones
    DB-->>DNSRepo: success
    Service-->>Handler: DNSZone
    Handler-->>Client: 201 Created
Loading
sequenceDiagram
    participant InstanceSvc as Instance Service
    participant DNSSvc as DNS Service
    participant VpcRepo as VPC Repo
    participant DNSRepo as DNS Repo
    participant DNSBackend as PowerDNS Backend

    InstanceSvc->>InstanceSvc: LaunchInstance completes (instance, ip)
    InstanceSvc->>DNSSvc: RegisterInstance(instance, ip)
    DNSSvc->>VpcRepo: GetByID(instance.VpcID)
    VpcRepo-->>DNSSvc: VPC
    DNSSvc->>DNSRepo: GetZoneByVPC(vpcID)
    DNSRepo-->>DNSSvc: DNSZone
    DNSSvc->>DNSBackend: AddRecords(zoneName, [A record])
    DNSBackend-->>DNSSvc: success
    DNSSvc->>DNSRepo: CreateRecord(metadata)
    DNSRepo-->>DNSSvc: success
    DNSSvc-->>InstanceSvc: OK
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

🐇 I hopped through configs, zones, and code so green,
I nudged A records where private IPs are seen,
PowerDNS hums while instances find their name,
Tenants and zones aligned — no two the same,
Thump! A rabbit cheers for DNS, what a scene 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main additions: DNS service implementation, PowerDNS integration, and E2E tests—all core features of this changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/dns-service-e2e-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive DNS management service to The Cloud platform, integrated with PowerDNS for backend DNS resolution. It enables users to manage private DNS zones and records (A, AAAA, CNAME, MX, TXT) with full tenant isolation, and includes automatic DNS registration for instances launched within VPCs.

Changes:

  • Added DNS service with full CRUD operations for zones and records, integrated with PowerDNS via HTTP API
  • Implemented automatic instance DNS registration/unregistration when instances are launched/terminated in VPCs with associated DNS zones
  • Enhanced compute provisioning with retry logic for Docker IP allocation and proper tenant context propagation to background workers
  • Added comprehensive E2E tests covering zone management, record operations, instance auto-registration, and multi-tenant isolation

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
internal/core/domain/dns.go New DNS domain models (DNSZone, DNSRecord, RecordType)
internal/core/domain/jobs.go Added TenantID field to ProvisionJob for context propagation
internal/core/ports/dns.go DNS service interfaces and port definitions
internal/core/services/dns.go Core DNS service business logic with PowerDNS integration
internal/core/services/dns_test.go Unit tests for DNS service operations
internal/core/services/instance.go Integration with DNS service for automatic instance registration
internal/repositories/postgres/dns_repo.go PostgreSQL repository for DNS persistence
internal/repositories/postgres/dns_repo_test.go Unit tests for DNS repository
internal/repositories/postgres/migrations/073_create_dns_zones.*.sql Migration for dns_zones table with tenant scoping
internal/repositories/postgres/migrations/074_create_dns_records.*.sql Migration for dns_records table with zone relationships
internal/adapters/dns/powerdns.go PowerDNS HTTP API adapter implementation
internal/adapters/dns/powerdns_test.go Unit tests for PowerDNS adapter
internal/handlers/dns_handler.go HTTP handlers for DNS API endpoints
internal/handlers/dns_handler_test.go Handler unit tests
internal/handlers/helper.go Consolidated error message constants
internal/workers/provision_worker.go Added tenant context injection for provisioning jobs
internal/repositories/docker/adapter.go Added retry logic for IP allocation (15-second timeout with 500ms intervals)
internal/platform/config.go PowerDNS configuration parameters
internal/api/setup/dependencies.go DNS service initialization and wiring
internal/api/setup/router.go DNS API route registration
docker-compose.yml PowerDNS service container configuration
.env.example PowerDNS environment variable examples
tests/dns_e2e_test.go E2E tests for DNS zone and record CRUD operations
tests/dns_isolation_e2e_test.go E2E tests for cross-VPC DNS isolation
tests/dns_instance_e2e_test.go E2E tests for automatic instance DNS registration
tests/helpers_test.go Added putRequest helper and updated health check endpoint
pkg/testutil/constants.go DNS route constants for testing
internal/core/ports/mocks/dns_mocks.go Mock implementations for DNS interfaces
internal/core/ports/mocks/common_mocks.go Mock implementations for VPC, Audit, and Event services
go.mod, go.sum Added joeig/go-powerdns/v3 dependency
docs/FEATURES.md Updated documentation with RDS, Cache, and Functions improvements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +262 to +285
func (r *DNSRepository) UpdateRecord(ctx context.Context, record *domain.DNSRecord) error {
record.UpdatedAt = time.Now()
query := `
UPDATE dns_records
SET content = $2, ttl = $3, priority = $4, disabled = $5, updated_at = $6
WHERE id = $1
`
_, err := r.db.Exec(ctx, query,
record.ID, record.Content, record.TTL, record.Priority, record.Disabled, record.UpdatedAt,
)
if err != nil {
return errors.Wrap(errors.Internal, "failed to update dns record", err)
}
return nil
}

func (r *DNSRepository) DeleteRecord(ctx context.Context, id uuid.UUID) error {
query := `DELETE FROM dns_records WHERE id = $1`
_, err := r.db.Exec(ctx, query, id)
if err != nil {
return errors.Wrap(errors.Internal, "failed to delete dns record", err)
}
return nil
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UpdateRecord and DeleteRecord methods lack tenant isolation checks. Records should only be updatable/deletable by the tenant that owns the zone. Consider joining with dns_zones table to filter by tenant_id, similar to the pattern needed for GetRecordByID.

Copilot uses AI. Check for mistakes.
Comment on lines +433 to +434
if s.dnsSvc != nil {
_ = s.dnsSvc.UnregisterInstance(context.Background(), inst.ID)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using context.Background() for DNS unregistration loses the tenant context from the original request. This could cause issues if GetRecordsByInstance or GetZoneByID require tenant filtering (which they should for security). Consider passing the context from the termination request instead, or ensuring the instance ID is sufficient for proper scoping.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +28
// Log raw payload
// fmt.Printf("Enqueueing to %s: %s\n", queueName, string(data))
// We don't have a logger here easily, but we can rely on return being nil.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commented-out debug lines should be removed before merging to production. If logging is needed here, consider using a proper logger instance instead of fmt.Printf.

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +156
func (r *DNSRepository) UpdateZone(ctx context.Context, zone *domain.DNSZone) error {
zone.UpdatedAt = time.Now()
query := `
UPDATE dns_zones
SET description = $2, status = $3, default_ttl = $4, updated_at = $5
WHERE id = $1
`
_, err := r.db.Exec(ctx, query,
zone.ID, zone.Description, zone.Status, zone.DefaultTTL, zone.UpdatedAt,
)
if err != nil {
return errors.Wrap(errors.Internal, "failed to update dns zone", err)
}
return nil
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UpdateZone method should filter by tenant_id to prevent unauthorized updates across tenants. Add tenant filtering similar to how Delete operations are done in vpc_repo.go:90. The query should include AND tenant_id = $N where tenant_id is extracted from context.

Copilot uses AI. Check for mistakes.
Comment thread docker-compose.yml
Comment on lines +98 to +99
- POWERDNS_API_URL=http://powerdns:8081
- POWERDNS_API_KEY=${POWERDNS_API_KEY:-thecloud-dns-secret}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PowerDNS environment variables are added but the API service doesn't wait for PowerDNS to be ready. Consider adding a healthcheck to the powerdns service and including it in the API's depends_on with a health condition. Otherwise, the API might fail to connect to PowerDNS during startup, potentially causing initialization errors for the DNS service.

Copilot uses AI. Check for mistakes.
Comment thread internal/repositories/docker/adapter.go Outdated
return net.IPAddress, nil
// Check status
if json.State != nil {
// fmt.Printf("Container status: %s\n", json.State.Status)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out debug line should be removed before merging to production. If logging is needed for debugging IP allocation issues, consider using the adapter's logger instance which should be available.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +68
func (r *DNSRepository) GetZoneByID(ctx context.Context, id uuid.UUID) (*domain.DNSZone, error) {
query := `
SELECT id, user_id, tenant_id, vpc_id, name, description,
status, default_ttl, powerdns_id, created_at, updated_at
FROM dns_zones
WHERE id = $1
`
// Typically we might also filter by tenant_id for security, but the service later should handle authz check or context-based filtering?
// Existing repos often just get by ID. However, let's verify if we need to respect tenant scoping here.
// For GetByID, usually we fetch the raw object and service checks permission.
// But let's check if the pattern is to include tenant check.
// Checking the implementation plan, it just says "Standard PostgreSQL repository".

var z domain.DNSZone
err := r.db.QueryRow(ctx, query, id).Scan(
&z.ID, &z.UserID, &z.TenantID, &z.VpcID, &z.Name, &z.Description,
&z.Status, &z.DefaultTTL, &z.PowerDNSID, &z.CreatedAt, &z.UpdatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New(errors.NotFound, "dns zone not found")
}
return nil, errors.Wrap(errors.Internal, "failed to get dns zone", err)
}
return &z, nil
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetZoneByID method should filter by tenant_id for security and consistency with the rest of the codebase. Other repositories like VpcRepository (vpc_repo.go:40) and InstanceRepository (instance_repo.go:48) include tenant filtering in their GetByID methods to prevent cross-tenant data access. Add AND tenant_id = $2 to the WHERE clause and extract tenant_id from context similar to how it's done in GetZoneByName.

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +205
func (r *DNSRepository) GetRecordByID(ctx context.Context, id uuid.UUID) (*domain.DNSRecord, error) {
query := `
SELECT id, zone_id, name, type, content, ttl, priority,
disabled, auto_managed, instance_id, created_at, updated_at
FROM dns_records
WHERE id = $1
`
var rec domain.DNSRecord
err := r.db.QueryRow(ctx, query, id).Scan(
&rec.ID, &rec.ZoneID, &rec.Name, &rec.Type, &rec.Content, &rec.TTL, &rec.Priority,
&rec.Disabled, &rec.AutoManaged, &rec.InstanceID, &rec.CreatedAt, &rec.UpdatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New(errors.NotFound, "dns record not found")
}
return nil, errors.Wrap(errors.Internal, "failed to get dns record", err)
}
return &rec, nil
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetRecordByID method should verify tenant ownership of the record. While DNS records don't directly have a tenant_id column, they belong to zones which do. Consider joining with dns_zones to filter by tenant_id, or alternatively ensure the service layer performs this check before allowing operations on records. The current implementation could allow cross-tenant access to records if an attacker knows the record UUID.

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +165
func (r *DNSRepository) DeleteZone(ctx context.Context, id uuid.UUID) error {
query := `DELETE FROM dns_zones WHERE id = $1`
_, err := r.db.Exec(ctx, query, id)
if err != nil {
return errors.Wrap(errors.Internal, "failed to delete dns zone", err)
}
return nil
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeleteZone method should filter by tenant_id to prevent unauthorized deletion across tenants. Follow the pattern from vpc_repo.go:88-91 which filters by tenant_id and checks RowsAffected to return an appropriate error if the zone doesn't exist or doesn't belong to the tenant.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
internal/repositories/docker/adapter.go (1)

29-35: ⚠️ Potential issue | 🟡 Minor

Replace magic numbers in retry policy with named constants.
This violates the no-magic-numbers guideline and makes tuning harder.

♻️ Proposed fix
 const (
 	// ImagePullTimeout is the maximum time allowed for pulling a Docker image
 	ImagePullTimeout = 5 * time.Minute
 	// DefaultOperationTimeout is the default timeout for Docker operations
 	DefaultOperationTimeout = 30 * time.Second
 	// tracerName is the name of the tracer for this package
 	tracerName = "docker-adapter"
+	// Instance IP retry policy
+	instanceIPMaxRetries   = 30
+	instanceIPRetryBackoff = 500 * time.Millisecond
 )
@@
-	for i := 0; i < 30; i++ {
+	for i := 0; i < instanceIPMaxRetries; i++ {
@@
-		case <-time.After(500 * time.Millisecond):
+		case <-time.After(instanceIPRetryBackoff):
 			continue
 		}
 	}

As per coding guidelines, do not use magic numbers - use named constants instead.

Also applies to: 444-469

internal/workers/provision_worker.go (1)

61-78: ⚠️ Potential issue | 🟠 Major

Guard against missing tenant_id in queued jobs; tests demonstrate the gap.

The test cases in internal/workers/provision_worker_test.go (lines 64–67, 77) create ProvisionJob without setting TenantID, which defaults to uuid.Nil. While the production enqueuer in internal/core/services/instance.go correctly populates TenantID, a defensive check prevents silent failures if other code paths bypass that validation.

🛡️ Suggested guard
 import (
 	"context"
 	"encoding/json"
 	"log/slog"
 	"sync"
 	"time"

+	"github.com/google/uuid"
 	appcontext "github.com/poyrazk/thecloud/internal/core/context"
 	"github.com/poyrazk/thecloud/internal/core/domain"
 	"github.com/poyrazk/thecloud/internal/core/ports"
 	"github.com/poyrazk/thecloud/internal/core/services"
 )
@@
-	// Inject User and Tenant IDs for repository access control
+	// Inject User and Tenant IDs for repository access control
 	ctx = appcontext.WithUserID(ctx, job.UserID)
+	if job.TenantID == uuid.Nil {
+		w.logger.Error("missing tenant_id in provision job", "instance_id", job.InstanceID)
+		return
+	}
 	ctx = appcontext.WithTenantID(ctx, job.TenantID)

Also fix test cases to populate TenantID rather than relying on implicit zero values.

docs/FEATURES.md (1)

202-221: ⚠️ Potential issue | 🟡 Minor

Fix duplicated section numbering (13 repeated).

“Observability” is now section 13, but “CLI” still uses 13. Consider renumbering CLI and Console to keep sequencing consistent.

📝 Suggested update
-### 13. CLI (Command Line Interface)
+### 14. CLI (Command Line Interface)
@@
-### 14. Console (Frontend)
+### 15. Console (Frontend)
internal/core/services/instance.go (1)

415-435: ⚠️ Potential issue | 🟠 Major

Propagate context and handle DNS cleanup errors.

Using context.Background() drops cancellation/trace, and ignoring errors can leave stale DNS records.

✅ Propagate ctx + log failures
-	s.updateTerminationMetrics(inst)
+	s.updateTerminationMetrics(ctx, inst)
...
-func (s *InstanceService) updateTerminationMetrics(inst *domain.Instance) {
+func (s *InstanceService) updateTerminationMetrics(ctx context.Context, inst *domain.Instance) {
 	switch inst.Status {
 	case domain.StatusRunning:
 		platform.InstancesTotal.WithLabelValues("running", s.compute.Type()).Dec()
 	case domain.StatusStopped:
 		platform.InstancesTotal.WithLabelValues("stopped", s.compute.Type()).Dec()
 	}
 	platform.InstanceOperationsTotal.WithLabelValues("terminate", "success").Inc()
 
 	if s.dnsSvc != nil {
-		_ = s.dnsSvc.UnregisterInstance(context.Background(), inst.ID)
+		if err := s.dnsSvc.UnregisterInstance(ctx, inst.ID); err != nil {
+			s.logger.Warn("failed to unregister instance DNS", "instance_id", inst.ID, "error", err)
+		}
 	}
 }
As per coding guidelines: Propagate context.Context to all blocking calls; Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`.
🤖 Fix all issues with AI agents
In `@docker-compose.yml`:
- Around line 50-60: The PowerDNS admin REST API is exposed to the host via the
"8081:8081" port mapping and PDNS_AUTH_WEBSERVER_ALLOW_FROM=0.0.0.0/0; change
PDNS_AUTH_WEBSERVER_ADDRESS to 127.0.0.1 (or the service's internal IP) and
restrict PDNS_AUTH_WEBSERVER_ALLOW_FROM to 127.0.0.1/32 or your Docker network
CIDR, and remove or limit the host port mapping "8081:8081" (or replace it with
a Docker-internal-only mapping) so the API is not broadly accessible from the
host network.

In `@internal/adapters/dns/powerdns.go`:
- Around line 64-66: The code constructs soaContent using nameservers[0] without
checking that nameservers is non-empty; add a guard early (before building
soaContent or soaRecord) that checks len(nameservers) > 0 and return a clear
error (including the zoneName) if it's empty so the function does not panic.
Update the code paths around the nameservers, soaContent and soaRecord creation
to rely on this validation and ensure any callers receive a descriptive error
instead of allowing an index out-of-range panic.

In `@internal/api/setup/router.go`:
- Around line 178-192: The DNS route group currently only applies
httputil.Auth(services.Identity, services.Tenant); update the DNS group to also
enforce tenant scoping by adding httputil.RequireTenant() and
httputil.TenantMember(services.Tenant) (or svcs.Tenant if that pattern is used
elsewhere) to the middleware chain for the dns router so handlers like
handlers.DNS.CreateZone, ListZones, GetZone, DeleteZone and record handlers are
protected the same way as instances, VPCs, storage and volumes.

In `@internal/core/services/dns.go`:
- Around line 75-79: The current check ignores the error from
s.repo.GetZoneByVPC, so DB failures are treated as “no zone” and allow
duplicates; change the logic to capture the returned error (e.g., err :=
s.repo.GetZoneByVPC(...)), then: if err != nil and the error is not a canonical
NotFound from the repo, return or wrap that error (do not proceed), and only
when the repo signals NotFound treat existing as nil and continue; otherwise if
existing != nil return the Conflict error as before. Refer to GetZoneByVPC, the
existing variable, and s.repo to locate where to adjust error handling.
- Around line 329-376: RegisterInstance currently treats any error from
s.repo.GetZoneByVPC as “no zone” and hides real failures and also doesn't
compensate if s.repo.CreateRecord fails after s.backend.AddRecords; change
RegisterInstance to explicitly check for a NotFound/ErrNotFound from
s.repo.GetZoneByVPC and only return nil in that case, otherwise return or log
the real error; after calling s.backend.AddRecords, if s.repo.CreateRecord
fails, attempt to roll back the DNS change by calling the backend removal
function (e.g., s.backend.RemoveRecords or DeleteRecords with the same
zone.PowerDNSID and recordSet), log both the DB error and any rollback error,
and return a wrapped error instead of silently continuing so callers know the
operation failed.
- Around line 104-114: The rollback and audit calls currently swallow errors
(using `_ =`) after s.repo.CreateZone fails and after zone creation; update the
handlers for s.backend.DeleteZone(ctx, powerdnsZone) and s.auditSvc.Log(ctx,
userID, "dns.zone.create", ...) to surface failures instead of discarding them:
capture their returned error values, log them using the service logger or wrap
them into the returned error (e.g., include rollback or audit error context when
returning from the function), and ensure you do not silently ignore failures
from DeleteZone or Log so callers/ops can observe and act on these errors.
- Around line 379-396: The UnregisterInstance function currently swallows errors
from s.repo.GetRecordsByInstance, s.repo.GetZoneByID, s.backend.DeleteRecords
and uses `_ =` for DeleteRecords; instead return or surface failures: make
DNSService.UnregisterInstance return the initial repo error if
GetRecordsByInstance fails, when iterating call s.repo.GetZoneByID and if it
errors log (or collect) the error and continue, call s.backend.DeleteRecords and
if it returns an error add it to an error slice (do not assign to `_`) and
continue, then after the loop call s.repo.DeleteRecordsByInstance and if any
prior or this final call errored return a wrapped/aggregated error; ensure you
reference and update calls to s.repo.GetRecordsByInstance, s.repo.GetZoneByID,
s.backend.DeleteRecords, and s.repo.DeleteRecordsByInstance accordingly.
- Around line 163-239: CreateRecord currently adds records to the backend via
s.backend.AddRecords and then persists to DB with s.repo.CreateRecord; if the DB
save fails the backend entry is left orphaned. Add a compensation step: after
s.backend.AddRecords succeeds but s.repo.CreateRecord returns an error, call the
inverse operation (e.g., s.backend.DeleteRecords or s.backend.RemoveRecordSet)
with the same zone.PowerDNSID and recordSet (or the created record identifier)
to rollback the backend, log both the original DB error and the compensation
result, and if the compensation fails, ensure the error returned wraps both
failures and/or enqueue a cleanup job/outbox entry so the orphan can be
reconciled asynchronously. Ensure this logic lives inside CreateRecord and
treats deletion errors non-panicking while surfacing enough context for
operators.

In `@internal/handlers/storage_handler.go`:
- Around line 34-36: The build fails because errInvalidRequestBody is referenced
in CreateBucket and GeneratePresignedURL but not defined; restore the constant
or replace its uses with the literal. Add a constant like errInvalidRequestBody
= "invalid request body" (next to errInvalidUploadID) or update the CreateBucket
and GeneratePresignedURL callers to inline "invalid request body" so the
references to errInvalidRequestBody are eliminated.

In `@internal/platform/config.go`:
- Around line 64-66: The config currently defaults PowerDNSAPIKey via getEnv to
"thecloud-dns-secret"; change the loading logic for PowerDNSAPIKey so that when
APP_ENV == "production" (read via getEnv("APP_ENV", ...)) the code does not
accept the default and instead returns an error or panics if PowerDNSAPIKey is
empty or equals the insecure default; update the PowerDNSAPIKey initialization
(the symbol PowerDNSAPIKey) to validate and fail fast in production rather than
silently using the known default.

In `@internal/repositories/postgres/dns_repo.go`:
- Around line 43-68: GetZoneByID currently queries by id only which allows
cross-tenant access; extract the tenant ID from the request context (the same
context key or helper your project uses for tenant/auth info), validate it's
present, and add tenant_id to the WHERE clause (e.g. "WHERE id = $1 AND
tenant_id = $2") and pass the tenantID as the additional query parameter; apply
the same pattern to the other affected methods (GetZoneByVPC, Update, Delete) so
all repo lookups/mutations include tenant scoping using the context-derived
tenant ID.
- Around line 186-205: GetRecordByID and other DNS record operations lack tenant
scoping; update each record lookup/mutation to join dns_zones and filter by
tenant_id to prevent cross-tenant access. For GetRecordByID (and the other
functions referenced), add a tenantID uuid.UUID parameter to the function
signature, change the SQL to JOIN dns_zones z ON z.id = dns_records.zone_id and
add AND z.tenant_id = $2 (or the appropriate placeholder) in the WHERE clause,
pass tenantID into r.db.QueryRow/Exec calls, and propagate the signature change
to all callers; apply the same join+tenant_id filter pattern to the other record
functions (list/update/delete) in this file.
🟡 Minor comments (11)
internal/repositories/docker/adapter.go-473-479 (1)

473-479: ⚠️ Potential issue | 🟡 Minor

Don’t ignore json.Marshal errors in the failure path.
Blank-identifier assignment violates the “no silent failures” guideline.

🛠️ Proposed fix
-	debugBytes, _ := encodingjson.Marshal(json.NetworkSettings)
-	return "", fmt.Errorf("no IP address found for container %s (Status: %s, Networks: %s) after retries", id, status, string(debugBytes))
+	debugBytes, marshalErr := encodingjson.Marshal(json.NetworkSettings)
+	if marshalErr != nil {
+		return "", fmt.Errorf("no IP address found for container %s (Status: %s, Networks: <marshal error: %v>) after retries", id, status, marshalErr)
+	}
+	return "", fmt.Errorf("no IP address found for container %s (Status: %s, Networks: %s) after retries", id, status, string(debugBytes))

As per coding guidelines, do not use silent failures - avoid blank identifier assignment like _ = someFunc().

tests/helpers_test.go-208-226 (1)

208-226: ⚠️ Potential issue | 🟡 Minor

Don’t ignore json.Marshal errors in putRequest.

Fail fast in tests to avoid sending malformed payloads.

✅ Safer error handling
-	if payload != nil {
-		b, _ := json.Marshal(payload)
-		body = bytes.NewBuffer(b)
-	}
+	if payload != nil {
+		b, err := json.Marshal(payload)
+		require.NoError(t, err)
+		body = bytes.NewBuffer(b)
+	}
As per coding guidelines: Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`.
go.mod-89-89 (1)

89-89: ⚠️ Potential issue | 🟡 Minor

Verify that your target PowerDNS version is compatible with v3.20.0.

The library targets PowerDNS Authoritative Server HTTP API v1 and supports PowerDNS 4.8, 4.9, and 5.0. If your deployment uses an older version, verify compatibility before upgrading. No security advisories were identified for the library itself.

.env.example-35-38 (1)

35-38: ⚠️ Potential issue | 🟡 Minor

Fix dotenv key ordering to satisfy linter.

The dotenv-linter warns that POWERDNS_API_KEY should appear before POWERDNS_API_URL.

✅ Suggested reorder
-POWERDNS_API_URL=http://localhost:8081
-POWERDNS_API_KEY=thecloud-dns-secret
+POWERDNS_API_KEY=thecloud-dns-secret
+POWERDNS_API_URL=http://localhost:8081
 POWERDNS_SERVER_ID=localhost
internal/adapters/dns/powerdns_test.go-27-29 (1)

27-29: ⚠️ Potential issue | 🟡 Minor

Add error handling for JSON decoding in test handlers.

The json.NewDecoder(r.Body).Decode(&reqBody) calls silently discard errors. If the request body is malformed or empty, the test could pass incorrectly.

🔧 Suggested fix
-var reqBody map[string]interface{}
-json.NewDecoder(r.Body).Decode(&reqBody)
-assert.Equal(t, testPDNSZone, reqBody["name"])
+var reqBody map[string]interface{}
+err := json.NewDecoder(r.Body).Decode(&reqBody)
+assert.NoError(t, err)
+assert.Equal(t, testPDNSZone, reqBody["name"])

Also applies to: 50-52

tests/dns_e2e_test.go-107-110 (1)

107-110: ⚠️ Potential issue | 🟡 Minor

Don’t ignore io.ReadAll errors in failure diagnostics.

If ReadAll fails, the failure message can be misleading.

✅ Safer error handling
-					body, _ := io.ReadAll(resp.Body)
+					body, err := io.ReadAll(resp.Body)
+					require.NoError(t, err)
As per coding guidelines: Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`.
tests/dns_isolation_e2e_test.go-70-73 (1)

70-73: ⚠️ Potential issue | 🟡 Minor

Avoid modulo-based suffixes to reduce name collisions.

With parallel E2E runs, %1000 can collide and make tests flaky. Prefer full UnixNano() or a UUID.

✅ Safer suffix
-		"name":       fmt.Sprintf("%s-%d", name, time.Now().UnixNano()%1000),
+		"name":       fmt.Sprintf("%s-%d", name, time.Now().UnixNano()),
tests/dns_instance_e2e_test.go-67-68 (1)

67-68: ⚠️ Potential issue | 🟡 Minor

Avoid modulo-based suffixes for instance names.

%1000 can collide across parallel tests. Prefer full UnixNano() or UUID.

✅ Safer suffix
-	instanceName := fmt.Sprintf("web-%d", time.Now().UnixNano()%1000)
+	instanceName := fmt.Sprintf("web-%d", time.Now().UnixNano())
tests/dns_e2e_test.go-26-48 (1)

26-48: ⚠️ Potential issue | 🟡 Minor

Avoid modulo-based suffixes for VPC/zone names.

The %1000000 / %1000 suffixes can collide in parallel runs. Prefer full UnixNano() or UUIDs.

✅ Safer suffix
-	vpcName := fmt.Sprintf("dns-e2e-vpc-%d", time.Now().UnixNano()%1000000)
+	vpcName := fmt.Sprintf("dns-e2e-vpc-%d", time.Now().UnixNano())
...
-	zoneName := fmt.Sprintf("e2e-%d.internal", time.Now().UnixNano()%1000)
+	zoneName := fmt.Sprintf("e2e-%d.internal", time.Now().UnixNano())
internal/handlers/dns_handler_test.go-35-58 (1)

35-58: ⚠️ Potential issue | 🟡 Minor

Handle JSON marshal/unmarshal errors instead of ignoring them.

Currently _ ignores errors and can hide issues when payloads change.

✅ Safer error handling
-		body, _ := json.Marshal(reqBody)
+		body, err := json.Marshal(reqBody)
+		require.NoError(t, err)
...
-		dataJSON, _ := json.Marshal(resp.Data)
+		dataJSON, err := json.Marshal(resp.Data)
+		require.NoError(t, err)
-		json.Unmarshal(dataJSON, &response)
+		require.NoError(t, json.Unmarshal(dataJSON, &response))
As per coding guidelines: Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`.
internal/core/services/dns.go-249-276 (1)

249-276: ⚠️ Potential issue | 🟡 Minor

Apply the same TTL bounds on update as on create.

Line 262 only checks > 0; values above maxTTL aren’t clamped. Reuse the min/max rules to keep data consistent.

🔧 Suggested fix
 	// Update fields
 	record.Content = content
-	if ttl > 0 {
-		record.TTL = ttl
-	}
+	if ttl > 0 {
+		if ttl < minTTL {
+			ttl = minTTL
+		}
+		if ttl > maxTTL {
+			ttl = maxTTL
+		}
+		record.TTL = ttl
+	}
🧹 Nitpick comments (16)
internal/repositories/redis/task_queue.go (1)

26-28: Remove commented-out logging or wire a logger.

These commented lines are dead code; either inject a logger and log intentionally or delete them to keep the repository clean.

internal/core/domain/jobs.go (1)

10-13: Consider marking TenantID as omitempty if it can be absent.

This avoids emitting a zero UUID for older queued jobs or optional tenant context and makes intent explicit.

♻️ Suggested update
-	TenantID   uuid.UUID          `json:"tenant_id"`
+	TenantID   uuid.UUID          `json:"tenant_id,omitempty"`
As per coding guidelines: Include `json` tags on struct fields with `omitempty` for optional fields.
internal/workers/provision_worker.go (1)

70-74: Replace the hard-coded timeout with a named constant.

This makes the timeout intent clearer and avoids magic numbers.

♻️ Suggested refactor
+const provisionTimeout = 10 * time.Minute
+
 func (w *ProvisionWorker) processJob(job domain.ProvisionJob) {
 	// Root context for background task with 10-minute safety timeout
 	// We use context.Background() because the worker lifecycle context shouldn't necessarily cancel active provisioning unless the app is shutting down
 	baseCtx := context.Background()
-	ctx, cancel := context.WithTimeout(baseCtx, 10*time.Minute)
+	ctx, cancel := context.WithTimeout(baseCtx, provisionTimeout)
As per coding guidelines: Do not use magic numbers - use named constants instead.
docker-compose.yml (1)

47-47: Pin the PowerDNS image tag to a specific version.

Using :latest makes builds non-reproducible. Pin to powerdns/pdns-auth-49:4.9.12 (the current stable release) or later versions as needed to ensure consistent builds.

internal/repositories/postgres/migrations/073_create_dns_zones.up.sql (2)

15-18: Redundant index on vpc_id.

The UNIQUE(vpc_id) constraint on line 15 implicitly creates an index. The explicit CREATE INDEX idx_dns_zones_vpc ON dns_zones(vpc_id) on line 18 is redundant and adds unnecessary overhead during writes.

♻️ Suggested fix
     UNIQUE(tenant_id, name),
     UNIQUE(vpc_id)  -- One zone per VPC
 );

-CREATE INDEX idx_dns_zones_vpc ON dns_zones(vpc_id);
 CREATE INDEX idx_dns_zones_tenant ON dns_zones(tenant_id);

10-10: Consider adding a UNIQUE constraint on powerdns_id.

The powerdns_id column stores the identifier from the PowerDNS backend. Without a unique constraint, duplicate values could cause data synchronization issues between your database and PowerDNS.

🛡️ Suggested addition
     powerdns_id VARCHAR(255) NOT NULL,
+    UNIQUE(powerdns_id),
internal/repositories/postgres/dns_repo_test.go (1)

15-121: Consider using table-driven tests for better coverage.

The tests provide basic happy-path coverage. Per coding guidelines, table-driven tests are preferred. Consider consolidating these into table-driven structures to easily add error scenarios (e.g., not found, constraint violations).

♻️ Example table-driven structure for GetZoneByID
func TestDNSRepositoryGetZoneByID(t *testing.T) {
	tests := []struct {
		name      string
		setupMock func(mock pgxmock.PgxPoolIface, id uuid.UUID)
		wantErr   bool
	}{
		{
			name: "success",
			setupMock: func(mock pgxmock.PgxPoolIface, id uuid.UUID) {
				rows := pgxmock.NewRows([]string{"id", "user_id", "tenant_id", "vpc_id", "name", "description", "status", "default_ttl", "powerdns_id", "created_at", "updated_at"}).
					AddRow(id, uuid.New(), uuid.New(), uuid.New(), "example.com", "desc", "ACTIVE", 300, "pdns-1", time.Now(), time.Now())
				mock.ExpectQuery("SELECT (.+) FROM dns_zones WHERE id = \\$1").
					WithArgs(id).
					WillReturnRows(rows)
			},
			wantErr: false,
		},
		{
			name: "not found",
			setupMock: func(mock pgxmock.PgxPoolIface, id uuid.UUID) {
				mock.ExpectQuery("SELECT (.+) FROM dns_zones WHERE id = \\$1").
					WithArgs(id).
					WillReturnError(pgx.ErrNoRows)
			},
			wantErr: true,
		},
	}
	// ... run tests
}

As per coding guidelines: "Use table-driven tests in test files".

internal/adapters/dns/powerdns_test.go (1)

21-91: Consider adding error response test cases.

The tests cover successful operations but not error scenarios (e.g., 4xx/5xx responses from PowerDNS). Adding tests for API error handling would improve coverage.

tests/dns_isolation_e2e_test.go (1)

25-32: Use camelCase for local IDs.

vpc_A_ID / zone_A_ID should be camelCase (vpcAID, zoneAID) for Go style consistency.
As per coding guidelines: Use camelCase for variable names (e.g., instanceRepo, vpcID).

internal/handlers/dns_handler_test.go (1)

26-265: Consider table-driven tests to cut duplication.

Many handlers repeat the same setup/expectations; table-driven cases would be more maintainable and align with project testing rules.
As per coding guidelines: Use table-driven tests in test files.

tests/dns_instance_e2e_test.go (1)

154-168: Replace fixed sleep with polling for DNS cleanup.

A hard-coded 2s delay can cause flaky tests if cleanup is slower. Poll with a timeout instead.

🔁 Example polling pattern
-		// Wait a bit for async cleanup
-		time.Sleep(2 * time.Second)
-
-		// Verify record is gone
-		getRecordsResp := getRequest(t, client, fmt.Sprintf("%s/dns/zones/%s/records", testutil.TestBaseURL, zoneID), token)
-		defer getRecordsResp.Body.Close()
-
-		var recordsRes struct {
-			Data []domain.DNSRecord `json:"data"`
-		}
-		require.NoError(t, json.NewDecoder(getRecordsResp.Body).Decode(&recordsRes))
-
-		for _, rec := range recordsRes.Data {
-			assert.NotEqual(t, instanceName, rec.Name, "DNS record should have been cleaned up")
-		}
+		deadline := time.Now().Add(30 * time.Second)
+		for time.Now().Before(deadline) {
+			getRecordsResp := getRequest(t, client, fmt.Sprintf("%s/dns/zones/%s/records", testutil.TestBaseURL, zoneID), token)
+			var recordsRes struct {
+				Data []domain.DNSRecord `json:"data"`
+			}
+			require.NoError(t, json.NewDecoder(getRecordsResp.Body).Decode(&recordsRes))
+			getRecordsResp.Body.Close()
+
+			stillPresent := false
+			for _, rec := range recordsRes.Data {
+				if rec.Name == instanceName {
+					stillPresent = true
+					break
+				}
+			}
+			if !stillPresent {
+				break
+			}
+			time.Sleep(1 * time.Second)
+		}
internal/core/services/dns_test.go (1)

62-210: Consider consolidating cases into table-driven tests.

These tests repeat setup/expectation patterns across success and error cases; table-driven structure would reduce duplication.
As per coding guidelines: Use table-driven tests in test files.

internal/adapters/dns/powerdns.go (2)

22-38: Add explicit HTTP timeouts for PowerDNS calls.

Line 28-31 builds a Resty client without a timeout; if callers don’t set a deadline, requests can hang. Consider wiring a configurable timeout or custom http.Client in the constructor.

💡 Suggested timeout wiring
 import (
 	"context"
 	"fmt"
 	"log/slog"
 	"strings"
+	"time"
@@
 )
 
+const powerDNSHTTPTimeout = 10 * time.Second
+
@@
 	restyClient := resty.New().
 		SetBaseURL(strings.TrimSuffix(apiURL, "/")+"/api/v1").
 		SetHeader("X-API-Key", apiKey).
-		SetHeader("Content-Type", "application/json")
+		SetHeader("Content-Type", "application/json").
+		SetTimeout(powerDNSHTTPTimeout)

64-70: Extract SOA timing values into named constants.

Lines 65-70 embed DNS timing values and TTLs; naming them improves clarity and makes tuning safer.

♻️ Suggested refactor
-	soaContent := fmt.Sprintf("%s hostmaster.%s 1 10800 3600 604800 3600", nameservers[0], zoneName)
+	const (
+		soaSerial  = 1
+		soaRefresh = 10800
+		soaRetry   = 3600
+		soaExpire  = 604800
+		soaMinimum = 3600
+		soaTTL     = 3600
+	)
+	soaContent := fmt.Sprintf(
+		"%s hostmaster.%s %d %d %d %d %d",
+		nameservers[0], zoneName, soaSerial, soaRefresh, soaRetry, soaExpire, soaMinimum,
+	)
@@
-		TTL:     3600,
+		TTL:     soaTTL,
 	}

As per coding guidelines Do not use magic numbers - use named constants instead.

internal/handlers/dns_handler.go (1)

98-114: Use httputil.Success for consistent success responses.

Line 113 returns c.Status(204). If the API standard requires the shared response envelope, consider using httputil.Success (or extend it to support empty bodies) for consistency.

As per coding guidelines Use httputil.Success(c, http.StatusOK, data) for successful handler responses.

internal/core/services/dns.go (1)

25-56: Avoid package-level global state for the zone-name regex.

Line 25 declares a package-level var. To follow the no-globals rule and keep dependencies localized, consider moving the compiled regex into DNSService and initializing it in the constructor.

♻️ Suggested refactor
 type DNSService struct {
 	repo     ports.DNSRepository
 	backend  ports.DNSBackend
 	vpcRepo  ports.VpcRepository
 	auditSvc ports.AuditService
 	eventSvc ports.EventService
 	logger   *slog.Logger
+	zoneNameRegex *regexp.Regexp
 }
@@
 func NewDNSService(params DNSServiceParams) *DNSService {
 	return &DNSService{
 		repo:     params.Repo,
 		backend:  params.Backend,
 		vpcRepo:  params.VpcRepo,
 		auditSvc: params.AuditSvc,
 		eventSvc: params.EventSvc,
 		logger:   params.Logger,
+		zoneNameRegex: regexp.MustCompile(`^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$`),
 	}
 }

Then update CreateZone to use s.zoneNameRegex.MatchString(...).

As per coding guidelines Do not use global variables (e.g., var DB *sql.DB).

Comment thread docker-compose.yml
Comment on lines +50 to +60
ports:
- "5354:53/udp" # DNS queries (non-standard to avoid conflict)
- "5354:53/tcp"
- "8081:8081" # REST API
environment:
- PDNS_AUTH_API=yes
- PDNS_AUTH_API_KEY=${POWERDNS_API_KEY:-thecloud-dns-secret}
- PDNS_AUTH_WEBSERVER=yes
- PDNS_AUTH_WEBSERVER_ADDRESS=0.0.0.0
- PDNS_AUTH_WEBSERVER_PORT=8081
- PDNS_AUTH_WEBSERVER_ALLOW_FROM=0.0.0.0/0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Lock down the PowerDNS API exposure.

The REST API is published on the host and PDNS_AUTH_WEBSERVER_ALLOW_FROM=0.0.0.0/0, which exposes the admin API broadly. Consider binding to localhost and narrowing allow-from to the Docker network or 127.0.0.1.

🔒 Safer local-only exposure
-      - "8081:8081" # REST API
+      - "127.0.0.1:8081:8081" # REST API (local-only)
@@
-      - PDNS_AUTH_WEBSERVER_ALLOW_FROM=0.0.0.0/0
+      - PDNS_AUTH_WEBSERVER_ALLOW_FROM=127.0.0.1/32
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ports:
- "5354:53/udp" # DNS queries (non-standard to avoid conflict)
- "5354:53/tcp"
- "8081:8081" # REST API
environment:
- PDNS_AUTH_API=yes
- PDNS_AUTH_API_KEY=${POWERDNS_API_KEY:-thecloud-dns-secret}
- PDNS_AUTH_WEBSERVER=yes
- PDNS_AUTH_WEBSERVER_ADDRESS=0.0.0.0
- PDNS_AUTH_WEBSERVER_PORT=8081
- PDNS_AUTH_WEBSERVER_ALLOW_FROM=0.0.0.0/0
ports:
- "5354:53/udp" # DNS queries (non-standard to avoid conflict)
- "5354:53/tcp"
- "127.0.0.1:8081:8081" # REST API (local-only)
environment:
- PDNS_AUTH_API=yes
- PDNS_AUTH_API_KEY=${POWERDNS_API_KEY:-thecloud-dns-secret}
- PDNS_AUTH_WEBSERVER=yes
- PDNS_AUTH_WEBSERVER_ADDRESS=0.0.0.0
- PDNS_AUTH_WEBSERVER_PORT=8081
- PDNS_AUTH_WEBSERVER_ALLOW_FROM=127.0.0.1/32
🤖 Prompt for AI Agents
In `@docker-compose.yml` around lines 50 - 60, The PowerDNS admin REST API is
exposed to the host via the "8081:8081" port mapping and
PDNS_AUTH_WEBSERVER_ALLOW_FROM=0.0.0.0/0; change PDNS_AUTH_WEBSERVER_ADDRESS to
127.0.0.1 (or the service's internal IP) and restrict
PDNS_AUTH_WEBSERVER_ALLOW_FROM to 127.0.0.1/32 or your Docker network CIDR, and
remove or limit the host port mapping "8081:8081" (or replace it with a
Docker-internal-only mapping) so the API is not broadly accessible from the host
network.

Comment on lines +64 to +66
// Add an initial SOA record to satisfy PowerDNS requirements for Native zones
soaContent := fmt.Sprintf("%s hostmaster.%s 1 10800 3600 604800 3600", nameservers[0], zoneName)
soaRecord := ports.RecordSet{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against empty nameserver slices.

Line 65 indexes nameservers[0] without checking length; an empty slice will panic. Validate and return a clear error.

🐛 Proposed fix
 	// Add an initial SOA record to satisfy PowerDNS requirements for Native zones
+	if len(nameservers) == 0 {
+		return fmt.Errorf("at least one nameserver is required to build SOA")
+	}
 	soaContent := fmt.Sprintf("%s hostmaster.%s 1 10800 3600 604800 3600", nameservers[0], zoneName)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Add an initial SOA record to satisfy PowerDNS requirements for Native zones
soaContent := fmt.Sprintf("%s hostmaster.%s 1 10800 3600 604800 3600", nameservers[0], zoneName)
soaRecord := ports.RecordSet{
// Add an initial SOA record to satisfy PowerDNS requirements for Native zones
if len(nameservers) == 0 {
return fmt.Errorf("at least one nameserver is required to build SOA")
}
soaContent := fmt.Sprintf("%s hostmaster.%s 1 10800 3600 604800 3600", nameservers[0], zoneName)
soaRecord := ports.RecordSet{
🤖 Prompt for AI Agents
In `@internal/adapters/dns/powerdns.go` around lines 64 - 66, The code constructs
soaContent using nameservers[0] without checking that nameservers is non-empty;
add a guard early (before building soaContent or soaRecord) that checks
len(nameservers) > 0 and return a clear error (including the zoneName) if it's
empty so the function does not panic. Update the code paths around the
nameservers, soaContent and soaRecord creation to rely on this validation and
ensure any callers receive a descriptive error instead of allowing an index
out-of-range panic.

Comment thread internal/api/setup/router.go
Comment on lines +75 to +79
// 3. Check if zone already exists for this VPC
existing, _ := s.repo.GetZoneByVPC(ctx, vpcID)
if existing != nil {
return nil, errors.New(errors.Conflict, "zone already exists for this VPC")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t ignore repository errors when checking for an existing zone.

Line 76 discards GetZoneByVPC errors; DB failures will be treated as “no zone,” enabling duplicate zone creation. Handle errors explicitly and only proceed on a genuine NotFound.

🤖 Prompt for AI Agents
In `@internal/core/services/dns.go` around lines 75 - 79, The current check
ignores the error from s.repo.GetZoneByVPC, so DB failures are treated as “no
zone” and allow duplicates; change the logic to capture the returned error
(e.g., err := s.repo.GetZoneByVPC(...)), then: if err != nil and the error is
not a canonical NotFound from the repo, return or wrap that error (do not
proceed), and only when the repo signals NotFound treat existing as nil and
continue; otherwise if existing != nil return the Conflict error as before.
Refer to GetZoneByVPC, the existing variable, and s.repo to locate where to
adjust error handling.

Comment on lines +104 to +114
if err := s.repo.CreateZone(ctx, zone); err != nil {
// Rollback PowerDNS
_ = s.backend.DeleteZone(ctx, powerdnsZone)
return nil, errors.Wrap(errors.Internal, "failed to save zone", err)
}

// 6. Audit log
_ = s.auditSvc.Log(ctx, userID, "dns.zone.create", "dns_zone", zone.ID.String(), map[string]interface{}{
"name": name,
"vpc_id": vpcID.String(),
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle rollback/audit errors instead of dropping them.

Lines 105-114 use _ = to discard rollback and audit failures, making cleanup/auditing invisible. Log or propagate these errors.

🛠️ Suggested handling
 	if err := s.repo.CreateZone(ctx, zone); err != nil {
 		// Rollback PowerDNS
-		_ = s.backend.DeleteZone(ctx, powerdnsZone)
+		if rbErr := s.backend.DeleteZone(ctx, powerdnsZone); rbErr != nil {
+			s.logger.Error("failed to rollback PowerDNS zone", "zone", powerdnsZone, "error", rbErr)
+		}
 		return nil, errors.Wrap(errors.Internal, "failed to save zone", err)
 	}
@@
-	_ = s.auditSvc.Log(ctx, userID, "dns.zone.create", "dns_zone", zone.ID.String(), map[string]interface{}{
+	if err := s.auditSvc.Log(ctx, userID, "dns.zone.create", "dns_zone", zone.ID.String(), map[string]interface{}{
 		"name":   name,
 		"vpc_id": vpcID.String(),
-	})
+	}); err != nil {
+		s.logger.Warn("failed to write audit log for DNS zone create", "zone_id", zone.ID.String(), "error", err)
+	}

As per coding guidelines Do not use silent failures - avoid blank identifier assignment like _ = someFunc().

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := s.repo.CreateZone(ctx, zone); err != nil {
// Rollback PowerDNS
_ = s.backend.DeleteZone(ctx, powerdnsZone)
return nil, errors.Wrap(errors.Internal, "failed to save zone", err)
}
// 6. Audit log
_ = s.auditSvc.Log(ctx, userID, "dns.zone.create", "dns_zone", zone.ID.String(), map[string]interface{}{
"name": name,
"vpc_id": vpcID.String(),
})
if err := s.repo.CreateZone(ctx, zone); err != nil {
// Rollback PowerDNS
if rbErr := s.backend.DeleteZone(ctx, powerdnsZone); rbErr != nil {
s.logger.Error("failed to rollback PowerDNS zone", "zone", powerdnsZone, "error", rbErr)
}
return nil, errors.Wrap(errors.Internal, "failed to save zone", err)
}
// 6. Audit log
if err := s.auditSvc.Log(ctx, userID, "dns.zone.create", "dns_zone", zone.ID.String(), map[string]interface{}{
"name": name,
"vpc_id": vpcID.String(),
}); err != nil {
s.logger.Warn("failed to write audit log for DNS zone create", "zone_id", zone.ID.String(), "error", err)
}
🤖 Prompt for AI Agents
In `@internal/core/services/dns.go` around lines 104 - 114, The rollback and audit
calls currently swallow errors (using `_ =`) after s.repo.CreateZone fails and
after zone creation; update the handlers for s.backend.DeleteZone(ctx,
powerdnsZone) and s.auditSvc.Log(ctx, userID, "dns.zone.create", ...) to surface
failures instead of discarding them: capture their returned error values, log
them using the service logger or wrap them into the returned error (e.g.,
include rollback or audit error context when returning from the function), and
ensure you do not silently ignore failures from DeleteZone or Log so callers/ops
can observe and act on these errors.

Comment on lines +379 to +396
func (s *DNSService) UnregisterInstance(ctx context.Context, instanceID uuid.UUID) error {
records, err := s.repo.GetRecordsByInstance(ctx, instanceID)
if err != nil || len(records) == 0 {
return nil // No records to remove
}

for _, record := range records {
zone, err := s.repo.GetZoneByID(ctx, record.ZoneID)
if err != nil {
continue
}

fqdn := fmt.Sprintf("%s.%s.", record.Name, zone.Name)
_ = s.backend.DeleteRecords(ctx, zone.PowerDNSID, fqdn, string(record.Type))
}

return s.repo.DeleteRecordsByInstance(ctx, instanceID)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently drop errors when unregistering instance records.

Lines 381-392 ignore repo/backend errors (including _ =), which can leave orphaned DNS records. Propagate or at least log failures so cleanup is observable.

🛠️ Suggested handling
 	records, err := s.repo.GetRecordsByInstance(ctx, instanceID)
-	if err != nil || len(records) == 0 {
-		return nil // No records to remove
-	}
+	if err != nil {
+		return errors.Wrap(errors.Internal, "failed to load instance DNS records", err)
+	}
+	if len(records) == 0 {
+		return nil
+	}
@@
-		_ = s.backend.DeleteRecords(ctx, zone.PowerDNSID, fqdn, string(record.Type))
+		if delErr := s.backend.DeleteRecords(ctx, zone.PowerDNSID, fqdn, string(record.Type)); delErr != nil {
+			s.logger.Warn("failed to delete DNS record in backend", "record_id", record.ID, "error", delErr)
+		}
 	}

As per coding guidelines Do not use silent failures - avoid blank identifier assignment like _ = someFunc().

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *DNSService) UnregisterInstance(ctx context.Context, instanceID uuid.UUID) error {
records, err := s.repo.GetRecordsByInstance(ctx, instanceID)
if err != nil || len(records) == 0 {
return nil // No records to remove
}
for _, record := range records {
zone, err := s.repo.GetZoneByID(ctx, record.ZoneID)
if err != nil {
continue
}
fqdn := fmt.Sprintf("%s.%s.", record.Name, zone.Name)
_ = s.backend.DeleteRecords(ctx, zone.PowerDNSID, fqdn, string(record.Type))
}
return s.repo.DeleteRecordsByInstance(ctx, instanceID)
}
func (s *DNSService) UnregisterInstance(ctx context.Context, instanceID uuid.UUID) error {
records, err := s.repo.GetRecordsByInstance(ctx, instanceID)
if err != nil {
return errors.Wrap(errors.Internal, "failed to load instance DNS records", err)
}
if len(records) == 0 {
return nil
}
for _, record := range records {
zone, err := s.repo.GetZoneByID(ctx, record.ZoneID)
if err != nil {
continue
}
fqdn := fmt.Sprintf("%s.%s.", record.Name, zone.Name)
if delErr := s.backend.DeleteRecords(ctx, zone.PowerDNSID, fqdn, string(record.Type)); delErr != nil {
s.logger.Warn("failed to delete DNS record in backend", "record_id", record.ID, "error", delErr)
}
}
return s.repo.DeleteRecordsByInstance(ctx, instanceID)
}
🤖 Prompt for AI Agents
In `@internal/core/services/dns.go` around lines 379 - 396, The UnregisterInstance
function currently swallows errors from s.repo.GetRecordsByInstance,
s.repo.GetZoneByID, s.backend.DeleteRecords and uses `_ =` for DeleteRecords;
instead return or surface failures: make DNSService.UnregisterInstance return
the initial repo error if GetRecordsByInstance fails, when iterating call
s.repo.GetZoneByID and if it errors log (or collect) the error and continue,
call s.backend.DeleteRecords and if it returns an error add it to an error slice
(do not assign to `_`) and continue, then after the loop call
s.repo.DeleteRecordsByInstance and if any prior or this final call errored
return a wrapped/aggregated error; ensure you reference and update calls to
s.repo.GetRecordsByInstance, s.repo.GetZoneByID, s.backend.DeleteRecords, and
s.repo.DeleteRecordsByInstance accordingly.

Comment on lines 34 to 36
const (
errInvalidUploadID = "invalid upload id"
errInvalidRequestBody = "invalid request body"
errInvalidUploadID = "invalid upload id"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Build breaks: errInvalidRequestBody is referenced but undefined.

CreateBucket and GeneratePresignedURL still use it, so this won’t compile. Reintroduce the constant or inline the string.

🛠️ Minimal fix
 const (
+	errInvalidRequestBody = "invalid request body"
 	errInvalidUploadID = "invalid upload id"
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const (
errInvalidUploadID = "invalid upload id"
errInvalidRequestBody = "invalid request body"
errInvalidUploadID = "invalid upload id"
)
const (
errInvalidRequestBody = "invalid request body"
errInvalidUploadID = "invalid upload id"
)
🤖 Prompt for AI Agents
In `@internal/handlers/storage_handler.go` around lines 34 - 36, The build fails
because errInvalidRequestBody is referenced in CreateBucket and
GeneratePresignedURL but not defined; restore the constant or replace its uses
with the literal. Add a constant like errInvalidRequestBody = "invalid request
body" (next to errInvalidUploadID) or update the CreateBucket and
GeneratePresignedURL callers to inline "invalid request body" so the references
to errInvalidRequestBody are eliminated.

Comment on lines +64 to +66
PowerDNSAPIURL: getEnv("POWERDNS_API_URL", "http://localhost:8081"),
PowerDNSAPIKey: getEnv("POWERDNS_API_KEY", "thecloud-dns-secret"),
PowerDNSServerID: getEnv("POWERDNS_SERVER_ID", "localhost"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Require an explicit PowerDNS API key in production.

Defaulting to a known key (thecloud-dns-secret) risks accidental production exposure if env vars aren’t set. Consider forcing a non-empty key (and rejecting defaults) when APP_ENV=production.

🔐 Suggested adjustment
-	return &Config{
+	cfg := &Config{
 		Port:                 getEnv("PORT", "8080"),
 		DatabaseURL:          getEnv("DATABASE_URL", "postgres://cloud:cloud@localhost:5433/thecloud"),
@@
-		PowerDNSAPIKey:     getEnv("POWERDNS_API_KEY", "thecloud-dns-secret"),
+		PowerDNSAPIKey:     getEnv("POWERDNS_API_KEY", ""),
 		PowerDNSServerID:   getEnv("POWERDNS_SERVER_ID", "localhost"),
-	}, nil
+	}
+	if cfg.Environment == "production" && cfg.PowerDNSAPIKey == "" {
+		return nil, fmt.Errorf("POWERDNS_API_KEY must be set in production")
+	}
+	return cfg, nil
🤖 Prompt for AI Agents
In `@internal/platform/config.go` around lines 64 - 66, The config currently
defaults PowerDNSAPIKey via getEnv to "thecloud-dns-secret"; change the loading
logic for PowerDNSAPIKey so that when APP_ENV == "production" (read via
getEnv("APP_ENV", ...)) the code does not accept the default and instead returns
an error or panics if PowerDNSAPIKey is empty or equals the insecure default;
update the PowerDNSAPIKey initialization (the symbol PowerDNSAPIKey) to validate
and fail fast in production rather than silently using the known default.

Comment on lines +43 to +68
func (r *DNSRepository) GetZoneByID(ctx context.Context, id uuid.UUID) (*domain.DNSZone, error) {
query := `
SELECT id, user_id, tenant_id, vpc_id, name, description,
status, default_ttl, powerdns_id, created_at, updated_at
FROM dns_zones
WHERE id = $1
`
// Typically we might also filter by tenant_id for security, but the service later should handle authz check or context-based filtering?
// Existing repos often just get by ID. However, let's verify if we need to respect tenant scoping here.
// For GetByID, usually we fetch the raw object and service checks permission.
// But let's check if the pattern is to include tenant check.
// Checking the implementation plan, it just says "Standard PostgreSQL repository".

var z domain.DNSZone
err := r.db.QueryRow(ctx, query, id).Scan(
&z.ID, &z.UserID, &z.TenantID, &z.VpcID, &z.Name, &z.Description,
&z.Status, &z.DefaultTTL, &z.PowerDNSID, &z.CreatedAt, &z.UpdatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New(errors.NotFound, "dns zone not found")
}
return nil, errors.Wrap(errors.Internal, "failed to get dns zone", err)
}
return &z, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce tenant scoping in zone lookups and mutations.

These queries operate by ID/VPC without tenant_id filtering. With the current service usage, this enables cross-tenant access by guessing IDs. Use tenant_id from context in WHERE for Get/Update/Delete (and GetZoneByVPC).

🛡️ Example fix for GetZoneByID (apply similar filters elsewhere)
 func (r *DNSRepository) GetZoneByID(ctx context.Context, id uuid.UUID) (*domain.DNSZone, error) {
+	tenantID := appcontext.TenantIDFromContext(ctx)
 	query := `
 		SELECT id, user_id, tenant_id, vpc_id, name, description, 
 		       status, default_ttl, powerdns_id, created_at, updated_at
 		FROM dns_zones
-		WHERE id = $1
+		WHERE id = $1 AND tenant_id = $2
 	`
@@
-	err := r.db.QueryRow(ctx, query, id).Scan(
+	err := r.db.QueryRow(ctx, query, id, tenantID).Scan(

Also applies to: 92-111, 142-156, 158-165

🤖 Prompt for AI Agents
In `@internal/repositories/postgres/dns_repo.go` around lines 43 - 68, GetZoneByID
currently queries by id only which allows cross-tenant access; extract the
tenant ID from the request context (the same context key or helper your project
uses for tenant/auth info), validate it's present, and add tenant_id to the
WHERE clause (e.g. "WHERE id = $1 AND tenant_id = $2") and pass the tenantID as
the additional query parameter; apply the same pattern to the other affected
methods (GetZoneByVPC, Update, Delete) so all repo lookups/mutations include
tenant scoping using the context-derived tenant ID.

Comment on lines +186 to +205
func (r *DNSRepository) GetRecordByID(ctx context.Context, id uuid.UUID) (*domain.DNSRecord, error) {
query := `
SELECT id, zone_id, name, type, content, ttl, priority,
disabled, auto_managed, instance_id, created_at, updated_at
FROM dns_records
WHERE id = $1
`
var rec domain.DNSRecord
err := r.db.QueryRow(ctx, query, id).Scan(
&rec.ID, &rec.ZoneID, &rec.Name, &rec.Type, &rec.Content, &rec.TTL, &rec.Priority,
&rec.Disabled, &rec.AutoManaged, &rec.InstanceID, &rec.CreatedAt, &rec.UpdatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New(errors.NotFound, "dns record not found")
}
return nil, errors.Wrap(errors.Internal, "failed to get dns record", err)
}
return &rec, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tenant scoping is missing in record lookups/mutations.

Record queries don’t filter by tenant. Join dns_zones on zone_id and apply tenant_id = $X to prevent cross-tenant reads/updates/deletes.

🛡️ Example fix for GetRecordByID (apply similarly to list/update/delete)
 func (r *DNSRepository) GetRecordByID(ctx context.Context, id uuid.UUID) (*domain.DNSRecord, error) {
+	tenantID := appcontext.TenantIDFromContext(ctx)
 	query := `
-		SELECT id, zone_id, name, type, content, ttl, priority, 
-		       disabled, auto_managed, instance_id, created_at, updated_at
-		FROM dns_records
-		WHERE id = $1
+		SELECT r.id, r.zone_id, r.name, r.type, r.content, r.ttl, r.priority,
+		       r.disabled, r.auto_managed, r.instance_id, r.created_at, r.updated_at
+		FROM dns_records r
+		JOIN dns_zones z ON z.id = r.zone_id
+		WHERE r.id = $1 AND z.tenant_id = $2
 	`
@@
-	err := r.db.QueryRow(ctx, query, id).Scan(
+	err := r.db.QueryRow(ctx, query, id, tenantID).Scan(

Also applies to: 207-233, 235-260, 262-276, 278-285, 287-294

🤖 Prompt for AI Agents
In `@internal/repositories/postgres/dns_repo.go` around lines 186 - 205,
GetRecordByID and other DNS record operations lack tenant scoping; update each
record lookup/mutation to join dns_zones and filter by tenant_id to prevent
cross-tenant access. For GetRecordByID (and the other functions referenced), add
a tenantID uuid.UUID parameter to the function signature, change the SQL to JOIN
dns_zones z ON z.id = dns_records.zone_id and add AND z.tenant_id = $2 (or the
appropriate placeholder) in the WHERE clause, pass tenantID into
r.db.QueryRow/Exec calls, and propagate the signature change to all callers;
apply the same join+tenant_id filter pattern to the other record functions
(list/update/delete) in this file.

Copilot AI review requested due to automatic review settings January 30, 2026 10:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 47 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/guides/dns.md

CloudDNS is a highly available and scalable Domain Name System (DNS) web service. It provides a reliable and cost-effective way to route users to internet applications by translating names like `www.example.com` into the numeric IP addresses like `192.0.2.1` that computers use to connect to each other.

The Cloud's DNA service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "DNA service" should be "DNS service".

Suggested change
The Cloud's DNA service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.
The Cloud's DNS service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +235
// Rollback backend? Difficult with records.
s.logger.Error("failed to create record in DB", "error", err)
return nil, errors.Wrap(errors.Internal, "failed to save record", err)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When CreateRecord fails to save to the database after successfully adding to PowerDNS (line 232), there's no attempt to rollback the PowerDNS record. This leaves an orphaned record in PowerDNS that's not tracked in the database. Consider either implementing a rollback mechanism or documenting this limitation. The comment at line 233 acknowledges this issue but doesn't address it.

Suggested change
// Rollback backend? Difficult with records.
s.logger.Error("failed to create record in DB", "error", err)
return nil, errors.Wrap(errors.Internal, "failed to save record", err)
// At this point the record was already created in the DNS backend via AddRecords above.
// We do not currently have a reliable way to roll back that change here, so an orphaned
// record may remain in the DNS backend without a corresponding DB entry. Operators may
// need to remove it manually using the information logged below.
s.logger.Error("failed to create DNS record in DB; record may still exist in DNS backend",
"error", err,
"zone_id", zoneID,
"zone_powerdns_id", zone.PowerDNSID,
"fqdn", fqdn,
"record_type", recordType,
)
return nil, errors.Wrap(
errors.Internal,
"failed to save DNS record; DNS backend may still contain the record",
err,
)

Copilot uses AI. Check for mistakes.
Comment thread internal/repositories/docker/adapter.go Outdated
return net.IPAddress, nil
// Check status
if json.State != nil {
// fmt.Printf("Container status: %s\n", json.State.Status)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out fmt.Printf statement for container status should be removed. Debug logging should use the logger infrastructure if needed.

Copilot uses AI. Check for mistakes.
Comment thread docs/FEATURES.md

### 6. Managed Caches (CloudCache) 🆕
- **Multi-Engine Support**: PostgreSQL and MySQL with configurable versions.
- **Provisioning**: Spawns Docker containers using official images (`postgres:<version>-alpine`, `mysql:<version>`).
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment/reference: "mysql:debet" should likely be "mysql:debian" or another valid MySQL image tag.

Suggested change
- **Provisioning**: Spawns Docker containers using official images (`postgres:<version>-alpine`, `mysql:<version>`).
- **Provisioning**: Spawns Docker containers using official images (`postgres:<version>-alpine`, `mysql:<version>-debian`).

Copilot uses AI. Check for mistakes.
Comment thread tests/dns_instance_e2e_test.go Outdated
t.Run("VerifyAutoDNSCreation", func(t *testing.T) {
privateIP := waitForInstanceRunning(t, client, token, instanceID)
if privateIP == "" {
t.Skip("Instance did... skipping DNS verification")
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment text "Instance did..." appears to be incomplete. It should likely read "Instance did not start" or similar complete message.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@cmd/cloud/dns.go`:
- Around line 33-36: The json.MarshalIndent call for the --json branch currently
ignores errors (e.g., data, _ := json.MarshalIndent(zones, "", "  ")); change
both occurrences that call json.MarshalIndent (the one that marshals the "zones"
variable and the other instance around lines 117-120) to capture the error,
handle it (e.g., if err != nil { fmt.Fprintf(os.Stderr, "failed to marshal JSON:
%v\n", err); os.Exit(1) }), and only print the marshaled data when err == nil;
remove the use of the blank identifier and ensure you import/use os for non-zero
exit.
- Around line 64-66: The flag parsing calls currently ignore errors (desc,
vpcStr := cmd.Flags().GetString("description"/"vpc-id") with the error assigned
to _); update the create-zone command to check the returned error from
cmd.Flags().GetString for both "description" and "vpc-id" (and any other flag
gets), and handle failures instead of discarding them: if this command is
implemented as a RunE, return the error; otherwise print a clear error via the
command logger (or cmd.PrintErrf) and exit non-zero. Ensure you reference the
variables desc and vpcStr and fail-fast on flag parse errors to avoid silent
failures.
- Around line 161-166: The flag parsing calls for the create-record command
currently ignore returned errors; capture the error from each
cmd.Flags().GetString/GetInt call (for name, type, content, ttl, priority) and
handle them instead of using the blank identifier—e.g., if err != nil return
fmt.Errorf("parsing %s flag: %w", "<flag-name>", err) or print a user-facing
error and exit; update the create-record handler to propagate or report these
parsing errors so they are surfaced to the user.
- Around line 203-210: The flag setup currently ignores errors from
MarkFlagRequired and embeds magic defaults; update dnsCreateRecordCmd flag
registration to check and handle returned errors (e.g., err :=
dnsCreateRecordCmd.MarkFlagRequired("name"); if err != nil { log.Fatalf("failed
to mark flag required: %v", err) } and similarly for "content") instead of
discarding them, and move literal defaults into named constants (for example
DEFAULT_TTL, DEFAULT_TYPE, DEFAULT_PRIORITY) and use those constants in the
dnsCreateRecordCmd.Flags().String/Int calls to replace magic values; refer to
dnsCreateRecordCmd, MarkFlagRequired, and the Flags().String/Int calls to locate
and change the code.

In `@docs/guides/dns.md`:
- Around line 66-71: The docs list two CLI commands that don't exist: "cloud dns
get-zone" and "cloud dns update-record"; either implement these commands in the
CLI or remove them from the guide. To fix, open cmd/cloud/dns.go and either add
command registrations and handlers for "get-zone" (e.g., a
GetZoneCmd/GetZoneHandler that looks up a zone by ID and prints details) and
"update-record" (e.g., UpdateRecordCmd/UpdateRecordHandler that accepts
record-id and patch fields), wiring them into the root "cloud dns" command, or
remove the two lines from docs/guides/dns.md so the guide only lists commands
actually registered (match existing symbols like ListZonesCmd, CreateRecordCmd,
DeleteZoneCmd, ListRecordsCmd, etc.).
- Around line 12-17: The "Supported Record Types" section triggers markdownlint
MD007/MD060 due to nested list formatting; replace the nested indented bullets
with the repo-preferred flat list or a simple markdown table (e.g., a top-level
bullet per record type with a short description or a two-column table) and
ensure blank lines before/after the list/table to satisfy MD007; apply the same
reformat to the other occurrence referenced (the similar record-types block
around lines 62-64).
- Line 5: Replace the typo "DNA" with "DNS" in the sentence that currently reads
"The Cloud's DNA service is powered by **PowerDNS** and is fully integrated with
our VPC and Compute services." so it correctly says "The Cloud's DNS
service..."; update the text in the docs/guides/dns.md content where that exact
sentence appears (look for the phrase "The Cloud's DNA service") to fix the
user-facing service name.

In `@docs/services/cloud-dns.md`:
- Around line 32-36: The table for PowerDNS variables (POWERDNS_API_URL,
POWERDNS_API_KEY, POWERDNS_SERVER_ID) is misformatted per markdownlint MD060;
reformat the markdown table to the repo's preferred style by aligning pipes with
single leading and trailing spaces for each cell and ensuring the separator row
uses consistent dashes (e.g., | Variable | Description | Default | followed by
|---|---|---|) so the table renders and lints correctly.

In `@docs/swagger/swagger.yaml`:
- Around line 633-648: The swagger enum domain.RecordType currently lists SRV
but the service layer (dns.go handling SRV) does not support full SRV structure;
either remove SRV from domain.RecordType enum and any validation/handlers that
expect SRV, or expand the API model to properly represent SRV records (replace
the single Content string with explicit fields weight:int, port:int,
target:string and keep priority:int), then update the service layer function
that formats/consumes SRV records (the code path in
internal/core/services/dns.go that references SRV and the record serialization
logic) to map the new fields into the required SRV wire/string format and adjust
validation/creation/update handlers accordingly. Ensure domain.RecordType enum,
request/response schemas, and the SRV-handling logic in dns.go remain
consistent.

In `@internal/handlers/dns_handler.go`:
- Around line 14-17: The constants errInvalidRecordID and errInvalidZoneID are
currently unused; either remove them or wire them into parseUUID's error
handling so they are returned when parsing fails for record or zone IDs. Update
the parseUUID function (or the call sites that validate UUIDs) to return the
appropriate constant (errInvalidRecordID for record parsing and errInvalidZoneID
for zone parsing) instead of generic errors, or delete both constants if you
prefer generic errors to keep the package clean.
- Around line 3-11: The handler currently imports core/domain types (e.g.,
RecordType and any response DTOs) which violates the rule that handlers should
depend only on Services; move the RecordType/response DTO definitions out of
core/domain into the service boundary (ports) or create handler-specific DTOs
and mapping functions, then update the handler code in dns_handler.go to import
only ports and use the ports-facing types (for example ports.DNSService method
signatures and newly moved Port DTOs) or map port DTOs to handler DTOs before
returning responses; ensure no direct references to domain.RecordType remain in
the handler.

In `@pkg/sdk/dns.go`:
- Around line 10-17: The DNS SDK methods are missing context support; update the
HTTP helper methods in Client (get, post, put, patch, delete in
pkg/sdk/client.go) to accept a context.Context as their first parameter and call
req.SetContext(ctx) on the resty request before executing it, then change all
DNS methods to accept ctx context.Context as their first parameter and pass that
ctx into the updated client methods: ListDNSZones(ctx context.Context),
CreateDNSZone(ctx context.Context,...), GetDNSZone(ctx context.Context,...),
DeleteDNSZone(ctx context.Context,...), ListDNSRecords(ctx context.Context,...),
CreateDNSRecord(ctx context.Context,...), and DeleteDNSRecord(ctx
context.Context,...). Ensure signatures and all internal calls to
c.get/c.post/etc. are updated to the new ctx-aware form.

In `@tests/dns_e2e_test.go`:
- Around line 1-24: Add an opt-in guard so TestDNSE2E (and helpers like
waitForServer and registerAndLogin) only run when explicitly requested: add a
build tag (e.g. //go:build e2e) at the top of the file to exclude it from
regular `go test ./...` and also add an environment variable guard at the start
of TestDNSE2E that calls t.Skipf when the env var (e.g. "RUN_E2E_TESTS") is not
set to "1" (or similar); this ensures the test won’t execute against real
services unless both the build tag and env flag are intentionally provided.
🧹 Nitpick comments (8)
cmd/cloud/dns.go (2)

15-20: Prefer constructor functions over global Cobra command variables.
Global command vars make dependency injection and testing harder; consider a newDNSCmd() that builds and returns the command tree.

As per coding guidelines Do not use global variables (e.g., var DB *sql.DB); Use constructor injection for dependencies instead of global initialization.


43-49: Extract the short-ID length into a named constant.
The hardcoded 8 appears multiple times; a constant improves clarity and reuse.

♻️ Suggested refactor
-const dnsErrorFormat = "Error: %v\n"
+const (
+	dnsErrorFormat = "Error: %v\n"
+	dnsIDPrefixLen = 8
+)
@@
-				vpcID = z.VpcID.String()[:8]
+				vpcID = z.VpcID.String()[:dnsIDPrefixLen]
@@
-				z.ID.String()[:8],
+				z.ID.String()[:dnsIDPrefixLen],
@@
-				r.ID.String()[:8],
+				r.ID.String()[:dnsIDPrefixLen],
As per coding guidelines Do not use magic numbers - use named constants instead.

Also applies to: 136-136

pkg/sdk/dns.go (1)

65-83: Only include priority when it’s provided.
Sending "priority": null can trip validation in some APIs; omit the field when not set.

♻️ Suggested refactor
-	payload := map[string]interface{}{
-		"name":     name,
-		"type":     recordType,
-		"content":  content,
-		"ttl":      ttl,
-		"priority": priority,
-	}
+	payload := map[string]interface{}{
+		"name":    name,
+		"type":    recordType,
+		"content": content,
+		"ttl":     ttl,
+	}
+	if priority != nil {
+		payload["priority"] = *priority
+	}
tests/dns_e2e_test.go (3)

18-18: Consider dropping t.Parallel() for E2E stability.
Parallel E2E runs can conflict on shared infra (PowerDNS/DB) and create flakiness unless isolated.


125-154: Use a table-driven structure for boundary cases.
The two boundary subtests can be consolidated into a table for consistency and easier expansion.

As per coding guidelines Use table-driven tests in test files.


23-31: Extract test constants to avoid magic numbers.
Timeouts, TTLs, and modulo values are hard-coded; naming them improves clarity.

As per coding guidelines Do not use magic numbers - use named constants instead.

Also applies to: 105-106, 132-133, 142-142

docs/swagger/swagger.json (1)

4784-4787: Consider whether powerdns_id should be exposed in the public API.

The powerdns_id field appears to be a backend implementation detail (the zone identifier in PowerDNS). Exposing internal system identifiers in the public API could:

  • Leak implementation details to consumers
  • Create coupling between API consumers and the backend system

If this field is intentionally exposed for operational/debugging purposes, this is fine. Otherwise, consider excluding it from the API response schema.

internal/handlers/dns_handler.go (1)

29-34: Add omitempty for optional request fields.
Optional DTO fields should be tagged with omitempty to align with serialization guidelines.

As per coding guidelines: "Include json tags on struct fields with omitempty for optional fields".

♻️ Suggested updates
type CreateZoneRequest struct {
  Name        string    `json:"name" binding:"required"`
- Description string    `json:"description"`
+ Description string    `json:"description,omitempty"`
  VpcID       uuid.UUID `json:"vpc_id" binding:"required"`
}

type CreateRecordRequest struct {
  Name     string            `json:"name" binding:"required"`
  Type     domain.RecordType `json:"type" binding:"required"`
  Content  string            `json:"content" binding:"required"`
- TTL      int               `json:"ttl"`
- Priority *int              `json:"priority"`
+ TTL      int               `json:"ttl,omitempty"`
+ Priority *int              `json:"priority,omitempty"`
}

type UpdateRecordRequest struct {
  Content  string `json:"content" binding:"required"`
- TTL      int    `json:"ttl"`
- Priority *int   `json:"priority"`
+ TTL      int    `json:"ttl,omitempty"`
+ Priority *int   `json:"priority,omitempty"`
}

Also applies to: 119-126, 209-213

Comment thread cmd/cloud/dns.go
Comment on lines +33 to +36
if outputJSON {
data, _ := json.MarshalIndent(zones, "", " ")
fmt.Println(string(data))
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t ignore JSON marshaling errors for --json output.
Right now failures are silently dropped; handle and surface the error (apply to both list commands).

🛠️ Suggested fix
-		if outputJSON {
-			data, _ := json.MarshalIndent(zones, "", "  ")
-			fmt.Println(string(data))
-			return
-		}
+		if outputJSON {
+			data, err := json.MarshalIndent(zones, "", "  ")
+			if err != nil {
+				fmt.Printf(dnsErrorFormat, err)
+				return
+			}
+			fmt.Println(string(data))
+			return
+		}
As per coding guidelines Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`.

Also applies to: 117-120

🤖 Prompt for AI Agents
In `@cmd/cloud/dns.go` around lines 33 - 36, The json.MarshalIndent call for the
--json branch currently ignores errors (e.g., data, _ :=
json.MarshalIndent(zones, "", "  ")); change both occurrences that call
json.MarshalIndent (the one that marshals the "zones" variable and the other
instance around lines 117-120) to capture the error, handle it (e.g., if err !=
nil { fmt.Fprintf(os.Stderr, "failed to marshal JSON: %v\n", err); os.Exit(1)
}), and only print the marshaled data when err == nil; remove the use of the
blank identifier and ensure you import/use os for non-zero exit.

Comment thread cmd/cloud/dns.go
Comment on lines +64 to +66
desc, _ := cmd.Flags().GetString("description")
vpcStr, _ := cmd.Flags().GetString("vpc-id")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle flag parsing errors in create-zone.
Ignoring flag errors can mask CLI misuse or internal flag wiring bugs.

🛠️ Suggested fix
-		desc, _ := cmd.Flags().GetString("description")
-		vpcStr, _ := cmd.Flags().GetString("vpc-id")
+		desc, err := cmd.Flags().GetString("description")
+		if err != nil {
+			fmt.Printf(dnsErrorFormat, err)
+			return
+		}
+		vpcStr, err := cmd.Flags().GetString("vpc-id")
+		if err != nil {
+			fmt.Printf(dnsErrorFormat, err)
+			return
+		}
As per coding guidelines Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
desc, _ := cmd.Flags().GetString("description")
vpcStr, _ := cmd.Flags().GetString("vpc-id")
desc, err := cmd.Flags().GetString("description")
if err != nil {
fmt.Printf(dnsErrorFormat, err)
return
}
vpcStr, err := cmd.Flags().GetString("vpc-id")
if err != nil {
fmt.Printf(dnsErrorFormat, err)
return
}
🤖 Prompt for AI Agents
In `@cmd/cloud/dns.go` around lines 64 - 66, The flag parsing calls currently
ignore errors (desc, vpcStr := cmd.Flags().GetString("description"/"vpc-id")
with the error assigned to _); update the create-zone command to check the
returned error from cmd.Flags().GetString for both "description" and "vpc-id"
(and any other flag gets), and handle failures instead of discarding them: if
this command is implemented as a RunE, return the error; otherwise print a clear
error via the command logger (or cmd.PrintErrf) and exit non-zero. Ensure you
reference the variables desc and vpcStr and fail-fast on flag parse errors to
avoid silent failures.

Comment thread cmd/cloud/dns.go
Comment on lines +161 to +166
name, _ := cmd.Flags().GetString("name")
recordType, _ := cmd.Flags().GetString("type")
content, _ := cmd.Flags().GetString("content")
ttl, _ := cmd.Flags().GetInt("ttl")
priority, _ := cmd.Flags().GetInt("priority")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle flag parsing errors in create-record.
These errors are currently ignored; they should be surfaced to the user.

🛠️ Suggested fix
-		name, _ := cmd.Flags().GetString("name")
-		recordType, _ := cmd.Flags().GetString("type")
-		content, _ := cmd.Flags().GetString("content")
-		ttl, _ := cmd.Flags().GetInt("ttl")
-		priority, _ := cmd.Flags().GetInt("priority")
+		name, err := cmd.Flags().GetString("name")
+		if err != nil {
+			fmt.Printf(dnsErrorFormat, err)
+			return
+		}
+		recordType, err := cmd.Flags().GetString("type")
+		if err != nil {
+			fmt.Printf(dnsErrorFormat, err)
+			return
+		}
+		content, err := cmd.Flags().GetString("content")
+		if err != nil {
+			fmt.Printf(dnsErrorFormat, err)
+			return
+		}
+		ttl, err := cmd.Flags().GetInt("ttl")
+		if err != nil {
+			fmt.Printf(dnsErrorFormat, err)
+			return
+		}
+		priority, err := cmd.Flags().GetInt("priority")
+		if err != nil {
+			fmt.Printf(dnsErrorFormat, err)
+			return
+		}
As per coding guidelines Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name, _ := cmd.Flags().GetString("name")
recordType, _ := cmd.Flags().GetString("type")
content, _ := cmd.Flags().GetString("content")
ttl, _ := cmd.Flags().GetInt("ttl")
priority, _ := cmd.Flags().GetInt("priority")
name, err := cmd.Flags().GetString("name")
if err != nil {
fmt.Printf(dnsErrorFormat, err)
return
}
recordType, err := cmd.Flags().GetString("type")
if err != nil {
fmt.Printf(dnsErrorFormat, err)
return
}
content, err := cmd.Flags().GetString("content")
if err != nil {
fmt.Printf(dnsErrorFormat, err)
return
}
ttl, err := cmd.Flags().GetInt("ttl")
if err != nil {
fmt.Printf(dnsErrorFormat, err)
return
}
priority, err := cmd.Flags().GetInt("priority")
if err != nil {
fmt.Printf(dnsErrorFormat, err)
return
}
🤖 Prompt for AI Agents
In `@cmd/cloud/dns.go` around lines 161 - 166, The flag parsing calls for the
create-record command currently ignore returned errors; capture the error from
each cmd.Flags().GetString/GetInt call (for name, type, content, ttl, priority)
and handle them instead of using the blank identifier—e.g., if err != nil return
fmt.Errorf("parsing %s flag: %w", "<flag-name>", err) or print a user-facing
error and exit; update the create-record handler to propagate or report these
parsing errors so they are surfaced to the user.

Comment thread cmd/cloud/dns.go Outdated
Comment on lines +203 to +210
dnsCreateRecordCmd.Flags().String("name", "", "Record name (e.g., 'www')")
dnsCreateRecordCmd.Flags().String("type", "A", "Record type (A, AAAA, CNAME, MX, TXT)")
dnsCreateRecordCmd.Flags().String("content", "", "Record content (e.g., IP address)")
dnsCreateRecordCmd.Flags().Int("ttl", 3600, "Time To Live in seconds")
dnsCreateRecordCmd.Flags().Int("priority", 0, "Priority for MX records")

dnsCreateRecordCmd.MarkFlagRequired("name")
dnsCreateRecordCmd.MarkFlagRequired("content")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t ignore MarkFlagRequired errors and avoid magic defaults.
Handle the returned errors and extract defaults into constants.

🛠️ Suggested fix
-	dnsCreateRecordCmd.Flags().Int("ttl", 3600, "Time To Live in seconds")
-	dnsCreateRecordCmd.Flags().Int("priority", 0, "Priority for MX records")
+	const (
+		dnsDefaultTTLSeconds = 3600
+		dnsDefaultPriority   = 0
+	)
+	dnsCreateRecordCmd.Flags().Int("ttl", dnsDefaultTTLSeconds, "Time To Live in seconds")
+	dnsCreateRecordCmd.Flags().Int("priority", dnsDefaultPriority, "Priority for MX records")
@@
-	dnsCreateRecordCmd.MarkFlagRequired("name")
-	dnsCreateRecordCmd.MarkFlagRequired("content")
+	if err := dnsCreateRecordCmd.MarkFlagRequired("name"); err != nil {
+		fmt.Printf(dnsErrorFormat, err)
+		os.Exit(1)
+	}
+	if err := dnsCreateRecordCmd.MarkFlagRequired("content"); err != nil {
+		fmt.Printf(dnsErrorFormat, err)
+		os.Exit(1)
+	}
As per coding guidelines Do not use silent failures - avoid blank identifier assignment like `_ = someFunc()`; Do not use magic numbers - use named constants instead.
🤖 Prompt for AI Agents
In `@cmd/cloud/dns.go` around lines 203 - 210, The flag setup currently ignores
errors from MarkFlagRequired and embeds magic defaults; update
dnsCreateRecordCmd flag registration to check and handle returned errors (e.g.,
err := dnsCreateRecordCmd.MarkFlagRequired("name"); if err != nil {
log.Fatalf("failed to mark flag required: %v", err) } and similarly for
"content") instead of discarding them, and move literal defaults into named
constants (for example DEFAULT_TTL, DEFAULT_TYPE, DEFAULT_PRIORITY) and use
those constants in the dnsCreateRecordCmd.Flags().String/Int calls to replace
magic values; refer to dnsCreateRecordCmd, MarkFlagRequired, and the
Flags().String/Int calls to locate and change the code.

Comment thread docs/guides/dns.md

CloudDNS is a highly available and scalable Domain Name System (DNS) web service. It provides a reliable and cost-effective way to route users to internet applications by translating names like `www.example.com` into the numeric IP addresses like `192.0.2.1` that computers use to connect to each other.

The Cloud's DNA service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix service name typo (“DNA” → “DNS”).
Minor but user-facing.

🛠️ Suggested fix
-The Cloud's DNA service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.
+The Cloud's DNS service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The Cloud's DNA service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.
The Cloud's DNS service is powered by **PowerDNS** and is fully integrated with our VPC and Compute services.
🤖 Prompt for AI Agents
In `@docs/guides/dns.md` at line 5, Replace the typo "DNA" with "DNS" in the
sentence that currently reads "The Cloud's DNA service is powered by
**PowerDNS** and is fully integrated with our VPC and Compute services." so it
correctly says "The Cloud's DNS service..."; update the text in the
docs/guides/dns.md content where that exact sentence appears (look for the
phrase "The Cloud's DNA service") to fix the user-facing service name.

Comment thread docs/swagger/swagger.yaml
Comment on lines +633 to +648
domain.RecordType:
enum:
- A
- AAAA
- CNAME
- MX
- TXT
- SRV
type: string
x-enum-varnames:
- RecordTypeA
- RecordTypeAAAA
- RecordTypeCNAME
- RecordTypeMX
- RecordTypeTXT
- RecordTypeSRV
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for SRV handling across the backend.
rg -n --type=go -C3 '\bRecordTypeSRV\b|\bSRV\b' internal

Repository: PoyrazK/thecloud

Length of output: 3257


🏁 Script executed:

#!/bin/bash
# Check for SRV-specific validation or additional handling
rg -n --type=go 'IsValid|validation|SRV' internal/core/domain/dns.go -A5 -B2

Repository: PoyrazK/thecloud

Length of output: 1316


🏁 Script executed:

#!/bin/bash
# Search for any specific SRV record validation or rejection logic
rg -n --type=go 'RecordType.*SRV|switch.*Type|case.*SRV' internal -A3 -B1

Repository: PoyrazK/thecloud

Length of output: 2531


🏁 Script executed:

#!/bin/bash
# Check API handlers for DNS record creation/update
rg -n --type=go 'CreateRecord|UpdateRecord|POST.*record|PUT.*record' internal/adapters -A10 | head -80

Repository: PoyrazK/thecloud

Length of output: 1044


🏁 Script executed:

#!/bin/bash
# Check the PowerDNS adapter implementation for record handling
rg -n --type=go 'AddRecords|UpdateRecords' internal/adapters/dns/powerdns.go -A15 -B2

Repository: PoyrazK/thecloud

Length of output: 2148


🏁 Script executed:

#!/bin/bash
# Check for any tests related to SRV record handling
fd -type f -name "*.go" internal | xargs rg -l 'SRV|test.*record' | head -20

Repository: PoyrazK/thecloud

Length of output: 482


🏁 Script executed:

#!/bin/bash
# Check for SRV-related tests
rg -n 'SRV' internal/handlers/dns_handler_test.go internal/adapters/dns/powerdns_test.go -B3 -A3

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

#!/bin/bash
# Check the DNS handler test to see what record types are tested
cat -n internal/handlers/dns_handler_test.go | head -100

Repository: PoyrazK/thecloud

Length of output: 3323


🏁 Script executed:

#!/bin/bash
# Continue reading the DNS handler test file to look for record creation tests
cat -n internal/handlers/dns_handler_test.go | sed -n '100,250p'

Repository: PoyrazK/thecloud

Length of output: 5322


🏁 Script executed:

#!/bin/bash
# Check the complete service layer handling for SRV to understand the full context
cat -n internal/core/services/dns.go | sed -n '180,210p'

Repository: PoyrazK/thecloud

Length of output: 1155


SRV support is incomplete and should be removed or properly implemented.

SRV records are advertised in the API schema, but the backend implementation acknowledges it cannot properly support them. The service layer code (internal/core/services/dns.go, lines 199-203) explicitly states:

SRV usually takes priority weight port target. But here we only have priority and content... For now, let's assume content contains what's needed.

The current schema only supports Priority (optional int) and Content (single string), which cannot adequately represent the required SRV format (priority, weight, port, target as separate fields). Unlike MX records which receive special formatting, SRV records bypass this entirely.

Either remove SRV from the swagger enum and validation, or redesign the schema and service layer to properly support the full SRV record structure.

🤖 Prompt for AI Agents
In `@docs/swagger/swagger.yaml` around lines 633 - 648, The swagger enum
domain.RecordType currently lists SRV but the service layer (dns.go handling
SRV) does not support full SRV structure; either remove SRV from
domain.RecordType enum and any validation/handlers that expect SRV, or expand
the API model to properly represent SRV records (replace the single Content
string with explicit fields weight:int, port:int, target:string and keep
priority:int), then update the service layer function that formats/consumes SRV
records (the code path in internal/core/services/dns.go that references SRV and
the record serialization logic) to map the new fields into the required SRV
wire/string format and adjust validation/creation/update handlers accordingly.
Ensure domain.RecordType enum, request/response schemas, and the SRV-handling
logic in dns.go remain consistent.

Comment on lines +3 to +11
import (
"net/http"

"github.com/gin-gonic/gin"
"github.com/google/uuid"
"github.com/poyrazk/thecloud/internal/core/domain"
"github.com/poyrazk/thecloud/internal/core/ports"
errs "github.com/poyrazk/thecloud/internal/errors"
"github.com/poyrazk/thecloud/pkg/httputil"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Handler should not depend on core/domain types.
internal/handlers/* is meant to depend on Services only; consider moving RecordType/response DTOs to the service/ports layer or introducing handler DTOs and mapping in the service boundary to avoid importing internal/core/domain.

As per coding guidelines: "Handlers (internal/handlers/) must import Services only, not Repositories or other layers".

🤖 Prompt for AI Agents
In `@internal/handlers/dns_handler.go` around lines 3 - 11, The handler currently
imports core/domain types (e.g., RecordType and any response DTOs) which
violates the rule that handlers should depend only on Services; move the
RecordType/response DTO definitions out of core/domain into the service boundary
(ports) or create handler-specific DTOs and mapping functions, then update the
handler code in dns_handler.go to import only ports and use the ports-facing
types (for example ports.DNSService method signatures and newly moved Port DTOs)
or map port DTOs to handler DTOs before returning responses; ensure no direct
references to domain.RecordType remain in the handler.

Comment thread internal/handlers/dns_handler.go Outdated
Comment thread pkg/sdk/dns.go
Comment on lines +10 to +17
// ListDNSZones returns all DNS zones for the current tenant.
func (c *Client) ListDNSZones() ([]domain.DNSZone, error) {
var resp struct {
Data []domain.DNSZone `json:"data"`
}
err := c.get("/dns/zones", &resp)
return resp.Data, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, examine the full DNS SDK file
cat -n pkg/sdk/dns.go

Repository: PoyrazK/thecloud

Length of output: 2954


🏁 Script executed:

# Check how many lines are in dns.go to understand scope
wc -l pkg/sdk/dns.go

Repository: PoyrazK/thecloud

Length of output: 77


🏁 Script executed:

# Examine other SDK files to see context patterns
ls -la pkg/sdk/*.go

Repository: PoyrazK/thecloud

Length of output: 3697


🏁 Script executed:

# Look at the Client type and its HTTP methods
rg -n "func \(c \*Client\)" pkg/sdk -A 2 | head -50

Repository: PoyrazK/thecloud

Length of output: 2491


🏁 Script executed:

# Search for context.Context usage in SDK files
rg -n "context\.Context" pkg/sdk/ -g '*.go'

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Examine the Client type and HTTP method implementations
cat -n pkg/sdk/client.go

Repository: PoyrazK/thecloud

Length of output: 3714


🏁 Script executed:

# Check if resty library supports context - search for resty usage docs or ctx usage
rg -n "SetContext\|WithContext\|ctx\|context" pkg/sdk/client.go

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Search for any test files that might show how context should be used
cat -n pkg/sdk/client_test.go

Repository: PoyrazK/thecloud

Length of output: 4436


🏁 Script executed:

# Check resty library documentation or check if there are any context-related methods
rg -rn "SetContext\|Context\(" pkg/sdk/ --type go

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Check if there are any internal service files that might show correct context patterns
find . -name "*.go" -path "*/internal/*" -type f | head -20 | xargs grep -l "context.Context" | head -5

Repository: PoyrazK/thecloud

Length of output: 205


🏁 Script executed:

# Check how context is used in the internal service layer
head -50 ./internal/adapters/dns/powerdns.go

Repository: PoyrazK/thecloud

Length of output: 1347


🏁 Script executed:

# Check the core context file
cat -n ./internal/core/context/context.go

Repository: PoyrazK/thecloud

Length of output: 1476


🏁 Script executed:

# Look at a service interface to see the expected pattern
rg -n "type.*Service interface" internal/core/ports/ -A 5 | head -40

Repository: PoyrazK/thecloud

Length of output: 3712


Add context support to DNS SDK methods and the underlying HTTP client.

All DNS SDK methods perform blocking HTTP calls without accepting context.Context as the first parameter. This violates the coding guidelines and prevents callers from controlling request timeouts and cancellation. Update the HTTP client methods (get, post, delete, put, patch) in pkg/sdk/client.go to accept context.Context as the first parameter and use req.SetContext(ctx) before making resty requests. Then propagate context through all DNS methods: ListDNSZones(ctx context.Context), CreateDNSZone(ctx context.Context, ...), GetDNSZone(ctx context.Context, ...), DeleteDNSZone(ctx context.Context, ...), ListDNSRecords(ctx context.Context, ...), CreateDNSRecord(ctx context.Context, ...), and DeleteDNSRecord(ctx context.Context, ...).

🤖 Prompt for AI Agents
In `@pkg/sdk/dns.go` around lines 10 - 17, The DNS SDK methods are missing context
support; update the HTTP helper methods in Client (get, post, put, patch, delete
in pkg/sdk/client.go) to accept a context.Context as their first parameter and
call req.SetContext(ctx) on the resty request before executing it, then change
all DNS methods to accept ctx context.Context as their first parameter and pass
that ctx into the updated client methods: ListDNSZones(ctx context.Context),
CreateDNSZone(ctx context.Context,...), GetDNSZone(ctx context.Context,...),
DeleteDNSZone(ctx context.Context,...), ListDNSRecords(ctx context.Context,...),
CreateDNSRecord(ctx context.Context,...), and DeleteDNSRecord(ctx
context.Context,...). Ensure signatures and all internal calls to
c.get/c.post/etc. are updated to the new ctx-aware form.

Comment thread tests/dns_e2e_test.go
Comment on lines +1 to +24
package tests

import (
"encoding/json"
"fmt"
"io"
"net/http"
"testing"
"time"

"github.com/poyrazk/thecloud/internal/core/domain"
"github.com/poyrazk/thecloud/pkg/testutil"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

func TestDNSE2E(t *testing.T) {
t.Parallel()
if err := waitForServer(); err != nil {
t.Fatalf("Failing DNS E2E test: %v", err)
}

client := &http.Client{Timeout: 15 * time.Second}
token := registerAndLogin(t, client, "dns-tester@thecloud.local", "DNS Tester")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Gate E2E tests behind a build tag or env flag.
This test hits real services and currently runs with go test ./..., which violates the unit-test isolation rule. Add a build tag and/or an env guard.

🛠️ Suggested fix (build tag + env guard)
+//go:build e2e
+// +build e2e
+
 package tests

 import (
 	"encoding/json"
 	"fmt"
 	"io"
 	"net/http"
+	"os"
 	"testing"
 	"time"
@@
 func TestDNSE2E(t *testing.T) {
+	if os.Getenv("E2E") == "" {
+		t.Skip("set E2E=1 to run DNS E2E tests")
+	}
 	t.Parallel()
As per coding guidelines Do not test with real external dependencies in unit tests - use mocks instead.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
package tests
import (
"encoding/json"
"fmt"
"io"
"net/http"
"testing"
"time"
"github.com/poyrazk/thecloud/internal/core/domain"
"github.com/poyrazk/thecloud/pkg/testutil"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestDNSE2E(t *testing.T) {
t.Parallel()
if err := waitForServer(); err != nil {
t.Fatalf("Failing DNS E2E test: %v", err)
}
client := &http.Client{Timeout: 15 * time.Second}
token := registerAndLogin(t, client, "dns-tester@thecloud.local", "DNS Tester")
//go:build e2e
// +build e2e
package tests
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"testing"
"time"
"github.com/poyrazk/thecloud/internal/core/domain"
"github.com/poyrazk/thecloud/pkg/testutil"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestDNSE2E(t *testing.T) {
if os.Getenv("E2E") == "" {
t.Skip("set E2E=1 to run DNS E2E tests")
}
t.Parallel()
if err := waitForServer(); err != nil {
t.Fatalf("Failing DNS E2E test: %v", err)
}
client := &http.Client{Timeout: 15 * time.Second}
token := registerAndLogin(t, client, "dns-tester@thecloud.local", "DNS Tester")
🤖 Prompt for AI Agents
In `@tests/dns_e2e_test.go` around lines 1 - 24, Add an opt-in guard so TestDNSE2E
(and helpers like waitForServer and registerAndLogin) only run when explicitly
requested: add a build tag (e.g. //go:build e2e) at the top of the file to
exclude it from regular `go test ./...` and also add an environment variable
guard at the start of TestDNSE2E that calls t.Skipf when the env var (e.g.
"RUN_E2E_TESTS") is not set to "1" (or similar); this ensures the test won’t
execute against real services unless both the build tag and env flag are
intentionally provided.

Copilot AI review requested due to automatic review settings January 30, 2026 11:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 122 out of 124 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 30, 2026 11:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 122 out of 124 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 30, 2026 12:11
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Go Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: 8be814e Previous: d6e391b Ratio
BenchmarkFunctionServiceInvoke 60423 ns/op 1656 B/op 22 allocs/op 29926 ns/op 1656 B/op 22 allocs/op 2.02
BenchmarkFunctionServiceInvoke - ns/op 60423 ns/op 29926 ns/op 2.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 123 out of 125 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- fix(docker): switch default container command to /bin/sh for Alpine compatibility
- fix(instance): register DNS record before updating status to running to prevent test race conditions
- fix(e2e): normalize private IP (remove CIDR) in DNS record verification
- fix(e2e): ensure unique VPC names using UUIDs for parallel test safety
- fix(dns): improve logging for DNS registration debugging
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@internal/core/services/instance.go`:
- Around line 432-435: The call to
s.dnsSvc.UnregisterInstance(context.Background(), inst.ID) currently discards
any returned error with `_ =`; change this to capture the error and handle it
instead of silently ignoring it — e.g., err :=
s.dnsSvc.UnregisterInstance(context.Background(), inst.ID); if err != nil { /*
log via the service logger (e.g. s.logger.Errorf/Infof) or handle/return the
error */ } — reference the UnregisterInstance call on s.dnsSvc and use inst.ID
and context.Background() as shown to locate the call.

In `@tests/dns_instance_e2e_test.go`:
- Around line 163-194: In waitForInstanceRunning, the response body may leak if
json decoding fails because require.NoError aborts before resp.Body.Close(); fix
by ensuring resp.Body is closed before any assertions—e.g., immediately call
defer resp.Body.Close() after each getRequest() (or read the body into bytes and
close before calling json.Unmarshal/require.NoError). Update the body handling
in waitForInstanceRunning (the block that calls getRequest,
json.NewDecoder(...).Decode(&res), and subsequent resp.Body.Close()) so the
close happens prior to require.NoError or decoding assertions.
🧹 Nitpick comments (2)
internal/repositories/docker/adapter.go (1)

438-476: Extract magic numbers to named constants and rename shadowed variable.

Several issues in this retry implementation:

  1. Magic numbers: 30 (retry count) and 500 * time.Millisecond (backoff) should be named constants per coding guidelines.

  2. Variable naming: json shadows the standard library package name. Consider inspectResp or containerInfo.

  3. Non-idiomatic goto: The goto retry pattern can be simplified using continue directly.

♻️ Proposed refactor

Add constants at package level (near existing constants):

const (
	// IPRetryAttempts is the number of attempts to get container IP
	IPRetryAttempts = 30
	// IPRetryBackoff is the delay between IP lookup attempts
	IPRetryBackoff = 500 * time.Millisecond
)

Then refactor the function:

 func (a *DockerAdapter) GetInstanceIP(ctx context.Context, id string) (string, error) {
 	// Inspect container with retries to allow time for IP assignment
-	var json container.InspectResponse
+	var inspectResp container.InspectResponse
 	var err error
 
-	// Retry up to 30 times with 500ms backoff (15 seconds total)
-	for i := 0; i < 30; i++ {
-		json, err = a.cli.ContainerInspect(ctx, id)
+	for i := 0; i < IPRetryAttempts; i++ {
+		inspectResp, err = a.cli.ContainerInspect(ctx, id)
 		if err != nil {
 			return "", fmt.Errorf("failed to inspect container: %w", err)
 		}
 
-		// Add nil check for NetworkSettings
-		if json.NetworkSettings == nil {
-			goto retry
-		}
-
-		// Add nil check for Networks map
-		if len(json.NetworkSettings.Networks) == 0 {
-			goto retry
-		}
-
-		// Try to get IP from first network
-		for _, net := range json.NetworkSettings.Networks {
-			if net != nil && net.IPAddress != "" {
-				return net.IPAddress, nil
+		if inspectResp.NetworkSettings != nil && len(inspectResp.NetworkSettings.Networks) > 0 {
+			for _, net := range inspectResp.NetworkSettings.Networks {
+				if net != nil && net.IPAddress != "" {
+					return net.IPAddress, nil
+				}
 			}
 		}
 
-	retry:
 		// Wait and retry
 		select {
 		case <-ctx.Done():
 			return "", ctx.Err()
-		case <-time.After(500 * time.Millisecond):
+		case <-time.After(IPRetryBackoff):
 			continue
 		}
 	}

As per coding guidelines: "Do not use magic numbers - use named constants instead".

tests/dns_instance_e2e_test.go (1)

139-140: Consider polling instead of fixed sleep for cleanup verification.

The 2-second sleep for async cleanup may be flaky under load. A polling loop with timeout (similar to waitForInstanceRunning) would be more reliable.

Comment on lines +432 to +435

if s.dnsSvc != nil {
_ = s.dnsSvc.UnregisterInstance(context.Background(), inst.ID)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't silently discard DNS unregistration errors.

Line 434 uses _ = to ignore the error from UnregisterInstance, which violates the coding guideline against silent failures. Errors during cleanup should at least be logged.

🛠️ Proposed fix
 	if s.dnsSvc != nil {
-		_ = s.dnsSvc.UnregisterInstance(context.Background(), inst.ID)
+		if err := s.dnsSvc.UnregisterInstance(context.Background(), inst.ID); err != nil {
+			s.logger.Warn("failed to unregister instance DNS", "instance_id", inst.ID, "error", err)
+		}
 	}

As per coding guidelines: Do not use silent failures - avoid blank identifier assignment like _ = someFunc().

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if s.dnsSvc != nil {
_ = s.dnsSvc.UnregisterInstance(context.Background(), inst.ID)
}
if s.dnsSvc != nil {
if err := s.dnsSvc.UnregisterInstance(context.Background(), inst.ID); err != nil {
s.logger.Warn("failed to unregister instance DNS", "instance_id", inst.ID, "error", err)
}
}
🤖 Prompt for AI Agents
In `@internal/core/services/instance.go` around lines 432 - 435, The call to
s.dnsSvc.UnregisterInstance(context.Background(), inst.ID) currently discards
any returned error with `_ =`; change this to capture the error and handle it
instead of silently ignoring it — e.g., err :=
s.dnsSvc.UnregisterInstance(context.Background(), inst.ID); if err != nil { /*
log via the service logger (e.g. s.logger.Errorf/Infof) or handle/return the
error */ } — reference the UnregisterInstance call on s.dnsSvc and use inst.ID
and context.Background() as shown to locate the call.

Comment on lines +163 to +194
func waitForInstanceRunning(t *testing.T, client *http.Client, token, instanceID string) (string, error) {
timeout := 120 * time.Second
start := time.Now()
var privateIP string

for time.Since(start) < timeout {
resp := getRequest(t, client, fmt.Sprintf("%s/instances/%s", testutil.TestBaseURL, instanceID), token)
var res struct {
Data domain.Instance `json:"data"`
}
require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))
_ = resp.Body.Close()

if res.Data.Status == domain.StatusRunning && res.Data.PrivateIP != "" {
privateIP = res.Data.PrivateIP
break
}
if res.Data.Status == domain.StatusError {
return "", fmt.Errorf("instance reached error state")
}
time.Sleep(2 * time.Second)
}

if privateIP == "" {
resp := getRequest(t, client, fmt.Sprintf("%s/instances/%s", testutil.TestBaseURL, instanceID), token)
body, _ := io.ReadAll(resp.Body)
_ = resp.Body.Close()
t.Logf("Final instance state: %s", string(body))
}

return privateIP, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Response body may leak on JSON decode errors.

The response body is closed after json.NewDecoder().Decode() and require.NoError(). If decoding fails, require.NoError terminates the test before resp.Body.Close() executes, potentially leaking the connection.

🛠️ Proposed fix - close body before assertions
 	for time.Since(start) < timeout {
 		resp := getRequest(t, client, fmt.Sprintf("%s/instances/%s", testutil.TestBaseURL, instanceID), token)
+		defer func() { _ = resp.Body.Close() }()
 		var res struct {
 			Data domain.Instance `json:"data"`
 		}
-		require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))
-		_ = resp.Body.Close()
+		err := json.NewDecoder(resp.Body).Decode(&res)
+		require.NoError(t, err)

Note: Using defer inside a loop accumulates defers, but given the bounded timeout and short iterations, this is acceptable for test code. Alternatively, read the body into bytes first.

🤖 Prompt for AI Agents
In `@tests/dns_instance_e2e_test.go` around lines 163 - 194, In
waitForInstanceRunning, the response body may leak if json decoding fails
because require.NoError aborts before resp.Body.Close(); fix by ensuring
resp.Body is closed before any assertions—e.g., immediately call defer
resp.Body.Close() after each getRequest() (or read the body into bytes and close
before calling json.Unmarshal/require.NoError). Update the body handling in
waitForInstanceRunning (the block that calls getRequest,
json.NewDecoder(...).Decode(&res), and subsequent resp.Body.Close()) so the
close happens prior to require.NoError or decoding assertions.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/compute_e2e_test.go`:
- Line 28: The instance name generation uses a magic number (%1000) in the
fmt.Sprintf call for variable instanceName; remove the modulo to rely on
time.Now().UnixNano() and uuid.New().String() or replace %1000 with a named
constant (e.g., instanceSuffixModulo) declared at the top of the test file, then
update the fmt.Sprintf invocation (which references time.Now().UnixNano() and
uuid.New().String()) to use the constant or omit the modulo entirely so no magic
number remains.
- Around line 38-39: Create a small test helper (e.g., assertClose or
requireClose with signature (t testing.TB, c io.Closer)) that calls t.Helper(),
closes the closer and fails the test (t.Fatalf or t.Errorf) if Close() returns
an error, then replace every defer _ = resp.Body.Close() (and other silent
closes in this file) with defer assertClose(t, resp.Body) so Close errors are
asserted instead of ignored; update all occurrences mentioned (the resp variable
closes at the listed locations) to use this helper.
🧹 Nitpick comments (1)
tests/compute_e2e_test.go (1)

31-147: Consider table-driving the repeated subtest steps.

A small table of step names + functions would reduce repetition and align with the test style guidance.

As per coding guidelines: Use table-driven tests in test files.

Comment thread tests/compute_e2e_test.go

var instanceID string
instanceName := fmt.Sprintf("e2e-inst-%d", time.Now().UnixNano()%1000)
instanceName := fmt.Sprintf("e2e-inst-%d-%s", time.Now().UnixNano()%1000, uuid.New().String())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the magic number in the instance name suffix.

%1000 is a magic number and unnecessary now that a UUID is appended; consider dropping the modulo (or replacing it with a named constant).

Suggested change
-	instanceName := fmt.Sprintf("e2e-inst-%d-%s", time.Now().UnixNano()%1000, uuid.New().String())
+	instanceName := fmt.Sprintf("e2e-inst-%d-%s", time.Now().UnixNano(), uuid.New().String())

As per coding guidelines: Do not use magic numbers - use named constants instead.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
instanceName := fmt.Sprintf("e2e-inst-%d-%s", time.Now().UnixNano()%1000, uuid.New().String())
instanceName := fmt.Sprintf("e2e-inst-%d-%s", time.Now().UnixNano(), uuid.New().String())
🤖 Prompt for AI Agents
In `@tests/compute_e2e_test.go` at line 28, The instance name generation uses a
magic number (%1000) in the fmt.Sprintf call for variable instanceName; remove
the modulo to rely on time.Now().UnixNano() and uuid.New().String() or replace
%1000 with a named constant (e.g., instanceSuffixModulo) declared at the top of
the test file, then update the fmt.Sprintf invocation (which references
time.Now().UnixNano() and uuid.New().String()) to use the constant or omit the
modulo entirely so no magic number remains.

Comment thread tests/compute_e2e_test.go
Comment on lines +38 to 39
defer func() { _ = resp.Body.Close() }()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t ignore Close() errors.

Using _ = resp.Body.Close() silently drops errors. Prefer a small helper that asserts on close errors and reuse it everywhere.

Suggested change
+func closeBody(t *testing.T, resp *http.Response) {
+	t.Helper()
+	assert.NoError(t, resp.Body.Close())
+}
+
 func TestComputeE2E(t *testing.T) {
@@
-		defer func() { _ = resp.Body.Close() }()
+		defer closeBody(t, resp)
@@
-		defer func() { _ = resp.Body.Close() }()
+		defer closeBody(t, resp)
@@
-			_ = resp.Body.Close()
+			closeBody(t, resp)
@@
-		defer func() { _ = resp.Body.Close() }()
+		defer closeBody(t, resp)
@@
-		defer func() { _ = resp.Body.Close() }()
+		defer closeBody(t, resp)
@@
-		defer func() { _ = resp.Body.Close() }()
+		defer closeBody(t, resp)
@@
-		defer func() { _ = resp.Body.Close() }()
+		defer closeBody(t, resp)

As per coding guidelines: Do not use silent failures - avoid blank identifier assignment like _ = someFunc().

Also applies to: 55-56, 79-80, 106-107, 129-130, 137-138, 145-146

🤖 Prompt for AI Agents
In `@tests/compute_e2e_test.go` around lines 38 - 39, Create a small test helper
(e.g., assertClose or requireClose with signature (t testing.TB, c io.Closer))
that calls t.Helper(), closes the closer and fails the test (t.Fatalf or
t.Errorf) if Close() returns an error, then replace every defer _ =
resp.Body.Close() (and other silent closes in this file) with defer
assertClose(t, resp.Body) so Close errors are asserted instead of ignored;
update all occurrences mentioned (the resp variable closes at the listed
locations) to use this helper.

- fix(docker): add 30s timeout for container stop operations
- fix(tests): increase HTTP client timeout to 60s for slow operations in CI
- fix(tests): use UUID for unique instance names to prevent conflicts

This resolves timeout issues when stopping containers in CI environments
where operations may take longer than in local development.
- fix(cli): use cobra.CheckErr for MarkFlagRequired in autoscaling, loadbalancer, notify, snapshot, and volume commands
- fix(cli): handle table.Append and table.Render errors in snapshot and volume commands
- fix(dns): return SOA record creation errors instead of silently logging them
- fix(websocket): log connection close, SetReadDeadline, and SetWriteDeadline errors instead of ignoring them
- fix(ssh): handle SCP header write errors by sending to error channel

These changes ensure errors are surfaced and handled properly instead of being silently ignored,
improving debugability and preventing silent failures.
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
13 Security Hotspots

See analysis details on SonarQube Cloud

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@cmd/cloud/autoscaling.go`:
- Around line 88-90: Replace the silent ignores on table.Append and table.Render
by checking and handling errors like in volume.go; specifically, wrap the calls
to table.Append(...) and table.Render() with cobra.CheckErr (or otherwise handle
the returned error) so failures aren’t discarded—locate the uses of table.Append
and table.Render in the autoscaling listing function and change `_ =
table.Append(...)` and `_ = table.Render()` to
`cobra.CheckErr(table.Append(...))` and `cobra.CheckErr(table.Render())` (or
equivalent error handling).

In `@internal/adapters/dns/powerdns.go`:
- Around line 62-69: The SOA creation uses magic numbers directly (the serial
and SOA timing values and TTL) in soaContent and soaRecord; define named
constants (e.g., soaSerial, soaRefresh, soaRetry, soaExpire, soaMinimum,
defaultTTL) at the package scope and replace the literal values in fmt.Sprintf
and the RecordSet TTL with those constants so the SOA fields are
self-documenting and configurable; update the fmt.Sprintf call that builds
soaContent and the ports.RecordSet{ TTL: ... } to use the new constants and keep
existing symbols soaContent, soaRecord, nameservers and zoneName unchanged.

In `@internal/handlers/ws/client.go`:
- Around line 73-76: The defer in WritePump currently discards the error from
c.conn.Close(); replace the blank assignment with proper error handling: call
c.conn.Close(), capture the returned error (err := c.conn.Close()) and, if
non-nil, log it the same way ReadPump does (use the same logger/format used in
ReadPump) so connection-close failures are not silently ignored; ensure
ticker.Stop() remains and the defer mirrors the ReadPump error-handling pattern.
- Around line 80-98: The message send path currently ignores errors from
c.conn.SetWriteDeadline, c.conn.WriteMessage, c.conn.NextWriter and w.Write
which can hide broken connections; update the send case that reads from c.send
so each call checks and handles errors like the ping branch does: capture and
handle SetWriteDeadline error, check the result of c.conn.WriteMessage (when
!ok), check err from c.conn.NextWriter (already returned but ensure any
cleanup/logging), and check the error returns from w.Write both for the initial
message and for each batched write (the <-c.send loop); on any write error
close/cleanup the connection and return or log appropriately, mirroring the
error-handling pattern used in the ping handling code.

In `@internal/repositories/docker/adapter.go`:
- Around line 197-201: Replace the hard-coded stop timeout "30" in
DockerAdapter.StopInstance with a named constant derived from
DefaultOperationTimeout (or introduce a local named constant like
defaultStopTimeout) so the value is consistent across the codebase; update the
Timeout passed into container.StopOptions to reference that constant (ensuring
the type matches the expected pointer type) instead of the magic number.
- Around line 441-479: The loop in the container IP inspect block uses magic
numbers (30 retries, 500ms delay); replace them with named constants (e.g.,
inspectRetryCount int = 30 and inspectRetryDelay = 500 * time.Millisecond) and
use those constants in the for loop and time.After call inside the retry logic
in the method that calls a.cli.ContainerInspect (the Inspect/ContainerInspect
retry block in adapter.go). Ensure the constants are typed appropriately (int
for count, time.Duration for delay), update the for loop to iterate to
inspectRetryCount and the select case to use time.After(inspectRetryDelay), and
keep the existing ctx.Done() behavior unchanged.

In `@tests/compute_e2e_test.go`:
- Line 24: Replace the magic timeout value used when constructing the HTTP
client by introducing a named constant (e.g., defaultHTTPClientTimeout) and use
it in the http.Client{Timeout: ...} expression; update the instantiation at the
site where client := &http.Client{Timeout: 60 * time.Second} to use the new
constant, and ensure the constant is defined at package scope near related test
constants so timeouts remain consistent and self-documenting across tests that
use http.Client or Timeout.
🧹 Nitpick comments (1)
cmd/cloud/snapshot.go (1)

33-33: Consider handling remaining silent failures for consistency.

While this PR correctly fixes the table and flag errors, there are other instances of _ = assignments that could be addressed for consistency:

  • Line 33, 82, 118: json.MarshalIndent errors ignored
  • Line 72, 108: cmd.Flags().GetString errors ignored

For GetString, the error is typically only returned if the flag doesn't exist (which would be a developer bug), so cobra.CheckErr would be appropriate. For MarshalIndent, errors are rare but could occur with types that have custom marshalers.

Comment thread cmd/cloud/autoscaling.go
Comment on lines +88 to +90
_ = table.Append([]string{g.ID, g.Name, instances, g.Status})
}
table.Render()
_ = table.Render()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle table output errors consistently with volume.go.

These lines use _ = table.Append(...) and _ = table.Render(), which violates the coding guideline against silent failures. This is also inconsistent with cmd/cloud/volume.go where the same operations are wrapped with cobra.CheckErr.

Proposed fix for consistency
 		for _, g := range groups {
 			instances := fmt.Sprintf("%d / %d / %d / %d", g.CurrentCount, g.DesiredCount, g.MinInstances, g.MaxInstances)
-			_ = table.Append([]string{g.ID, g.Name, instances, g.Status})
+			cobra.CheckErr(table.Append([]string{g.ID, g.Name, instances, g.Status}))
 		}
-		_ = table.Render()
+		cobra.CheckErr(table.Render())

Based on learnings: "Do not use silent failures - avoid blank identifier assignment like _ = someFunc()"

🤖 Prompt for AI Agents
In `@cmd/cloud/autoscaling.go` around lines 88 - 90, Replace the silent ignores on
table.Append and table.Render by checking and handling errors like in volume.go;
specifically, wrap the calls to table.Append(...) and table.Render() with
cobra.CheckErr (or otherwise handle the returned error) so failures aren’t
discarded—locate the uses of table.Append and table.Render in the autoscaling
listing function and change `_ = table.Append(...)` and `_ = table.Render()` to
`cobra.CheckErr(table.Append(...))` and `cobra.CheckErr(table.Render())` (or
equivalent error handling).

Comment on lines +62 to +69
// Add an initial SOA record to satisfy PowerDNS requirements for Native zones
soaContent := fmt.Sprintf("%s hostmaster.%s 1 10800 3600 604800 3600", nameservers[0], zoneName)
soaRecord := ports.RecordSet{
Name: zoneName,
Type: "SOA",
Records: []string{soaContent},
TTL: 3600,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace SOA magic numbers with named constants.

This makes the SOA fields easier to audit and tweak safely.

♻️ Suggested refactor
+const (
+	soaSerial  = 1
+	soaRefresh = 10800
+	soaRetry   = 3600
+	soaExpire  = 604800
+	soaMinimum = 3600
+	soaTTL     = 3600
+)
+
 	// Add an initial SOA record to satisfy PowerDNS requirements for Native zones
-	soaContent := fmt.Sprintf("%s hostmaster.%s 1 10800 3600 604800 3600", nameservers[0], zoneName)
+	soaContent := fmt.Sprintf("%s hostmaster.%s %d %d %d %d %d", nameservers[0], zoneName, soaSerial, soaRefresh, soaRetry, soaExpire, soaMinimum)
 	soaRecord := ports.RecordSet{
 		Name:    zoneName,
 		Type:    "SOA",
 		Records: []string{soaContent},
-		TTL:     3600,
+		TTL:     soaTTL,
 	}
As per coding guidelines: Do not use magic numbers - use named constants instead.
🤖 Prompt for AI Agents
In `@internal/adapters/dns/powerdns.go` around lines 62 - 69, The SOA creation
uses magic numbers directly (the serial and SOA timing values and TTL) in
soaContent and soaRecord; define named constants (e.g., soaSerial, soaRefresh,
soaRetry, soaExpire, soaMinimum, defaultTTL) at the package scope and replace
the literal values in fmt.Sprintf and the RecordSet TTL with those constants so
the SOA fields are self-documenting and configurable; update the fmt.Sprintf
call that builds soaContent and the ports.RecordSet{ TTL: ... } to use the new
constants and keep existing symbols soaContent, soaRecord, nameservers and
zoneName unchanged.

Comment on lines 73 to 76
defer func() {
ticker.Stop()
c.conn.Close()
_ = c.conn.Close()
}()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n internal/handlers/ws/client.go | head -100

Repository: PoyrazK/thecloud

Length of output: 3125


Handle Close error in WritePump defer block.

The connection close error is silently discarded, hiding potential teardown failures. ReadPump already follows the proper pattern (lines 40-45).

Suggested fix
 defer func() {
 	ticker.Stop()
-	_ = c.conn.Close()
+	if err := c.conn.Close(); err != nil {
+		c.logger.Error("failed to close connection", slog.String("error", err.Error()))
+	}
 }()

Per coding guidelines: "Do not use silent failures - avoid blank identifier assignment like _ = someFunc()".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
defer func() {
ticker.Stop()
c.conn.Close()
_ = c.conn.Close()
}()
defer func() {
ticker.Stop()
if err := c.conn.Close(); err != nil {
c.logger.Error("failed to close connection", slog.String("error", err.Error()))
}
}()
🤖 Prompt for AI Agents
In `@internal/handlers/ws/client.go` around lines 73 - 76, The defer in WritePump
currently discards the error from c.conn.Close(); replace the blank assignment
with proper error handling: call c.conn.Close(), capture the returned error (err
:= c.conn.Close()) and, if non-nil, log it the same way ReadPump does (use the
same logger/format used in ReadPump) so connection-close failures are not
silently ignored; ensure ticker.Stop() remains and the defer mirrors the
ReadPump error-handling pattern.

Comment on lines 80 to 98
case message, ok := <-c.send:
c.conn.SetWriteDeadline(time.Now().Add(writeWait))
_ = c.conn.SetWriteDeadline(time.Now().Add(writeWait))
if !ok {
c.conn.WriteMessage(websocket.CloseMessage, []byte{})
_ = c.conn.WriteMessage(websocket.CloseMessage, []byte{})
return
}

w, err := c.conn.NextWriter(websocket.TextMessage)
if err != nil {
return
}
w.Write(message)
_, _ = w.Write(message)

// Batch queued messages
n := len(c.send)
for i := 0; i < n; i++ {
w.Write([]byte{'\n'})
w.Write(<-c.send)
_, _ = w.Write([]byte{'\n'})
_, _ = w.Write(<-c.send)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n internal/handlers/ws/client.go | head -120 | tail -50

Repository: PoyrazK/thecloud

Length of output: 1469


🏁 Script executed:

wc -l internal/handlers/ws/client.go

Repository: PoyrazK/thecloud

Length of output: 94


🏁 Script executed:

# Check if the file exists and examine the exact code
fd -t f "client.go" internal/handlers/ws/

Repository: PoyrazK/thecloud

Length of output: 90


🏁 Script executed:

rg -A 10 "type Client struct" internal/handlers/ws/client.go

Repository: PoyrazK/thecloud

Length of output: 331


Handle errors on the WebSocket message write path to prevent silent failures.

Lines 81, 83, 91, 96–97 silently discard errors from SetWriteDeadline, WriteMessage, and w.Write calls. These errors can mask broken connections and dropped messages. The codebase already uses proper error handling in the ping branch (lines 105–110); apply the same pattern to the message branch.

Suggested fix
 case message, ok := <-c.send:
-	_ = c.conn.SetWriteDeadline(time.Now().Add(writeWait))
+	if err := c.conn.SetWriteDeadline(time.Now().Add(writeWait)); err != nil {
+		c.logger.Error("failed to set write deadline", slog.String("error", err.Error()))
+		return
+	}
 	if !ok {
-		_ = c.conn.WriteMessage(websocket.CloseMessage, []byte{})
+		if err := c.conn.WriteMessage(websocket.CloseMessage, []byte{}); err != nil {
+			c.logger.Error("failed to write close message", slog.String("error", err.Error()))
+		}
 		return
 	}
 
 	w, err := c.conn.NextWriter(websocket.TextMessage)
 	if err != nil {
 		return
 	}
-	_, _ = w.Write(message)
+	if _, err := w.Write(message); err != nil {
+		c.logger.Error("failed to write message", slog.String("error", err.Error()))
+		return
+	}
 
 	// Batch queued messages
 	n := len(c.send)
 	for i := 0; i < n; i++ {
-		_, _ = w.Write([]byte{'\n'})
-		_, _ = w.Write(<-c.send)
+		if _, err := w.Write([]byte{'\n'}); err != nil {
+			c.logger.Error("failed to write newline", slog.String("error", err.Error()))
+			return
+		}
+		if _, err := w.Write(<-c.send); err != nil {
+			c.logger.Error("failed to write queued message", slog.String("error", err.Error()))
+			return
+		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case message, ok := <-c.send:
c.conn.SetWriteDeadline(time.Now().Add(writeWait))
_ = c.conn.SetWriteDeadline(time.Now().Add(writeWait))
if !ok {
c.conn.WriteMessage(websocket.CloseMessage, []byte{})
_ = c.conn.WriteMessage(websocket.CloseMessage, []byte{})
return
}
w, err := c.conn.NextWriter(websocket.TextMessage)
if err != nil {
return
}
w.Write(message)
_, _ = w.Write(message)
// Batch queued messages
n := len(c.send)
for i := 0; i < n; i++ {
w.Write([]byte{'\n'})
w.Write(<-c.send)
_, _ = w.Write([]byte{'\n'})
_, _ = w.Write(<-c.send)
}
case message, ok := <-c.send:
if err := c.conn.SetWriteDeadline(time.Now().Add(writeWait)); err != nil {
c.logger.Error("failed to set write deadline", slog.String("error", err.Error()))
return
}
if !ok {
if err := c.conn.WriteMessage(websocket.CloseMessage, []byte{}); err != nil {
c.logger.Error("failed to write close message", slog.String("error", err.Error()))
}
return
}
w, err := c.conn.NextWriter(websocket.TextMessage)
if err != nil {
return
}
if _, err := w.Write(message); err != nil {
c.logger.Error("failed to write message", slog.String("error", err.Error()))
return
}
// Batch queued messages
n := len(c.send)
for i := 0; i < n; i++ {
if _, err := w.Write([]byte{'\n'}); err != nil {
c.logger.Error("failed to write newline", slog.String("error", err.Error()))
return
}
if _, err := w.Write(<-c.send); err != nil {
c.logger.Error("failed to write queued message", slog.String("error", err.Error()))
return
}
}
🤖 Prompt for AI Agents
In `@internal/handlers/ws/client.go` around lines 80 - 98, The message send path
currently ignores errors from c.conn.SetWriteDeadline, c.conn.WriteMessage,
c.conn.NextWriter and w.Write which can hide broken connections; update the send
case that reads from c.send so each call checks and handles errors like the ping
branch does: capture and handle SetWriteDeadline error, check the result of
c.conn.WriteMessage (when !ok), check err from c.conn.NextWriter (already
returned but ensure any cleanup/logging), and check the error returns from
w.Write both for the initial message and for each batched write (the <-c.send
loop); on any write error close/cleanup the connection and return or log
appropriately, mirroring the error-handling pattern used in the ping handling
code.

Comment on lines 197 to +201
func (a *DockerAdapter) StopInstance(ctx context.Context, name string) error {
err := a.cli.ContainerStop(ctx, name, container.StopOptions{})
timeout := 30
err := a.cli.ContainerStop(ctx, name, container.StopOptions{
Timeout: &timeout,
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid the stop timeout magic number.

Derive it from DefaultOperationTimeout (or a named constant) so it stays consistent.

♻️ Suggested refactor
-	timeout := 30
+	timeoutSeconds := int(DefaultOperationTimeout.Seconds())
 	err := a.cli.ContainerStop(ctx, name, container.StopOptions{
-		Timeout: &timeout,
+		Timeout: &timeoutSeconds,
 	})
As per coding guidelines: Do not use magic numbers - use named constants instead.
🤖 Prompt for AI Agents
In `@internal/repositories/docker/adapter.go` around lines 197 - 201, Replace the
hard-coded stop timeout "30" in DockerAdapter.StopInstance with a named constant
derived from DefaultOperationTimeout (or introduce a local named constant like
defaultStopTimeout) so the value is consistent across the codebase; update the
Timeout passed into container.StopOptions to reference that constant (ensuring
the type matches the expected pointer type) instead of the magic number.

Comment on lines +441 to +479
// Inspect container with retries to allow time for IP assignment
var json container.InspectResponse
var err error

// Retry up to 30 times with 500ms backoff (15 seconds total)
for i := 0; i < 30; i++ {
json, err = a.cli.ContainerInspect(ctx, id)
if err != nil {
return "", fmt.Errorf("failed to inspect container: %w", err)
}

// Try to get IP from first network
for _, net := range json.NetworkSettings.Networks {
if net.IPAddress != "" {
return net.IPAddress, nil
// Add nil check for NetworkSettings
if json.NetworkSettings == nil {
goto retry
}

// Add nil check for Networks map
if len(json.NetworkSettings.Networks) == 0 {
goto retry
}

// Try to get IP from first network
for _, net := range json.NetworkSettings.Networks {
if net != nil && net.IPAddress != "" {
return net.IPAddress, nil
}
}

retry:
// Wait and retry
select {
case <-ctx.Done():
return "", ctx.Err()
case <-time.After(500 * time.Millisecond):
continue
}
}
return "", fmt.Errorf("no IP address found for container %s", id)

return "", fmt.Errorf("no IP address found for container %s after retries", id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace retry/backoff magic numbers with named constants.

This makes retries easier to tune and audit.

♻️ Suggested refactor
 const (
 	// ImagePullTimeout is the maximum time allowed for pulling a Docker image
 	ImagePullTimeout = 5 * time.Minute
 	// DefaultOperationTimeout is the default timeout for Docker operations
 	DefaultOperationTimeout = 30 * time.Second
 	// tracerName is the name of the tracer for this package
 	tracerName = "docker-adapter"
+	ipAssignMaxRetries  = 30
+	ipAssignRetryBackoff = 500 * time.Millisecond
 )
-	for i := 0; i < 30; i++ {
+	for i := 0; i < ipAssignMaxRetries; i++ {
 		json, err = a.cli.ContainerInspect(ctx, id)
 		if err != nil {
 			return "", fmt.Errorf("failed to inspect container: %w", err)
 		}
@@
-		case <-time.After(500 * time.Millisecond):
+		case <-time.After(ipAssignRetryBackoff):
 			continue
 		}
 	}
As per coding guidelines: Do not use magic numbers - use named constants instead.
🤖 Prompt for AI Agents
In `@internal/repositories/docker/adapter.go` around lines 441 - 479, The loop in
the container IP inspect block uses magic numbers (30 retries, 500ms delay);
replace them with named constants (e.g., inspectRetryCount int = 30 and
inspectRetryDelay = 500 * time.Millisecond) and use those constants in the for
loop and time.After call inside the retry logic in the method that calls
a.cli.ContainerInspect (the Inspect/ContainerInspect retry block in adapter.go).
Ensure the constants are typed appropriately (int for count, time.Duration for
delay), update the for loop to iterate to inspectRetryCount and the select case
to use time.After(inspectRetryDelay), and keep the existing ctx.Done() behavior
unchanged.

Comment thread tests/compute_e2e_test.go
}

client := &http.Client{Timeout: 10 * time.Second}
client := &http.Client{Timeout: 60 * time.Second}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace the new timeout magic number with a named constant.

Helps keep timeouts consistent and self-documenting.

Suggested change
 func TestComputeE2E(t *testing.T) {
 	t.Parallel()
+	const httpClientTimeout = 60 * time.Second
 	if err := waitForServer(); err != nil {
 		t.Fatalf("Failing Compute E2E test: %v", err)
 	}

-	client := &http.Client{Timeout: 60 * time.Second}
+	client := &http.Client{Timeout: httpClientTimeout}
As per coding guidelines: Do not use magic numbers - use named constants instead.
🤖 Prompt for AI Agents
In `@tests/compute_e2e_test.go` at line 24, Replace the magic timeout value used
when constructing the HTTP client by introducing a named constant (e.g.,
defaultHTTPClientTimeout) and use it in the http.Client{Timeout: ...}
expression; update the instantiation at the site where client :=
&http.Client{Timeout: 60 * time.Second} to use the new constant, and ensure the
constant is defined at package scope near related test constants so timeouts
remain consistent and self-documenting across tests that use http.Client or
Timeout.

@poyrazK poyrazK merged commit 7c24a8e into main Jan 30, 2026
25 of 26 checks passed
@poyrazK poyrazK deleted the feature/dns-service-e2e-tests branch February 9, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants