Skip to content

feat: OVS Networking Integration#14

Merged
poyrazK merged 30 commits intomainfrom
feature/ovs-networking
Jan 8, 2026
Merged

feat: OVS Networking Integration#14
poyrazK merged 30 commits intomainfrom
feature/ovs-networking

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Jan 8, 2026

Integrated Open vSwitch (OVS) as a networking backend for The Cloud. This PR includes:

  • OVS Adapter implementation
  • VPC Service updates to support OVS bridges
  • Cleaner migration logs (emojis removed)
  • Removal of deprecated release workflows

Summary by CodeRabbit

  • New Features

    • Open vSwitch (OVS) SDN support: subnets, VXLAN isolation, security groups, and OVS health check endpoint.
    • New CLI and SDK commands for managing VPCs, subnets, and security groups; instance launches can target subnets and receive private IPs.
  • Enhancements

    • VPC creation accepts CIDR and now surfaces CIDR, VXLAN ID, status, and ARN.
    • Improved instance/network wiring and richer API responses.
  • Documentation

    • Updated API, CLI, architecture, roadmap, and setup docs for OVS/subnets/security groups.
  • Chores

    • Local compose and setup updated to enable OVS; release workflow removed.

✏️ Tip: You can customize this high-level summary in your review settings.

poyrazK added 22 commits January 8, 2026 16:27
Copilot AI review requested due to automatic review settings January 8, 2026 16:38
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

Adds Open vSwitch (OVS) networking and subnet/security‑group support across the stack: new NetworkBackend interface with OVS and noop adapters, DB schema/migrations, repos, services, handlers, CLI/SDK changes, instance veth/OVS plumbing, docker/runtime privileges, tests, and documentation updates.

Changes

Cohort / File(s) Summary
OVS Adapter & Setup
internal/repositories/ovs/adapter.go, internal/repositories/ovs/adapter_test.go, scripts/setup-ovs.sh, .agent/workflows/setup-ovs.md
New OvsAdapter implementation (bridge/VXLAN/port/flow/veth/IP ops), integration test, OVS install script and workflow doc.
Networking Ports & Interfaces
internal/core/ports/... (network.go, instance.go, vpc.go, subnet.go, security_group.go)
New NetworkBackend and FlowRule; added Subnet/SecurityGroup port interfaces; Instance/VPC ports extended (subnetID, cidrBlock); ListBySubnet added.
Domain Models & Config
internal/core/domain/*, internal/platform/config.go, .env.example
New Subnet and SecurityGroup domain types; Instance adds SubnetID/PrivateIP/OvsPort; VPC adds CIDRBlock/VXLANID/Status/ARN; RBAC adds vpc:update; config defaults (DEFAULT_VPC_CIDR, pool range).
Services & Business Logic
internal/core/services/* (vpc, subnet, security_group, instance, stack, autoscaling, tests)
VpcService now uses NetworkBackend and creates OVS bridges/VXLANs; SubnetService manages CIDR/gateway; SecurityGroupService persists rules and syncs flows to network; InstanceService allocates subnet IPs, creates veth pairs, attaches to OVS and records network fields; constructors and signatures updated.
Postgres Repos & Migrations
internal/repositories/postgres/*, internal/repositories/postgres/migrations/*
New/updated repositories for VPC, Subnet, SecurityGroup, Instance network fields; instance ListBySubnet; many migrations added (vpc OVS fields, security groups, subnets, instance network columns) and goose headers.
API Handlers & Routing
internal/handlers/*, cmd/api/main.go
New Subnet and SecurityGroup handlers/routes; instance Launch and VPC Create handlers accept subnet_id and cidr_block; OVS health endpoint added and wired.
SDK & CLI
pkg/sdk/*, cmd/cloud/*.go
SDK adds subnet/security‑group clients/models; LaunchInstance and CreateVPC signatures changed; CLI adds subnet and sg commands/flags and updates examples.
Tests & Mocks
internal/core/services/*_test.go, internal/handlers/*_test.go, internal/repositories/docker/*_test.go, internal/core/services/shared_test.go
Tests and mocks updated for new signatures and behaviors; MockSubnetRepo and MockNetworkBackend added; container tests adapted to instance‑oriented API; integration OVS test included.
Runtime / Tooling
Dockerfile, docker-compose.yml, Makefile, .gitignore, .github/workflows/release.yml
Dockerfile adds openvswitch; docker-compose grants NET_ADMIN/NET_RAW and mounts /var/run/openvswitch; Makefile adds cleanup/dev targets; .env/config defaults added; release workflow removed; .gitignore entries made root-relative.
Docs / Swagger / Roadmaps
README.md, docs/*, docs/swagger/*, .agent/workflows/setup-ovs.md
Documentation, API reference and swagger updated to include OVS/subnets/security groups and new fields; roadmaps and deployment docs adjusted; workflow for OVS setup documented.

Sequence Diagram(s)

sequenceDiagram
participant CLI as CLI / SDK
participant API as API Server
participant VpcSvc as VpcService
participant Net as NetworkBackend (OVS)
participant DB as Postgres

CLI->>API: POST /vpcs { name, cidr_block }
API->>VpcSvc: CreateVPC(ctx, name, cidr)
VpcSvc->>Net: CreateBridge(ctx, bridgeName, vxlanID)
VpcSvc->>DB: INSERT vpc (network_id=bridgeName, cidr_block, vxlan_id, status, arn)
alt DB success
VpcSvc-->>API: return VPC
else DB failure
VpcSvc->>Net: DeleteBridge(ctx, bridgeName)
VpcSvc-->>API: return error
end
Loading
sequenceDiagram
participant CLI as CLI / SDK
participant API as API Server
participant InstSvc as InstanceService
participant SubnetRepo as SubnetRepository
participant Net as NetworkBackend (OVS)
participant DB as Postgres
participant Runtime as Compute Adapter

CLI->>API: POST /instances { name, image, vpc_id, subnet_id, volumes }
API->>InstSvc: LaunchInstance(ctx, name, image, ports, vpcID, subnetID, volumes)
InstSvc->>SubnetRepo: GetByID(ctx, subnetID)
SubnetRepo->>DB: SELECT subnet row
DB-->>SubnetRepo: subnet (cidr, gateway)
InstSvc->>InstSvc: allocateIP(subnet)
InstSvc->>Runtime: CreateInstance(...)  -- launch container
Runtime-->>InstSvc: container id
InstSvc->>Net: CreateVethPair(ctx, hostEnd, containerEnd)
InstSvc->>Net: AttachVethToBridge(ctx, bridgeName, hostEnd)
InstSvc->>Net: SetVethIP(ctx, hostEnd, ip, cidr)
InstSvc->>DB: INSERT instance with subnet_id, private_ip, ovs_port
InstSvc-->>API: return Instance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰
I nibble cables, spin a bridge of green,
OVS leaves shimmer where the packets preen,
Subnets like meadows, rules keep the way,
I hop through tunnels where the VXLANs play,
A joyful thump — our network's seen!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: OVS Networking Integration' is concise, specific, and clearly reflects the primary change—integration of Open vSwitch as a networking backend.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates Open vSwitch (OVS) as the networking backend for The Cloud, replacing Docker networks with software-defined networking capabilities. The changes introduce subnet management, security groups with OpenFlow rules, and advanced IPAM functionality.

Key Changes:

  • OVS adapter implementation with bridge, VXLAN, and flow rule management
  • VPC service refactored from Docker networks to OVS bridges with CIDR and VXLAN support
  • New Subnet and Security Group services with full CRUD operations
  • Instance networking enhanced with veth pair creation and private IP allocation

Reviewed changes

Copilot reviewed 106 out of 107 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
internal/repositories/ovs/adapter.go New OVS adapter implementing NetworkBackend interface for bridge/port/flow management
internal/core/services/vpc.go Refactored to use OVS bridges instead of Docker networks
internal/core/services/subnet.go New service for subnet CRUD with CIDR validation and gateway IP calculation
internal/core/services/security_group.go New service translating security rules to OVS flow rules
internal/core/services/instance.go Enhanced with veth pair setup, IP allocation, and OVS bridge attachment
internal/repositories/postgres/migrations/029_update_vpcs_for_ovs.up.sql Migration adding CIDR, VXLAN, status, and ARN columns to VPCs
internal/repositories/postgres/migrations/030-032_*.sql New tables for security groups, subnets, and instance network fields
cmd/api/main.go Added OVS adapter initialization and new handler registrations
pkg/sdk/*.go Updated SDK with new CIDR parameter and subnet/security group operations
cmd/cloud/*.go New CLI commands for subnet and security group management
docker-compose.yml Added NET_ADMIN capability and OVS socket mount
scripts/setup-ovs.sh New setup script for OVS installation and configuration
scripts/setup_path.sh Removed emojis from output messages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

var (
bridgeNameRegex = regexp.MustCompile(`^[a-zA-Z0-9-]+$`)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex only allows alphanumeric characters and hyphens, but OVS bridge names can also contain underscores. Additionally, there's no check for maximum length (OVS typically limits interface names to 15 characters). This could cause validation to reject valid bridge names or allow names that will fail when passed to OVS.

Update the regex to match OVS's actual bridge name requirements and add length validation.

Copilot uses AI. Check for mistakes.
Comment on lines +486 to +528
func (s *InstanceService) allocateIP(ctx context.Context, subnet *domain.Subnet) (string, error) {
_, ipNet, err := net.ParseCIDR(subnet.CIDRBlock)
if err != nil {
return "", err
}

instances, err := s.repo.ListBySubnet(ctx, subnet.ID)
if err != nil {
return "", err
}

usedIPs := make(map[string]bool)
for _, inst := range instances {
if inst.PrivateIP != "" {
usedIPs[inst.PrivateIP] = true
}
}
usedIPs[subnet.GatewayIP] = true

// Find first available IP
ip := make(net.IP, len(ipNet.IP))
copy(ip, ipNet.IP)

for {
// Increment IP
for i := len(ip) - 1; i >= 0; i-- {
ip[i]++
if ip[i] > 0 {
break
}
}

if !ipNet.Contains(ip) {
break
}

if !usedIPs[ip.String()] && s.isValidHostIP(ip, ipNet) {
return ip.String(), nil
}
}

return "", fmt.Errorf("no available IPs in subnet")
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IP allocation algorithm has O(n * m) complexity where n is the number of available IPs in the subnet and m is the number of existing instances. For large subnets (e.g., /16), this could iterate through 65,534 IPs checking against the used IP map each time. This will cause significant performance degradation as subnets fill up.

Consider:

  1. Using a bitmap or more efficient data structure for IP tracking
  2. Implementing a persistent IP allocation table in the database
  3. Adding early termination logic or limiting subnet sizes

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,9 @@
-- +goose Up

ALTER TABLE vpcs ADD COLUMN IF NOT EXISTS cidr_block VARCHAR(18);
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cidr_block field allows VARCHAR(18) but CIDR blocks can be longer. IPv6 CIDR blocks can be up to 43 characters (e.g., "2001:0db8:85a3:0000:0000:8a2e:0370:7334/128"). Even IPv4 with longer netmasks could exceed 18 characters.

Increase the column size to at least 43 characters to support IPv6, or 20 for IPv4-only if IPv6 is not planned.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +9
-- Set defaults for existing rows
UPDATE vpcs SET cidr_block = '10.0.0.0/16', vxlan_id = 100, status = 'active' WHERE cidr_block IS NULL;
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration sets default values for existing rows, but all three columns (cidr_block, vxlan_id, status) are set to the same values for all existing VPCs. This means all existing VPCs will have:

  • The same CIDR block (10.0.0.0/16) causing IP conflicts
  • The same VXLAN ID (100) breaking network isolation
  • All marked as 'active' even if they weren't

This will break existing VPCs. Consider either:

  1. Generating unique values for each existing VPC
  2. Requiring manual migration of existing VPCs
  3. Documenting that this is a breaking change requiring VPC recreation

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +76
func (h *SubnetHandler) Get(c *gin.Context) {
idStr := c.Param("id")
// For simple Get, we use id only for now. Or we could pass vpcID if we had it in path.
// The service GetSubnet requires vpcID if searching by name, but here we use ID.
subnet, err := h.svc.GetSubnet(c.Request.Context(), idStr, uuid.Nil)
if err != nil {
httputil.Error(c, err)
return
}

httputil.Success(c, http.StatusOK, subnet)
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetSubnet handler passes uuid.Nil as the vpcID when calling the service. This means subnet lookups by name will fail since the service requires a valid vpcID to search by name. The comment acknowledges this ("Or we could pass vpcID if we had it in path") but doesn't address it.

Either:

  1. Require vpcID in the path or query params for name-based lookups
  2. Modify the service to handle lookup by ID without requiring vpcID
  3. Document that this endpoint only supports ID-based lookups

Copilot uses AI. Check for mistakes.
Comment on lines 32 to 35
var req struct {
Name string `json:"name" binding:"required"`
Name string `json:"name" binding:"required"`
CIDRBlock string `json:"cidr_block"`
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API requires a valid CIDR block to create a subnet, but the VPC creation command in the CLI has a default CIDR (10.0.0.0/16). However, the CreateVPC handler makes cidr_block optional (no binding:"required" tag). This inconsistency means the API will accept VPC creation without a CIDR, which could cause issues when creating subnets or allocating IPs.

Make the cidr_block field required in the handler to match the CLI's expectation and the service's CIDR validation logic.

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +126
func (s *SecurityGroupService) RemoveRule(ctx context.Context, ruleID uuid.UUID) error {
// 1. Get rule to know groupID
// Implementation omitted: would need a GetRuleByID in repo
return nil
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RemoveRule method is defined but has no implementation (returns nil immediately). The function is part of the public SecurityGroupService interface but is non-functional. If called, it will silently succeed without actually removing any rules.

Either implement the function or remove it from the interface if it's not yet ready.

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +159
cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up")
_ = cmd.Run()
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from running the "ip link set" command is silently ignored. If bringing up the container end of the veth pair fails, the network interface will remain down and the instance won't have network connectivity, but no error will be logged or reported.

Add proper error handling and logging for this operation.

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +200
return ports.FlowRule{
Priority: rule.Priority,
Match: strings.Join(matchParts, ","),
Actions: "allow", // Simplified
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security group flow translation uses a hardcoded "allow" action which doesn't align with typical firewall behavior. Security groups typically work with default-deny policies where only explicitly allowed traffic passes. This implementation would need to be coordinated with a default drop rule on the OVS bridge, but there's no indication that such a rule is being set up.

Consider:

  1. Documenting the expected default bridge behavior
  2. Implementing proper allow/drop actions based on the rule type
  3. Setting up default bridge rules when VPCs are created

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +63
func (a *OvsAdapter) CreateBridge(ctx context.Context, name string, vxlanID int) error {
if !bridgeNameRegex.MatchString(name) {
return errors.New(errors.InvalidInput, "invalid bridge name")
}

cmd := exec.CommandContext(ctx, a.ovsPath, "add-br", name)
if err := cmd.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to add bridge", err)
}

return nil
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vxlanID parameter is accepted but never used in the CreateBridge function. This means VXLANs are not being configured when bridges are created, breaking the multi-tenant isolation that VXLAN IDs are meant to provide.

Either:

  1. Implement VXLAN configuration in this function using the vxlanID parameter
  2. Remove the parameter if VXLAN setup happens elsewhere
  3. Document why this parameter is unused

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
internal/repositories/postgres/migrations/008_create_volumes_table.up.sql (2)

6-6: Add CHECK constraint to ensure size_gb is positive.

The size_gb column lacks a CHECK constraint, allowing negative or zero values to be inserted, which would be invalid for volume sizes.

✅ Proposed fix to add CHECK constraint
-    size_gb INTEGER NOT NULL DEFAULT 1,
+    size_gb INTEGER NOT NULL DEFAULT 1 CHECK (size_gb > 0),

7-7: Add CHECK constraint to enforce valid status values.

The status column accepts any VARCHAR up to 50 characters. Without a CHECK constraint, invalid status values could be inserted, compromising data integrity and making status-based queries unreliable.

✅ Proposed fix to add CHECK constraint
-    status VARCHAR(50) NOT NULL DEFAULT 'AVAILABLE',
+    status VARCHAR(50) NOT NULL DEFAULT 'AVAILABLE' CHECK (status IN ('AVAILABLE', 'ATTACHED', 'CREATING', 'DELETING', 'ERROR')),

Note: Adjust the allowed status values based on your application's volume lifecycle states.

internal/repositories/postgres/migrations/016_create_secrets.up.sql (1)

9-10: Add a trigger to automatically update the updated_at timestamp.

The updated_at field will only default to NOW() on INSERT. To automatically update it on modifications, add a trigger function:

⏰ Add trigger to auto-update `updated_at`

Add this to the migration after the table creation:

-- Create trigger function for updating updated_at timestamp
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Create trigger for secrets table
CREATE TRIGGER update_secrets_updated_at
    BEFORE UPDATE ON secrets
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();
internal/repositories/postgres/migrations/024_create_audit_logs.up.sql (1)

3-13: Reconsider the foreign key constraint on user_id for audit log integrity.

The user_id column has a foreign key constraint to users(id) with NOT NULL but no ON DELETE clause specified. This creates a potential issue:

  • If a user is deleted, PostgreSQL will default to ON DELETE NO ACTION, which will block user deletion if audit logs exist for that user.
  • Audit logs should typically be preserved even after user deletion for compliance and historical tracking purposes.

Since user_id is NOT NULL, you cannot use ON DELETE SET NULL. Consider one of these approaches:

  1. Remove the NOT NULL constraint and use ON DELETE SET NULL to preserve logs while marking the user as deleted
  2. Use ON DELETE NO ACTION explicitly and handle user deletion at the application level (soft deletes)
  3. Remove the foreign key constraint entirely for audit logs to ensure they remain immutable
Proposed fix: Remove NOT NULL and add ON DELETE SET NULL
 CREATE TABLE audit_logs (
     id UUID PRIMARY KEY,
-    user_id UUID NOT NULL REFERENCES users(id),
+    user_id UUID REFERENCES users(id) ON DELETE SET NULL,
     action VARCHAR(255) NOT NULL,
     resource_type VARCHAR(255),
     resource_id VARCHAR(255),
     details JSONB,
     ip_address VARCHAR(45),
     user_agent TEXT,
     created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
 );
internal/repositories/postgres/migrations/018_create_caches.up.sql (1)

13-13: Encrypt cache credentials at rest.

The password column stores Redis instance credentials in plaintext. While the codebase has encryption utilities (AES-256-GCM in pkg/crypto), they are not applied here. Apply the same encryption pattern used for secrets (pkg/crypto.Encrypt/Decrypt) to protect these credentials at rest in the database, and decrypt them when needed for connecting to Redis or passing to the API.

internal/repositories/postgres/migrations/015_create_databases.up.sql (2)

14-14: CRITICAL: Plaintext password storage is a security vulnerability.

Storing database passwords in plaintext violates fundamental security principles and creates compliance risks. Passwords must be encrypted or, preferably, managed through a secrets management system (HashiCorp Vault, AWS Secrets Manager, etc.).

If passwords must be stored, use application-layer encryption with a dedicated key management solution.

🔒 Recommended approaches

Option 1 (Preferred): Use a secrets management system

  • Store credentials in HashiCorp Vault, AWS Secrets Manager, or similar
  • Store only a reference/ID to the secret in this table

Option 2: Application-layer encryption

  • Encrypt passwords before storing using a secure encryption library
  • Manage encryption keys separately (never in the same database)
  • Consider envelope encryption for added security

Option 3: Eliminate storage

  • Generate passwords on-demand and return only once during creation
  • Use certificate-based authentication where possible

9-9: Add validation constraints for status and port.

The status and port fields lack validation:

  • port should be constrained to valid port range (0-65535)
  • status should be validated against expected values
♻️ Proposed constraints
     status VARCHAR(50) DEFAULT 'CREATING',
+    CONSTRAINT chk_status CHECK (status IN ('CREATING', 'RUNNING', 'STOPPED', 'FAILED', 'DELETING')),
     vpc_id UUID REFERENCES vpcs(id) ON DELETE SET NULL,
     container_id VARCHAR(255),
     port INT,
+    CONSTRAINT chk_port CHECK (port >= 0 AND port <= 65535),
     username VARCHAR(255),

Adjust the status values to match your application's state machine.

Also applies to: 12-12

internal/repositories/postgres/migrations/007_create_events_table.up.sql (1)

3-10: Add index on resource_id and consider schema refinements.

The events table lacks an index on resource_id, which is critical for query performance. Events are typically filtered by resource (e.g., "fetch all events for instance X"), and without an index, these queries will scan the entire table as it grows.

Additionally:

  • VARCHAR(50) sizing: This is inconsistent with audit_logs, which uses VARCHAR(255) for resource_id. Since the domain model indicates resource IDs are UUIDs (36 characters), 50 characters leaves minimal headroom and may cause issues if IDs exceed this limit in the future.
  • Missing DEFAULT for id: Recent migrations in this codebase consistently use DEFAULT gen_random_uuid() for UUID primary keys. Consider adding this to reduce application complexity and align with the established pattern.
Suggested changes
 CREATE TABLE IF NOT EXISTS events (
-    id UUID PRIMARY KEY,
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
     action VARCHAR(50) NOT NULL,
-    resource_id VARCHAR(50) NOT NULL,
+    resource_id VARCHAR(255) NOT NULL,
     resource_type VARCHAR(50) NOT NULL,
     metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
     created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
 );
 
+CREATE INDEX idx_events_resource_id ON events(resource_id);
 CREATE INDEX idx_events_created_at ON events(created_at DESC);
🤖 Fix all issues with AI agents
In @cmd/api/main.go:
- Around line 136-139: When ovs.NewOvsAdapter returns an error you currently
leave networkBackend nil which can cause panics; update the initialization so
that on error you either (a) assign a valid fallback adapter instance (e.g.,
create or call your mock/noop adapter constructor) to networkBackend or (b) fail
fast (log error and exit) so callers never see a nil; also update the
/health/ovs handler to defensively check networkBackend for nil and return an
appropriate degraded/unavailable health response instead of dereferencing it.
Refer to networkBackend, ovs.NewOvsAdapter, and the /health/ovs health handler
when making these changes.

In @internal/repositories/postgres/migrations/003_add_container_id.up.sql:
- Around line 1-2: The .down.sql counterparts are missing the required Goose
directive and two migrations (031, 032) lack .down.sql files entirely; create a
.down.sql file for each .up.sql (including migrations 031 and 032) named to
match their .up.sql counterparts (e.g., create 003_add_container_id.down.sql)
and ensure the very first non-comment line is the directive "-- +goose Down",
then add the appropriate rollback SQL that reverses the changes made in the
corresponding .up.sql (e.g., DROP COLUMN, DROP TABLE, or REVERT INDEX changes)
so each migration has a proper Up/Down pair for Goose to execute and roll back.

In @internal/repositories/postgres/security_group_repo.go:
- Around line 124-134: AddInstanceToGroup and RemoveInstanceFromGroup perform
direct INSERT/DELETE into instance_security_groups without verifying the caller
owns the instance or group; update both methods to first validate ownership by
querying the instance and security_group records (or a join) using the provided
instanceID and groupID and the caller identity from ctx (e.g., user/tenant ID),
return a permission error if the ownership check fails, and perform the
insert/delete inside a transaction to avoid races; reference the
AddInstanceToGroup and RemoveInstanceFromGroup methods and the
instance_security_groups table when locating the code to change.
🟠 Major comments (20)
internal/repositories/postgres/migrations/019_create_queues.up.sql-1-2 (1)

1-2: Add -- +goose Down directive to the down migration file.

The up migration file is correctly formatted with the -- +goose Up directive. However, the corresponding down migration file (019_create_queues.down.sql) is missing the -- +goose Down directive at the beginning. Add it to match the Goose migration standard:

Required change for down migration
-- +goose Down

DROP TABLE IF EXISTS queue_messages;
DROP TABLE IF EXISTS queues;
internal/repositories/postgres/migrations/026_create_snapshots.up.sql-5-5 (1)

5-5: Add ON DELETE clause to user_id foreign key.

The user_id foreign key lacks an ON DELETE clause, which defaults to RESTRICT behavior. This prevents user deletion when snapshots exist. Consider adding either ON DELETE CASCADE (if snapshots should be deleted with the user) or ON DELETE SET NULL with a nullable column (if snapshots should be preserved as orphaned records).

🔧 Proposed fix with CASCADE behavior
-    user_id UUID NOT NULL REFERENCES users(id),
+    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
internal/repositories/postgres/migrations/018_create_caches.up.sql-5-5 (1)

5-5: Add ON DELETE clauses to foreign key constraints.

The foreign keys to users(id) and vpcs(id) lack ON DELETE behavior specifications. When a referenced user or VPC is deleted, this can lead to constraint violations or orphaned cache records.

Consider adding appropriate cascading behavior:

  • For user_id: Likely ON DELETE CASCADE since caches should be removed when the owning user is deleted.
  • For vpc_id: Consider ON DELETE SET NULL if caches can exist without a VPC, or ON DELETE CASCADE if VPC deletion should remove associated caches.
🔧 Proposed fix
-    user_id UUID NOT NULL REFERENCES users(id),
+    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
-    vpc_id UUID REFERENCES vpcs(id),
+    vpc_id UUID REFERENCES vpcs(id) ON DELETE SET NULL,

Also applies to: 10-10

scripts/setup-ovs.sh-16-16 (1)

16-16: Ensure setfacl package is installed and handle potential failures.

The setfacl command requires the acl package, which may not be installed by default on all systems. Additionally, ACL permissions on the socket may not persist across system reboots or OVS service restarts.

🔧 Recommended fixes

Option 1: Install acl package and add fallback

 echo "--- Configuring Permissions ---"
+sudo apt-get install -y acl
 # Allow current user to manage OVS without sudo
 USER_NAME=$(whoami)
-# Note: On some systems, adding to the 'ovs' or 'openvswitch' group works if it exists.
-# Otherwise, we use ACLs or set permissions on the socket.
-sudo setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock
+# Try adding to openvswitch group first, fall back to ACL
+if getent group openvswitch >/dev/null; then
+  sudo usermod -aG openvswitch $USER_NAME
+  echo "Added $USER_NAME to openvswitch group"
+else
+  sudo setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock
+  echo "Set ACL for $USER_NAME on OVS socket"
+fi
+echo "Note: You may need to log out and back in for group changes to take effect"

Option 2: Create a systemd drop-in for persistent socket permissions

Add after line 20:

# Create systemd override for persistent socket permissions
sudo mkdir -p /etc/systemd/system/openvswitch-switch.service.d
cat <<EOF | sudo tee /etc/systemd/system/openvswitch-switch.service.d/override.conf
[Service]
ExecStartPost=/bin/bash -c 'sleep 1 && setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock'
EOF
sudo systemctl daemon-reload

Committable suggestion skipped: line range outside the PR's diff.

internal/repositories/postgres/migrations/029_update_vpcs_for_ovs.up.sql-9-9 (1)

9-9: Risk: All existing VPCs assigned the same VXLAN ID.

The migration sets vxlan_id = 100 for all existing VPCs with cidr_block IS NULL. This conflicts with how new VPCs are handled—they receive unique VXLAN IDs derived from their UUID (int(vpcID[0]) + 100). Since VXLAN IDs identify virtual networks in OVS, having multiple VPCs with the same ID may prevent network isolation. Additionally, new VPCs could generate a VXLAN ID of 100, causing collisions with existing VPCs.

Assign unique VXLAN IDs to existing VPCs using a sequence or row-based calculation to align with the pattern used for new VPCs.

Alternative: Generate unique VXLAN IDs
UPDATE vpcs 
SET 
  cidr_block = '10.0.0.0/16', 
  vxlan_id = 100 + (row_number() OVER (ORDER BY id))::INT, 
  status = 'active' 
WHERE cidr_block IS NULL;
internal/repositories/postgres/migrations/029_update_vpcs_for_ovs.up.sql-3-3 (1)

3-3: Use INET type instead of VARCHAR(18) for consistency with the rest of the schema.

The codebase already uses the INET type for IP address storage elsewhere (e.g., instances.private_ip in migration 032, subnets.gateway_ip in migration 031). The cidr_block column should also use INET to maintain consistency and benefit from PostgreSQL's built-in validation of CIDR notation. The INET type also naturally supports both IPv4 and IPv6 CIDR blocks if needed in the future.

internal/repositories/postgres/migrations/032_add_instance_network_fields.up.sql-2-2 (1)

2-2: Add ON DELETE behavior to the subnet_id foreign key constraint.

The foreign key on subnet_id lacks an ON DELETE action. If a subnet is deleted, instances will retain dangling references. Add ON DELETE RESTRICT (prevent subnet deletion if instances depend on it), ON DELETE SET NULL (allow deletion and clear references), or ON DELETE CASCADE (delete instances with their subnet) based on your application's requirements.

Example fix with ON DELETE RESTRICT
-ALTER TABLE instances ADD COLUMN subnet_id UUID REFERENCES subnets(id);
+ALTER TABLE instances ADD COLUMN subnet_id UUID REFERENCES subnets(id) ON DELETE RESTRICT;
internal/handlers/vpc_handler.go-32-35 (1)

32-35: Add CIDR block format validation to VPC creation.

The CIDRBlock field is optional in the handler and the service layer lacks validation. Invalid CIDR blocks are only caught later when creating subnets (via net.ParseCIDR in the subnet service), resulting in poor user experience. Validate the CIDR block format at VPC creation time in the service layer using net.ParseCIDR before persisting the VPC.

internal/handlers/instance_handler.go-136-142 (1)

136-142: Inconsistent error handling for SubnetID vs VpcID.

Invalid subnet_id is silently ignored (line 139 checks err == nil), while invalid vpc_id returns a 400 error (lines 120-123). This inconsistency could confuse API consumers.

🐛 Suggested fix for consistent error handling
 	var subnetUUID *uuid.UUID
 	if req.SubnetID != "" {
 		id, err := uuid.Parse(req.SubnetID)
-		if err == nil {
-			subnetUUID = &id
+		if err != nil {
+			c.JSON(http.StatusBadRequest, gin.H{"error": "invalid subnet_id format"})
+			return
 		}
+		subnetUUID = &id
 	}
internal/repositories/postgres/migrations/031_create_subnets.up.sql-4-5 (1)

4-5: Add ON DELETE CASCADE to user_id foreign key for consistency.

The user_id FK is missing an ON DELETE clause while vpc_id has ON DELETE CASCADE. This creates inconsistent behavior compared to other resource tables in the codebase: security_groups (030), containers (023), gateway (022), cron (021), notify (020), queues (019), functions (017), and secrets (016) all specify ON DELETE CASCADE on their user_id FKs.

Add ON DELETE CASCADE to line 4:

user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,

This ensures subnets are deleted when their owner is deleted, consistent with the established pattern for resource ownership.

internal/core/services/subnet.go-52-54 (1)

52-54: Incomplete CIDR containment validation.

The check only verifies that the subnet's network address is within the VPC CIDR, but doesn't ensure the entire subnet range fits. For example, if VPC is 10.0.0.0/24 and subnet is 10.0.0.200/24, the network address 10.0.0.200 is within the VPC, but the subnet extends to 10.0.0.255 which is fine, yet a 10.0.0.200/23 would extend beyond.

Additionally, there's no validation for overlapping subnets within the same VPC, which could lead to IP conflicts.

🔧 Suggested fix for complete CIDR validation
 	if !vpcNet.Contains(ip) {
 		return nil, errors.New(errors.InvalidInput, "subnet CIDR must be within VPC CIDR range")
 	}
+
+	// Verify the entire subnet range fits within VPC
+	lastIP := make(net.IP, len(ip))
+	copy(lastIP, ip)
+	for i := range subnetNet.Mask {
+		lastIP[i] |= ^subnetNet.Mask[i]
+	}
+	if !vpcNet.Contains(lastIP) {
+		return nil, errors.New(errors.InvalidInput, "subnet CIDR extends beyond VPC CIDR range")
+	}
+
+	// TODO: Check for overlapping subnets within the VPC
internal/handlers/security_group_handler.go-106-110 (1)

106-110: Avoid binding directly to domain type for security rules.

Binding directly to domain.SecurityRule allows clients to potentially set fields like ID, GroupID, and CreatedAt that should be server-generated. Use a dedicated request struct to whitelist only the expected input fields.

🔧 Suggested fix
+	var req struct {
+		Direction string `json:"direction" binding:"required"`
+		Protocol  string `json:"protocol" binding:"required"`
+		PortMin   int    `json:"port_min"`
+		PortMax   int    `json:"port_max"`
+		CIDR      string `json:"cidr" binding:"required"`
+		Priority  int    `json:"priority"`
+	}
-	var req domain.SecurityRule
 	if err := c.ShouldBindJSON(&req); err != nil {
 		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
 		return
 	}

-	rule, err := h.svc.AddRule(c.Request.Context(), groupID, req)
+	rule, err := h.svc.AddRule(c.Request.Context(), groupID, domain.SecurityRule{
+		Direction: domain.RuleDirection(req.Direction),
+		Protocol:  req.Protocol,
+		PortMin:   req.PortMin,
+		PortMax:   req.PortMax,
+		CIDR:      req.CIDR,
+		Priority:  req.Priority,
+	})

Committable suggestion skipped: line range outside the PR's diff.

internal/core/services/security_group.go-122-126 (1)

122-126: RemoveRule is unimplemented but exposed in interface.

This method returns nil without doing anything. If called via the API, rules won't be removed. Consider implementing it or returning an error.

🐛 Proposed fix: Return not-implemented error
 func (s *SecurityGroupService) RemoveRule(ctx context.Context, ruleID uuid.UUID) error {
-	// 1. Get rule to know groupID
-	// Implementation omitted: would need a GetRuleByID in repo
-	return nil
+	return errors.New(errors.Internal, "RemoveRule not implemented")
 }
cmd/api/main.go-332-342 (1)

332-342: Security group routes missing RBAC permission checks.

Unlike other protected routes (VPC, instances, volumes), security group endpoints don't use httputil.Permission() middleware. This may allow unauthorized access.

🐛 Proposed fix: Add permission middleware
 	// Security Group Routes (Protected)
 	sgGroup := r.Group("/security-groups")
 	sgGroup.Use(httputil.Auth(identitySvc))
 	{
-		sgGroup.POST("", sgHandler.Create)
-		sgGroup.GET("", sgHandler.List)
-		sgGroup.GET("/:id", sgHandler.Get)
-		sgGroup.DELETE("/:id", sgHandler.Delete)
-		sgGroup.POST("/:id/rules", sgHandler.AddRule)
-		sgGroup.POST("/attach", sgHandler.Attach)
+		sgGroup.POST("", httputil.Permission(rbacSvc, domain.PermissionVpcCreate), sgHandler.Create)
+		sgGroup.GET("", httputil.Permission(rbacSvc, domain.PermissionVpcRead), sgHandler.List)
+		sgGroup.GET("/:id", httputil.Permission(rbacSvc, domain.PermissionVpcRead), sgHandler.Get)
+		sgGroup.DELETE("/:id", httputil.Permission(rbacSvc, domain.PermissionVpcDelete), sgHandler.Delete)
+		sgGroup.POST("/:id/rules", httputil.Permission(rbacSvc, domain.PermissionVpcUpdate), sgHandler.AddRule)
+		sgGroup.POST("/attach", httputil.Permission(rbacSvc, domain.PermissionVpcUpdate), sgHandler.Attach)
 	}

Note: You may want to define dedicated PermissionSecurityGroup* constants instead of reusing VPC permissions.

Committable suggestion skipped: line range outside the PR's diff.

internal/core/services/instance.go-140-167 (1)

140-167: Silent error handling masks OVS networking failures.

Multiple errors are silently ignored with _ = .... If veth creation or bridge attachment fails, the instance launches without proper networking, but no error is returned or logged (except for the outer CreateVethPair which has an if err == nil guard).

🐛 Proposed fix: Add error handling and logging
 	// 4a. OVS Post-launch plumb
 	if inst.OvsPort != "" && s.network != nil {
 		vethContainer := fmt.Sprintf("eth0-%s", inst.ID.String()[:8])
-		if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err == nil {
-			vpc, _ := s.vpcRepo.GetByID(ctx, *inst.VpcID)
-			_ = s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort)
+		if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err != nil {
+			s.logger.Error("failed to create veth pair", "instance_id", inst.ID, "error", err)
+			// Continue - instance is launched but networking may be degraded
+		} else {
+			vpc, err := s.vpcRepo.GetByID(ctx, *inst.VpcID)
+			if err != nil {
+				s.logger.Error("failed to get VPC for bridge attachment", "vpc_id", inst.VpcID, "error", err)
+			} else if err := s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort); err != nil {
+				s.logger.Error("failed to attach veth to bridge", "bridge", vpc.NetworkID, "error", err)
+			}
 
 			// Set IP on the host simulation "container" end
 			if inst.SubnetID != nil {
-				subnet, _ := s.subnetRepo.GetByID(ctx, *inst.SubnetID)
+				subnet, err := s.subnetRepo.GetByID(ctx, *inst.SubnetID)
+				if err != nil {
+					s.logger.Error("failed to get subnet for IP assignment", "subnet_id", inst.SubnetID, "error", err)
+				} else if subnet != nil {
-				if subnet != nil {
 					_, ipNet, _ := net.ParseCIDR(subnet.CIDRBlock)
 					ones, _ := ipNet.Mask.Size()
-					_ = s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones))
+					if err := s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones)); err != nil {
+						s.logger.Error("failed to set veth IP", "veth", vethContainer, "ip", inst.PrivateIP, "error", err)
+					}
 
 					cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up")
-					_ = cmd.Run()
+					if err := cmd.Run(); err != nil {
+						s.logger.Error("failed to bring up veth interface", "veth", vethContainer, "error", err)
+					}
 				}
 			}
-		}
+		}
 	}

Committable suggestion skipped: line range outside the PR's diff.

internal/repositories/ovs/adapter.go-177-217 (1)

177-217: Validate inputs to prevent command injection.

The veth-related methods (CreateVethPair, AttachVethToBridge, DeleteVethPair, SetVethIP) use user-provided interface names, IP addresses, and CIDR values directly in command construction without validation. This poses a command injection risk if inputs contain shell metacharacters.

🔒 Proposed validation

Add validation functions and use them in each method:

import (
	"net"
	"regexp"
	"strconv"
	"strings"
)

var (
	interfaceNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
)

// Helper to validate interface names
func validateInterfaceName(name string) error {
	if !interfaceNameRegex.MatchString(name) || len(name) > 15 {
		return errors.New(errors.InvalidInput, "invalid interface name")
	}
	return nil
}

// Helper to validate CIDR
func validateCIDR(cidr string) error {
	bits, err := strconv.Atoi(cidr)
	if err != nil || bits < 0 || bits > 32 {
		return errors.New(errors.InvalidInput, "invalid CIDR prefix length")
	}
	return nil
}

Then update each method:

 func (a *OvsAdapter) CreateVethPair(ctx context.Context, hostEnd, containerEnd string) error {
+	if err := validateInterfaceName(hostEnd); err != nil {
+		return err
+	}
+	if err := validateInterfaceName(containerEnd); err != nil {
+		return err
+	}
 	cmd := exec.CommandContext(ctx, "ip", "link", "add", hostEnd, "type", "veth", "peer", "name", containerEnd)
 	if err := cmd.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to create veth pair", err)
 	}
 	return nil
 }

 func (a *OvsAdapter) DeleteVethPair(ctx context.Context, hostEnd string) error {
+	if err := validateInterfaceName(hostEnd); err != nil {
+		return err
+	}
 	cmd := exec.CommandContext(ctx, "ip", "link", "del", hostEnd)
 	if err := cmd.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to delete veth pair", err)
 	}
 	return nil
 }

 func (a *OvsAdapter) SetVethIP(ctx context.Context, vethEnd, ip, cidr string) error {
+	if err := validateInterfaceName(vethEnd); err != nil {
+		return err
+	}
+	if net.ParseIP(ip) == nil {
+		return errors.New(errors.InvalidInput, "invalid IP address")
+	}
+	if err := validateCIDR(cidr); err != nil {
+		return err
+	}
 	cmd := exec.CommandContext(ctx, "ip", "addr", "add", fmt.Sprintf("%s/%s", ip, cidr), "dev", vethEnd)
 	if err := cmd.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to set veth ip", err)
 	}
 
 	cmdUp := exec.CommandContext(ctx, "ip", "link", "set", vethEnd, "up")
 	if err := cmdUp.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to bring veth up", err)
 	}
 	return nil
 }

Committable suggestion skipped: line range outside the PR's diff.

internal/repositories/postgres/security_group_repo.go-136-158 (1)

136-158: Information disclosure: missing authorization and inconsistent data.

The ListInstanceGroups method has two issues:

  1. Security: It doesn't verify user ownership of the instance, allowing any user to query security groups associated with instances they don't own.
  2. Consistency: It doesn't load rules for the security groups, unlike GetByID and GetByName.
🔒 Proposed fixes
 func (r *SecurityGroupRepository) ListInstanceGroups(ctx context.Context, instanceID uuid.UUID) ([]*domain.SecurityGroup, error) {
+	userID := appcontext.UserIDFromContext(ctx)
 	query := `
 		SELECT sg.id, sg.user_id, sg.vpc_id, sg.name, sg.description, sg.arn, sg.created_at 
 		FROM security_groups sg
 		JOIN instance_security_groups isg ON sg.id = isg.group_id
-		WHERE isg.instance_id = $1
+		WHERE isg.instance_id = $1
+		AND EXISTS (SELECT 1 FROM instances WHERE id = $1 AND user_id = $2)
 	`
-	rows, err := r.db.Query(ctx, query, instanceID)
+	rows, err := r.db.Query(ctx, query, instanceID, userID)
 	if err != nil {
-		return nil, err
+		return nil, errors.Wrap(errors.Internal, "failed to list instance groups", err)
 	}
 	defer rows.Close()
 
 	var groups []*domain.SecurityGroup
 	for rows.Next() {
 		var sg domain.SecurityGroup
 		if err := rows.Scan(&sg.ID, &sg.UserID, &sg.VPCID, &sg.Name, &sg.Description, &sg.ARN, &sg.CreatedAt); err != nil {
-			return nil, err
+			return nil, errors.Wrap(errors.Internal, "failed to scan security group", err)
 		}
+		rules, err := r.getRulesForGroup(ctx, sg.ID)
+		if err != nil {
+			return nil, err
+		}
+		sg.Rules = rules
 		groups = append(groups, &sg)
 	}
 	return groups, nil
 }

Committable suggestion skipped: line range outside the PR's diff.

internal/repositories/ovs/adapter.go-52-63 (1)

52-63: Incomplete implementation: vxlanID parameter is accepted but ignored.

The CreateBridge method accepts a vxlanID parameter as part of the NetworkBackend interface contract (defined in internal/core/ports/network.go:18) and callers pass this value (e.g., in internal/core/services/vpc.go:45), but the implementation ignores it. The bridge is created without any VXLAN configuration, making the parameter useless to callers.

Either implement VXLAN configuration using the vxlanID parameter, or clarify with the team if this feature should be removed from the interface contract.

internal/repositories/ovs/adapter.go-144-170 (1)

144-170: Validate OpenFlow rule parameters.

The AddFlowRule and DeleteFlowRule methods use user-provided Match and Actions strings directly in command construction. While OpenFlow syntax is constrained, malicious input could still cause issues or unexpected behavior.

Consider adding basic validation for flow rule parameters:

♻️ Proposed validation
+var (
+	flowFieldRegex = regexp.MustCompile(`^[a-zA-Z0-9_=:,./\s-]+$`)
+)
+
 func (a *OvsAdapter) AddFlowRule(ctx context.Context, bridge string, rule ports.FlowRule) error {
 	if !bridgeNameRegex.MatchString(bridge) {
 		return errors.New(errors.InvalidInput, "invalid bridge name")
 	}
+	
+	// Basic validation of flow rule fields
+	if !flowFieldRegex.MatchString(rule.Match) {
+		return errors.New(errors.InvalidInput, "invalid flow match string")
+	}
+	if !flowFieldRegex.MatchString(rule.Actions) {
+		return errors.New(errors.InvalidInput, "invalid flow actions string")
+	}
 
 	// ovs-ofctl add-flow <bridge> priority=<p>,<match>,actions=<actions>
 	flowSpec := fmt.Sprintf("priority=%d,%s,actions=%s", rule.Priority, rule.Match, rule.Actions)
 	cmd := exec.CommandContext(ctx, a.ofctlPath, "add-flow", bridge, flowSpec)
 	if err := cmd.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to add flow rule", err)
 	}
 
 	return nil
 }
 
 func (a *OvsAdapter) DeleteFlowRule(ctx context.Context, bridge string, match string) error {
 	if !bridgeNameRegex.MatchString(bridge) {
 		return errors.New(errors.InvalidInput, "invalid bridge name")
 	}
+	
+	if !flowFieldRegex.MatchString(match) {
+		return errors.New(errors.InvalidInput, "invalid flow match string")
+	}
 
 	cmd := exec.CommandContext(ctx, a.ofctlPath, "del-flows", bridge, match)
 	if err := cmd.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to delete flow rule", err)
 	}
 
 	return nil
 }
internal/repositories/ovs/adapter.go-118-137 (1)

118-137: Validate remote IP to prevent command injection.

The CreateVXLANTunnel method uses remoteIP directly in command construction without validation. A malicious or malformed IP could potentially inject additional command arguments.

🔒 Proposed validation

Add this validation at the start of the method:

+import "net"
+
 func (a *OvsAdapter) CreateVXLANTunnel(ctx context.Context, bridge string, vni int, remoteIP string) error {
 	if !bridgeNameRegex.MatchString(bridge) {
 		return errors.New(errors.InvalidInput, "invalid bridge name")
 	}
+
+	// Validate IP address format
+	if net.ParseIP(remoteIP) == nil {
+		return errors.New(errors.InvalidInput, "invalid remote IP address")
+	}
 
 	tunnelName := fmt.Sprintf("vxlan-%s", strings.ReplaceAll(remoteIP, ".", "-"))
 	cmd := exec.CommandContext(ctx, a.ovsPath,
 		"add-port", bridge, tunnelName,
 		"--", "set", "interface", tunnelName,
 		"type=vxlan",
 		fmt.Sprintf("options:remote_ip=%s", remoteIP),
 		fmt.Sprintf("options:key=%d", vni),
 	)
 
 	if err := cmd.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to create vxlan tunnel", err)
 	}
 
 	return nil
 }

Committable suggestion skipped: line range outside the PR's diff.

🟡 Minor comments (13)
.agent/workflows/setup-ovs.md-18-26 (1)

18-26: Incomplete workflow steps.

Steps 3 and 4 contain only comments without actual commands. Consider completing these steps or removing them if they're meant to be manual.

Suggested completion
 3. Update Docker Compose for OVS support
 ```bash
-# This adds NET_ADMIN capabilities and mounts the OVS socket
+# Verify docker-compose.yml has the required OVS configuration
+grep -A5 "cap_add" docker-compose.yml
 ```

 4. Verify OVS integration via API health check
 ```bash
-# Wait for API to restart and check /health/ovs
+docker-compose restart api
+sleep 5
+curl -s http://localhost:8080/health/ovs
 ```
.agent/workflows/setup-ovs.md-10-16 (1)

10-16: Replace hardcoded absolute paths with relative paths.

The paths /home/poyraz/dev/Cloud/scripts/setup-ovs.sh are user-specific and will not work for other developers. Use relative paths from the repository root instead.

Suggested fix
 1. Make setup script executable
 ```bash
-chmod +x /home/poyraz/dev/Cloud/scripts/setup-ovs.sh
+chmod +x ./scripts/setup-ovs.sh
  1. Run the setup script
-/home/poyraz/dev/Cloud/scripts/setup-ovs.sh
+./scripts/setup-ovs.sh
</details>

</blockquote></details>
<details>
<summary>cmd/cloud/instance.go-100-100 (1)</summary><blockquote>

`100-100`: **Add subnet/VPC relationship validation in LaunchInstance.**

The service layer is missing a critical check: when both `vpcID` and `subnetID` are provided, the code should validate that the subnet actually belongs to the specified VPC. Currently, it fetches both independently without verifying the relationship (lines 80-105 in `internal/core/services/instance.go`). 

Add validation after retrieving the subnet to check `subnet.VPCID == *vpcID` and return an error if they don't match. Also verify the three scenarios:
- Subnet specified without VPC (currently allowed—clarify intent)
- VPC specified without subnet (currently allowed—clarify intent)
- Subnet belongs to the specified VPC (missing validation—implement check)

</blockquote></details>
<details>
<summary>internal/repositories/postgres/migrations/030_create_security_groups.up.sql-21-21 (1)</summary><blockquote>

`21-21`: **CIDR column may be too short for IPv6.**

`VARCHAR(18)` accommodates IPv4 CIDR (max 18 chars for `255.255.255.255/32`) but not IPv6 (up to 43 chars for `xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/128`).

Consider using PostgreSQL's native `CIDR` type (as done in `031_create_subnets.up.sql` line 7) for consistency and built-in validation.

<details>
<summary>🐛 Suggested fix</summary>

```diff
-    cidr VARCHAR(18) NOT NULL,
+    cidr CIDR NOT NULL,
internal/handlers/instance_handler_test.go-100-100 (1)

100-100: Add test case for non-nil subnetID parameter.

The handler accepts and processes the subnetID parameter (lines 136-144), but test coverage only exercises the nil case. While line 246 tests a non-nil vpcID, there's no corresponding test that validates behavior when subnetID is provided. Add a test case that passes a non-nil subnetID to LaunchInstance to ensure the handler correctly propagates subnet information to the service layer.

pkg/sdk/compute.go-44-52 (1)

44-52: Breaking change documentation incomplete.

All SDK consumers in the codebase have been properly updated for the subnetID parameter. However, the breaking change is only documented in the API reference—no dedicated migration guide or breaking change notes exist for external SDK consumers to discover the update.

While the API reference (docs/api-reference.md) and CLI reference (docs/cli-reference.md) show the new subnet_id parameter, adding explicit breaking change notes to release notes or a migration guide would improve visibility for external SDK consumers.

cmd/cloud/subnet.go-39-49 (1)

39-49: Potential panic if subnet ID is shorter than 8 characters.

The slice s.ID[:8] will panic with an index out of range error if the ID string is shorter than 8 characters. While UUIDs are typically 36 characters, defensive handling is recommended.

Proposed fix
 		for _, s := range subnets {
+			displayID := s.ID
+			if len(displayID) > 8 {
+				displayID = displayID[:8]
+			}
 			table.Append([]string{
-				s.ID[:8],
+				displayID,
 				s.Name,
 				s.CIDRBlock,
 				s.AZ,
 				s.GatewayIP,
 				s.Status,
 				s.CreatedAt.Format("2006-01-02 15:04:05"),
 			})
 		}
internal/handlers/security_group_handler.go-68-71 (1)

68-71: Silent failure on invalid vpc_id could cause unexpected behavior.

If vpcIDStr is non-empty but not a valid UUID, the parse error is silently ignored and vpcID defaults to uuid.Nil. This could lead to confusing behavior. Consider returning an error for malformed input.

🔧 Suggested fix
 	var vpcID uuid.UUID
 	if vpcIDStr != "" {
-		vpcID, _ = uuid.Parse(vpcIDStr)
+		var err error
+		vpcID, err = uuid.Parse(vpcIDStr)
+		if err != nil {
+			c.JSON(http.StatusBadRequest, gin.H{"error": "invalid vpc_id"})
+			return
+		}
 	}
internal/handlers/subnet_handler.go-65-76 (1)

65-76: Name-based lookup cannot succeed with uuid.Nil as the VPC context.

When idStr is not a valid UUID, GetSubnet falls through to GetByName(ctx, uuid.Nil, idStr). The repository's query filters by vpc_id = $1, which means it searches for vpc_id = 00000000-0000-0000-0000-000000000000. Since no subnet will have that VPC ID, the lookup will always fail with "not found." Either restrict this endpoint to UUID-only identifiers, or refactor GetByName to support lookups without VPC context (e.g., by removing the vpc_id filter when it is nil).

internal/repositories/postgres/subnet_repo.go-76-88 (1)

76-88: Missing rows.Err() check after iteration.

After iterating over rows with rows.Next(), you should check rows.Err() to catch any errors that occurred during iteration.

🐛 Proposed fix
 	for rows.Next() {
 		var s domain.Subnet
 		err := rows.Scan(
 			&s.ID, &s.UserID, &s.VPCID, &s.Name, &s.CIDRBlock, &s.AvailabilityZone, &s.GatewayIP, &s.ARN, &s.Status, &s.CreatedAt,
 		)
 		if err != nil {
 			return nil, errors.Wrap(errors.Internal, "failed to scan subnet", err)
 		}
 		subnets = append(subnets, &s)
 	}
+	if err := rows.Err(); err != nil {
+		return nil, errors.Wrap(errors.Internal, "error iterating subnets", err)
+	}
 	return subnets, nil
internal/core/services/vpc.go-101-111 (1)

101-111: DeleteVPC: Bridge deletion before DB deletion risks orphaned records.

If s.repo.Delete fails after the bridge is already deleted, the VPC record remains in the database but the bridge is gone. Consider reversing the order or implementing a transactional approach.

🔧 Suggested fix: Delete from DB first
 func (s *VpcService) DeleteVPC(ctx context.Context, idOrName string) error {
 	vpc, err := s.GetVPC(ctx, idOrName)
 	if err != nil {
 		return err
 	}

-	// 1. Remove OVS bridge
-	if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil {
-		s.logger.Error("failed to remove OVS bridge", "bridge", vpc.NetworkID, "error", err)
-		return errors.Wrap(errors.Internal, "failed to remove OVS bridge", err)
-	}
-	s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID)
-
-	// 2. Delete from DB
+	// 1. Delete from DB first
 	if err := s.repo.Delete(ctx, vpc.ID); err != nil {
 		return errors.Wrap(errors.Internal, "failed to delete VPC from database", err)
 	}
+
+	// 2. Remove OVS bridge (best-effort cleanup)
+	if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil {
+		s.logger.Error("failed to remove OVS bridge", "bridge", vpc.NetworkID, "error", err)
+		// Log but don't fail - DB record is already gone
+	}
+	s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID)
internal/core/services/security_group.go-81-93 (1)

81-93: DeleteGroup doesn't clean up OVS flows.

When deleting a security group, the associated OVS flow rules should also be removed. Currently, only the database record is deleted, leaving orphaned flows in OVS.

🐛 Proposed fix: Clean up flows before deletion
 func (s *SecurityGroupService) DeleteGroup(ctx context.Context, id uuid.UUID) error {
 	userID := appcontext.UserIDFromContext(ctx)

+	// Get the group to access VPC and rules for cleanup
+	sg, err := s.repo.GetByID(ctx, id)
+	if err != nil {
+		return err
+	}
+
+	// Clean up OVS flows
+	if vpc, err := s.vpcRepo.GetByID(ctx, sg.VPCID); err == nil {
+		for _, rule := range sg.Rules {
+			flow := s.translateToFlow(rule)
+			_ = s.network.DeleteFlowRule(ctx, vpc.NetworkID, flow.Match)
+		}
+	}
+
 	// In a real implementation, we should check if any instances are still attached
 	// For now, we just delete.
 	if err := s.repo.Delete(ctx, id); err != nil {

Committable suggestion skipped: line range outside the PR's diff.

internal/core/services/instance.go-486-528 (1)

486-528: IP allocation skips network address but not broadcast address.

The allocateIP function starts from the network address and increments, but isValidHostIP only checks Contains(). For subnets larger than /31, the broadcast address should be excluded.

🐛 Proposed fix for isValidHostIP
 func (s *InstanceService) isValidHostIP(ip net.IP, n *net.IPNet) bool {
-	// Simple check: not network address and not broadcast address (if /30 or larger)
-	// For simplicity in this demo, we just ensure it's in range and not gateway
-	return n.Contains(ip)
+	if !n.Contains(ip) {
+		return false
+	}
+	// Exclude network address
+	if ip.Equal(n.IP) {
+		return false
+	}
+	// Exclude broadcast address for IPv4
+	if len(ip) == net.IPv4len || ip.To4() != nil {
+		ones, bits := n.Mask.Size()
+		if ones < bits { // Not a /32
+			broadcast := make(net.IP, len(n.IP))
+			copy(broadcast, n.IP)
+			for i := range broadcast {
+				broadcast[i] |= ^n.Mask[i]
+			}
+			if ip.Equal(broadcast) {
+				return false
+			}
+		}
+	}
+	return true
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +1 to +2
-- +goose Up

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, check if the file exists and view its content
echo "=== Checking file: internal/repositories/postgres/migrations/003_add_container_id.up.sql ==="
if [ -f "internal/repositories/postgres/migrations/003_add_container_id.up.sql" ]; then
    echo "File found. Content (first 5 lines):"
    head -n 5 "internal/repositories/postgres/migrations/003_add_container_id.up.sql" | cat -n
else
    echo "File NOT found"
fi

echo ""
echo "=== Checking all migration files in the directory ==="
find internal/repositories/postgres/migrations -type f \( -name "*.up.sql" -o -name "*.down.sql" \) 2>/dev/null | sort

Repository: PoyrazK/thecloud

Length of output: 4792


🏁 Script executed:

#!/bin/bash
# Verification script from review comment

echo "=== Checking .up.sql files for '-- +goose Up' directive ==="
fd -e sql -p '\.up\.sql$' internal/repositories/postgres/migrations/ --exec bash -c '
  file="{}";
  if head -n 1 "$file" | grep -q "^-- +goose Up$"; then
    echo "✓ $file"
  else
    echo "✗ MISSING: $file"
  fi
'

echo ""
echo "=== Checking .down.sql files for '-- +goose Down' directive ==="
fd -e sql -p '\.down\.sql$' internal/repositories/postgres/migrations/ --exec bash -c '
  file="{}";
  if head -n 1 "$file" | grep -q "^-- +goose Down$"; then
    echo "✓ $file"
  else
    echo "✗ MISSING: $file"
  fi
'

echo ""
echo "=== Checking for .up.sql files without corresponding .down.sql files ==="
for upfile in internal/repositories/postgres/migrations/*.up.sql; do
  downfile="${upfile%.up.sql}.down.sql"
  if [ ! -f "$downfile" ]; then
    echo "⚠ Missing .down.sql: $upfile"
  fi
done

Repository: PoyrazK/thecloud

Length of output: 5304


Create .down.sql migration files with proper -- +goose Down directives.

All .up.sql files correctly include -- +goose Up, but all 30 corresponding .down.sql files are missing the required -- +goose Down directive. Additionally, migrations 031 and 032 lack .down.sql files entirely. Goose requires both directives for proper migration execution and rollback support.

🤖 Prompt for AI Agents
In @internal/repositories/postgres/migrations/003_add_container_id.up.sql around
lines 1 - 2, The .down.sql counterparts are missing the required Goose directive
and two migrations (031, 032) lack .down.sql files entirely; create a .down.sql
file for each .up.sql (including migrations 031 and 032) named to match their
.up.sql counterparts (e.g., create 003_add_container_id.down.sql) and ensure the
very first non-comment line is the directive "-- +goose Down", then add the
appropriate rollback SQL that reverses the changes made in the corresponding
.up.sql (e.g., DROP COLUMN, DROP TABLE, or REVERT INDEX changes) so each
migration has a proper Up/Down pair for Goose to execute and roll back.

Comment on lines +111 to +115
func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
query := `DELETE FROM security_rules WHERE id = $1`
_, err := r.db.Exec(ctx, query, ruleID)
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Authorization bypass: missing user ownership check.

The DeleteRule method does not verify that the rule belongs to a security group owned by the requesting user. This allows any authenticated user to delete security rules by ID, which is a critical security vulnerability.

🔒 Proposed fix to add authorization check
 func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
-	query := `DELETE FROM security_rules WHERE id = $1`
-	_, err := r.db.Exec(ctx, query, ruleID)
-	return err
+	userID := appcontext.UserIDFromContext(ctx)
+	query := `
+		DELETE FROM security_rules
+		WHERE id = $1
+		AND group_id IN (SELECT id FROM security_groups WHERE user_id = $2)
+	`
+	result, err := r.db.Exec(ctx, query, ruleID, userID)
+	if err != nil {
+		return errors.Wrap(errors.Internal, "failed to delete security rule", err)
+	}
+	if result.RowsAffected() == 0 {
+		return errors.New(errors.NotFound, "security rule not found")
+	}
+	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
query := `DELETE FROM security_rules WHERE id = $1`
_, err := r.db.Exec(ctx, query, ruleID)
return err
}
func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
userID := appcontext.UserIDFromContext(ctx)
query := `
DELETE FROM security_rules
WHERE id = $1
AND group_id IN (SELECT id FROM security_groups WHERE user_id = $2)
`
result, err := r.db.Exec(ctx, query, ruleID, userID)
if err != nil {
return errors.Wrap(errors.Internal, "failed to delete security rule", err)
}
if result.RowsAffected() == 0 {
return errors.New(errors.NotFound, "security rule not found")
}
return nil
}

Comment thread internal/repositories/postgres/security_group_repo.go
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 8, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #15

Copilot AI review requested due to automatic review settings January 8, 2026 16:55
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Fix all issues with AI agents
In @internal/core/services/security_group.go:
- Around line 81-93: DeleteGroup currently deletes a security group without
verifying no instances are attached; update DeleteGroup to first query for
attached instances and abort deletion if any exist. Add/use a repository method
such as s.repo.CountInstancesByGroup(ctx, id) or
s.instanceRepo.ListBySecurityGroup(ctx, id) to detect attachments, return a
specific validation error (e.g., errors.Conflict or a new
errors.InvalidOperation) with a clear message when count > 0, and only call
s.repo.Delete and audit log when no attached instances are found; ensure the
check uses the same ctx and preserves existing error wrapping semantics.
- Around line 128-144: DetachFromInstance currently only calls
repo.RemoveInstanceFromGroup and does not update OVS flows; modify
DetachFromInstance to mirror AttachToInstance by first removing the instance via
RemoveInstanceFromGroup, then load the affected SecurityGroup with repo.GetByID
(using the same groupID), and finally call s.syncGroupFlows(ctx, sg). Ensure you
propagate and handle errors from RemoveInstanceFromGroup, GetByID, and
syncGroupFlows (returning them as appropriate) so stale OVS flows are removed
when an instance is detached.
- Around line 95-120: Add a rollback when OVS flow sync fails in
SecurityGroupService.AddRule: after calling s.repo.AddRule and before returning,
if s.syncGroupFlows(ctx, sg) returns an error attempt to undo the DB change
(call the repository removal method, e.g., s.repo.RemoveRule(ctx, rule.ID) or
use an explicit DB transaction around s.repo.AddRule so you can rollback) and
only log the sync error if rollback succeeds; ensure auditSvc.Log is only called
after successful sync/commit; handle and return the original sync error
(wrapped) if rollback fails or if sync ultimately fails so callers see the
consistency problem.
- Around line 41-67: In CreateGroup, validate the user ID returned by
appcontext.UserIDFromContext (ensure it's not the zero UUID) and return a proper
error if missing instead of proceeding to create the group; after calling
s.repo.Create keep the existing error wrap behavior, and when calling
s.auditSvc.Log do not discard its error—capture the returned error from
auditSvc.Log and surface it (either log via the service logger on s or return a
wrapped error) so audit failures are observable; reference the CreateGroup
method, appcontext.UserIDFromContext, and s.auditSvc.Log when making these
changes.
- Around line 146-161: syncGroupFlows currently only adds flows and never
removes stale ones, AddRule calls syncGroupFlows with a stale sg, and RemoveRule
is unimplemented; fix by (1) making AddRule fetch and persist the new rule then
re-retrieve the fresh SecurityGroup from the repo (use the same repo method used
elsewhere, e.g., GetByID) before calling syncGroupFlows, (2) update
syncGroupFlows to first enumerate and remove existing flows for that security
group via network.DeleteFlowRule (or a DeleteFlowsByGroup variant) before adding
current rules with network.AddFlowRule, (3) implement RemoveRule to delete the
rule from the DB and then call the fresh-state syncGroupFlows to remove
associated OVS flows, and (4) ensure the OVS adapter’s AddFlowRule is made
idempotent (or include a flow-version or unique flow-id per rule that you use to
delete duplicates) so repeated syncs do not create duplicate flows.

In @internal/repositories/ovs/adapter.go:
- Around line 119-138: CreateVXLANTunnel currently uses remoteIP directly which
can produce invalid/malicious tunnel names; validate remoteIP with net.ParseIP
and return an InvalidInput error if parsing fails, then derive tunnelName from
the parsed IP (use ip.String()) and sanitize by replacing '.' and ':' with '-'
before building the exec.CommandContext args; ensure the error path uses the
same validation error type and keep bridge validation via bridgeNameRegex
unchanged so DeletePort validation remains consistent.
🧹 Nitpick comments (3)
internal/repositories/ovs/adapter.go (2)

16-18: Consider allowing underscores in bridge/port names.

OVS bridge and port names commonly include underscores (e.g., br_int, veth_host). The current regex ^[a-zA-Z0-9-]+$ excludes them.

🔧 Suggested fix
 var (
-	bridgeNameRegex = regexp.MustCompile(`^[a-zA-Z0-9-]+$`)
+	bridgeNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
 )

173-176: Stub implementation may mislead callers.

ListFlowRules returns an empty slice without indicating it's unimplemented. Callers can't distinguish "no rules exist" from "not implemented."

Consider returning a sentinel error or adding structured logging until implemented:

 func (a *OvsAdapter) ListFlowRules(_ context.Context, _ string) ([]ports.FlowRule, error) {
-	// Parsing ovs-ofctl dump-flows output is complex and will be implemented in a follow-up.
-	return []ports.FlowRule{}, nil
+	// TODO: Implement parsing of ovs-ofctl dump-flows output
+	return nil, errors.New(errors.Internal, "ListFlowRules not yet implemented")
 }

Alternatively, if returning an empty slice is acceptable for now, add a log statement for observability.

internal/core/services/security_group.go (1)

163-182: Consider protocol validation and extracting magic strings.

Two refactoring suggestions:

  1. Lines 166-173: The protocol switch statement lacks a default case. Unknown protocols are silently ignored, which could lead to unexpected behavior where rules are created but don't generate meaningful flow matches.

  2. Line 175: The magic string "0.0.0.0/0" should be extracted as a constant for maintainability.

♻️ Suggested improvements
+const (
+	anyIPv4CIDR = "0.0.0.0/0"
+)
+
 func (s *SecurityGroupService) translateToFlow(rule domain.SecurityRule) ports.FlowRule {
 	matchParts := []string{}
 
 	switch rule.Protocol {
 	case "tcp":
 		matchParts = append(matchParts, "tcp")
 	case "udp":
 		matchParts = append(matchParts, "udp")
 	case "icmp":
 		matchParts = append(matchParts, "icmp")
+	default:
+		// Log or handle unknown protocol
+		// For now, continue without protocol match (matches all protocols)
 	}
 
-	if rule.CIDR != "" && rule.CIDR != "0.0.0.0/0" {
+	if rule.CIDR != "" && rule.CIDR != anyIPv4CIDR {
 		if rule.Direction == domain.RuleIngress {
 			matchParts = append(matchParts, fmt.Sprintf("nw_src=%s", rule.CIDR))
 		} else {
 			matchParts = append(matchParts, fmt.Sprintf("nw_dst=%s", rule.CIDR))
 		}
 	}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25569d4 and 4192df7.

📒 Files selected for processing (5)
  • docs/swagger/docs.go
  • docs/swagger/swagger.json
  • docs/swagger/swagger.yaml
  • internal/core/services/security_group.go
  • internal/repositories/ovs/adapter.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/repositories/ovs/adapter.go (2)
internal/errors/errors.go (4)
  • Type (8-8)
  • New (57-59)
  • InvalidInput (12-12)
  • Wrap (61-63)
internal/core/ports/network.go (1)
  • FlowRule (8-13)
🔇 Additional comments (12)
docs/swagger/swagger.yaml (4)

90-99: LGTM - New OVS networking fields for Instance model.

The added fields (ovs_port, private_ip, subnet_id) are correctly typed as strings and align with the OVS networking integration requirements. These are additive, backward-compatible changes to the API response schema.


225-225: LGTM - Permission enum properly updated.

The vpc:update permission and corresponding PermissionVpcUpdate varname are correctly positioned and follow the established naming convention.

Also applies to: 287-287


572-590: LGTM - VPC model extended with OVS networking fields.

The new fields (arn, cidr_block, vxlan_id) and the description for network_id ("OVS bridge name") appropriately document the OVS integration.

Note: The VPC creation endpoint (POST /vpcs) currently only accepts name in the request body. Verify whether cidr_block should be an input parameter for VPC creation, or if it's auto-assigned by the backend.


787-788: LGTM - LaunchRequest extended with subnet_id.

The subnet_id field is correctly added as an optional parameter, allowing instances to be launched with or without explicit subnet placement.

docs/swagger/docs.go (1)

1-4: Auto-generated file - changes are consistent with swagger.yaml.

This file is generated by swaggo/swag and should not be manually edited. The changes correctly mirror the swagger.yaml updates for OVS networking integration (Instance fields, VPC fields, Permission enum, LaunchRequest).

docs/swagger/swagger.json (2)

2679-2693: New networking fields align with OVS integration.

The additions to domain.Instance (ovs_port, private_ip, subnet_id), domain.VPC (arn, cidr_block, vxlan_id), and httphandlers.LaunchRequest (subnet_id) are consistent with the OVS networking integration. Fields are appropriately typed and subnet_id in LaunchRequest is correctly optional for backward compatibility.

Consider adding format validation constraints in a follow-up for cidr_block (CIDR notation) and private_ip (IPv4/IPv6 format) to improve API documentation and client-side validation.

Also applies to: 3326-3353, 3647-3649


2873-2873: Permission addition is correct.

The vpc:update permission and corresponding PermissionVpcUpdate enum variant are properly added and maintain alignment with the existing permission structure.

Also applies to: 2935-2935

internal/repositories/ovs/adapter.go (3)

20-42: Constructor implementation is sound.

Using exec.LookPath at construction time to validate binary availability is a good practice. This fails fast if OVS tools are not installed.


44-51: Core OVS operations are well-implemented.

The bridge, port, and flow rule operations follow a consistent pattern with proper:

  • Input validation using bridgeNameRegex
  • Context propagation via exec.CommandContext
  • Error wrapping with domain-specific error types
  • Edge case handling (e.g., empty bridge list in ListBridges)

Also applies to: 66-91, 93-117, 145-171, 186-197


53-64: The vxlanID parameter is correctly used as metadata and should not be removed.

The vxlanID parameter is intentionally accepted, stored in the VPC domain object, persisted to the database, and exposed through the SDK and CLI. It serves as the VXLAN Network Identifier (VNI) metadata for the VPC. VXLAN tunnel configuration is handled separately via the CreateVXLANTunnel method, which accepts a vni parameter. This is the correct architectural separation: CreateBridge establishes the bridge and metadata, while CreateVXLANTunnel configures overlay tunnels when needed.

Likely an incorrect or invalid review comment.

internal/core/services/security_group.go (2)

17-39: LGTM!

The service struct and constructor are well-designed with clean dependency injection using interface types for testability.


69-79: LGTM!

The GetGroup method appropriately attempts UUID parsing before falling back to name-based lookup, and ListGroups is a clean pass-through to the repository.

Comment on lines +41 to +67
func (s *SecurityGroupService) CreateGroup(ctx context.Context, vpcID uuid.UUID, name, description string) (*domain.SecurityGroup, error) {
userID := appcontext.UserIDFromContext(ctx)
sgID := uuid.New()

arn := fmt.Sprintf("arn:thecloud:vpc:local:%s:security-group/%s", userID.String(), sgID.String())

sg := &domain.SecurityGroup{
ID: sgID,
UserID: userID,
VPCID: vpcID,
Name: name,
Description: description,
ARN: arn,
CreatedAt: time.Now(),
}

if err := s.repo.Create(ctx, sg); err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create security group", err)
}

_ = s.auditSvc.Log(ctx, userID, "security_group.create", "security_group", sgID.String(), map[string]interface{}{
"name": name,
"vpc_id": vpcID,
})

return sg, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider validating UserID and handling audit errors.

Two minor concerns:

  1. Line 42: UserIDFromContext is called without checking if a valid user ID exists in the context. If the context lacks a user ID, this could lead to creating security groups with zero-value UUIDs.

  2. Line 61: Audit logging errors are silently discarded. Consider logging the error for observability.

♻️ Suggested improvements
 func (s *SecurityGroupService) CreateGroup(ctx context.Context, vpcID uuid.UUID, name, description string) (*domain.SecurityGroup, error) {
 	userID := appcontext.UserIDFromContext(ctx)
+	if userID == uuid.Nil {
+		return nil, errors.New(errors.Unauthorized, "user ID not found in context")
+	}
 	sgID := uuid.New()
 
 	arn := fmt.Sprintf("arn:thecloud:vpc:local:%s:security-group/%s", userID.String(), sgID.String())
 
 	sg := &domain.SecurityGroup{
 		ID:          sgID,
 		UserID:      userID,
 		VPCID:       vpcID,
 		Name:        name,
 		Description: description,
 		ARN:         arn,
 		CreatedAt:   time.Now(),
 	}
 
 	if err := s.repo.Create(ctx, sg); err != nil {
 		return nil, errors.Wrap(errors.Internal, "failed to create security group", err)
 	}
 
-	_ = s.auditSvc.Log(ctx, userID, "security_group.create", "security_group", sgID.String(), map[string]interface{}{
+	if err := s.auditSvc.Log(ctx, userID, "security_group.create", "security_group", sgID.String(), map[string]interface{}{
 		"name":   name,
 		"vpc_id": vpcID,
-	})
+	}); err != nil {
+		s.logger.Warn("failed to log audit event", "error", err)
+	}
 
 	return sg, nil
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 41 - 67, In
CreateGroup, validate the user ID returned by appcontext.UserIDFromContext
(ensure it's not the zero UUID) and return a proper error if missing instead of
proceeding to create the group; after calling s.repo.Create keep the existing
error wrap behavior, and when calling s.auditSvc.Log do not discard its
error—capture the returned error from auditSvc.Log and surface it (either log
via the service logger on s or return a wrapped error) so audit failures are
observable; reference the CreateGroup method, appcontext.UserIDFromContext, and
s.auditSvc.Log when making these changes.

Comment on lines +81 to +93
func (s *SecurityGroupService) DeleteGroup(ctx context.Context, id uuid.UUID) error {
userID := appcontext.UserIDFromContext(ctx)

// In a real implementation, we should check if any instances are still attached
// For now, we just delete.
if err := s.repo.Delete(ctx, id); err != nil {
return errors.Wrap(errors.Internal, "failed to delete security group", err)
}

_ = s.auditSvc.Log(ctx, userID, "security_group.delete", "security_group", id.String(), nil)

return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Implement instance attachment validation before deletion.

The comment on lines 84-85 acknowledges that the implementation should check if instances are attached before deleting the security group. Without this check:

  • Instance-to-group associations in the database could be orphaned
  • OVS flows for attached instances may remain active, creating stale security rules
  • The system could enter an inconsistent state where instances reference non-existent groups

This validation is essential for maintaining data integrity and correct security enforcement.

Do you want me to generate an implementation that checks for attached instances and prevents deletion if any exist?

🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 81 - 93, DeleteGroup
currently deletes a security group without verifying no instances are attached;
update DeleteGroup to first query for attached instances and abort deletion if
any exist. Add/use a repository method such as s.repo.CountInstancesByGroup(ctx,
id) or s.instanceRepo.ListBySecurityGroup(ctx, id) to detect attachments, return
a specific validation error (e.g., errors.Conflict or a new
errors.InvalidOperation) with a clear message when count > 0, and only call
s.repo.Delete and audit log when no attached instances are found; ensure the
check uses the same ctx and preserves existing error wrapping semantics.

Comment on lines +95 to +120
func (s *SecurityGroupService) AddRule(ctx context.Context, groupID uuid.UUID, rule domain.SecurityRule) (*domain.SecurityRule, error) {
sg, err := s.repo.GetByID(ctx, groupID)
if err != nil {
return nil, err
}

rule.ID = uuid.New()
rule.GroupID = groupID
rule.CreatedAt = time.Now()

if err := s.repo.AddRule(ctx, &rule); err != nil {
return nil, errors.Wrap(errors.Internal, "failed to add security rule", err)
}

// Update OVS flows
if err := s.syncGroupFlows(ctx, sg); err != nil {
s.logger.Error("failed to sync OVS flows", "group_id", groupID, "error", err)
// We don't rollback DB here in this simple pass, but in real life we should.
}

_ = s.auditSvc.Log(ctx, sg.UserID, "security_group.add_rule", "security_group", groupID.String(), map[string]interface{}{
"rule_id": rule.ID.String(),
})

return &rule, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Handle flow synchronization failures with rollback.

Lines 110-113 identify a critical consistency issue: if OVS flow synchronization fails after the rule is persisted to the database, the system enters an inconsistent state where:

  • The rule exists in the database
  • The rule is NOT enforced in the network layer
  • Security policies appear configured but aren't actually protecting traffic

This is a security risk because administrators may believe traffic is filtered when it isn't.

🔒 Proposed fix with transaction rollback

Consider implementing a rollback mechanism:

 func (s *SecurityGroupService) AddRule(ctx context.Context, groupID uuid.UUID, rule domain.SecurityRule) (*domain.SecurityRule, error) {
 	sg, err := s.repo.GetByID(ctx, groupID)
 	if err != nil {
 		return nil, err
 	}
 
 	rule.ID = uuid.New()
 	rule.GroupID = groupID
 	rule.CreatedAt = time.Now()
 
 	if err := s.repo.AddRule(ctx, &rule); err != nil {
 		return nil, errors.Wrap(errors.Internal, "failed to add security rule", err)
 	}
 
 	// Update OVS flows
 	if err := s.syncGroupFlows(ctx, sg); err != nil {
-		s.logger.Error("failed to sync OVS flows", "group_id", groupID, "error", err)
-		// We don't rollback DB here in this simple pass, but in real life we should.
+		s.logger.Error("failed to sync OVS flows, rolling back", "group_id", groupID, "error", err)
+		if rollbackErr := s.repo.RemoveRule(ctx, rule.ID); rollbackErr != nil {
+			s.logger.Error("failed to rollback rule after flow sync failure", "rule_id", rule.ID, "error", rollbackErr)
+		}
+		return nil, errors.Wrap(errors.Internal, "failed to sync security group flows", err)
 	}
 
-	_ = s.auditSvc.Log(ctx, sg.UserID, "security_group.add_rule", "security_group", groupID.String(), map[string]interface{}{
+	if err := s.auditSvc.Log(ctx, sg.UserID, "security_group.add_rule", "security_group", groupID.String(), map[string]interface{}{
 		"rule_id": rule.ID.String(),
-	})
+	}); err != nil {
+		s.logger.Warn("failed to log audit event", "error", err)
+	}
 
 	return &rule, nil
 }

Note: This assumes a RemoveRule method exists in the repository interface.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 95 - 120, Add a
rollback when OVS flow sync fails in SecurityGroupService.AddRule: after calling
s.repo.AddRule and before returning, if s.syncGroupFlows(ctx, sg) returns an
error attempt to undo the DB change (call the repository removal method, e.g.,
s.repo.RemoveRule(ctx, rule.ID) or use an explicit DB transaction around
s.repo.AddRule so you can rollback) and only log the sync error if rollback
succeeds; ensure auditSvc.Log is only called after successful sync/commit;
handle and return the original sync error (wrapped) if rollback fails or if sync
ultimately fails so callers see the consistency problem.

Comment on lines +122 to +126
func (s *SecurityGroupService) RemoveRule(ctx context.Context, ruleID uuid.UUID) error {
// 1. Get rule to know groupID
// Implementation omitted: would need a GetRuleByID in repo
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: RemoveRule is a non-functional stub.

The RemoveRule method returns nil without performing any operation. This means:

  • Callers receive a success response when no rule is actually removed
  • The rule remains in the database
  • OVS flows continue to enforce the rule
  • The system state becomes inconsistent with what the API reports

This is a critical incomplete implementation that will cause silent failures.

Would you like me to generate a complete implementation for RemoveRule that removes the rule from the database and syncs OVS flows?

Comment on lines +128 to +144
func (s *SecurityGroupService) AttachToInstance(ctx context.Context, instanceID, groupID uuid.UUID) error {
if err := s.repo.AddInstanceToGroup(ctx, instanceID, groupID); err != nil {
return err
}

sg, err := s.repo.GetByID(ctx, groupID)
if err != nil {
return err
}

// Update OVS flows
return s.syncGroupFlows(ctx, sg)
}

func (s *SecurityGroupService) DetachFromInstance(ctx context.Context, instanceID, groupID uuid.UUID) error {
return s.repo.RemoveInstanceFromGroup(ctx, instanceID, groupID)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: DetachFromInstance doesn't sync OVS flows.

There's a critical asymmetry between AttachToInstance and DetachFromInstance:

  • AttachToInstance (line 139): Syncs OVS flows after attaching
  • DetachFromInstance (line 143): Does NOT sync flows after detaching

This means when an instance is detached from a security group, the OVS flows remain active, continuing to enforce rules for an instance that should no longer be subject to them. This creates stale network rules and incorrect security enforcement.

🔒 Proposed fix to sync flows on detach
 func (s *SecurityGroupService) DetachFromInstance(ctx context.Context, instanceID, groupID uuid.UUID) error {
-	return s.repo.RemoveInstanceFromGroup(ctx, instanceID, groupID)
+	if err := s.repo.RemoveInstanceFromGroup(ctx, instanceID, groupID); err != nil {
+		return err
+	}
+
+	sg, err := s.repo.GetByID(ctx, groupID)
+	if err != nil {
+		return err
+	}
+
+	// Update OVS flows to remove rules for detached instance
+	return s.syncGroupFlows(ctx, sg)
 }

Note: Depending on your OVS architecture, you may need to remove specific flows for the detached instance rather than re-syncing all flows. Consider if syncGroupFlows properly handles removal of flows for detached instances.

🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 128 - 144,
DetachFromInstance currently only calls repo.RemoveInstanceFromGroup and does
not update OVS flows; modify DetachFromInstance to mirror AttachToInstance by
first removing the instance via RemoveInstanceFromGroup, then load the affected
SecurityGroup with repo.GetByID (using the same groupID), and finally call
s.syncGroupFlows(ctx, sg). Ensure you propagate and handle errors from
RemoveInstanceFromGroup, GetByID, and syncGroupFlows (returning them as
appropriate) so stale OVS flows are removed when an instance is detached.

Comment on lines +146 to +161
func (s *SecurityGroupService) syncGroupFlows(ctx context.Context, sg *domain.SecurityGroup) error {
vpc, err := s.vpcRepo.GetByID(ctx, sg.VPCID)
if err != nil {
return err
}

// For each rule, generate an OVS flow
for _, rule := range sg.Rules {
flow := s.translateToFlow(rule)
if err := s.network.AddFlowRule(ctx, vpc.NetworkID, flow); err != nil {
return err
}
}

return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the security_group.go file and examine the syncGroupFlows method
git ls-files | grep -E "(security_group|ovs|network)" | head -20

Repository: PoyrazK/thecloud

Length of output: 902


🏁 Script executed:

# Read the security_group.go file to verify the implementation
cat -n internal/core/services/security_group.go | sed -n '140,170p'

Repository: PoyrazK/thecloud

Length of output: 1105


🏁 Script executed:

# Search for AddFlowRule implementation
rg "func.*AddFlowRule" --type go -A 10

Repository: PoyrazK/thecloud

Length of output: 1940


🏁 Script executed:

# Search for any flow cleanup or removal methods
rg "RemoveFlow|DeleteFlow|ClearFlow" --type go -B 2 -A 5

Repository: PoyrazK/thecloud

Length of output: 2567


🏁 Script executed:

# Look for any flow synchronization or versioning mechanisms
rg "flow.*sync|flow.*version|tag.*flow" --type go -i -B 2 -A 5

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Check if there are other methods in SecurityGroupService that handle cleanup
ast-grep --pattern $'type SecurityGroupService struct {
  $$$
}'

Repository: PoyrazK/thecloud

Length of output: 557


🏁 Script executed:

# Search for all methods in SecurityGroupService to see the full picture
rg "func \(s \*SecurityGroupService\)" --type go -A 1 | head -30

Repository: PoyrazK/thecloud

Length of output: 2457


🏁 Script executed:

# Find all calls to syncGroupFlows
rg "syncGroupFlows" --type go -B 3 -A 3

Repository: PoyrazK/thecloud

Length of output: 1579


🏁 Script executed:

# Check the AddRule and RemoveRule implementations to see if they handle flow cleanup
cat -n internal/core/services/security_group.go | sed -n '80,130p'

Repository: PoyrazK/thecloud

Length of output: 2140


🏁 Script executed:

# Search for any flow cleanup in the service
rg "DeleteFlowRule" --type go -B 5 -A 5

Repository: PoyrazK/thecloud

Length of output: 3419


🏁 Script executed:

# Look at the full AddRule method
ast-grep --pattern $'func (s *SecurityGroupService) AddRule($_) {
  $$$
}'

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Check RemoveRule implementation
cat -n internal/core/services/security_group.go | sed -n '118,140p'

Repository: PoyrazK/thecloud

Length of output: 774


Major: syncGroupFlows only adds flows without removing stale ones, and AddRule uses stale security group state.

The implementation has multiple critical gaps:

  1. No flow cleanup: syncGroupFlows (lines 146–161) only adds flows via AddFlowRule but never removes old flows when rules change or are deleted. The network backend has a DeleteFlowRule method available but it's never called, causing flow table accumulation.

  2. Stale state sync in AddRule: The method fetches the security group at line 96 (before the new rule is persisted), adds the rule to the database at line 105, then calls syncGroupFlows with the old sg object at line 110. This means the newly added rule won't be synced to OVS on the first call.

  3. RemoveRule unimplemented: Lines 122–125 show RemoveRule is a stub that returns nil, so deleted rules are never removed from OVS either.

  4. AddFlowRule is not idempotent: The OVS adapter uses ovs-ofctl add-flow directly, which creates duplicate flows if called multiple times with the same rule specification.

These issues lead to:

  • Flow table pollution and performance degradation
  • Incorrect rule enforcement (old rules persist)
  • New rules not taking effect immediately

Implement a proper flow synchronization strategy:

  1. Fetch fresh security group state after persisting changes
  2. Clear all flows associated with the security group before re-syncing
  3. Complete the RemoveRule implementation with flow cleanup
  4. Consider idempotency guarantees or flow versioning to prevent duplicates
🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 146 - 161,
syncGroupFlows currently only adds flows and never removes stale ones, AddRule
calls syncGroupFlows with a stale sg, and RemoveRule is unimplemented; fix by
(1) making AddRule fetch and persist the new rule then re-retrieve the fresh
SecurityGroup from the repo (use the same repo method used elsewhere, e.g.,
GetByID) before calling syncGroupFlows, (2) update syncGroupFlows to first
enumerate and remove existing flows for that security group via
network.DeleteFlowRule (or a DeleteFlowsByGroup variant) before adding current
rules with network.AddFlowRule, (3) implement RemoveRule to delete the rule from
the DB and then call the fresh-state syncGroupFlows to remove associated OVS
flows, and (4) ensure the OVS adapter’s AddFlowRule is made idempotent (or
include a flow-version or unique flow-id per rule that you use to delete
duplicates) so repeated syncs do not create duplicate flows.

Comment on lines +183 to +201
if rule.PortMin > 0 {
if rule.PortMin == rule.PortMax {
switch rule.Protocol {
case "tcp", "udp":
matchParts = append(matchParts, fmt.Sprintf("tp_dst=%d", rule.PortMin))
}
} else if rule.PortMax > rule.PortMin {
// OVS doesn't support easy ranges, would need multiple flows or mask
// Simplified for now
matchParts = append(matchParts, fmt.Sprintf("tp_dst=%d", rule.PortMin))
}
}

return ports.FlowRule{
Priority: rule.Priority,
Match: strings.Join(matchParts, ","),
Actions: "allow", // Simplified
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the file
find . -name "security_group.go" -type f

Repository: PoyrazK/thecloud

Length of output: 211


🏁 Script executed:

# Then read the specific file around lines 183-201
cat -n internal/core/services/security_group.go | sed -n '170,210p'

Repository: PoyrazK/thecloud

Length of output: 1186


🏁 Script executed:

# Also check the domain and ports structures referenced
find . -name "*.go" -type f | xargs grep -l "type SecurityRule\|type FlowRule" | head -20

Repository: PoyrazK/thecloud

Length of output: 161


🏁 Script executed:

# Look for any tests related to security group rules
find . -name "*security*test*" -o -name "*test*security*" | grep -E "\.go$"

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Read the SecurityRule structure
cat -n internal/core/domain/security_group.go | grep -A 20 "type SecurityRule"

Repository: PoyrazK/thecloud

Length of output: 760


🏁 Script executed:

# Read the FlowRule structure
cat -n internal/core/ports/security_group.go | grep -A 20 "type FlowRule"

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Check the full context of the translateToFlow method
cat -n internal/core/services/security_group.go | sed -n '150,210p'

Repository: PoyrazK/thecloud

Length of output: 1756


🏁 Script executed:

# Search for usages of translateToFlow or port handling
rg "translateToFlow|PortMax|PortMin" internal/core/services/security_group.go -A 2 -B 2

Repository: PoyrazK/thecloud

Length of output: 794


🏁 Script executed:

# Look for any tests or test files related to security groups
find . -path "*/test*" -name "*security*" -type f 2>/dev/null || find . -name "*_test.go" -type f | xargs grep -l "SecurityGroup\|security_group" 2>/dev/null | head -10

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Search for FlowRule definition across the codebase
rg "type FlowRule" -A 10

Repository: PoyrazK/thecloud

Length of output: 794


🏁 Script executed:

# Check if there's a deny/block action support anywhere in the codebase
rg "Actions|action" internal/core/ports/ internal/core/services/ | grep -i "deny\|reject\|drop\|block"

Repository: PoyrazK/thecloud

Length of output: 144


🏁 Script executed:

# Check if there are any integration tests or examples
find . -type f -name "*.go" | xargs grep -l "translateToFlow\|AddFlowRule" | head -10

Repository: PoyrazK/thecloud

Length of output: 258


🏁 Script executed:

# Look for any TODO or FIXME comments related to port ranges or flow actions
rg "TODO|FIXME|XXX" internal/core/services/security_group.go -A 2 -B 2

Repository: PoyrazK/thecloud

Length of output: 42


🏁 Script executed:

# Check the test file
cat -n internal/core/services/shared_test.go | head -100

Repository: PoyrazK/thecloud

Length of output: 3675


🏁 Script executed:

# Check the OVS adapter implementation
cat -n internal/repositories/ovs/adapter.go | grep -A 20 -B 5 "FlowRule\|AddFlowRule"

Repository: PoyrazK/thecloud

Length of output: 2551


🏁 Script executed:

# Check if there are any admission/validation checks on SecurityRule
rg "PortMin|PortMax" internal/core/domain/ internal/core/services/ -B 3 -A 3

Repository: PoyrazK/thecloud

Length of output: 1843


🏁 Script executed:

# Look for any comments or documentation about the security group implementation
rg "Simplified|TODO|port range|Actions" internal/core/services/security_group.go

Repository: PoyrazK/thecloud

Length of output: 119


🏁 Script executed:

# Check if SecurityRule has any action or allow/deny field
cat -n internal/core/domain/security_group.go | grep -A 15 "type SecurityRule"

Repository: PoyrazK/thecloud

Length of output: 706


🏁 Script executed:

# Check the complete SecurityRule and SecurityGroup structures
cat -n internal/core/domain/security_group.go

Repository: PoyrazK/thecloud

Length of output: 1349


🏁 Script executed:

# Check if there are any other uses of Actions in the codebase
rg "Actions" --type go | grep -v "test\|Actions  string"

Repository: PoyrazK/thecloud

Length of output: 267


Critical: Port range handling only uses the minimum port, breaking port-based security rules.

The port range logic at lines 189-192 only uses PortMin even when a range is specified:

} else if rule.PortMax > rule.PortMin {
    // OVS doesn't support easy ranges, would need multiple flows or mask
    // Simplified for now
    matchParts = append(matchParts, fmt.Sprintf("tp_dst=%d", rule.PortMin))
}

This breaks any security rule specifying a port range. For example, a rule allowing ports 80-90 would only match port 80, leaving ports 81-90 unmatched. This is a security vulnerability because administrators believe a range is protected when only a single port actually matches.

Solution: Generate multiple flows (one per port in the range) or implement OpenFlow masking for ranges that align to powers of 2. For large ranges, consider requiring administrators to split them into smaller, more specific rules to avoid generating excessive flows.

Regarding the hardcoded Actions: "allow" at line 199: This is intentional rather than a bug. The SecurityRule domain model has no field for allow/deny semantics—all rules are implicitly "allow" rules. Supporting deny/drop rules would require extending the domain model first with an Action field and updating all related repositories and service layers.

Comment on lines +119 to +138
func (a *OvsAdapter) CreateVXLANTunnel(ctx context.Context, bridge string, vni int, remoteIP string) error {
if !bridgeNameRegex.MatchString(bridge) {
return errors.New(errors.InvalidInput, "invalid bridge name")
}

tunnelName := fmt.Sprintf("vxlan-%s", strings.ReplaceAll(remoteIP, ".", "-"))
cmd := exec.CommandContext(ctx, a.ovsPath,
"add-port", bridge, tunnelName,
"--", "set", "interface", tunnelName,
"type=vxlan",
fmt.Sprintf("options:remote_ip=%s", remoteIP),
fmt.Sprintf("options:key=%d", vni),
)

if err := cmd.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to create vxlan tunnel", err)
}

return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing validation for remoteIP parameter.

The remoteIP is used directly in command arguments without validation. While exec.CommandContext avoids shell injection, an invalid or malicious IP string could:

  1. Create improperly named tunnel interfaces
  2. Cause OVS configuration corruption
  3. Bypass the tunnel name validation that DeletePort would enforce
🔧 Add IP address validation
+import "net"
+
 func (a *OvsAdapter) CreateVXLANTunnel(ctx context.Context, bridge string, vni int, remoteIP string) error {
 	if !bridgeNameRegex.MatchString(bridge) {
 		return errors.New(errors.InvalidInput, "invalid bridge name")
 	}
+	if ip := net.ParseIP(remoteIP); ip == nil {
+		return errors.New(errors.InvalidInput, "invalid remote IP address")
+	}

 	tunnelName := fmt.Sprintf("vxlan-%s", strings.ReplaceAll(remoteIP, ".", "-"))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @internal/repositories/ovs/adapter.go around lines 119 - 138,
CreateVXLANTunnel currently uses remoteIP directly which can produce
invalid/malicious tunnel names; validate remoteIP with net.ParseIP and return an
InvalidInput error if parsing fails, then derive tunnelName from the parsed IP
(use ip.String()) and sanitize by replacing '.' and ':' with '-' before building
the exec.CommandContext args; ensure the error path uses the same validation
error type and keep bridge validation via bridgeNameRegex unchanged so
DeletePort validation remains consistent.

Comment on lines +178 to +218
func (a *OvsAdapter) CreateVethPair(ctx context.Context, hostEnd, containerEnd string) error {
cmd := exec.CommandContext(ctx, "ip", "link", "add", hostEnd, "type", "veth", "peer", "name", containerEnd)
if err := cmd.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to create veth pair", err)
}
return nil
}

func (a *OvsAdapter) AttachVethToBridge(ctx context.Context, bridge, vethEnd string) error {
if err := a.AddPort(ctx, bridge, vethEnd); err != nil {
return err
}

cmd := exec.CommandContext(ctx, "ip", "link", "set", vethEnd, "up")
if err := cmd.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to set veth up", err)
}

return nil
}

func (a *OvsAdapter) DeleteVethPair(ctx context.Context, hostEnd string) error {
cmd := exec.CommandContext(ctx, "ip", "link", "del", hostEnd)
if err := cmd.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to delete veth pair", err)
}
return nil
}

func (a *OvsAdapter) SetVethIP(ctx context.Context, vethEnd, ip, cidr string) error {
cmd := exec.CommandContext(ctx, "ip", "addr", "add", fmt.Sprintf("%s/%s", ip, cidr), "dev", vethEnd)
if err := cmd.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to set veth ip", err)
}

cmdUp := exec.CommandContext(ctx, "ip", "link", "set", vethEnd, "up")
if err := cmdUp.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to bring veth up", err)
}
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing input validation for veth and IP configuration methods.

CreateVethPair, DeleteVethPair, and SetVethIP pass user-controlled input directly to the ip command without validation. While exec.CommandContext doesn't use shell interpolation, these methods should validate:

  • hostEnd/containerEnd: interface names (similar to bridgeNameRegex)
  • ip: valid IPv4/IPv6 address
  • cidr: valid prefix length

This ensures consistency with the validation applied to bridge/port names elsewhere.

🔧 Add validation for veth operations
+import "net"
+
+var ifaceNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
+
 func (a *OvsAdapter) CreateVethPair(ctx context.Context, hostEnd, containerEnd string) error {
+	if !ifaceNameRegex.MatchString(hostEnd) || !ifaceNameRegex.MatchString(containerEnd) {
+		return errors.New(errors.InvalidInput, "invalid veth endpoint name")
+	}
 	cmd := exec.CommandContext(ctx, "ip", "link", "add", hostEnd, "type", "veth", "peer", "name", containerEnd)
 	if err := cmd.Run(); err != nil {
 		return errors.Wrap(errors.Internal, "failed to create veth pair", err)
 	}
 	return nil
 }

 func (a *OvsAdapter) DeleteVethPair(ctx context.Context, hostEnd string) error {
+	if !ifaceNameRegex.MatchString(hostEnd) {
+		return errors.New(errors.InvalidInput, "invalid veth endpoint name")
+	}
 	cmd := exec.CommandContext(ctx, "ip", "link", "del", hostEnd)

 func (a *OvsAdapter) SetVethIP(ctx context.Context, vethEnd, ip, cidr string) error {
+	if !ifaceNameRegex.MatchString(vethEnd) {
+		return errors.New(errors.InvalidInput, "invalid veth endpoint name")
+	}
+	if parsedIP := net.ParseIP(ip); parsedIP == nil {
+		return errors.New(errors.InvalidInput, "invalid IP address")
+	}
+	// Optionally validate cidr is a valid prefix length (0-32 for IPv4, 0-128 for IPv6)
 	cmd := exec.CommandContext(ctx, "ip", "addr", "add", fmt.Sprintf("%s/%s", ip, cidr), "dev", vethEnd)

Committable suggestion skipped: line range outside the PR's diff.

- Change cleanDB to log and ignore errors instead of failing
- Allows tests to run even if not all migrations have been applied
- Fixes CI integration test failures
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/repositories/libvirt/adapter.go (1)

499-516: Add bounds checking for network pool exhaustion.

getNextNetworkRange() has no error return and doesn't validate whether the allocated network exceeds poolEnd. If the pool is exhausted (after ~101 networks given the default range), subsequent allocations will silently proceed beyond the intended range, potentially creating network conflicts.

Consider modifying getNextNetworkRange() to return an error when the pool is exhausted, and handle it here:

🔧 Proposed fix
 func (a *LibvirtAdapter) CreateNetwork(ctx context.Context, name string) (string, error) {
 	// Allocate network range from pool
-	gateway, rangeStart, rangeEnd := a.getNextNetworkRange()
+	gateway, rangeStart, rangeEnd, err := a.getNextNetworkRange()
+	if err != nil {
+		return "", fmt.Errorf("failed to allocate network range: %w", err)
+	}
 
 	// Simple NAT network
 	xml := generateNetworkXML(name, "virbr-"+name, gateway, rangeStart, rangeEnd)

And update getNextNetworkRange() signature and implementation (see next comment).

🤖 Fix all issues with AI agents
In @cmd/api/main.go:
- Around line 332-342: The security-group routes (sgGroup) only apply
authentication via httputil.Auth(identitySvc) but lack RBAC checks; update the
route group or each handler to enforce the appropriate permission(s) by adding
the Permission middleware (e.g., httputil.Permission(...))—for example apply
VPC-level permissions such as domain.PermissionVpcCreate,
domain.PermissionVpcList, domain.PermissionVpcGet, domain.PermissionVpcDelete,
domain.PermissionVpcUpdate (or the SG-specific permissions if present like
domain.PermissionSgCreate / PermissionSgList / PermissionSgGet /
PermissionSgDelete / PermissionSgUpdate) either as
sgGroup.Use(httputil.Permission(...)) for group-wide rules or by adding the
middleware to individual routes (sgHandler.Create, List, Get, Delete, AddRule,
Attach) so only authorized roles can access each endpoint.
- Around line 136-139: The OVS adapter initialization can return nil and cause
panics because networkBackend is used regardless; either fail fast on
initialization error or supply a safe fallback. Update the block that calls
ovs.NewOvsAdapter so that on error you return/fatalize (e.g., log error and
exit) when OVS is required, or instantiate and pass a NoOpNetworkBackend
implementation instead of leaving networkBackend nil; implement
NoOpNetworkBackend to satisfy the NetworkBackend interface and return clear
errors for all methods, and ensure services and the health endpoint use the
non-nil networkBackend variable.

In @internal/repositories/libvirt/adapter.go:
- Around line 716-749: The getNextNetworkRange function must validate that the
computed network does not exceed the configured pool end before incrementing
networkCounter; modify getNextNetworkRange to compute the candidate baseIP from
a.poolStart + (a.networkCounter * 256), then compare it to a.poolEnd using
bytes.Compare (import "bytes") and if the candidate > a.poolEnd return an error
instead of allocating (so change the signature to return (gateway, rangeStart,
rangeEnd string, err error)), only increment a.networkCounter after the bounds
check passes, and update all callers to handle the new error return.
🧹 Nitpick comments (3)
internal/repositories/libvirt/adapter.go (1)

59-67: Consider making the network pool range configurable.

The hardcoded IP pool (192.168.100.0 to 192.168.200.255) may not suit all deployment environments. Since the PR objectives mention configurable network settings via environment variables (DEFAULT_VPC_CIDR, NETWORK_POOL_START, NETWORK_POOL_END), consider using those here as well.

♻️ Proposed refactor to use environment variables
+import (
+	"os"
+)
+
 func NewLibvirtAdapter(logger *slog.Logger, uri string) (*LibvirtAdapter, error) {
+	poolStartStr := os.Getenv("NETWORK_POOL_START")
+	if poolStartStr == "" {
+		poolStartStr = "192.168.100.0"
+	}
+	poolEndStr := os.Getenv("NETWORK_POOL_END")
+	if poolEndStr == "" {
+		poolEndStr = "192.168.200.255"
+	}
+
+	poolStart := net.ParseIP(poolStartStr)
+	poolEnd := net.ParseIP(poolEndStr)
+	if poolStart == nil || poolEnd == nil {
+		return nil, fmt.Errorf("invalid network pool configuration")
+	}
+
 	return &LibvirtAdapter{
 		conn:           l,
 		logger:         logger,
 		uri:            uri,
 		portMappings:   make(map[string]map[string]int),
 		networkCounter: 0,
-		poolStart:      net.ParseIP("192.168.100.0"),
-		poolEnd:        net.ParseIP("192.168.200.255"),
+		poolStart:      poolStart,
+		poolEnd:        poolEnd,
 	}, nil
 }
internal/platform/config.go (1)

30-32: Consider validating network configuration formats at initialization.

The configuration values are used without format validation. Invalid CIDR blocks or IP addresses will only fail later during VPC or network operations. Adding validation here provides fail-fast behavior and clearer error messages at startup.

♻️ Suggested validation approach
+import (
+	"fmt"
+	"net"
+)
+
 func NewConfig() (*Config, error) {
 	_ = godotenv.Load() // Ignore error if .env doesn't exist
 
+	defaultCIDR := getEnv("DEFAULT_VPC_CIDR", "10.0.0.0/16")
+	if _, _, err := net.ParseCIDR(defaultCIDR); err != nil {
+		return nil, fmt.Errorf("invalid DEFAULT_VPC_CIDR: %w", err)
+	}
+
+	poolStart := getEnv("NETWORK_POOL_START", "192.168.100.0")
+	if net.ParseIP(poolStart) == nil {
+		return nil, fmt.Errorf("invalid NETWORK_POOL_START: not a valid IP address")
+	}
+
+	poolEnd := getEnv("NETWORK_POOL_END", "192.168.200.255")
+	if net.ParseIP(poolEnd) == nil {
+		return nil, fmt.Errorf("invalid NETWORK_POOL_END: not a valid IP address")
+	}
+
 	return &Config{
 		Port:                 getEnv("PORT", "8080"),
 		DatabaseURL:          getEnv("DATABASE_URL", "postgres://cloud:cloud@localhost:5433/thecloud"),
 		Environment:          getEnv("APP_ENV", "development"),
 		SecretsEncryptionKey: os.Getenv("SECRETS_ENCRYPTION_KEY"),
 		ComputeBackend:       getEnv("COMPUTE_BACKEND", "docker"),
-		DefaultVPCCIDR:       getEnv("DEFAULT_VPC_CIDR", "10.0.0.0/16"),
-		NetworkPoolStart:     getEnv("NETWORK_POOL_START", "192.168.100.0"),
-		NetworkPoolEnd:       getEnv("NETWORK_POOL_END", "192.168.200.255"),
+		DefaultVPCCIDR:       defaultCIDR,
+		NetworkPoolStart:     poolStart,
+		NetworkPoolEnd:       poolEnd,
 	}, nil
 }
internal/core/services/vpc.go (1)

37-40: Validate CIDR format before VPC creation.

The CIDR block is used without format validation. Invalid CIDR notation (e.g., "not-a-cidr", "10.0.0.0/33") would only fail when the VPC is used for subnet allocation or network operations, resulting in unclear error messages.

♻️ Add CIDR validation
+import (
+	"net"
+)
+
 func (s *VpcService) CreateVPC(ctx context.Context, name, cidrBlock string) (*domain.VPC, error) {
 	if cidrBlock == "" {
 		cidrBlock = s.defaultCIDR
 	}
+
+	// Validate CIDR format
+	if _, _, err := net.ParseCIDR(cidrBlock); err != nil {
+		return nil, errors.New(errors.BadRequest, "invalid CIDR block format")
+	}
 
 	userID := appcontext.UserIDFromContext(ctx)
 	vpcID := uuid.New()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4192df7 and ba419e5.

📒 Files selected for processing (6)
  • .env.example
  • cmd/api/main.go
  • internal/core/services/vpc.go
  • internal/core/services/vpc_test.go
  • internal/platform/config.go
  • internal/repositories/libvirt/adapter.go
🧰 Additional context used
🧬 Code graph analysis (2)
internal/core/services/vpc.go (5)
internal/core/ports/vpc.go (1)
  • VpcRepository (10-16)
internal/core/ports/network.go (1)
  • NetworkBackend (16-44)
internal/core/domain/vpc.go (1)
  • VPC (9-19)
internal/core/context/context.go (1)
  • UserIDFromContext (22-28)
internal/errors/errors.go (3)
  • Wrap (61-63)
  • Internal (13-13)
  • Error (38-43)
cmd/api/main.go (11)
internal/repositories/postgres/security_group_repo.go (1)
  • NewSecurityGroupRepository (18-20)
internal/repositories/postgres/subnet_repo.go (1)
  • NewSubnetRepository (19-21)
internal/repositories/ovs/adapter.go (1)
  • NewOvsAdapter (26-42)
internal/core/services/vpc.go (1)
  • NewVpcService (24-35)
internal/core/services/subnet.go (1)
  • NewSubnetService (24-31)
internal/core/services/instance.go (1)
  • NewInstanceService (35-47)
internal/core/services/security_group.go (1)
  • NewSecurityGroupService (25-39)
internal/handlers/security_group_handler.go (1)
  • NewSecurityGroupHandler (17-19)
internal/handlers/subnet_handler.go (1)
  • NewSubnetHandler (16-18)
internal/core/domain/rbac.go (3)
  • Permission (7-7)
  • PermissionVpcUpdate (20-20)
  • PermissionVpcRead (19-19)
pkg/httputil/auth.go (2)
  • Permission (49-66)
  • Auth (13-36)
🪛 dotenv-linter (4.0.0)
.env.example

[warning] 25-25: [UnorderedKey] The NETWORK_POOL_END key should go before the NETWORK_POOL_START key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (10)
internal/repositories/libvirt/adapter.go (1)

34-38: LGTM! Clean field additions for network pool management.

The new fields appropriately support dynamic network allocation with poolStart/poolEnd defining the range and networkCounter tracking the next available /24 subnet.

.env.example (1)

19-28: Network configuration defaults look good.

The default values are appropriate for development. The static analysis warning about key ordering (NETWORK_POOL_END before NETWORK_POOL_START) is a style preference for alphabetical ordering and can be safely ignored.

internal/platform/config.go (1)

16-18: New configuration fields are well-structured.

The addition of DefaultVPCCIDR, NetworkPoolStart, and NetworkPoolEnd aligns with the OVS networking integration objectives.

internal/core/services/vpc_test.go (4)

17-45: Test coverage for VPC creation looks good.

The test properly validates the updated CreateVPC signature with CIDR parameter, NetworkBackend usage, and bridge-based identifiers.


47-66: Rollback test properly validates cleanup behavior.

The test ensures that bridge deletion occurs when database persistence fails, which is critical for avoiding resource leaks.


68-93: Delete test correctly validates bridge cleanup.


95-132: List and Get tests properly updated for new service constructor.

internal/core/services/vpc.go (2)

24-35: Constructor properly handles default CIDR fallback.

The fallback to "10.0.0.0/16" ensures the service always has a valid default when initialization provides an empty string.


100-123: DeleteVPC implementation handles cleanup correctly.

The deletion flow properly removes the OVS bridge before database deletion, with appropriate error handling and logging.

cmd/api/main.go (1)

141-147: Service initialization properly wires network backend and configuration.

The services are correctly initialized with the network backend and default CIDR from configuration, assuming the critical nil pointer issue (lines 136-139) is addressed.

Comment thread cmd/api/main.go Outdated
Comment on lines +716 to +749
// getNextNetworkRange allocates the next /24 network from the pool
func (a *LibvirtAdapter) getNextNetworkRange() (gateway, rangeStart, rangeEnd string) {
a.mu.Lock()
defer a.mu.Unlock()

// Calculate base IP: poolStart + (networkCounter * 256)
baseIP := make(net.IP, len(a.poolStart))
copy(baseIP, a.poolStart)

// Add offset for this network (each network gets a /24)
offset := a.networkCounter * 256
for i := len(baseIP) - 1; i >= 0 && offset > 0; i-- {
sum := int(baseIP[i]) + offset
baseIP[i] = byte(sum % 256)
offset = sum / 256
}

a.networkCounter++

// Gateway is .1, DHCP range is .2 to .254
gw := make(net.IP, len(baseIP))
copy(gw, baseIP)
gw[len(gw)-1] = 1

start := make(net.IP, len(baseIP))
copy(start, baseIP)
start[len(start)-1] = 2

end := make(net.IP, len(baseIP))
copy(end, baseIP)
end[len(end)-1] = 254

return gw.String(), start.String(), end.String()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Critical: Add bounds validation to prevent allocation beyond pool range.

The function increments networkCounter unconditionally without checking if the resulting network exceeds poolEnd. After approximately 101 networks (given the default range 192.168.100.0 to 192.168.200.255), allocations will exceed the intended pool boundary, potentially causing IP conflicts with external networks.

🛡️ Proposed fix with bounds checking
-func (a *LibvirtAdapter) getNextNetworkRange() (gateway, rangeStart, rangeEnd string) {
+func (a *LibvirtAdapter) getNextNetworkRange() (gateway, rangeStart, rangeEnd string, err error) {
 	a.mu.Lock()
 	defer a.mu.Unlock()
 
 	// Calculate base IP: poolStart + (networkCounter * 256)
 	baseIP := make(net.IP, len(a.poolStart))
 	copy(baseIP, a.poolStart)
 
 	// Add offset for this network (each network gets a /24)
 	offset := a.networkCounter * 256
 	for i := len(baseIP) - 1; i >= 0 && offset > 0; i-- {
 		sum := int(baseIP[i]) + offset
 		baseIP[i] = byte(sum % 256)
 		offset = sum / 256
 	}
 
+	// Validate the allocated network is within pool bounds
+	// Check if baseIP + 255 exceeds poolEnd
+	endOfNetwork := make(net.IP, len(baseIP))
+	copy(endOfNetwork, baseIP)
+	endOfNetwork[len(endOfNetwork)-1] = 255
+	
+	if bytes.Compare(baseIP, a.poolEnd) > 0 || bytes.Compare(endOfNetwork, a.poolEnd) > 0 {
+		return "", "", "", fmt.Errorf("network pool exhausted: cannot allocate more /24 networks")
+	}
+
 	a.networkCounter++
 
 	// Gateway is .1, DHCP range is .2 to .254
 	gw := make(net.IP, len(baseIP))
 	copy(gw, baseIP)
 	gw[len(gw)-1] = 1
 
 	start := make(net.IP, len(baseIP))
 	copy(start, baseIP)
 	start[len(start)-1] = 2
 
 	end := make(net.IP, len(baseIP))
 	copy(end, baseIP)
 	end[len(end)-1] = 254
 
-	return gw.String(), start.String(), end.String()
+	return gw.String(), start.String(), end.String(), nil
 }

Note: You'll need to import "bytes" for the bytes.Compare function.

🤖 Prompt for AI Agents
In @internal/repositories/libvirt/adapter.go around lines 716 - 749, The
getNextNetworkRange function must validate that the computed network does not
exceed the configured pool end before incrementing networkCounter; modify
getNextNetworkRange to compute the candidate baseIP from a.poolStart +
(a.networkCounter * 256), then compare it to a.poolEnd using bytes.Compare
(import "bytes") and if the candidate > a.poolEnd return an error instead of
allocating (so change the signature to return (gateway, rangeStart, rangeEnd
string, err error)), only increment a.networkCounter after the bounds check
passes, and update all callers to handle the new error return.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/repositories/postgres/helper_test.go (1)

83-90: Approve with suggestion: Consider checking for specific error codes.

The change successfully makes cleanup resilient to missing tables, fixing the CI integration test failures as intended. However, the implementation ignores all errors rather than specifically checking for the "undefined_table" error (42P01) mentioned in the comment.

♻️ Refactor to check specific PostgreSQL error codes

Consider checking for specific error codes to avoid hiding real errors like permission issues or connection problems:

+import (
+	"errors"
+	"github.com/jackc/pgx/v5/pgconn"
+)
+
 func cleanDB(t *testing.T, db *pgxpool.Pool) {
 	ctx := context.Background()
 	queries := []string{
 		// ... existing queries ...
 	}
 
 	for _, q := range queries {
 		_, err := db.Exec(ctx, q)
-		// Ignore errors if table doesn't exist (42P01 error code)
-		// This allows tests to run even if not all migrations have been applied
 		if err != nil {
-			t.Logf("Cleanup query failed (ignoring): %s - %v", q, err)
+			var pgErr *pgconn.PgError
+			// Only ignore "undefined_table" errors (42P01) from missing migrations
+			if errors.As(err, &pgErr) && pgErr.Code == "42P01" {
+				t.Logf("Cleanup query skipped (table doesn't exist): %s", q)
+			} else {
+				// Log but still continue for other errors to maintain resilience
+				t.Logf("Cleanup query failed (ignoring): %s - %v", q, err)
+			}
 		}
 	}
 }

This provides better visibility into which errors are expected (missing tables) versus unexpected (permissions, connections).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba419e5 and 10992f7.

📒 Files selected for processing (1)
  • internal/repositories/postgres/helper_test.go
🔇 Additional comments (1)
internal/repositories/postgres/helper_test.go (1)

28-31: LGTM - Better diagnostics for test skips.

Including the underlying error in the skip message provides helpful diagnostic information when the database is unavailable.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 109 out of 110 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/setup-ovs.sh
Comment on lines +11 to +16
echo "--- Configuring Permissions ---"
# Allow current user to manage OVS without sudo
USER_NAME=$(whoami)
# Note: On some systems, adding to the 'ovs' or 'openvswitch' group works if it exists.
# Otherwise, we use ACLs or set permissions on the socket.
sudo setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setup-ovs.sh script uses setfacl to grant permissions on the OVS socket but doesn't check if the 'ovs' or 'openvswitch' group exists first. On some systems, adding the user to these groups is the preferred approach. Consider checking for group existence and using that method when available, or document the requirement for ACL support.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to 52
// 2. Create OVS bridge
bridgeName := fmt.Sprintf("br-vpc-%s", vpcID.String()[:8])
if err := s.network.CreateBridge(ctx, bridgeName, vxlanID); err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create OVS bridge", err)
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VPC service now uses a NetworkBackend interface but the CreateBridge call doesn't validate if the bridge already exists. If this method is called multiple times with the same bridge name (e.g., during a retry), OVS will fail. Consider checking if the bridge exists before attempting to create it, or handle the error gracefully.

Copilot uses AI. Check for mistakes.
Comment on lines +530 to +534
func (s *InstanceService) isValidHostIP(ip net.IP, n *net.IPNet) bool {
// Simple check: not network address and not broadcast address (if /30 or larger)
// For simplicity in this demo, we just ensure it's in range and not gateway
return n.Contains(ip)
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IP allocation logic doesn't handle the broadcast address. For a /24 subnet, the last IP (e.g., 10.0.1.255) is the broadcast address and should not be allocated. The isValidHostIP function should exclude both the network address and broadcast address from allocation.

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +161
func (s *SecurityGroupService) syncGroupFlows(ctx context.Context, sg *domain.SecurityGroup) error {
vpc, err := s.vpcRepo.GetByID(ctx, sg.VPCID)
if err != nil {
return err
}

// For each rule, generate an OVS flow
for _, rule := range sg.Rules {
flow := s.translateToFlow(rule)
if err := s.network.AddFlowRule(ctx, vpc.NetworkID, flow); err != nil {
return err
}
}

return nil
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syncGroupFlows function adds flow rules but never removes old ones. If a security rule is updated or removed, the old OpenFlow rules will remain active on the bridge, potentially allowing unwanted traffic. The function should delete existing flows for the group before adding new ones.

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +12
ALTER TABLE instances ADD COLUMN subnet_id UUID REFERENCES subnets(id);
ALTER TABLE instances ADD COLUMN private_ip INET;
ALTER TABLE instances ADD COLUMN ovs_port VARCHAR(64);

CREATE INDEX idx_instances_subnet ON instances(subnet_id);

-- +goose Down
DROP INDEX IF EXISTS idx_instances_subnet;
ALTER TABLE instances DROP COLUMN IF EXISTS subnet_id;
ALTER TABLE instances DROP COLUMN IF EXISTS private_ip;
ALTER TABLE instances DROP COLUMN IF EXISTS ovs_port;
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration file adds new columns to the instances table but doesn't set default values for existing rows. If there are existing instances in the database, they will have NULL values for subnet_id, private_ip, and ovs_port. Consider adding default values or a data migration step to handle existing instances.

Copilot uses AI. Check for mistakes.
Comment thread docker-compose.yml
@@ -28,13 +28,17 @@ services:
condition: service_healthy
ports:
- "${PORT:-8080}:${PORT:-8080}"
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker Compose file adds NET_ADMIN and NET_RAW capabilities but doesn't document why these are needed or the security implications. These are privileged capabilities that increase the container's attack surface. Add a comment explaining that these are required for OVS networking operations.

Suggested change
- "${PORT:-8080}:${PORT:-8080}"
- "${PORT:-8080}:${PORT:-8080}"
# NOTE: NET_ADMIN and NET_RAW are privileged capabilities required for
# Open vSwitch (OVS) networking operations (see /var/run/openvswitch mount).
# These increase the container's network attack surface and should only be
# enabled when OVS-based networking is explicitly needed.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +46
// 1. Generate unique VNI (for demo purposes we use a hash based int)
vxlanID := int(vpcID[0]) + 100
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VXLAN ID generation using int(vpcID[0]) + 100 is simplistic and may cause collisions. For example, VPCs with IDs starting with the same byte will get the same VXLAN ID. Consider using a hash function or maintaining a counter/registry to ensure unique VXLAN IDs across VPCs.

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +200
return ports.FlowRule{
Priority: rule.Priority,
Match: strings.Join(matchParts, ","),
Actions: "allow", // Simplified
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security group flow translation uses "allow" as an action, but this is not a valid OpenFlow action. Standard OVS actions are "output", "drop", "normal", etc. The flows will not work as expected. Consider using "normal" or a specific output port, or document that this is a placeholder for future implementation.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +42
func NewOvsAdapter(logger *slog.Logger) (*OvsAdapter, error) {
ovsctl, err := exec.LookPath("ovs-vsctl")
if err != nil {
return nil, fmt.Errorf("ovs-vsctl not found: %w", err)
}

ofctl, err := exec.LookPath("ovs-ofctl")
if err != nil {
return nil, fmt.Errorf("ovs-ofctl not found: %w", err)
}

return &OvsAdapter{
ovsPath: ovsctl,
ofctlPath: ofctl,
logger: logger,
}, nil
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OVS adapter calls exec.LookPath to find ovs-vsctl and ovs-ofctl at initialization. If OVS is installed after the application starts, or if the PATH changes, the adapter will not find the tools. Consider documenting that OVS must be installed before starting the application, or implement periodic health checks that update the paths.

Copilot uses AI. Check for mistakes.
1. Network Backend Nil Safety:
   - Add NoopNetworkAdapter as fallback when OVS fails to initialize
   - Update /health/ovs endpoint with nil checks and degraded status
   - Prevents panics from nil networkBackend

2. Migration Completeness:
   - Add missing .down.sql files for migrations 031 and 032
   - Separate embedded down migration from 032 .up.sql file
   - All migrations now have proper Up/Down pairs for Goose

3. Security Group Ownership Validation:
   - Add ownership checks in AddInstanceToGroup and RemoveInstanceFromGroup
   - Verify both instance and security group belong to caller
   - Use transactions to prevent race conditions
   - Return Forbidden error for unauthorized access attempts
- Wrap deferred tx.Rollback calls in anonymous functions
- Explicitly ignore error with underscore assignment
- Fixes errcheck linter warnings
Copilot AI review requested due to automatic review settings January 8, 2026 17:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 112 out of 113 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix Postgres INET/CIDR type handling with NULLIF and explicit casting
- Resolve clock skew flakiness in queue repository tests by using DB server time
- Ensure isolation tests pass by allowing NULL network fields in tests
- All integration tests now passing consistently
- Added package documentation comments
- Added struct and method docstrings for VpcService, InstanceService
- Added documentation for InstanceRepository, VpcRepository, and PostgresQueueRepository
- Improved code discoverability for IDEs and godoc
Copilot AI review requested due to automatic review settings January 8, 2026 18:27
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jan 8, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
34 Security Hotspots
4.3% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 113 out of 114 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@poyrazK poyrazK merged commit 461d6a1 into main Jan 8, 2026
11 of 13 checks passed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
internal/repositories/postgres/queue_repo.go (3)

70-78: Missing rows.Err() check after iteration.

After the for rows.Next() loop, you should check rows.Err() to catch any errors that occurred during iteration. Without this, iteration errors (e.g., network issues mid-fetch) are silently ignored.

Proposed fix
 	for rows.Next() {
 		q := &domain.Queue{}
 		if err := rows.Scan(&q.ID, &q.UserID, &q.Name, &q.ARN, &q.VisibilityTimeout, &q.RetentionDays, &q.MaxMessageSize, &q.Status, &q.CreatedAt, &q.UpdatedAt); err != nil {
 			return nil, err
 		}
 		queues = append(queues, q)
 	}
+	if err := rows.Err(); err != nil {
+		return nil, err
+	}
 	return queues, nil

89-108: Clock skew condition is always true—logic bug.

The condition on line 99 checks if m.VisibleAt is within one second of time.Now(), but m.VisibleAt is set to time.Now() just 5 lines earlier (line 94). This means the condition will always be true, making the else branch unreachable dead code.

Additionally, when using NOW(), the returned *domain.Message still contains Go's time.Now() values rather than the actual database server timestamps, which could cause inconsistency.

Suggested simplification (always use DB time)
 func (r *PostgresQueueRepository) SendMessage(ctx context.Context, queueID uuid.UUID, body string) (*domain.Message, error) {
 	m := &domain.Message{
 		ID:      uuid.New(),
 		QueueID: queueID,
 		Body:    body,
-		VisibleAt: time.Now(),
-		CreatedAt: time.Now(),
 	}
-	query := `INSERT INTO queue_messages (id, queue_id, body, visible_at, created_at) VALUES ($1, $2, $3, $4, $5)`
-	var err error
-	if time.Since(m.VisibleAt) < time.Second && time.Until(m.VisibleAt) < time.Second {
-		query = `INSERT INTO queue_messages (id, queue_id, body, visible_at, created_at) VALUES ($1, $2, $3, NOW(), NOW())`
-		_, err = r.db.Exec(ctx, query, m.ID, m.QueueID, m.Body)
-	} else {
-		_, err = r.db.Exec(ctx, query, m.ID, m.QueueID, m.Body, m.VisibleAt, m.CreatedAt)
-	}
+	query := `INSERT INTO queue_messages (id, queue_id, body, visible_at, created_at) 
+		VALUES ($1, $2, $3, NOW(), NOW()) 
+		RETURNING visible_at, created_at`
+	err := r.db.QueryRow(ctx, query, m.ID, m.QueueID, m.Body).Scan(&m.VisibleAt, &m.CreatedAt)
 	if err != nil {
 		return nil, err
 	}
 	return m, nil
 }

185-197: GetQueueStats is unused code and not part of the QueueRepository interface.

The method is defined in PostgresQueueRepository and has a corresponding mock, but is never called anywhere in the codebase and is not exposed through the QueueRepository interface (see internal/core/ports/queue.go lines 16-28).

Either remove this method if it's not needed, or if it's intended for future use, add it to the interface:

GetQueueStats(ctx context.Context, queueID uuid.UUID) (visible int, inFlight int, err error)
🤖 Fix all issues with AI agents
In @internal/core/services/instance.go:
- Around line 146-173: The post-launch OVS wiring silently swallows errors and
calls the host "ip" command directly; update the code that calls
network.CreateVethPair, network.AttachVethToBridge, and network.SetVethIP (the
block guarded by inst.OvsPort and functions referencing s.vpcRepo.GetByID and
s.subnetRepo.GetByID) to check and handle returned errors (log via process
logger and return an error to mark the instance network setup as failed) instead
of using `_ =`. Add a BringUpInterface(ctx, ifaceName) method to the
NetworkBackend interface, implement it in backends, and replace
exec.CommandContext("ip", "link", "set", ...) with network.BringUpInterface(ctx,
vethContainer); ensure all network setup failures cause the instance launch path
to surface the error so instances are not reported running without connectivity.

In @internal/core/services/vpc.go:
- Around line 52-53: The current VXLAN ID generation uses only the first byte of
vpcID (vxlanID := int(vpcID[0]) + 100) which restricts VNI space and risks
collisions; replace it with a proper 24-bit VNI derivation or allocator: derive
vxlanID by hashing more of the UUID (e.g., use the first 3 bytes or a hash like
CRC32/SHA1 and mask/mod to 24 bits) and ensure the result is within the valid
VXLAN range (0..16777215) and offset as needed, or plug in a
counter/registry-based allocator to guarantee uniqueness; update the vxlanID
variable assignment and any docs/comments accordingly.

In @internal/repositories/postgres/security_group_repo.go:
- Around line 224-246: ListInstanceGroups currently returns groups for any
instance ID without enforcing ownership; change the signature to
ListInstanceGroups(ctx context.Context, instanceID uuid.UUID, userID uuid.UUID)
and modify the SQL to enforce ownership by joining the instances table and/or
checking group owner (e.g. add "JOIN instances i ON isg.instance_id = i.id" and
"WHERE isg.instance_id = $1 AND i.user_id = $2 AND sg.user_id = $2" or at
minimum "AND sg.user_id = $2"), pass instanceID and userID into r.db.Query, and
update all callers to supply the caller's userID so only groups owned by that
user (and attached to an instance owned by that user) are returned.
🧹 Nitpick comments (3)
internal/core/services/vpc.go (1)

109-127: DeleteVPC order may leave inconsistent state on DB failure.

If DeleteBridge succeeds but repo.Delete fails, the VPC record remains in DB but the bridge is gone. Consider reversing the order: delete from DB first (or use a transaction), then delete the bridge.

♻️ Proposed fix to improve deletion ordering
 func (s *VpcService) DeleteVPC(ctx context.Context, idOrName string) error {
 	vpc, err := s.GetVPC(ctx, idOrName)
 	if err != nil {
 		return err
 	}

-	// 1. Remove OVS bridge
-	if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil {
-		s.logger.Error("failed to remove OVS bridge", "bridge", vpc.NetworkID, "error", err)
-		return errors.Wrap(errors.Internal, "failed to remove OVS bridge", err)
-	}
-	s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID)
-
-	// 2. Delete from DB
+	// 1. Delete from DB first (authoritative state)
 	if err := s.repo.Delete(ctx, vpc.ID); err != nil {
 		return errors.Wrap(errors.Internal, "failed to delete VPC from database", err)
 	}

+	// 2. Remove OVS bridge (best effort after DB success)
+	if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil {
+		s.logger.Error("failed to remove OVS bridge after DB deletion", "bridge", vpc.NetworkID, "error", err)
+		// Log but don't fail - DB is source of truth
+	}
+	s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID)
+
 	_ = s.auditSvc.Log(ctx, vpc.UserID, "vpc.delete", "vpc", vpc.ID.String(), map[string]interface{}{
 		"name": vpc.Name,
 	})

 	return nil
 }
internal/core/services/instance.go (1)

498-540: IP allocation logic has edge cases and potential infinite loop risk.

Issues:

  1. The loop increments IP bytes but the overflow handling at line 523-528 may not correctly handle all cases (e.g., 255.255.255.255 increment).
  2. isValidHostIP (lines 542-545) only checks Contains() but the comment says it should exclude network and broadcast addresses.
  3. No timeout/limit on the loop - a full subnet could cause long iteration.
♻️ Proposed fix for isValidHostIP
 func (s *InstanceService) isValidHostIP(ip net.IP, n *net.IPNet) bool {
-	// Simple check: not network address and not broadcast address (if /30 or larger)
-	// For simplicity in this demo, we just ensure it's in range and not gateway
-	return n.Contains(ip)
+	if !n.Contains(ip) {
+		return false
+	}
+	// Check for network address (all host bits zero)
+	if ip.Equal(n.IP) {
+		return false
+	}
+	// Check for broadcast address (all host bits one) - IPv4 only
+	if ip4 := ip.To4(); ip4 != nil {
+		broadcast := make(net.IP, len(n.IP))
+		for i := range broadcast {
+			broadcast[i] = n.IP[i] | ^n.Mask[i]
+		}
+		if ip.Equal(broadcast) {
+			return false
+		}
+	}
+	return true
 }
internal/repositories/postgres/queue_repo.go (1)

134-145: Consider using database time for visibility timeout calculation.

The visibility timeout is calculated using Go's time.Now() (line 135, 142), while message selection uses the database's NOW() (line 124). Clock skew between the application server and database server could cause messages to become visible earlier or later than expected.

For consistency, consider calculating visible_at in the UPDATE query:

-	updateQuery := `UPDATE queue_messages SET receipt_handle = $1, visible_at = $2, received_count = received_count + 1 WHERE id = $3`
+	updateQuery := `UPDATE queue_messages SET receipt_handle = $1, visible_at = NOW() + ($2 || ' seconds')::interval, received_count = received_count + 1 WHERE id = $3`
 	for _, m := range messages {
-		_, err := tx.Exec(ctx, updateQuery, m.ReceiptHandle, m.VisibleAt, m.ID)
+		_, err := tx.Exec(ctx, updateQuery, m.ReceiptHandle, visibilityTimeout, m.ID)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10992f7 and 4953e91.

📒 Files selected for processing (12)
  • cmd/api/main.go
  • internal/core/services/instance.go
  • internal/core/services/vpc.go
  • internal/repositories/noop/network_adapter.go
  • internal/repositories/postgres/instance_repo.go
  • internal/repositories/postgres/migrations/031_create_subnets.down.sql
  • internal/repositories/postgres/migrations/032_add_instance_network_fields.down.sql
  • internal/repositories/postgres/migrations/032_add_instance_network_fields.up.sql
  • internal/repositories/postgres/queue_repo.go
  • internal/repositories/postgres/security_group_repo.go
  • internal/repositories/postgres/subnet_repo.go
  • internal/repositories/postgres/vpc_repo.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/repositories/postgres/subnet_repo.go
🧰 Additional context used
🧬 Code graph analysis (6)
internal/core/services/instance.go (4)
internal/core/ports/instance.go (1)
  • InstanceRepository (11-19)
internal/core/ports/compute.go (1)
  • ComputeBackend (9-37)
internal/core/ports/network.go (1)
  • NetworkBackend (16-44)
internal/core/domain/instance.go (1)
  • Instance (25-40)
internal/repositories/postgres/instance_repo.go (5)
internal/core/ports/instance.go (1)
  • InstanceRepository (11-19)
pkg/sdk/compute.go (1)
  • Instance (8-21)
internal/core/domain/instance.go (1)
  • Instance (25-40)
internal/errors/errors.go (1)
  • Wrap (61-63)
internal/core/context/context.go (1)
  • UserIDFromContext (22-28)
internal/repositories/postgres/security_group_repo.go (2)
internal/errors/errors.go (5)
  • Wrap (61-63)
  • Internal (13-13)
  • New (57-59)
  • NotFound (11-11)
  • Forbidden (16-16)
internal/core/context/context.go (1)
  • UserIDFromContext (22-28)
internal/repositories/noop/network_adapter.go (1)
internal/core/ports/network.go (1)
  • FlowRule (8-13)
internal/repositories/postgres/vpc_repo.go (5)
internal/core/ports/vpc.go (1)
  • VpcRepository (10-16)
pkg/sdk/vpc.go (1)
  • VPC (8-17)
internal/core/domain/vpc.go (1)
  • VPC (9-19)
internal/errors/errors.go (2)
  • Wrap (61-63)
  • Internal (13-13)
internal/core/context/context.go (1)
  • UserIDFromContext (22-28)
internal/repositories/postgres/queue_repo.go (1)
internal/core/ports/queue.go (1)
  • QueueRepository (16-28)
🔇 Additional comments (29)
internal/repositories/postgres/migrations/031_create_subnets.down.sql (1)

1-4: LGTM! Clean down-migration following best practices.

The migration correctly reverses the corresponding up-migration by:

  • Using the proper -- +goose Down marker
  • Dropping indices before the table
  • Using IF EXISTS for idempotent execution

The absence of CASCADE on the DROP TABLE statement is appropriate as it provides a safety check—if other tables have foreign keys referencing subnets, the migration will fail rather than cascade deletions, forcing explicit dependency handling in the migration sequence.

internal/repositories/postgres/vpc_repo.go (3)

15-36: LGTM! Proper handling of nullable CIDR field.

The Create method correctly uses NULLIF($4, '')::cidr to handle empty string conversion to NULL for PostgreSQL's CIDR type. The docstrings are clear, and parameter ordering is correct.


38-70: LGTM! Consistent user scoping and NULL handling.

Both GetByID and GetByName properly:

  • Filter by authenticated user ID from context
  • Use COALESCE(cidr_block::text, '') to handle NULL CIDR values
  • Cast CIDR to text before applying COALESCE

72-108: LGTM! Complete CRUD operations with proper scoping.

The List and Delete methods maintain consistency:

  • User-scoped filtering throughout
  • Proper error handling with NotFound checks
  • Consistent use of error wrapping patterns
internal/repositories/noop/network_adapter.go (2)

1-35: LGTM! Clean fallback implementation.

The noop adapter provides a safe fallback when OVS is unavailable:

  • Clear package documentation
  • Warning logs help identify when the fallback is in use
  • Returns safe zero values (empty slice for ListBridges, nil for errors)

36-96: LGTM! Complete NetworkBackend implementation.

All interface methods are properly implemented:

  • Consistent warning logs for mutating operations
  • Safe return values for queries (empty slices)
  • Ping() returns nil (healthy) for a noop adapter
  • Type() returns "noop" for identification

This provides a complete fallback when OVS is unavailable.

internal/repositories/postgres/migrations/032_add_instance_network_fields.down.sql (1)

1-5: LGTM! Proper down migration with idempotency guards.

The down migration correctly:

  • Drops operations in reverse order (index first, then columns)
  • Uses IF EXISTS guards for idempotency
  • Matches the up migration changes
internal/repositories/postgres/instance_repo.go (5)

26-40: LGTM! Correct parameter ordering and NULL handling.

The Create method properly:

  • Uses NULLIF($10, '')::inet to convert empty strings to NULL for the INET type
  • Aligns all 14 query parameters with the Exec arguments
  • Maintains consistent error handling

42-82: LGTM! Consistent user scoping and NULL handling.

Both GetByID and GetByName:

  • Properly filter by authenticated user ID
  • Use COALESCE with appropriate type casting for nullable fields
  • Match SELECT column order with Scan parameters

84-111: LGTM! Complete list implementation with proper handling.

The List method:

  • Applies user-scoped filtering
  • Orders by creation time (newest first)
  • Handles nullable fields consistently
  • Properly iterates rows with deferred close

113-134: LGTM! Proper optimistic locking and parameter ordering.

The Update method correctly:

  • Implements optimistic locking via the version field
  • Uses NULLIF($8, '')::inet for private_ip handling
  • Aligns all 12 parameters correctly
  • Checks RowsAffected() for conflict detection
  • Updates the in-memory version and timestamp after success

136-163: LGTM! Secure subnet-scoped query.

The new ListBySubnet method properly:

  • Filters by both subnet ID and authenticated user ID for security
  • Uses consistent NULL handling with COALESCE
  • Maintains the same structure as the List method
  • Orders results by creation time
internal/repositories/postgres/migrations/032_add_instance_network_fields.up.sql (1)

1-6: No issue found. Migration 031 (031_create_subnets.up.sql) correctly creates the subnets table before migration 032 references it. The foreign key constraint on line 2 is safely dependent on a table created in the preceding migration.

cmd/api/main.go (2)

30-31: OVS adapter initialization with noop fallback looks good.

The fallback to noop adapter when OVS initialization fails ensures the API can still start in degraded mode. The warning log provides visibility into the fallback scenario.

Also applies to: 137-144


271-288: OVS health check endpoint is well-structured.

The endpoint correctly handles three states: unavailable (nil backend), degraded (noop), and healthy/unhealthy (OVS). The Type() method usage for backend detection is clean.

internal/core/services/vpc.go (2)

1-1: VpcService refactored for network backend integration.

The struct and constructor changes properly inject the network backend and default CIDR. The fallback to "10.0.0.0/16" is sensible.

Also applies to: 17-25, 27-40


77-84: Good rollback logic for bridge cleanup on DB failure.

The code correctly cleans up the OVS bridge if the database insert fails, preventing orphaned network resources.

internal/core/services/instance.go (2)

23-36: InstanceService updated with subnet and network backend dependencies.

The constructor and struct changes properly integrate the new dependencies for subnet-aware networking.

Also applies to: 38-51


95-111: Pre-launch OVS networking setup is well-structured.

The subnet validation, IP allocation, and veth port naming follow a logical flow. Good use of error wrapping with appropriate error types.

internal/repositories/postgres/security_group_repo.go (4)

14-32: Repository structure and Create method look good.

The struct initialization and Create method properly handle the security group insertion with appropriate error wrapping.


34-76: GetByID and GetByName properly enforce user ownership.

Both methods correctly filter by user_id from context and attach rules via the helper method.


124-172: AddInstanceToGroup and RemoveInstanceFromGroup have proper transaction handling and ownership checks.

Both methods correctly:

  • Use transactions for atomicity
  • Verify both instance and security group ownership
  • Return appropriate errors (Forbidden, NotFound)
  • Handle deferred rollback safely

Also applies to: 174-222


248-265: getRulesForGroup helper is correctly implemented.

The internal method properly fetches rules ordered by priority. As a private helper called after ownership is already verified, it doesn't need additional access checks.

internal/repositories/postgres/queue_repo.go (6)

15-24: LGTM!

Clean struct definition with proper interface return type from the constructor. Using pgxpool.Pool is appropriate for concurrent access.


26-35: LGTM!

Standard parameterized insert with proper field mapping.


37-59: LGTM!

Consistent pattern for retrieving queues with proper user scoping. Returning (nil, nil) for not-found is idiomatic for this codebase.


164-174: LGTM!

Proper validation of deletion success with clear error messaging for expired or invalid receipt handles.


176-183: LGTM!

Clean implementation returning the count of purged messages.


81-85: Delete method ownership scoping is intentional and correctly enforced at the service layer.

The DeleteQueue service method calls GetQueue(ctx, id) before invoking repo.Delete(). GetQueue extracts the user ID from context and calls repo.GetByID(ctx, id, userID), which includes a WHERE id = $1 AND user_id = $2 clause. If the queue doesn't belong to the user or doesn't exist, GetQueue returns an error and the delete never executes. The repository-level Delete method intentionally omits user scoping because it's a low-level internal method only called after ownership has been verified upstream. This is a valid architectural pattern where business logic validation happens at the service layer.

Comment thread cmd/api/main.go
Comment on lines +348 to 358
// Security Group Routes (Protected)
sgGroup := r.Group("/security-groups")
sgGroup.Use(httputil.Auth(identitySvc))
{
sgGroup.POST("", sgHandler.Create)
sgGroup.GET("", sgHandler.List)
sgGroup.GET("/:id", sgHandler.Get)
sgGroup.DELETE("/:id", sgHandler.Delete)
sgGroup.POST("/:id/rules", sgHandler.AddRule)
sgGroup.POST("/attach", sgHandler.Attach)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for security group permissions in domain
rg -n "PermissionSg" --type=go

# Also check for pattern of other permission definitions
rg -n "Permission.*=" --type=go | head -50

Repository: PoyrazK/thecloud

Length of output: 4356


🏁 Script executed:

#!/bin/bash
# Read the complete domain/rbac.go file
cat -n internal/core/domain/rbac.go

# Check for security group routes in main.go
echo "=== Checking security group routes ==="
rg -n "security-groups|sgGroup" cmd/api/main.go -A 10 -B 2

# Check for other protected routes to verify the pattern
echo "=== Checking instance routes pattern ==="
rg -n "instances|instanceGroup" cmd/api/main.go -A 10 -B 2 | head -40

Repository: PoyrazK/thecloud

Length of output: 8585


🏁 Script executed:

#!/bin/bash
# Check storage routes more closely
rg -n "storage" cmd/api/main.go -A 10 -B 2 | head -50

# Check if there are any other route groups without Permission middleware
rg -n "\.Use\(httputil\.Auth" cmd/api/main.go | wc -l
rg -n "httputil\.Permission" cmd/api/main.go | wc -l

Repository: PoyrazK/thecloud

Length of output: 1658


Add RBAC permission checks to security group routes (and define missing permission constants first).

Security group routes lack httputil.Permission() middleware checks like other protected resources (instances, VPCs, etc.). However, the required permission constants (PermissionSgCreate, PermissionSgRead, PermissionSgDelete, PermissionSgUpdate) do not exist in internal/core/domain/rbac.go and must be defined before the middleware can be applied.

First, add the missing permission constants to internal/core/domain/rbac.go following the pattern of existing resources:

// Security Group Permissions
PermissionSgCreate Permission = "sg:create"
PermissionSgRead   Permission = "sg:read"
PermissionSgDelete Permission = "sg:delete"
PermissionSgUpdate Permission = "sg:update"

Then apply the httputil.Permission() middleware to routes at lines 352–357, similar to instance routes at lines 317–323.

Note: Storage routes (lines 364–367) and event routes (lines 374) have the same issue and should be addressed similarly.

Comment on lines +146 to +173
// 4a. OVS Post-launch plumb
if inst.OvsPort != "" && s.network != nil {
vethContainer := fmt.Sprintf("eth0-%s", inst.ID.String()[:8])
if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err == nil {
vpc, _ := s.vpcRepo.GetByID(ctx, *inst.VpcID)
_ = s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort)

// Set IP on the host simulation "container" end
if inst.SubnetID != nil {
subnet, _ := s.subnetRepo.GetByID(ctx, *inst.SubnetID)
if subnet != nil {
_, ipNet, _ := net.ParseCIDR(subnet.CIDRBlock)
ones, _ := ipNet.Mask.Size()
// In a real cloud, this happens inside the container namespace.
// For this demo, we do it on the host for the 'peer' end.
_ = s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones))

// Ensure the "container" end is also up on host if simulating
cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up")
_ = cmd.Run()
}
}

// Note: Moving veth to container namespace requires the container PID,
// which Docker shim handles in a real cloud. For this local demo,
// we just create the veth pair on host.
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Post-launch OVS wiring silently ignores errors and uses direct shell execution.

Several issues:

  1. Errors from CreateVethPair, AttachVethToBridge, and SetVethIP are silently ignored with _ = or nested in if-else.
  2. Line 164-165 uses exec.CommandContext directly, which bypasses the network backend abstraction and may fail silently.
  3. If veth setup fails, the instance appears running but has no network connectivity.
🔧 Proposed improvements
 	// 4a. OVS Post-launch plumb
 	if inst.OvsPort != "" && s.network != nil {
 		vethContainer := fmt.Sprintf("eth0-%s", inst.ID.String()[:8])
-		if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err == nil {
-			vpc, _ := s.vpcRepo.GetByID(ctx, *inst.VpcID)
-			_ = s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort)
-
-			// Set IP on the host simulation "container" end
-			if inst.SubnetID != nil {
-				subnet, _ := s.subnetRepo.GetByID(ctx, *inst.SubnetID)
-				if subnet != nil {
-					_, ipNet, _ := net.ParseCIDR(subnet.CIDRBlock)
-					ones, _ := ipNet.Mask.Size()
-					// In a real cloud, this happens inside the container namespace.
-					// For this demo, we do it on the host for the 'peer' end.
-					_ = s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones))
-
-					// Ensure the "container" end is also up on host if simulating
-					cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up")
-					_ = cmd.Run()
-				}
-			}
-
-			// Note: Moving veth to container namespace requires the container PID,
-			// which Docker shim handles in a real cloud. For this local demo,
-			// we just create the veth pair on host.
+		if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err != nil {
+			s.logger.Warn("failed to create veth pair", "instance_id", inst.ID, "error", err)
+		} else {
+			vpc, err := s.vpcRepo.GetByID(ctx, *inst.VpcID)
+			if err != nil {
+				s.logger.Warn("failed to get VPC for bridge attachment", "vpc_id", inst.VpcID, "error", err)
+			} else if err := s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort); err != nil {
+				s.logger.Warn("failed to attach veth to bridge", "bridge", vpc.NetworkID, "error", err)
+			}
+
+			if inst.SubnetID != nil {
+				subnet, err := s.subnetRepo.GetByID(ctx, *inst.SubnetID)
+				if err != nil {
+					s.logger.Warn("failed to get subnet for IP config", "subnet_id", inst.SubnetID, "error", err)
+				} else if subnet != nil {
+					_, ipNet, err := net.ParseCIDR(subnet.CIDRBlock)
+					if err != nil {
+						s.logger.Warn("failed to parse subnet CIDR", "cidr", subnet.CIDRBlock, "error", err)
+					} else {
+						ones, _ := ipNet.Mask.Size()
+						if err := s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones)); err != nil {
+							s.logger.Warn("failed to set veth IP", "veth", vethContainer, "ip", inst.PrivateIP, "error", err)
+						}
+					}
+				}
+			}
 		}
 	}

Consider adding a BringUpInterface(ctx, ifaceName) method to the NetworkBackend interface to avoid direct exec.Command calls.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @internal/core/services/instance.go around lines 146 - 173, The post-launch
OVS wiring silently swallows errors and calls the host "ip" command directly;
update the code that calls network.CreateVethPair, network.AttachVethToBridge,
and network.SetVethIP (the block guarded by inst.OvsPort and functions
referencing s.vpcRepo.GetByID and s.subnetRepo.GetByID) to check and handle
returned errors (log via process logger and return an error to mark the instance
network setup as failed) instead of using `_ =`. Add a BringUpInterface(ctx,
ifaceName) method to the NetworkBackend interface, implement it in backends, and
replace exec.CommandContext("ip", "link", "set", ...) with
network.BringUpInterface(ctx, vethContainer); ensure all network setup failures
cause the instance launch path to surface the error so instances are not
reported running without connectivity.

Comment on lines +52 to +53
// 1. Generate unique VNI (for demo purposes we use a hash based int)
vxlanID := int(vpcID[0]) + 100
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

VXLAN ID generation may cause collisions.

Using only the first byte of the UUID (int(vpcID[0]) + 100) limits VXLAN IDs to range 100-355, which can cause collisions with more than 256 VPCs. VXLAN VNI supports 24-bit values (up to ~16 million).

Consider using more bytes or a counter-based approach for production:

-	// 1. Generate unique VNI (for demo purposes we use a hash based int)
-	vxlanID := int(vpcID[0]) + 100
+	// Generate VNI using first 3 bytes of UUID (24-bit range)
+	vxlanID := (int(vpcID[0])<<16 | int(vpcID[1])<<8 | int(vpcID[2])) & 0xFFFFFF
+	if vxlanID < 100 {
+		vxlanID += 100 // Avoid reserved range
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// 1. Generate unique VNI (for demo purposes we use a hash based int)
vxlanID := int(vpcID[0]) + 100
// Generate VNI using first 3 bytes of UUID (24-bit range)
vxlanID := (int(vpcID[0])<<16 | int(vpcID[1])<<8 | int(vpcID[2])) & 0xFFFFFF
if vxlanID < 100 {
vxlanID += 100 // Avoid reserved range
}
🤖 Prompt for AI Agents
In @internal/core/services/vpc.go around lines 52 - 53, The current VXLAN ID
generation uses only the first byte of vpcID (vxlanID := int(vpcID[0]) + 100)
which restricts VNI space and risks collisions; replace it with a proper 24-bit
VNI derivation or allocator: derive vxlanID by hashing more of the UUID (e.g.,
use the first 3 bytes or a hash like CRC32/SHA1 and mask/mod to 24 bits) and
ensure the result is within the valid VXLAN range (0..16777215) and offset as
needed, or plug in a counter/registry-based allocator to guarantee uniqueness;
update the vxlanID variable assignment and any docs/comments accordingly.

Comment on lines +111 to +122
func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
query := `DELETE FROM security_rules WHERE id = $1`
_, err := r.db.Exec(ctx, query, ruleID)
return err
}

func (r *SecurityGroupRepository) Delete(ctx context.Context, id uuid.UUID) error {
userID := appcontext.UserIDFromContext(ctx)
query := `DELETE FROM security_groups WHERE id = $1 AND user_id = $2`
_, err := r.db.Exec(ctx, query, id, userID)
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

DeleteRule and Delete lack ownership verification.

DeleteRule deletes any rule by ID without verifying the caller owns the parent security group. Delete checks user_id but doesn't verify the row was actually deleted (could silently succeed on non-existent ID).

🔒 Proposed fix for DeleteRule
 func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
-	query := `DELETE FROM security_rules WHERE id = $1`
-	_, err := r.db.Exec(ctx, query, ruleID)
-	return err
+	userID := appcontext.UserIDFromContext(ctx)
+	query := `
+		DELETE FROM security_rules sr
+		USING security_groups sg
+		WHERE sr.id = $1 
+		  AND sr.group_id = sg.id 
+		  AND sg.user_id = $2
+	`
+	result, err := r.db.Exec(ctx, query, ruleID, userID)
+	if err != nil {
+		return errors.Wrap(errors.Internal, "failed to delete security rule", err)
+	}
+	if result.RowsAffected() == 0 {
+		return errors.New(errors.NotFound, "security rule not found or access denied")
+	}
+	return nil
 }

 func (r *SecurityGroupRepository) Delete(ctx context.Context, id uuid.UUID) error {
 	userID := appcontext.UserIDFromContext(ctx)
 	query := `DELETE FROM security_groups WHERE id = $1 AND user_id = $2`
-	_, err := r.db.Exec(ctx, query, id, userID)
-	return err
+	result, err := r.db.Exec(ctx, query, id, userID)
+	if err != nil {
+		return errors.Wrap(errors.Internal, "failed to delete security group", err)
+	}
+	if result.RowsAffected() == 0 {
+		return errors.New(errors.NotFound, "security group not found")
+	}
+	return nil
 }

Comment on lines +224 to +246
func (r *SecurityGroupRepository) ListInstanceGroups(ctx context.Context, instanceID uuid.UUID) ([]*domain.SecurityGroup, error) {
query := `
SELECT sg.id, sg.user_id, sg.vpc_id, sg.name, sg.description, sg.arn, sg.created_at
FROM security_groups sg
JOIN instance_security_groups isg ON sg.id = isg.group_id
WHERE isg.instance_id = $1
`
rows, err := r.db.Query(ctx, query, instanceID)
if err != nil {
return nil, err
}
defer rows.Close()

var groups []*domain.SecurityGroup
for rows.Next() {
var sg domain.SecurityGroup
if err := rows.Scan(&sg.ID, &sg.UserID, &sg.VPCID, &sg.Name, &sg.Description, &sg.ARN, &sg.CreatedAt); err != nil {
return nil, err
}
groups = append(groups, &sg)
}
return groups, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ListInstanceGroups doesn't enforce user ownership.

This method returns all security groups attached to an instance without verifying the caller owns either the instance or the groups. A user could potentially list security groups from another user by providing an instance ID.

🔒 Proposed fix to add ownership check
 func (r *SecurityGroupRepository) ListInstanceGroups(ctx context.Context, instanceID uuid.UUID) ([]*domain.SecurityGroup, error) {
+	userID := appcontext.UserIDFromContext(ctx)
+
+	// First verify instance ownership
+	var instanceOwner uuid.UUID
+	ownerQuery := `SELECT user_id FROM instances WHERE id = $1`
+	err := r.db.QueryRow(ctx, ownerQuery, instanceID).Scan(&instanceOwner)
+	if err != nil {
+		if err == pgx.ErrNoRows {
+			return nil, errors.New(errors.NotFound, "instance not found")
+		}
+		return nil, errors.Wrap(errors.Internal, "failed to verify instance ownership", err)
+	}
+	if instanceOwner != userID {
+		return nil, errors.New(errors.Forbidden, "you do not own this instance")
+	}
+
 	query := `
 		SELECT sg.id, sg.user_id, sg.vpc_id, sg.name, sg.description, sg.arn, sg.created_at 
 		FROM security_groups sg
 		JOIN instance_security_groups isg ON sg.id = isg.group_id
 		WHERE isg.instance_id = $1
 	`

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @internal/repositories/postgres/security_group_repo.go around lines 224 -
246, ListInstanceGroups currently returns groups for any instance ID without
enforcing ownership; change the signature to ListInstanceGroups(ctx
context.Context, instanceID uuid.UUID, userID uuid.UUID) and modify the SQL to
enforce ownership by joining the instances table and/or checking group owner
(e.g. add "JOIN instances i ON isg.instance_id = i.id" and "WHERE
isg.instance_id = $1 AND i.user_id = $2 AND sg.user_id = $2" or at minimum "AND
sg.user_id = $2"), pass instanceID and userID into r.db.Query, and update all
callers to supply the caller's userID so only groups owned by that user (and
attached to an instance owned by that user) are returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants