Conversation
…nd security groups
…bridge parameters
…ross API, SDK, and CLI
📝 WalkthroughWalkthroughAdds Open vSwitch (OVS) networking and subnet/security‑group support across the stack: new NetworkBackend interface with OVS and noop adapters, DB schema/migrations, repos, services, handlers, CLI/SDK changes, instance veth/OVS plumbing, docker/runtime privileges, tests, and documentation updates. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI / SDK
participant API as API Server
participant VpcSvc as VpcService
participant Net as NetworkBackend (OVS)
participant DB as Postgres
CLI->>API: POST /vpcs { name, cidr_block }
API->>VpcSvc: CreateVPC(ctx, name, cidr)
VpcSvc->>Net: CreateBridge(ctx, bridgeName, vxlanID)
VpcSvc->>DB: INSERT vpc (network_id=bridgeName, cidr_block, vxlan_id, status, arn)
alt DB success
VpcSvc-->>API: return VPC
else DB failure
VpcSvc->>Net: DeleteBridge(ctx, bridgeName)
VpcSvc-->>API: return error
end
sequenceDiagram
participant CLI as CLI / SDK
participant API as API Server
participant InstSvc as InstanceService
participant SubnetRepo as SubnetRepository
participant Net as NetworkBackend (OVS)
participant DB as Postgres
participant Runtime as Compute Adapter
CLI->>API: POST /instances { name, image, vpc_id, subnet_id, volumes }
API->>InstSvc: LaunchInstance(ctx, name, image, ports, vpcID, subnetID, volumes)
InstSvc->>SubnetRepo: GetByID(ctx, subnetID)
SubnetRepo->>DB: SELECT subnet row
DB-->>SubnetRepo: subnet (cidr, gateway)
InstSvc->>InstSvc: allocateIP(subnet)
InstSvc->>Runtime: CreateInstance(...) -- launch container
Runtime-->>InstSvc: container id
InstSvc->>Net: CreateVethPair(ctx, hostEnd, containerEnd)
InstSvc->>Net: AttachVethToBridge(ctx, bridgeName, hostEnd)
InstSvc->>Net: SetVethIP(ctx, hostEnd, ip, cidr)
InstSvc->>DB: INSERT instance with subnet_id, private_ip, ovs_port
InstSvc-->>API: return Instance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Pull request overview
This PR integrates Open vSwitch (OVS) as the networking backend for The Cloud, replacing Docker networks with software-defined networking capabilities. The changes introduce subnet management, security groups with OpenFlow rules, and advanced IPAM functionality.
Key Changes:
- OVS adapter implementation with bridge, VXLAN, and flow rule management
- VPC service refactored from Docker networks to OVS bridges with CIDR and VXLAN support
- New Subnet and Security Group services with full CRUD operations
- Instance networking enhanced with veth pair creation and private IP allocation
Reviewed changes
Copilot reviewed 106 out of 107 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
internal/repositories/ovs/adapter.go |
New OVS adapter implementing NetworkBackend interface for bridge/port/flow management |
internal/core/services/vpc.go |
Refactored to use OVS bridges instead of Docker networks |
internal/core/services/subnet.go |
New service for subnet CRUD with CIDR validation and gateway IP calculation |
internal/core/services/security_group.go |
New service translating security rules to OVS flow rules |
internal/core/services/instance.go |
Enhanced with veth pair setup, IP allocation, and OVS bridge attachment |
internal/repositories/postgres/migrations/029_update_vpcs_for_ovs.up.sql |
Migration adding CIDR, VXLAN, status, and ARN columns to VPCs |
internal/repositories/postgres/migrations/030-032_*.sql |
New tables for security groups, subnets, and instance network fields |
cmd/api/main.go |
Added OVS adapter initialization and new handler registrations |
pkg/sdk/*.go |
Updated SDK with new CIDR parameter and subnet/security group operations |
cmd/cloud/*.go |
New CLI commands for subnet and security group management |
docker-compose.yml |
Added NET_ADMIN capability and OVS socket mount |
scripts/setup-ovs.sh |
New setup script for OVS installation and configuration |
scripts/setup_path.sh |
Removed emojis from output messages |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| var ( | ||
| bridgeNameRegex = regexp.MustCompile(`^[a-zA-Z0-9-]+$`) |
There was a problem hiding this comment.
The regex only allows alphanumeric characters and hyphens, but OVS bridge names can also contain underscores. Additionally, there's no check for maximum length (OVS typically limits interface names to 15 characters). This could cause validation to reject valid bridge names or allow names that will fail when passed to OVS.
Update the regex to match OVS's actual bridge name requirements and add length validation.
| func (s *InstanceService) allocateIP(ctx context.Context, subnet *domain.Subnet) (string, error) { | ||
| _, ipNet, err := net.ParseCIDR(subnet.CIDRBlock) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| instances, err := s.repo.ListBySubnet(ctx, subnet.ID) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| usedIPs := make(map[string]bool) | ||
| for _, inst := range instances { | ||
| if inst.PrivateIP != "" { | ||
| usedIPs[inst.PrivateIP] = true | ||
| } | ||
| } | ||
| usedIPs[subnet.GatewayIP] = true | ||
|
|
||
| // Find first available IP | ||
| ip := make(net.IP, len(ipNet.IP)) | ||
| copy(ip, ipNet.IP) | ||
|
|
||
| for { | ||
| // Increment IP | ||
| for i := len(ip) - 1; i >= 0; i-- { | ||
| ip[i]++ | ||
| if ip[i] > 0 { | ||
| break | ||
| } | ||
| } | ||
|
|
||
| if !ipNet.Contains(ip) { | ||
| break | ||
| } | ||
|
|
||
| if !usedIPs[ip.String()] && s.isValidHostIP(ip, ipNet) { | ||
| return ip.String(), nil | ||
| } | ||
| } | ||
|
|
||
| return "", fmt.Errorf("no available IPs in subnet") | ||
| } |
There was a problem hiding this comment.
The IP allocation algorithm has O(n * m) complexity where n is the number of available IPs in the subnet and m is the number of existing instances. For large subnets (e.g., /16), this could iterate through 65,534 IPs checking against the used IP map each time. This will cause significant performance degradation as subnets fill up.
Consider:
- Using a bitmap or more efficient data structure for IP tracking
- Implementing a persistent IP allocation table in the database
- Adding early termination logic or limiting subnet sizes
| @@ -0,0 +1,9 @@ | |||
| -- +goose Up | |||
|
|
|||
| ALTER TABLE vpcs ADD COLUMN IF NOT EXISTS cidr_block VARCHAR(18); | |||
There was a problem hiding this comment.
The cidr_block field allows VARCHAR(18) but CIDR blocks can be longer. IPv6 CIDR blocks can be up to 43 characters (e.g., "2001:0db8:85a3:0000:0000:8a2e:0370:7334/128"). Even IPv4 with longer netmasks could exceed 18 characters.
Increase the column size to at least 43 characters to support IPv6, or 20 for IPv4-only if IPv6 is not planned.
| -- Set defaults for existing rows | ||
| UPDATE vpcs SET cidr_block = '10.0.0.0/16', vxlan_id = 100, status = 'active' WHERE cidr_block IS NULL; |
There was a problem hiding this comment.
The migration sets default values for existing rows, but all three columns (cidr_block, vxlan_id, status) are set to the same values for all existing VPCs. This means all existing VPCs will have:
- The same CIDR block (10.0.0.0/16) causing IP conflicts
- The same VXLAN ID (100) breaking network isolation
- All marked as 'active' even if they weren't
This will break existing VPCs. Consider either:
- Generating unique values for each existing VPC
- Requiring manual migration of existing VPCs
- Documenting that this is a breaking change requiring VPC recreation
| func (h *SubnetHandler) Get(c *gin.Context) { | ||
| idStr := c.Param("id") | ||
| // For simple Get, we use id only for now. Or we could pass vpcID if we had it in path. | ||
| // The service GetSubnet requires vpcID if searching by name, but here we use ID. | ||
| subnet, err := h.svc.GetSubnet(c.Request.Context(), idStr, uuid.Nil) | ||
| if err != nil { | ||
| httputil.Error(c, err) | ||
| return | ||
| } | ||
|
|
||
| httputil.Success(c, http.StatusOK, subnet) | ||
| } |
There was a problem hiding this comment.
The GetSubnet handler passes uuid.Nil as the vpcID when calling the service. This means subnet lookups by name will fail since the service requires a valid vpcID to search by name. The comment acknowledges this ("Or we could pass vpcID if we had it in path") but doesn't address it.
Either:
- Require vpcID in the path or query params for name-based lookups
- Modify the service to handle lookup by ID without requiring vpcID
- Document that this endpoint only supports ID-based lookups
| var req struct { | ||
| Name string `json:"name" binding:"required"` | ||
| Name string `json:"name" binding:"required"` | ||
| CIDRBlock string `json:"cidr_block"` | ||
| } |
There was a problem hiding this comment.
The API requires a valid CIDR block to create a subnet, but the VPC creation command in the CLI has a default CIDR (10.0.0.0/16). However, the CreateVPC handler makes cidr_block optional (no binding:"required" tag). This inconsistency means the API will accept VPC creation without a CIDR, which could cause issues when creating subnets or allocating IPs.
Make the cidr_block field required in the handler to match the CLI's expectation and the service's CIDR validation logic.
| func (s *SecurityGroupService) RemoveRule(ctx context.Context, ruleID uuid.UUID) error { | ||
| // 1. Get rule to know groupID | ||
| // Implementation omitted: would need a GetRuleByID in repo | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The RemoveRule method is defined but has no implementation (returns nil immediately). The function is part of the public SecurityGroupService interface but is non-functional. If called, it will silently succeed without actually removing any rules.
Either implement the function or remove it from the interface if it's not yet ready.
| cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up") | ||
| _ = cmd.Run() |
There was a problem hiding this comment.
The error from running the "ip link set" command is silently ignored. If bringing up the container end of the veth pair fails, the network interface will remain down and the instance won't have network connectivity, but no error will be logged or reported.
Add proper error handling and logging for this operation.
| return ports.FlowRule{ | ||
| Priority: rule.Priority, | ||
| Match: strings.Join(matchParts, ","), | ||
| Actions: "allow", // Simplified | ||
| } |
There was a problem hiding this comment.
The security group flow translation uses a hardcoded "allow" action which doesn't align with typical firewall behavior. Security groups typically work with default-deny policies where only explicitly allowed traffic passes. This implementation would need to be coordinated with a default drop rule on the OVS bridge, but there's no indication that such a rule is being set up.
Consider:
- Documenting the expected default bridge behavior
- Implementing proper allow/drop actions based on the rule type
- Setting up default bridge rules when VPCs are created
| func (a *OvsAdapter) CreateBridge(ctx context.Context, name string, vxlanID int) error { | ||
| if !bridgeNameRegex.MatchString(name) { | ||
| return errors.New(errors.InvalidInput, "invalid bridge name") | ||
| } | ||
|
|
||
| cmd := exec.CommandContext(ctx, a.ovsPath, "add-br", name) | ||
| if err := cmd.Run(); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to add bridge", err) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
The vxlanID parameter is accepted but never used in the CreateBridge function. This means VXLANs are not being configured when bridges are created, breaking the multi-tenant isolation that VXLAN IDs are meant to provide.
Either:
- Implement VXLAN configuration in this function using the vxlanID parameter
- Remove the parameter if VXLAN setup happens elsewhere
- Document why this parameter is unused
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
internal/repositories/postgres/migrations/008_create_volumes_table.up.sql (2)
6-6: Add CHECK constraint to ensure size_gb is positive.The
size_gbcolumn lacks a CHECK constraint, allowing negative or zero values to be inserted, which would be invalid for volume sizes.✅ Proposed fix to add CHECK constraint
- size_gb INTEGER NOT NULL DEFAULT 1, + size_gb INTEGER NOT NULL DEFAULT 1 CHECK (size_gb > 0),
7-7: Add CHECK constraint to enforce valid status values.The
statuscolumn accepts any VARCHAR up to 50 characters. Without a CHECK constraint, invalid status values could be inserted, compromising data integrity and making status-based queries unreliable.✅ Proposed fix to add CHECK constraint
- status VARCHAR(50) NOT NULL DEFAULT 'AVAILABLE', + status VARCHAR(50) NOT NULL DEFAULT 'AVAILABLE' CHECK (status IN ('AVAILABLE', 'ATTACHED', 'CREATING', 'DELETING', 'ERROR')),Note: Adjust the allowed status values based on your application's volume lifecycle states.
internal/repositories/postgres/migrations/016_create_secrets.up.sql (1)
9-10: Add a trigger to automatically update theupdated_attimestamp.The
updated_atfield will only default to NOW() on INSERT. To automatically update it on modifications, add a trigger function:⏰ Add trigger to auto-update `updated_at`
Add this to the migration after the table creation:
-- Create trigger function for updating updated_at timestamp CREATE OR REPLACE FUNCTION update_updated_at_column() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$ LANGUAGE plpgsql; -- Create trigger for secrets table CREATE TRIGGER update_secrets_updated_at BEFORE UPDATE ON secrets FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();internal/repositories/postgres/migrations/024_create_audit_logs.up.sql (1)
3-13: Reconsider the foreign key constraint onuser_idfor audit log integrity.The
user_idcolumn has a foreign key constraint tousers(id)withNOT NULLbut noON DELETEclause specified. This creates a potential issue:
- If a user is deleted, PostgreSQL will default to
ON DELETE NO ACTION, which will block user deletion if audit logs exist for that user.- Audit logs should typically be preserved even after user deletion for compliance and historical tracking purposes.
Since
user_idisNOT NULL, you cannot useON DELETE SET NULL. Consider one of these approaches:
- Remove the NOT NULL constraint and use
ON DELETE SET NULLto preserve logs while marking the user as deleted- Use
ON DELETE NO ACTIONexplicitly and handle user deletion at the application level (soft deletes)- Remove the foreign key constraint entirely for audit logs to ensure they remain immutable
Proposed fix: Remove NOT NULL and add ON DELETE SET NULL
CREATE TABLE audit_logs ( id UUID PRIMARY KEY, - user_id UUID NOT NULL REFERENCES users(id), + user_id UUID REFERENCES users(id) ON DELETE SET NULL, action VARCHAR(255) NOT NULL, resource_type VARCHAR(255), resource_id VARCHAR(255), details JSONB, ip_address VARCHAR(45), user_agent TEXT, created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP );internal/repositories/postgres/migrations/018_create_caches.up.sql (1)
13-13: Encrypt cache credentials at rest.The
passwordcolumn stores Redis instance credentials in plaintext. While the codebase has encryption utilities (AES-256-GCM inpkg/crypto), they are not applied here. Apply the same encryption pattern used for secrets (pkg/crypto.Encrypt/Decrypt) to protect these credentials at rest in the database, and decrypt them when needed for connecting to Redis or passing to the API.internal/repositories/postgres/migrations/015_create_databases.up.sql (2)
14-14: CRITICAL: Plaintext password storage is a security vulnerability.Storing database passwords in plaintext violates fundamental security principles and creates compliance risks. Passwords must be encrypted or, preferably, managed through a secrets management system (HashiCorp Vault, AWS Secrets Manager, etc.).
If passwords must be stored, use application-layer encryption with a dedicated key management solution.
🔒 Recommended approaches
Option 1 (Preferred): Use a secrets management system
- Store credentials in HashiCorp Vault, AWS Secrets Manager, or similar
- Store only a reference/ID to the secret in this table
Option 2: Application-layer encryption
- Encrypt passwords before storing using a secure encryption library
- Manage encryption keys separately (never in the same database)
- Consider envelope encryption for added security
Option 3: Eliminate storage
- Generate passwords on-demand and return only once during creation
- Use certificate-based authentication where possible
9-9: Add validation constraints for status and port.The
statusandportfields lack validation:
portshould be constrained to valid port range (0-65535)statusshould be validated against expected values♻️ Proposed constraints
status VARCHAR(50) DEFAULT 'CREATING', + CONSTRAINT chk_status CHECK (status IN ('CREATING', 'RUNNING', 'STOPPED', 'FAILED', 'DELETING')), vpc_id UUID REFERENCES vpcs(id) ON DELETE SET NULL, container_id VARCHAR(255), port INT, + CONSTRAINT chk_port CHECK (port >= 0 AND port <= 65535), username VARCHAR(255),Adjust the status values to match your application's state machine.
Also applies to: 12-12
internal/repositories/postgres/migrations/007_create_events_table.up.sql (1)
3-10: Add index onresource_idand consider schema refinements.The events table lacks an index on
resource_id, which is critical for query performance. Events are typically filtered by resource (e.g., "fetch all events for instance X"), and without an index, these queries will scan the entire table as it grows.Additionally:
- VARCHAR(50) sizing: This is inconsistent with
audit_logs, which uses VARCHAR(255) forresource_id. Since the domain model indicates resource IDs are UUIDs (36 characters), 50 characters leaves minimal headroom and may cause issues if IDs exceed this limit in the future.- Missing DEFAULT for
id: Recent migrations in this codebase consistently useDEFAULT gen_random_uuid()for UUID primary keys. Consider adding this to reduce application complexity and align with the established pattern.Suggested changes
CREATE TABLE IF NOT EXISTS events ( - id UUID PRIMARY KEY, + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), action VARCHAR(50) NOT NULL, - resource_id VARCHAR(50) NOT NULL, + resource_id VARCHAR(255) NOT NULL, resource_type VARCHAR(50) NOT NULL, metadata JSONB NOT NULL DEFAULT '{}'::jsonb, created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP ); +CREATE INDEX idx_events_resource_id ON events(resource_id); CREATE INDEX idx_events_created_at ON events(created_at DESC);
🤖 Fix all issues with AI agents
In @cmd/api/main.go:
- Around line 136-139: When ovs.NewOvsAdapter returns an error you currently
leave networkBackend nil which can cause panics; update the initialization so
that on error you either (a) assign a valid fallback adapter instance (e.g.,
create or call your mock/noop adapter constructor) to networkBackend or (b) fail
fast (log error and exit) so callers never see a nil; also update the
/health/ovs handler to defensively check networkBackend for nil and return an
appropriate degraded/unavailable health response instead of dereferencing it.
Refer to networkBackend, ovs.NewOvsAdapter, and the /health/ovs health handler
when making these changes.
In @internal/repositories/postgres/migrations/003_add_container_id.up.sql:
- Around line 1-2: The .down.sql counterparts are missing the required Goose
directive and two migrations (031, 032) lack .down.sql files entirely; create a
.down.sql file for each .up.sql (including migrations 031 and 032) named to
match their .up.sql counterparts (e.g., create 003_add_container_id.down.sql)
and ensure the very first non-comment line is the directive "-- +goose Down",
then add the appropriate rollback SQL that reverses the changes made in the
corresponding .up.sql (e.g., DROP COLUMN, DROP TABLE, or REVERT INDEX changes)
so each migration has a proper Up/Down pair for Goose to execute and roll back.
In @internal/repositories/postgres/security_group_repo.go:
- Around line 124-134: AddInstanceToGroup and RemoveInstanceFromGroup perform
direct INSERT/DELETE into instance_security_groups without verifying the caller
owns the instance or group; update both methods to first validate ownership by
querying the instance and security_group records (or a join) using the provided
instanceID and groupID and the caller identity from ctx (e.g., user/tenant ID),
return a permission error if the ownership check fails, and perform the
insert/delete inside a transaction to avoid races; reference the
AddInstanceToGroup and RemoveInstanceFromGroup methods and the
instance_security_groups table when locating the code to change.
🟠 Major comments (20)
internal/repositories/postgres/migrations/019_create_queues.up.sql-1-2 (1)
1-2: Add-- +goose Downdirective to the down migration file.The up migration file is correctly formatted with the
-- +goose Updirective. However, the corresponding down migration file (019_create_queues.down.sql) is missing the-- +goose Downdirective at the beginning. Add it to match the Goose migration standard:Required change for down migration
-- +goose Down DROP TABLE IF EXISTS queue_messages; DROP TABLE IF EXISTS queues;internal/repositories/postgres/migrations/026_create_snapshots.up.sql-5-5 (1)
5-5: Add ON DELETE clause to user_id foreign key.The
user_idforeign key lacks an ON DELETE clause, which defaults to RESTRICT behavior. This prevents user deletion when snapshots exist. Consider adding eitherON DELETE CASCADE(if snapshots should be deleted with the user) orON DELETE SET NULLwith a nullable column (if snapshots should be preserved as orphaned records).🔧 Proposed fix with CASCADE behavior
- user_id UUID NOT NULL REFERENCES users(id), + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,internal/repositories/postgres/migrations/018_create_caches.up.sql-5-5 (1)
5-5: Add ON DELETE clauses to foreign key constraints.The foreign keys to
users(id)andvpcs(id)lackON DELETEbehavior specifications. When a referenced user or VPC is deleted, this can lead to constraint violations or orphaned cache records.Consider adding appropriate cascading behavior:
- For
user_id: LikelyON DELETE CASCADEsince caches should be removed when the owning user is deleted.- For
vpc_id: ConsiderON DELETE SET NULLif caches can exist without a VPC, orON DELETE CASCADEif VPC deletion should remove associated caches.🔧 Proposed fix
- user_id UUID NOT NULL REFERENCES users(id), + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,- vpc_id UUID REFERENCES vpcs(id), + vpc_id UUID REFERENCES vpcs(id) ON DELETE SET NULL,Also applies to: 10-10
scripts/setup-ovs.sh-16-16 (1)
16-16: Ensure setfacl package is installed and handle potential failures.The
setfaclcommand requires theaclpackage, which may not be installed by default on all systems. Additionally, ACL permissions on the socket may not persist across system reboots or OVS service restarts.🔧 Recommended fixes
Option 1: Install acl package and add fallback
echo "--- Configuring Permissions ---" +sudo apt-get install -y acl # Allow current user to manage OVS without sudo USER_NAME=$(whoami) -# Note: On some systems, adding to the 'ovs' or 'openvswitch' group works if it exists. -# Otherwise, we use ACLs or set permissions on the socket. -sudo setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock +# Try adding to openvswitch group first, fall back to ACL +if getent group openvswitch >/dev/null; then + sudo usermod -aG openvswitch $USER_NAME + echo "Added $USER_NAME to openvswitch group" +else + sudo setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock + echo "Set ACL for $USER_NAME on OVS socket" +fi +echo "Note: You may need to log out and back in for group changes to take effect"Option 2: Create a systemd drop-in for persistent socket permissions
Add after line 20:
# Create systemd override for persistent socket permissions sudo mkdir -p /etc/systemd/system/openvswitch-switch.service.d cat <<EOF | sudo tee /etc/systemd/system/openvswitch-switch.service.d/override.conf [Service] ExecStartPost=/bin/bash -c 'sleep 1 && setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock' EOF sudo systemctl daemon-reloadCommittable suggestion skipped: line range outside the PR's diff.
internal/repositories/postgres/migrations/029_update_vpcs_for_ovs.up.sql-9-9 (1)
9-9: Risk: All existing VPCs assigned the same VXLAN ID.The migration sets
vxlan_id = 100for all existing VPCs withcidr_block IS NULL. This conflicts with how new VPCs are handled—they receive unique VXLAN IDs derived from their UUID (int(vpcID[0]) + 100). Since VXLAN IDs identify virtual networks in OVS, having multiple VPCs with the same ID may prevent network isolation. Additionally, new VPCs could generate a VXLAN ID of 100, causing collisions with existing VPCs.Assign unique VXLAN IDs to existing VPCs using a sequence or row-based calculation to align with the pattern used for new VPCs.
Alternative: Generate unique VXLAN IDs
UPDATE vpcs SET cidr_block = '10.0.0.0/16', vxlan_id = 100 + (row_number() OVER (ORDER BY id))::INT, status = 'active' WHERE cidr_block IS NULL;internal/repositories/postgres/migrations/029_update_vpcs_for_ovs.up.sql-3-3 (1)
3-3: Use INET type instead of VARCHAR(18) for consistency with the rest of the schema.The codebase already uses the
INETtype for IP address storage elsewhere (e.g.,instances.private_ipin migration 032,subnets.gateway_ipin migration 031). Thecidr_blockcolumn should also useINETto maintain consistency and benefit from PostgreSQL's built-in validation of CIDR notation. TheINETtype also naturally supports both IPv4 and IPv6 CIDR blocks if needed in the future.internal/repositories/postgres/migrations/032_add_instance_network_fields.up.sql-2-2 (1)
2-2: Add ON DELETE behavior to the subnet_id foreign key constraint.The foreign key on
subnet_idlacks anON DELETEaction. If a subnet is deleted, instances will retain dangling references. AddON DELETE RESTRICT(prevent subnet deletion if instances depend on it),ON DELETE SET NULL(allow deletion and clear references), orON DELETE CASCADE(delete instances with their subnet) based on your application's requirements.Example fix with ON DELETE RESTRICT
-ALTER TABLE instances ADD COLUMN subnet_id UUID REFERENCES subnets(id); +ALTER TABLE instances ADD COLUMN subnet_id UUID REFERENCES subnets(id) ON DELETE RESTRICT;internal/handlers/vpc_handler.go-32-35 (1)
32-35: Add CIDR block format validation to VPC creation.The
CIDRBlockfield is optional in the handler and the service layer lacks validation. Invalid CIDR blocks are only caught later when creating subnets (vianet.ParseCIDRin the subnet service), resulting in poor user experience. Validate the CIDR block format at VPC creation time in the service layer usingnet.ParseCIDRbefore persisting the VPC.internal/handlers/instance_handler.go-136-142 (1)
136-142: Inconsistent error handling for SubnetID vs VpcID.Invalid
subnet_idis silently ignored (line 139 checkserr == nil), while invalidvpc_idreturns a 400 error (lines 120-123). This inconsistency could confuse API consumers.🐛 Suggested fix for consistent error handling
var subnetUUID *uuid.UUID if req.SubnetID != "" { id, err := uuid.Parse(req.SubnetID) - if err == nil { - subnetUUID = &id + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": "invalid subnet_id format"}) + return } + subnetUUID = &id }internal/repositories/postgres/migrations/031_create_subnets.up.sql-4-5 (1)
4-5: Add ON DELETE CASCADE to user_id foreign key for consistency.The
user_idFK is missing an ON DELETE clause whilevpc_idhasON DELETE CASCADE. This creates inconsistent behavior compared to other resource tables in the codebase: security_groups (030), containers (023), gateway (022), cron (021), notify (020), queues (019), functions (017), and secrets (016) all specifyON DELETE CASCADEon their user_id FKs.Add
ON DELETE CASCADEto line 4:user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,This ensures subnets are deleted when their owner is deleted, consistent with the established pattern for resource ownership.
internal/core/services/subnet.go-52-54 (1)
52-54: Incomplete CIDR containment validation.The check only verifies that the subnet's network address is within the VPC CIDR, but doesn't ensure the entire subnet range fits. For example, if VPC is
10.0.0.0/24and subnet is10.0.0.200/24, the network address10.0.0.200is within the VPC, but the subnet extends to10.0.0.255which is fine, yet a10.0.0.200/23would extend beyond.Additionally, there's no validation for overlapping subnets within the same VPC, which could lead to IP conflicts.
🔧 Suggested fix for complete CIDR validation
if !vpcNet.Contains(ip) { return nil, errors.New(errors.InvalidInput, "subnet CIDR must be within VPC CIDR range") } + + // Verify the entire subnet range fits within VPC + lastIP := make(net.IP, len(ip)) + copy(lastIP, ip) + for i := range subnetNet.Mask { + lastIP[i] |= ^subnetNet.Mask[i] + } + if !vpcNet.Contains(lastIP) { + return nil, errors.New(errors.InvalidInput, "subnet CIDR extends beyond VPC CIDR range") + } + + // TODO: Check for overlapping subnets within the VPCinternal/handlers/security_group_handler.go-106-110 (1)
106-110: Avoid binding directly to domain type for security rules.Binding directly to
domain.SecurityRuleallows clients to potentially set fields likeID,GroupID, andCreatedAtthat should be server-generated. Use a dedicated request struct to whitelist only the expected input fields.🔧 Suggested fix
+ var req struct { + Direction string `json:"direction" binding:"required"` + Protocol string `json:"protocol" binding:"required"` + PortMin int `json:"port_min"` + PortMax int `json:"port_max"` + CIDR string `json:"cidr" binding:"required"` + Priority int `json:"priority"` + } - var req domain.SecurityRule if err := c.ShouldBindJSON(&req); err != nil { c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) return } - rule, err := h.svc.AddRule(c.Request.Context(), groupID, req) + rule, err := h.svc.AddRule(c.Request.Context(), groupID, domain.SecurityRule{ + Direction: domain.RuleDirection(req.Direction), + Protocol: req.Protocol, + PortMin: req.PortMin, + PortMax: req.PortMax, + CIDR: req.CIDR, + Priority: req.Priority, + })Committable suggestion skipped: line range outside the PR's diff.
internal/core/services/security_group.go-122-126 (1)
122-126: RemoveRule is unimplemented but exposed in interface.This method returns
nilwithout doing anything. If called via the API, rules won't be removed. Consider implementing it or returning an error.🐛 Proposed fix: Return not-implemented error
func (s *SecurityGroupService) RemoveRule(ctx context.Context, ruleID uuid.UUID) error { - // 1. Get rule to know groupID - // Implementation omitted: would need a GetRuleByID in repo - return nil + return errors.New(errors.Internal, "RemoveRule not implemented") }cmd/api/main.go-332-342 (1)
332-342: Security group routes missing RBAC permission checks.Unlike other protected routes (VPC, instances, volumes), security group endpoints don't use
httputil.Permission()middleware. This may allow unauthorized access.🐛 Proposed fix: Add permission middleware
// Security Group Routes (Protected) sgGroup := r.Group("/security-groups") sgGroup.Use(httputil.Auth(identitySvc)) { - sgGroup.POST("", sgHandler.Create) - sgGroup.GET("", sgHandler.List) - sgGroup.GET("/:id", sgHandler.Get) - sgGroup.DELETE("/:id", sgHandler.Delete) - sgGroup.POST("/:id/rules", sgHandler.AddRule) - sgGroup.POST("/attach", sgHandler.Attach) + sgGroup.POST("", httputil.Permission(rbacSvc, domain.PermissionVpcCreate), sgHandler.Create) + sgGroup.GET("", httputil.Permission(rbacSvc, domain.PermissionVpcRead), sgHandler.List) + sgGroup.GET("/:id", httputil.Permission(rbacSvc, domain.PermissionVpcRead), sgHandler.Get) + sgGroup.DELETE("/:id", httputil.Permission(rbacSvc, domain.PermissionVpcDelete), sgHandler.Delete) + sgGroup.POST("/:id/rules", httputil.Permission(rbacSvc, domain.PermissionVpcUpdate), sgHandler.AddRule) + sgGroup.POST("/attach", httputil.Permission(rbacSvc, domain.PermissionVpcUpdate), sgHandler.Attach) }Note: You may want to define dedicated
PermissionSecurityGroup*constants instead of reusing VPC permissions.Committable suggestion skipped: line range outside the PR's diff.
internal/core/services/instance.go-140-167 (1)
140-167: Silent error handling masks OVS networking failures.Multiple errors are silently ignored with
_ = .... If veth creation or bridge attachment fails, the instance launches without proper networking, but no error is returned or logged (except for the outer CreateVethPair which has anif err == nilguard).🐛 Proposed fix: Add error handling and logging
// 4a. OVS Post-launch plumb if inst.OvsPort != "" && s.network != nil { vethContainer := fmt.Sprintf("eth0-%s", inst.ID.String()[:8]) - if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err == nil { - vpc, _ := s.vpcRepo.GetByID(ctx, *inst.VpcID) - _ = s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort) + if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err != nil { + s.logger.Error("failed to create veth pair", "instance_id", inst.ID, "error", err) + // Continue - instance is launched but networking may be degraded + } else { + vpc, err := s.vpcRepo.GetByID(ctx, *inst.VpcID) + if err != nil { + s.logger.Error("failed to get VPC for bridge attachment", "vpc_id", inst.VpcID, "error", err) + } else if err := s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort); err != nil { + s.logger.Error("failed to attach veth to bridge", "bridge", vpc.NetworkID, "error", err) + } // Set IP on the host simulation "container" end if inst.SubnetID != nil { - subnet, _ := s.subnetRepo.GetByID(ctx, *inst.SubnetID) + subnet, err := s.subnetRepo.GetByID(ctx, *inst.SubnetID) + if err != nil { + s.logger.Error("failed to get subnet for IP assignment", "subnet_id", inst.SubnetID, "error", err) + } else if subnet != nil { - if subnet != nil { _, ipNet, _ := net.ParseCIDR(subnet.CIDRBlock) ones, _ := ipNet.Mask.Size() - _ = s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones)) + if err := s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones)); err != nil { + s.logger.Error("failed to set veth IP", "veth", vethContainer, "ip", inst.PrivateIP, "error", err) + } cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up") - _ = cmd.Run() + if err := cmd.Run(); err != nil { + s.logger.Error("failed to bring up veth interface", "veth", vethContainer, "error", err) + } } } - } + } }Committable suggestion skipped: line range outside the PR's diff.
internal/repositories/ovs/adapter.go-177-217 (1)
177-217: Validate inputs to prevent command injection.The veth-related methods (
CreateVethPair,AttachVethToBridge,DeleteVethPair,SetVethIP) use user-provided interface names, IP addresses, and CIDR values directly in command construction without validation. This poses a command injection risk if inputs contain shell metacharacters.🔒 Proposed validation
Add validation functions and use them in each method:
import ( "net" "regexp" "strconv" "strings" ) var ( interfaceNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`) ) // Helper to validate interface names func validateInterfaceName(name string) error { if !interfaceNameRegex.MatchString(name) || len(name) > 15 { return errors.New(errors.InvalidInput, "invalid interface name") } return nil } // Helper to validate CIDR func validateCIDR(cidr string) error { bits, err := strconv.Atoi(cidr) if err != nil || bits < 0 || bits > 32 { return errors.New(errors.InvalidInput, "invalid CIDR prefix length") } return nil }Then update each method:
func (a *OvsAdapter) CreateVethPair(ctx context.Context, hostEnd, containerEnd string) error { + if err := validateInterfaceName(hostEnd); err != nil { + return err + } + if err := validateInterfaceName(containerEnd); err != nil { + return err + } cmd := exec.CommandContext(ctx, "ip", "link", "add", hostEnd, "type", "veth", "peer", "name", containerEnd) if err := cmd.Run(); err != nil { return errors.Wrap(errors.Internal, "failed to create veth pair", err) } return nil } func (a *OvsAdapter) DeleteVethPair(ctx context.Context, hostEnd string) error { + if err := validateInterfaceName(hostEnd); err != nil { + return err + } cmd := exec.CommandContext(ctx, "ip", "link", "del", hostEnd) if err := cmd.Run(); err != nil { return errors.Wrap(errors.Internal, "failed to delete veth pair", err) } return nil } func (a *OvsAdapter) SetVethIP(ctx context.Context, vethEnd, ip, cidr string) error { + if err := validateInterfaceName(vethEnd); err != nil { + return err + } + if net.ParseIP(ip) == nil { + return errors.New(errors.InvalidInput, "invalid IP address") + } + if err := validateCIDR(cidr); err != nil { + return err + } cmd := exec.CommandContext(ctx, "ip", "addr", "add", fmt.Sprintf("%s/%s", ip, cidr), "dev", vethEnd) if err := cmd.Run(); err != nil { return errors.Wrap(errors.Internal, "failed to set veth ip", err) } cmdUp := exec.CommandContext(ctx, "ip", "link", "set", vethEnd, "up") if err := cmdUp.Run(); err != nil { return errors.Wrap(errors.Internal, "failed to bring veth up", err) } return nil }Committable suggestion skipped: line range outside the PR's diff.
internal/repositories/postgres/security_group_repo.go-136-158 (1)
136-158: Information disclosure: missing authorization and inconsistent data.The
ListInstanceGroupsmethod has two issues:
- Security: It doesn't verify user ownership of the instance, allowing any user to query security groups associated with instances they don't own.
- Consistency: It doesn't load rules for the security groups, unlike
GetByIDandGetByName.🔒 Proposed fixes
func (r *SecurityGroupRepository) ListInstanceGroups(ctx context.Context, instanceID uuid.UUID) ([]*domain.SecurityGroup, error) { + userID := appcontext.UserIDFromContext(ctx) query := ` SELECT sg.id, sg.user_id, sg.vpc_id, sg.name, sg.description, sg.arn, sg.created_at FROM security_groups sg JOIN instance_security_groups isg ON sg.id = isg.group_id - WHERE isg.instance_id = $1 + WHERE isg.instance_id = $1 + AND EXISTS (SELECT 1 FROM instances WHERE id = $1 AND user_id = $2) ` - rows, err := r.db.Query(ctx, query, instanceID) + rows, err := r.db.Query(ctx, query, instanceID, userID) if err != nil { - return nil, err + return nil, errors.Wrap(errors.Internal, "failed to list instance groups", err) } defer rows.Close() var groups []*domain.SecurityGroup for rows.Next() { var sg domain.SecurityGroup if err := rows.Scan(&sg.ID, &sg.UserID, &sg.VPCID, &sg.Name, &sg.Description, &sg.ARN, &sg.CreatedAt); err != nil { - return nil, err + return nil, errors.Wrap(errors.Internal, "failed to scan security group", err) } + rules, err := r.getRulesForGroup(ctx, sg.ID) + if err != nil { + return nil, err + } + sg.Rules = rules groups = append(groups, &sg) } return groups, nil }Committable suggestion skipped: line range outside the PR's diff.
internal/repositories/ovs/adapter.go-52-63 (1)
52-63: Incomplete implementation:vxlanIDparameter is accepted but ignored.The
CreateBridgemethod accepts avxlanIDparameter as part of theNetworkBackendinterface contract (defined ininternal/core/ports/network.go:18) and callers pass this value (e.g., ininternal/core/services/vpc.go:45), but the implementation ignores it. The bridge is created without any VXLAN configuration, making the parameter useless to callers.Either implement VXLAN configuration using the
vxlanIDparameter, or clarify with the team if this feature should be removed from the interface contract.internal/repositories/ovs/adapter.go-144-170 (1)
144-170: Validate OpenFlow rule parameters.The
AddFlowRuleandDeleteFlowRulemethods use user-providedMatchandActionsstrings directly in command construction. While OpenFlow syntax is constrained, malicious input could still cause issues or unexpected behavior.Consider adding basic validation for flow rule parameters:
♻️ Proposed validation
+var ( + flowFieldRegex = regexp.MustCompile(`^[a-zA-Z0-9_=:,./\s-]+$`) +) + func (a *OvsAdapter) AddFlowRule(ctx context.Context, bridge string, rule ports.FlowRule) error { if !bridgeNameRegex.MatchString(bridge) { return errors.New(errors.InvalidInput, "invalid bridge name") } + + // Basic validation of flow rule fields + if !flowFieldRegex.MatchString(rule.Match) { + return errors.New(errors.InvalidInput, "invalid flow match string") + } + if !flowFieldRegex.MatchString(rule.Actions) { + return errors.New(errors.InvalidInput, "invalid flow actions string") + } // ovs-ofctl add-flow <bridge> priority=<p>,<match>,actions=<actions> flowSpec := fmt.Sprintf("priority=%d,%s,actions=%s", rule.Priority, rule.Match, rule.Actions) cmd := exec.CommandContext(ctx, a.ofctlPath, "add-flow", bridge, flowSpec) if err := cmd.Run(); err != nil { return errors.Wrap(errors.Internal, "failed to add flow rule", err) } return nil } func (a *OvsAdapter) DeleteFlowRule(ctx context.Context, bridge string, match string) error { if !bridgeNameRegex.MatchString(bridge) { return errors.New(errors.InvalidInput, "invalid bridge name") } + + if !flowFieldRegex.MatchString(match) { + return errors.New(errors.InvalidInput, "invalid flow match string") + } cmd := exec.CommandContext(ctx, a.ofctlPath, "del-flows", bridge, match) if err := cmd.Run(); err != nil { return errors.Wrap(errors.Internal, "failed to delete flow rule", err) } return nil }internal/repositories/ovs/adapter.go-118-137 (1)
118-137: Validate remote IP to prevent command injection.The
CreateVXLANTunnelmethod usesremoteIPdirectly in command construction without validation. A malicious or malformed IP could potentially inject additional command arguments.🔒 Proposed validation
Add this validation at the start of the method:
+import "net" + func (a *OvsAdapter) CreateVXLANTunnel(ctx context.Context, bridge string, vni int, remoteIP string) error { if !bridgeNameRegex.MatchString(bridge) { return errors.New(errors.InvalidInput, "invalid bridge name") } + + // Validate IP address format + if net.ParseIP(remoteIP) == nil { + return errors.New(errors.InvalidInput, "invalid remote IP address") + } tunnelName := fmt.Sprintf("vxlan-%s", strings.ReplaceAll(remoteIP, ".", "-")) cmd := exec.CommandContext(ctx, a.ovsPath, "add-port", bridge, tunnelName, "--", "set", "interface", tunnelName, "type=vxlan", fmt.Sprintf("options:remote_ip=%s", remoteIP), fmt.Sprintf("options:key=%d", vni), ) if err := cmd.Run(); err != nil { return errors.Wrap(errors.Internal, "failed to create vxlan tunnel", err) } return nil }Committable suggestion skipped: line range outside the PR's diff.
🟡 Minor comments (13)
.agent/workflows/setup-ovs.md-18-26 (1)
18-26: Incomplete workflow steps.Steps 3 and 4 contain only comments without actual commands. Consider completing these steps or removing them if they're meant to be manual.
Suggested completion
3. Update Docker Compose for OVS support ```bash -# This adds NET_ADMIN capabilities and mounts the OVS socket +# Verify docker-compose.yml has the required OVS configuration +grep -A5 "cap_add" docker-compose.yml ``` 4. Verify OVS integration via API health check ```bash -# Wait for API to restart and check /health/ovs +docker-compose restart api +sleep 5 +curl -s http://localhost:8080/health/ovs ```.agent/workflows/setup-ovs.md-10-16 (1)
10-16: Replace hardcoded absolute paths with relative paths.The paths
/home/poyraz/dev/Cloud/scripts/setup-ovs.share user-specific and will not work for other developers. Use relative paths from the repository root instead.Suggested fix
1. Make setup script executable ```bash -chmod +x /home/poyraz/dev/Cloud/scripts/setup-ovs.sh +chmod +x ./scripts/setup-ovs.sh
- Run the setup script
-/home/poyraz/dev/Cloud/scripts/setup-ovs.sh +./scripts/setup-ovs.sh</details> </blockquote></details> <details> <summary>cmd/cloud/instance.go-100-100 (1)</summary><blockquote> `100-100`: **Add subnet/VPC relationship validation in LaunchInstance.** The service layer is missing a critical check: when both `vpcID` and `subnetID` are provided, the code should validate that the subnet actually belongs to the specified VPC. Currently, it fetches both independently without verifying the relationship (lines 80-105 in `internal/core/services/instance.go`). Add validation after retrieving the subnet to check `subnet.VPCID == *vpcID` and return an error if they don't match. Also verify the three scenarios: - Subnet specified without VPC (currently allowed—clarify intent) - VPC specified without subnet (currently allowed—clarify intent) - Subnet belongs to the specified VPC (missing validation—implement check) </blockquote></details> <details> <summary>internal/repositories/postgres/migrations/030_create_security_groups.up.sql-21-21 (1)</summary><blockquote> `21-21`: **CIDR column may be too short for IPv6.** `VARCHAR(18)` accommodates IPv4 CIDR (max 18 chars for `255.255.255.255/32`) but not IPv6 (up to 43 chars for `xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/128`). Consider using PostgreSQL's native `CIDR` type (as done in `031_create_subnets.up.sql` line 7) for consistency and built-in validation. <details> <summary>🐛 Suggested fix</summary> ```diff - cidr VARCHAR(18) NOT NULL, + cidr CIDR NOT NULL,internal/handlers/instance_handler_test.go-100-100 (1)
100-100: Add test case for non-nilsubnetIDparameter.The handler accepts and processes the
subnetIDparameter (lines 136-144), but test coverage only exercises the nil case. While line 246 tests a non-nilvpcID, there's no corresponding test that validates behavior whensubnetIDis provided. Add a test case that passes a non-nilsubnetIDtoLaunchInstanceto ensure the handler correctly propagates subnet information to the service layer.pkg/sdk/compute.go-44-52 (1)
44-52: Breaking change documentation incomplete.All SDK consumers in the codebase have been properly updated for the
subnetIDparameter. However, the breaking change is only documented in the API reference—no dedicated migration guide or breaking change notes exist for external SDK consumers to discover the update.While the API reference (
docs/api-reference.md) and CLI reference (docs/cli-reference.md) show the newsubnet_idparameter, adding explicit breaking change notes to release notes or a migration guide would improve visibility for external SDK consumers.cmd/cloud/subnet.go-39-49 (1)
39-49: Potential panic if subnet ID is shorter than 8 characters.The slice
s.ID[:8]will panic with an index out of range error if the ID string is shorter than 8 characters. While UUIDs are typically 36 characters, defensive handling is recommended.Proposed fix
for _, s := range subnets { + displayID := s.ID + if len(displayID) > 8 { + displayID = displayID[:8] + } table.Append([]string{ - s.ID[:8], + displayID, s.Name, s.CIDRBlock, s.AZ, s.GatewayIP, s.Status, s.CreatedAt.Format("2006-01-02 15:04:05"), }) }internal/handlers/security_group_handler.go-68-71 (1)
68-71: Silent failure on invalidvpc_idcould cause unexpected behavior.If
vpcIDStris non-empty but not a valid UUID, the parse error is silently ignored andvpcIDdefaults touuid.Nil. This could lead to confusing behavior. Consider returning an error for malformed input.🔧 Suggested fix
var vpcID uuid.UUID if vpcIDStr != "" { - vpcID, _ = uuid.Parse(vpcIDStr) + var err error + vpcID, err = uuid.Parse(vpcIDStr) + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": "invalid vpc_id"}) + return + } }internal/handlers/subnet_handler.go-65-76 (1)
65-76: Name-based lookup cannot succeed withuuid.Nilas the VPC context.When
idStris not a valid UUID,GetSubnetfalls through toGetByName(ctx, uuid.Nil, idStr). The repository's query filters byvpc_id = $1, which means it searches forvpc_id = 00000000-0000-0000-0000-000000000000. Since no subnet will have that VPC ID, the lookup will always fail with "not found." Either restrict this endpoint to UUID-only identifiers, or refactorGetByNameto support lookups without VPC context (e.g., by removing the vpc_id filter when it is nil).internal/repositories/postgres/subnet_repo.go-76-88 (1)
76-88: Missingrows.Err()check after iteration.After iterating over rows with
rows.Next(), you should checkrows.Err()to catch any errors that occurred during iteration.🐛 Proposed fix
for rows.Next() { var s domain.Subnet err := rows.Scan( &s.ID, &s.UserID, &s.VPCID, &s.Name, &s.CIDRBlock, &s.AvailabilityZone, &s.GatewayIP, &s.ARN, &s.Status, &s.CreatedAt, ) if err != nil { return nil, errors.Wrap(errors.Internal, "failed to scan subnet", err) } subnets = append(subnets, &s) } + if err := rows.Err(); err != nil { + return nil, errors.Wrap(errors.Internal, "error iterating subnets", err) + } return subnets, nilinternal/core/services/vpc.go-101-111 (1)
101-111: DeleteVPC: Bridge deletion before DB deletion risks orphaned records.If
s.repo.Deletefails after the bridge is already deleted, the VPC record remains in the database but the bridge is gone. Consider reversing the order or implementing a transactional approach.🔧 Suggested fix: Delete from DB first
func (s *VpcService) DeleteVPC(ctx context.Context, idOrName string) error { vpc, err := s.GetVPC(ctx, idOrName) if err != nil { return err } - // 1. Remove OVS bridge - if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil { - s.logger.Error("failed to remove OVS bridge", "bridge", vpc.NetworkID, "error", err) - return errors.Wrap(errors.Internal, "failed to remove OVS bridge", err) - } - s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID) - - // 2. Delete from DB + // 1. Delete from DB first if err := s.repo.Delete(ctx, vpc.ID); err != nil { return errors.Wrap(errors.Internal, "failed to delete VPC from database", err) } + + // 2. Remove OVS bridge (best-effort cleanup) + if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil { + s.logger.Error("failed to remove OVS bridge", "bridge", vpc.NetworkID, "error", err) + // Log but don't fail - DB record is already gone + } + s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID)internal/core/services/security_group.go-81-93 (1)
81-93: DeleteGroup doesn't clean up OVS flows.When deleting a security group, the associated OVS flow rules should also be removed. Currently, only the database record is deleted, leaving orphaned flows in OVS.
🐛 Proposed fix: Clean up flows before deletion
func (s *SecurityGroupService) DeleteGroup(ctx context.Context, id uuid.UUID) error { userID := appcontext.UserIDFromContext(ctx) + // Get the group to access VPC and rules for cleanup + sg, err := s.repo.GetByID(ctx, id) + if err != nil { + return err + } + + // Clean up OVS flows + if vpc, err := s.vpcRepo.GetByID(ctx, sg.VPCID); err == nil { + for _, rule := range sg.Rules { + flow := s.translateToFlow(rule) + _ = s.network.DeleteFlowRule(ctx, vpc.NetworkID, flow.Match) + } + } + // In a real implementation, we should check if any instances are still attached // For now, we just delete. if err := s.repo.Delete(ctx, id); err != nil {Committable suggestion skipped: line range outside the PR's diff.
internal/core/services/instance.go-486-528 (1)
486-528: IP allocation skips network address but not broadcast address.The
allocateIPfunction starts from the network address and increments, butisValidHostIPonly checksContains(). For subnets larger than /31, the broadcast address should be excluded.🐛 Proposed fix for isValidHostIP
func (s *InstanceService) isValidHostIP(ip net.IP, n *net.IPNet) bool { - // Simple check: not network address and not broadcast address (if /30 or larger) - // For simplicity in this demo, we just ensure it's in range and not gateway - return n.Contains(ip) + if !n.Contains(ip) { + return false + } + // Exclude network address + if ip.Equal(n.IP) { + return false + } + // Exclude broadcast address for IPv4 + if len(ip) == net.IPv4len || ip.To4() != nil { + ones, bits := n.Mask.Size() + if ones < bits { // Not a /32 + broadcast := make(net.IP, len(n.IP)) + copy(broadcast, n.IP) + for i := range broadcast { + broadcast[i] |= ^n.Mask[i] + } + if ip.Equal(broadcast) { + return false + } + } + } + return true }Committable suggestion skipped: line range outside the PR's diff.
| -- +goose Up | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, check if the file exists and view its content
echo "=== Checking file: internal/repositories/postgres/migrations/003_add_container_id.up.sql ==="
if [ -f "internal/repositories/postgres/migrations/003_add_container_id.up.sql" ]; then
echo "File found. Content (first 5 lines):"
head -n 5 "internal/repositories/postgres/migrations/003_add_container_id.up.sql" | cat -n
else
echo "File NOT found"
fi
echo ""
echo "=== Checking all migration files in the directory ==="
find internal/repositories/postgres/migrations -type f \( -name "*.up.sql" -o -name "*.down.sql" \) 2>/dev/null | sortRepository: PoyrazK/thecloud
Length of output: 4792
🏁 Script executed:
#!/bin/bash
# Verification script from review comment
echo "=== Checking .up.sql files for '-- +goose Up' directive ==="
fd -e sql -p '\.up\.sql$' internal/repositories/postgres/migrations/ --exec bash -c '
file="{}";
if head -n 1 "$file" | grep -q "^-- +goose Up$"; then
echo "✓ $file"
else
echo "✗ MISSING: $file"
fi
'
echo ""
echo "=== Checking .down.sql files for '-- +goose Down' directive ==="
fd -e sql -p '\.down\.sql$' internal/repositories/postgres/migrations/ --exec bash -c '
file="{}";
if head -n 1 "$file" | grep -q "^-- +goose Down$"; then
echo "✓ $file"
else
echo "✗ MISSING: $file"
fi
'
echo ""
echo "=== Checking for .up.sql files without corresponding .down.sql files ==="
for upfile in internal/repositories/postgres/migrations/*.up.sql; do
downfile="${upfile%.up.sql}.down.sql"
if [ ! -f "$downfile" ]; then
echo "⚠ Missing .down.sql: $upfile"
fi
doneRepository: PoyrazK/thecloud
Length of output: 5304
Create .down.sql migration files with proper -- +goose Down directives.
All .up.sql files correctly include -- +goose Up, but all 30 corresponding .down.sql files are missing the required -- +goose Down directive. Additionally, migrations 031 and 032 lack .down.sql files entirely. Goose requires both directives for proper migration execution and rollback support.
🤖 Prompt for AI Agents
In @internal/repositories/postgres/migrations/003_add_container_id.up.sql around
lines 1 - 2, The .down.sql counterparts are missing the required Goose directive
and two migrations (031, 032) lack .down.sql files entirely; create a .down.sql
file for each .up.sql (including migrations 031 and 032) named to match their
.up.sql counterparts (e.g., create 003_add_container_id.down.sql) and ensure the
very first non-comment line is the directive "-- +goose Down", then add the
appropriate rollback SQL that reverses the changes made in the corresponding
.up.sql (e.g., DROP COLUMN, DROP TABLE, or REVERT INDEX changes) so each
migration has a proper Up/Down pair for Goose to execute and roll back.
| func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error { | ||
| query := `DELETE FROM security_rules WHERE id = $1` | ||
| _, err := r.db.Exec(ctx, query, ruleID) | ||
| return err | ||
| } |
There was a problem hiding this comment.
Authorization bypass: missing user ownership check.
The DeleteRule method does not verify that the rule belongs to a security group owned by the requesting user. This allows any authenticated user to delete security rules by ID, which is a critical security vulnerability.
🔒 Proposed fix to add authorization check
func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
- query := `DELETE FROM security_rules WHERE id = $1`
- _, err := r.db.Exec(ctx, query, ruleID)
- return err
+ userID := appcontext.UserIDFromContext(ctx)
+ query := `
+ DELETE FROM security_rules
+ WHERE id = $1
+ AND group_id IN (SELECT id FROM security_groups WHERE user_id = $2)
+ `
+ result, err := r.db.Exec(ctx, query, ruleID, userID)
+ if err != nil {
+ return errors.Wrap(errors.Internal, "failed to delete security rule", err)
+ }
+ if result.RowsAffected() == 0 {
+ return errors.New(errors.NotFound, "security rule not found")
+ }
+ return nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error { | |
| query := `DELETE FROM security_rules WHERE id = $1` | |
| _, err := r.db.Exec(ctx, query, ruleID) | |
| return err | |
| } | |
| func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error { | |
| userID := appcontext.UserIDFromContext(ctx) | |
| query := ` | |
| DELETE FROM security_rules | |
| WHERE id = $1 | |
| AND group_id IN (SELECT id FROM security_groups WHERE user_id = $2) | |
| ` | |
| result, err := r.db.Exec(ctx, query, ruleID, userID) | |
| if err != nil { | |
| return errors.Wrap(errors.Internal, "failed to delete security rule", err) | |
| } | |
| if result.RowsAffected() == 0 { | |
| return errors.New(errors.NotFound, "security rule not found") | |
| } | |
| return nil | |
| } |
|
Note Docstrings generation - SUCCESS |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Fix all issues with AI agents
In @internal/core/services/security_group.go:
- Around line 81-93: DeleteGroup currently deletes a security group without
verifying no instances are attached; update DeleteGroup to first query for
attached instances and abort deletion if any exist. Add/use a repository method
such as s.repo.CountInstancesByGroup(ctx, id) or
s.instanceRepo.ListBySecurityGroup(ctx, id) to detect attachments, return a
specific validation error (e.g., errors.Conflict or a new
errors.InvalidOperation) with a clear message when count > 0, and only call
s.repo.Delete and audit log when no attached instances are found; ensure the
check uses the same ctx and preserves existing error wrapping semantics.
- Around line 128-144: DetachFromInstance currently only calls
repo.RemoveInstanceFromGroup and does not update OVS flows; modify
DetachFromInstance to mirror AttachToInstance by first removing the instance via
RemoveInstanceFromGroup, then load the affected SecurityGroup with repo.GetByID
(using the same groupID), and finally call s.syncGroupFlows(ctx, sg). Ensure you
propagate and handle errors from RemoveInstanceFromGroup, GetByID, and
syncGroupFlows (returning them as appropriate) so stale OVS flows are removed
when an instance is detached.
- Around line 95-120: Add a rollback when OVS flow sync fails in
SecurityGroupService.AddRule: after calling s.repo.AddRule and before returning,
if s.syncGroupFlows(ctx, sg) returns an error attempt to undo the DB change
(call the repository removal method, e.g., s.repo.RemoveRule(ctx, rule.ID) or
use an explicit DB transaction around s.repo.AddRule so you can rollback) and
only log the sync error if rollback succeeds; ensure auditSvc.Log is only called
after successful sync/commit; handle and return the original sync error
(wrapped) if rollback fails or if sync ultimately fails so callers see the
consistency problem.
- Around line 41-67: In CreateGroup, validate the user ID returned by
appcontext.UserIDFromContext (ensure it's not the zero UUID) and return a proper
error if missing instead of proceeding to create the group; after calling
s.repo.Create keep the existing error wrap behavior, and when calling
s.auditSvc.Log do not discard its error—capture the returned error from
auditSvc.Log and surface it (either log via the service logger on s or return a
wrapped error) so audit failures are observable; reference the CreateGroup
method, appcontext.UserIDFromContext, and s.auditSvc.Log when making these
changes.
- Around line 146-161: syncGroupFlows currently only adds flows and never
removes stale ones, AddRule calls syncGroupFlows with a stale sg, and RemoveRule
is unimplemented; fix by (1) making AddRule fetch and persist the new rule then
re-retrieve the fresh SecurityGroup from the repo (use the same repo method used
elsewhere, e.g., GetByID) before calling syncGroupFlows, (2) update
syncGroupFlows to first enumerate and remove existing flows for that security
group via network.DeleteFlowRule (or a DeleteFlowsByGroup variant) before adding
current rules with network.AddFlowRule, (3) implement RemoveRule to delete the
rule from the DB and then call the fresh-state syncGroupFlows to remove
associated OVS flows, and (4) ensure the OVS adapter’s AddFlowRule is made
idempotent (or include a flow-version or unique flow-id per rule that you use to
delete duplicates) so repeated syncs do not create duplicate flows.
In @internal/repositories/ovs/adapter.go:
- Around line 119-138: CreateVXLANTunnel currently uses remoteIP directly which
can produce invalid/malicious tunnel names; validate remoteIP with net.ParseIP
and return an InvalidInput error if parsing fails, then derive tunnelName from
the parsed IP (use ip.String()) and sanitize by replacing '.' and ':' with '-'
before building the exec.CommandContext args; ensure the error path uses the
same validation error type and keep bridge validation via bridgeNameRegex
unchanged so DeletePort validation remains consistent.
🧹 Nitpick comments (3)
internal/repositories/ovs/adapter.go (2)
16-18: Consider allowing underscores in bridge/port names.OVS bridge and port names commonly include underscores (e.g.,
br_int,veth_host). The current regex^[a-zA-Z0-9-]+$excludes them.🔧 Suggested fix
var ( - bridgeNameRegex = regexp.MustCompile(`^[a-zA-Z0-9-]+$`) + bridgeNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`) )
173-176: Stub implementation may mislead callers.
ListFlowRulesreturns an empty slice without indicating it's unimplemented. Callers can't distinguish "no rules exist" from "not implemented."Consider returning a sentinel error or adding structured logging until implemented:
func (a *OvsAdapter) ListFlowRules(_ context.Context, _ string) ([]ports.FlowRule, error) { - // Parsing ovs-ofctl dump-flows output is complex and will be implemented in a follow-up. - return []ports.FlowRule{}, nil + // TODO: Implement parsing of ovs-ofctl dump-flows output + return nil, errors.New(errors.Internal, "ListFlowRules not yet implemented") }Alternatively, if returning an empty slice is acceptable for now, add a log statement for observability.
internal/core/services/security_group.go (1)
163-182: Consider protocol validation and extracting magic strings.Two refactoring suggestions:
Lines 166-173: The protocol switch statement lacks a default case. Unknown protocols are silently ignored, which could lead to unexpected behavior where rules are created but don't generate meaningful flow matches.
Line 175: The magic string
"0.0.0.0/0"should be extracted as a constant for maintainability.♻️ Suggested improvements
+const ( + anyIPv4CIDR = "0.0.0.0/0" +) + func (s *SecurityGroupService) translateToFlow(rule domain.SecurityRule) ports.FlowRule { matchParts := []string{} switch rule.Protocol { case "tcp": matchParts = append(matchParts, "tcp") case "udp": matchParts = append(matchParts, "udp") case "icmp": matchParts = append(matchParts, "icmp") + default: + // Log or handle unknown protocol + // For now, continue without protocol match (matches all protocols) } - if rule.CIDR != "" && rule.CIDR != "0.0.0.0/0" { + if rule.CIDR != "" && rule.CIDR != anyIPv4CIDR { if rule.Direction == domain.RuleIngress { matchParts = append(matchParts, fmt.Sprintf("nw_src=%s", rule.CIDR)) } else { matchParts = append(matchParts, fmt.Sprintf("nw_dst=%s", rule.CIDR)) } }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/swagger/docs.godocs/swagger/swagger.jsondocs/swagger/swagger.yamlinternal/core/services/security_group.gointernal/repositories/ovs/adapter.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/repositories/ovs/adapter.go (2)
internal/errors/errors.go (4)
Type(8-8)New(57-59)InvalidInput(12-12)Wrap(61-63)internal/core/ports/network.go (1)
FlowRule(8-13)
🔇 Additional comments (12)
docs/swagger/swagger.yaml (4)
90-99: LGTM - New OVS networking fields for Instance model.The added fields (
ovs_port,private_ip,subnet_id) are correctly typed as strings and align with the OVS networking integration requirements. These are additive, backward-compatible changes to the API response schema.
225-225: LGTM - Permission enum properly updated.The
vpc:updatepermission and correspondingPermissionVpcUpdatevarname are correctly positioned and follow the established naming convention.Also applies to: 287-287
572-590: LGTM - VPC model extended with OVS networking fields.The new fields (
arn,cidr_block,vxlan_id) and the description fornetwork_id("OVS bridge name") appropriately document the OVS integration.Note: The VPC creation endpoint (POST
/vpcs) currently only acceptsnamein the request body. Verify whethercidr_blockshould be an input parameter for VPC creation, or if it's auto-assigned by the backend.
787-788: LGTM - LaunchRequest extended with subnet_id.The
subnet_idfield is correctly added as an optional parameter, allowing instances to be launched with or without explicit subnet placement.docs/swagger/docs.go (1)
1-4: Auto-generated file - changes are consistent with swagger.yaml.This file is generated by swaggo/swag and should not be manually edited. The changes correctly mirror the swagger.yaml updates for OVS networking integration (Instance fields, VPC fields, Permission enum, LaunchRequest).
docs/swagger/swagger.json (2)
2679-2693: New networking fields align with OVS integration.The additions to
domain.Instance(ovs_port,private_ip,subnet_id),domain.VPC(arn,cidr_block,vxlan_id), andhttphandlers.LaunchRequest(subnet_id) are consistent with the OVS networking integration. Fields are appropriately typed andsubnet_idin LaunchRequest is correctly optional for backward compatibility.Consider adding format validation constraints in a follow-up for
cidr_block(CIDR notation) andprivate_ip(IPv4/IPv6 format) to improve API documentation and client-side validation.Also applies to: 3326-3353, 3647-3649
2873-2873: Permission addition is correct.The
vpc:updatepermission and correspondingPermissionVpcUpdateenum variant are properly added and maintain alignment with the existing permission structure.Also applies to: 2935-2935
internal/repositories/ovs/adapter.go (3)
20-42: Constructor implementation is sound.Using
exec.LookPathat construction time to validate binary availability is a good practice. This fails fast if OVS tools are not installed.
44-51: Core OVS operations are well-implemented.The bridge, port, and flow rule operations follow a consistent pattern with proper:
- Input validation using
bridgeNameRegex- Context propagation via
exec.CommandContext- Error wrapping with domain-specific error types
- Edge case handling (e.g., empty bridge list in
ListBridges)Also applies to: 66-91, 93-117, 145-171, 186-197
53-64: ThevxlanIDparameter is correctly used as metadata and should not be removed.The
vxlanIDparameter is intentionally accepted, stored in the VPC domain object, persisted to the database, and exposed through the SDK and CLI. It serves as the VXLAN Network Identifier (VNI) metadata for the VPC. VXLAN tunnel configuration is handled separately via theCreateVXLANTunnelmethod, which accepts avniparameter. This is the correct architectural separation:CreateBridgeestablishes the bridge and metadata, whileCreateVXLANTunnelconfigures overlay tunnels when needed.Likely an incorrect or invalid review comment.
internal/core/services/security_group.go (2)
17-39: LGTM!The service struct and constructor are well-designed with clean dependency injection using interface types for testability.
69-79: LGTM!The GetGroup method appropriately attempts UUID parsing before falling back to name-based lookup, and ListGroups is a clean pass-through to the repository.
| func (s *SecurityGroupService) CreateGroup(ctx context.Context, vpcID uuid.UUID, name, description string) (*domain.SecurityGroup, error) { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
| sgID := uuid.New() | ||
|
|
||
| arn := fmt.Sprintf("arn:thecloud:vpc:local:%s:security-group/%s", userID.String(), sgID.String()) | ||
|
|
||
| sg := &domain.SecurityGroup{ | ||
| ID: sgID, | ||
| UserID: userID, | ||
| VPCID: vpcID, | ||
| Name: name, | ||
| Description: description, | ||
| ARN: arn, | ||
| CreatedAt: time.Now(), | ||
| } | ||
|
|
||
| if err := s.repo.Create(ctx, sg); err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to create security group", err) | ||
| } | ||
|
|
||
| _ = s.auditSvc.Log(ctx, userID, "security_group.create", "security_group", sgID.String(), map[string]interface{}{ | ||
| "name": name, | ||
| "vpc_id": vpcID, | ||
| }) | ||
|
|
||
| return sg, nil | ||
| } |
There was a problem hiding this comment.
Consider validating UserID and handling audit errors.
Two minor concerns:
-
Line 42:
UserIDFromContextis called without checking if a valid user ID exists in the context. If the context lacks a user ID, this could lead to creating security groups with zero-value UUIDs. -
Line 61: Audit logging errors are silently discarded. Consider logging the error for observability.
♻️ Suggested improvements
func (s *SecurityGroupService) CreateGroup(ctx context.Context, vpcID uuid.UUID, name, description string) (*domain.SecurityGroup, error) {
userID := appcontext.UserIDFromContext(ctx)
+ if userID == uuid.Nil {
+ return nil, errors.New(errors.Unauthorized, "user ID not found in context")
+ }
sgID := uuid.New()
arn := fmt.Sprintf("arn:thecloud:vpc:local:%s:security-group/%s", userID.String(), sgID.String())
sg := &domain.SecurityGroup{
ID: sgID,
UserID: userID,
VPCID: vpcID,
Name: name,
Description: description,
ARN: arn,
CreatedAt: time.Now(),
}
if err := s.repo.Create(ctx, sg); err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create security group", err)
}
- _ = s.auditSvc.Log(ctx, userID, "security_group.create", "security_group", sgID.String(), map[string]interface{}{
+ if err := s.auditSvc.Log(ctx, userID, "security_group.create", "security_group", sgID.String(), map[string]interface{}{
"name": name,
"vpc_id": vpcID,
- })
+ }); err != nil {
+ s.logger.Warn("failed to log audit event", "error", err)
+ }
return sg, nil
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 41 - 67, In
CreateGroup, validate the user ID returned by appcontext.UserIDFromContext
(ensure it's not the zero UUID) and return a proper error if missing instead of
proceeding to create the group; after calling s.repo.Create keep the existing
error wrap behavior, and when calling s.auditSvc.Log do not discard its
error—capture the returned error from auditSvc.Log and surface it (either log
via the service logger on s or return a wrapped error) so audit failures are
observable; reference the CreateGroup method, appcontext.UserIDFromContext, and
s.auditSvc.Log when making these changes.
| func (s *SecurityGroupService) DeleteGroup(ctx context.Context, id uuid.UUID) error { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
|
|
||
| // In a real implementation, we should check if any instances are still attached | ||
| // For now, we just delete. | ||
| if err := s.repo.Delete(ctx, id); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to delete security group", err) | ||
| } | ||
|
|
||
| _ = s.auditSvc.Log(ctx, userID, "security_group.delete", "security_group", id.String(), nil) | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Implement instance attachment validation before deletion.
The comment on lines 84-85 acknowledges that the implementation should check if instances are attached before deleting the security group. Without this check:
- Instance-to-group associations in the database could be orphaned
- OVS flows for attached instances may remain active, creating stale security rules
- The system could enter an inconsistent state where instances reference non-existent groups
This validation is essential for maintaining data integrity and correct security enforcement.
Do you want me to generate an implementation that checks for attached instances and prevents deletion if any exist?
🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 81 - 93, DeleteGroup
currently deletes a security group without verifying no instances are attached;
update DeleteGroup to first query for attached instances and abort deletion if
any exist. Add/use a repository method such as s.repo.CountInstancesByGroup(ctx,
id) or s.instanceRepo.ListBySecurityGroup(ctx, id) to detect attachments, return
a specific validation error (e.g., errors.Conflict or a new
errors.InvalidOperation) with a clear message when count > 0, and only call
s.repo.Delete and audit log when no attached instances are found; ensure the
check uses the same ctx and preserves existing error wrapping semantics.
| func (s *SecurityGroupService) AddRule(ctx context.Context, groupID uuid.UUID, rule domain.SecurityRule) (*domain.SecurityRule, error) { | ||
| sg, err := s.repo.GetByID(ctx, groupID) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| rule.ID = uuid.New() | ||
| rule.GroupID = groupID | ||
| rule.CreatedAt = time.Now() | ||
|
|
||
| if err := s.repo.AddRule(ctx, &rule); err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to add security rule", err) | ||
| } | ||
|
|
||
| // Update OVS flows | ||
| if err := s.syncGroupFlows(ctx, sg); err != nil { | ||
| s.logger.Error("failed to sync OVS flows", "group_id", groupID, "error", err) | ||
| // We don't rollback DB here in this simple pass, but in real life we should. | ||
| } | ||
|
|
||
| _ = s.auditSvc.Log(ctx, sg.UserID, "security_group.add_rule", "security_group", groupID.String(), map[string]interface{}{ | ||
| "rule_id": rule.ID.String(), | ||
| }) | ||
|
|
||
| return &rule, nil | ||
| } |
There was a problem hiding this comment.
Critical: Handle flow synchronization failures with rollback.
Lines 110-113 identify a critical consistency issue: if OVS flow synchronization fails after the rule is persisted to the database, the system enters an inconsistent state where:
- The rule exists in the database
- The rule is NOT enforced in the network layer
- Security policies appear configured but aren't actually protecting traffic
This is a security risk because administrators may believe traffic is filtered when it isn't.
🔒 Proposed fix with transaction rollback
Consider implementing a rollback mechanism:
func (s *SecurityGroupService) AddRule(ctx context.Context, groupID uuid.UUID, rule domain.SecurityRule) (*domain.SecurityRule, error) {
sg, err := s.repo.GetByID(ctx, groupID)
if err != nil {
return nil, err
}
rule.ID = uuid.New()
rule.GroupID = groupID
rule.CreatedAt = time.Now()
if err := s.repo.AddRule(ctx, &rule); err != nil {
return nil, errors.Wrap(errors.Internal, "failed to add security rule", err)
}
// Update OVS flows
if err := s.syncGroupFlows(ctx, sg); err != nil {
- s.logger.Error("failed to sync OVS flows", "group_id", groupID, "error", err)
- // We don't rollback DB here in this simple pass, but in real life we should.
+ s.logger.Error("failed to sync OVS flows, rolling back", "group_id", groupID, "error", err)
+ if rollbackErr := s.repo.RemoveRule(ctx, rule.ID); rollbackErr != nil {
+ s.logger.Error("failed to rollback rule after flow sync failure", "rule_id", rule.ID, "error", rollbackErr)
+ }
+ return nil, errors.Wrap(errors.Internal, "failed to sync security group flows", err)
}
- _ = s.auditSvc.Log(ctx, sg.UserID, "security_group.add_rule", "security_group", groupID.String(), map[string]interface{}{
+ if err := s.auditSvc.Log(ctx, sg.UserID, "security_group.add_rule", "security_group", groupID.String(), map[string]interface{}{
"rule_id": rule.ID.String(),
- })
+ }); err != nil {
+ s.logger.Warn("failed to log audit event", "error", err)
+ }
return &rule, nil
}Note: This assumes a RemoveRule method exists in the repository interface.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 95 - 120, Add a
rollback when OVS flow sync fails in SecurityGroupService.AddRule: after calling
s.repo.AddRule and before returning, if s.syncGroupFlows(ctx, sg) returns an
error attempt to undo the DB change (call the repository removal method, e.g.,
s.repo.RemoveRule(ctx, rule.ID) or use an explicit DB transaction around
s.repo.AddRule so you can rollback) and only log the sync error if rollback
succeeds; ensure auditSvc.Log is only called after successful sync/commit;
handle and return the original sync error (wrapped) if rollback fails or if sync
ultimately fails so callers see the consistency problem.
| func (s *SecurityGroupService) RemoveRule(ctx context.Context, ruleID uuid.UUID) error { | ||
| // 1. Get rule to know groupID | ||
| // Implementation omitted: would need a GetRuleByID in repo | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Critical: RemoveRule is a non-functional stub.
The RemoveRule method returns nil without performing any operation. This means:
- Callers receive a success response when no rule is actually removed
- The rule remains in the database
- OVS flows continue to enforce the rule
- The system state becomes inconsistent with what the API reports
This is a critical incomplete implementation that will cause silent failures.
Would you like me to generate a complete implementation for RemoveRule that removes the rule from the database and syncs OVS flows?
| func (s *SecurityGroupService) AttachToInstance(ctx context.Context, instanceID, groupID uuid.UUID) error { | ||
| if err := s.repo.AddInstanceToGroup(ctx, instanceID, groupID); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| sg, err := s.repo.GetByID(ctx, groupID) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // Update OVS flows | ||
| return s.syncGroupFlows(ctx, sg) | ||
| } | ||
|
|
||
| func (s *SecurityGroupService) DetachFromInstance(ctx context.Context, instanceID, groupID uuid.UUID) error { | ||
| return s.repo.RemoveInstanceFromGroup(ctx, instanceID, groupID) | ||
| } |
There was a problem hiding this comment.
Critical: DetachFromInstance doesn't sync OVS flows.
There's a critical asymmetry between AttachToInstance and DetachFromInstance:
- AttachToInstance (line 139): Syncs OVS flows after attaching
- DetachFromInstance (line 143): Does NOT sync flows after detaching
This means when an instance is detached from a security group, the OVS flows remain active, continuing to enforce rules for an instance that should no longer be subject to them. This creates stale network rules and incorrect security enforcement.
🔒 Proposed fix to sync flows on detach
func (s *SecurityGroupService) DetachFromInstance(ctx context.Context, instanceID, groupID uuid.UUID) error {
- return s.repo.RemoveInstanceFromGroup(ctx, instanceID, groupID)
+ if err := s.repo.RemoveInstanceFromGroup(ctx, instanceID, groupID); err != nil {
+ return err
+ }
+
+ sg, err := s.repo.GetByID(ctx, groupID)
+ if err != nil {
+ return err
+ }
+
+ // Update OVS flows to remove rules for detached instance
+ return s.syncGroupFlows(ctx, sg)
}Note: Depending on your OVS architecture, you may need to remove specific flows for the detached instance rather than re-syncing all flows. Consider if syncGroupFlows properly handles removal of flows for detached instances.
🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 128 - 144,
DetachFromInstance currently only calls repo.RemoveInstanceFromGroup and does
not update OVS flows; modify DetachFromInstance to mirror AttachToInstance by
first removing the instance via RemoveInstanceFromGroup, then load the affected
SecurityGroup with repo.GetByID (using the same groupID), and finally call
s.syncGroupFlows(ctx, sg). Ensure you propagate and handle errors from
RemoveInstanceFromGroup, GetByID, and syncGroupFlows (returning them as
appropriate) so stale OVS flows are removed when an instance is detached.
| func (s *SecurityGroupService) syncGroupFlows(ctx context.Context, sg *domain.SecurityGroup) error { | ||
| vpc, err := s.vpcRepo.GetByID(ctx, sg.VPCID) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // For each rule, generate an OVS flow | ||
| for _, rule := range sg.Rules { | ||
| flow := s.translateToFlow(rule) | ||
| if err := s.network.AddFlowRule(ctx, vpc.NetworkID, flow); err != nil { | ||
| return err | ||
| } | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the security_group.go file and examine the syncGroupFlows method
git ls-files | grep -E "(security_group|ovs|network)" | head -20Repository: PoyrazK/thecloud
Length of output: 902
🏁 Script executed:
# Read the security_group.go file to verify the implementation
cat -n internal/core/services/security_group.go | sed -n '140,170p'Repository: PoyrazK/thecloud
Length of output: 1105
🏁 Script executed:
# Search for AddFlowRule implementation
rg "func.*AddFlowRule" --type go -A 10Repository: PoyrazK/thecloud
Length of output: 1940
🏁 Script executed:
# Search for any flow cleanup or removal methods
rg "RemoveFlow|DeleteFlow|ClearFlow" --type go -B 2 -A 5Repository: PoyrazK/thecloud
Length of output: 2567
🏁 Script executed:
# Look for any flow synchronization or versioning mechanisms
rg "flow.*sync|flow.*version|tag.*flow" --type go -i -B 2 -A 5Repository: PoyrazK/thecloud
Length of output: 42
🏁 Script executed:
# Check if there are other methods in SecurityGroupService that handle cleanup
ast-grep --pattern $'type SecurityGroupService struct {
$$$
}'Repository: PoyrazK/thecloud
Length of output: 557
🏁 Script executed:
# Search for all methods in SecurityGroupService to see the full picture
rg "func \(s \*SecurityGroupService\)" --type go -A 1 | head -30Repository: PoyrazK/thecloud
Length of output: 2457
🏁 Script executed:
# Find all calls to syncGroupFlows
rg "syncGroupFlows" --type go -B 3 -A 3Repository: PoyrazK/thecloud
Length of output: 1579
🏁 Script executed:
# Check the AddRule and RemoveRule implementations to see if they handle flow cleanup
cat -n internal/core/services/security_group.go | sed -n '80,130p'Repository: PoyrazK/thecloud
Length of output: 2140
🏁 Script executed:
# Search for any flow cleanup in the service
rg "DeleteFlowRule" --type go -B 5 -A 5Repository: PoyrazK/thecloud
Length of output: 3419
🏁 Script executed:
# Look at the full AddRule method
ast-grep --pattern $'func (s *SecurityGroupService) AddRule($_) {
$$$
}'Repository: PoyrazK/thecloud
Length of output: 42
🏁 Script executed:
# Check RemoveRule implementation
cat -n internal/core/services/security_group.go | sed -n '118,140p'Repository: PoyrazK/thecloud
Length of output: 774
Major: syncGroupFlows only adds flows without removing stale ones, and AddRule uses stale security group state.
The implementation has multiple critical gaps:
-
No flow cleanup:
syncGroupFlows(lines 146–161) only adds flows viaAddFlowRulebut never removes old flows when rules change or are deleted. The network backend has aDeleteFlowRulemethod available but it's never called, causing flow table accumulation. -
Stale state sync in AddRule: The method fetches the security group at line 96 (before the new rule is persisted), adds the rule to the database at line 105, then calls
syncGroupFlowswith the oldsgobject at line 110. This means the newly added rule won't be synced to OVS on the first call. -
RemoveRule unimplemented: Lines 122–125 show
RemoveRuleis a stub that returns nil, so deleted rules are never removed from OVS either. -
AddFlowRule is not idempotent: The OVS adapter uses
ovs-ofctl add-flowdirectly, which creates duplicate flows if called multiple times with the same rule specification.
These issues lead to:
- Flow table pollution and performance degradation
- Incorrect rule enforcement (old rules persist)
- New rules not taking effect immediately
Implement a proper flow synchronization strategy:
- Fetch fresh security group state after persisting changes
- Clear all flows associated with the security group before re-syncing
- Complete the
RemoveRuleimplementation with flow cleanup - Consider idempotency guarantees or flow versioning to prevent duplicates
🤖 Prompt for AI Agents
In @internal/core/services/security_group.go around lines 146 - 161,
syncGroupFlows currently only adds flows and never removes stale ones, AddRule
calls syncGroupFlows with a stale sg, and RemoveRule is unimplemented; fix by
(1) making AddRule fetch and persist the new rule then re-retrieve the fresh
SecurityGroup from the repo (use the same repo method used elsewhere, e.g.,
GetByID) before calling syncGroupFlows, (2) update syncGroupFlows to first
enumerate and remove existing flows for that security group via
network.DeleteFlowRule (or a DeleteFlowsByGroup variant) before adding current
rules with network.AddFlowRule, (3) implement RemoveRule to delete the rule from
the DB and then call the fresh-state syncGroupFlows to remove associated OVS
flows, and (4) ensure the OVS adapter’s AddFlowRule is made idempotent (or
include a flow-version or unique flow-id per rule that you use to delete
duplicates) so repeated syncs do not create duplicate flows.
| if rule.PortMin > 0 { | ||
| if rule.PortMin == rule.PortMax { | ||
| switch rule.Protocol { | ||
| case "tcp", "udp": | ||
| matchParts = append(matchParts, fmt.Sprintf("tp_dst=%d", rule.PortMin)) | ||
| } | ||
| } else if rule.PortMax > rule.PortMin { | ||
| // OVS doesn't support easy ranges, would need multiple flows or mask | ||
| // Simplified for now | ||
| matchParts = append(matchParts, fmt.Sprintf("tp_dst=%d", rule.PortMin)) | ||
| } | ||
| } | ||
|
|
||
| return ports.FlowRule{ | ||
| Priority: rule.Priority, | ||
| Match: strings.Join(matchParts, ","), | ||
| Actions: "allow", // Simplified | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the file
find . -name "security_group.go" -type fRepository: PoyrazK/thecloud
Length of output: 211
🏁 Script executed:
# Then read the specific file around lines 183-201
cat -n internal/core/services/security_group.go | sed -n '170,210p'Repository: PoyrazK/thecloud
Length of output: 1186
🏁 Script executed:
# Also check the domain and ports structures referenced
find . -name "*.go" -type f | xargs grep -l "type SecurityRule\|type FlowRule" | head -20Repository: PoyrazK/thecloud
Length of output: 161
🏁 Script executed:
# Look for any tests related to security group rules
find . -name "*security*test*" -o -name "*test*security*" | grep -E "\.go$"Repository: PoyrazK/thecloud
Length of output: 42
🏁 Script executed:
# Read the SecurityRule structure
cat -n internal/core/domain/security_group.go | grep -A 20 "type SecurityRule"Repository: PoyrazK/thecloud
Length of output: 760
🏁 Script executed:
# Read the FlowRule structure
cat -n internal/core/ports/security_group.go | grep -A 20 "type FlowRule"Repository: PoyrazK/thecloud
Length of output: 42
🏁 Script executed:
# Check the full context of the translateToFlow method
cat -n internal/core/services/security_group.go | sed -n '150,210p'Repository: PoyrazK/thecloud
Length of output: 1756
🏁 Script executed:
# Search for usages of translateToFlow or port handling
rg "translateToFlow|PortMax|PortMin" internal/core/services/security_group.go -A 2 -B 2Repository: PoyrazK/thecloud
Length of output: 794
🏁 Script executed:
# Look for any tests or test files related to security groups
find . -path "*/test*" -name "*security*" -type f 2>/dev/null || find . -name "*_test.go" -type f | xargs grep -l "SecurityGroup\|security_group" 2>/dev/null | head -10Repository: PoyrazK/thecloud
Length of output: 42
🏁 Script executed:
# Search for FlowRule definition across the codebase
rg "type FlowRule" -A 10Repository: PoyrazK/thecloud
Length of output: 794
🏁 Script executed:
# Check if there's a deny/block action support anywhere in the codebase
rg "Actions|action" internal/core/ports/ internal/core/services/ | grep -i "deny\|reject\|drop\|block"Repository: PoyrazK/thecloud
Length of output: 144
🏁 Script executed:
# Check if there are any integration tests or examples
find . -type f -name "*.go" | xargs grep -l "translateToFlow\|AddFlowRule" | head -10Repository: PoyrazK/thecloud
Length of output: 258
🏁 Script executed:
# Look for any TODO or FIXME comments related to port ranges or flow actions
rg "TODO|FIXME|XXX" internal/core/services/security_group.go -A 2 -B 2Repository: PoyrazK/thecloud
Length of output: 42
🏁 Script executed:
# Check the test file
cat -n internal/core/services/shared_test.go | head -100Repository: PoyrazK/thecloud
Length of output: 3675
🏁 Script executed:
# Check the OVS adapter implementation
cat -n internal/repositories/ovs/adapter.go | grep -A 20 -B 5 "FlowRule\|AddFlowRule"Repository: PoyrazK/thecloud
Length of output: 2551
🏁 Script executed:
# Check if there are any admission/validation checks on SecurityRule
rg "PortMin|PortMax" internal/core/domain/ internal/core/services/ -B 3 -A 3Repository: PoyrazK/thecloud
Length of output: 1843
🏁 Script executed:
# Look for any comments or documentation about the security group implementation
rg "Simplified|TODO|port range|Actions" internal/core/services/security_group.goRepository: PoyrazK/thecloud
Length of output: 119
🏁 Script executed:
# Check if SecurityRule has any action or allow/deny field
cat -n internal/core/domain/security_group.go | grep -A 15 "type SecurityRule"Repository: PoyrazK/thecloud
Length of output: 706
🏁 Script executed:
# Check the complete SecurityRule and SecurityGroup structures
cat -n internal/core/domain/security_group.goRepository: PoyrazK/thecloud
Length of output: 1349
🏁 Script executed:
# Check if there are any other uses of Actions in the codebase
rg "Actions" --type go | grep -v "test\|Actions string"Repository: PoyrazK/thecloud
Length of output: 267
Critical: Port range handling only uses the minimum port, breaking port-based security rules.
The port range logic at lines 189-192 only uses PortMin even when a range is specified:
} else if rule.PortMax > rule.PortMin {
// OVS doesn't support easy ranges, would need multiple flows or mask
// Simplified for now
matchParts = append(matchParts, fmt.Sprintf("tp_dst=%d", rule.PortMin))
}This breaks any security rule specifying a port range. For example, a rule allowing ports 80-90 would only match port 80, leaving ports 81-90 unmatched. This is a security vulnerability because administrators believe a range is protected when only a single port actually matches.
Solution: Generate multiple flows (one per port in the range) or implement OpenFlow masking for ranges that align to powers of 2. For large ranges, consider requiring administrators to split them into smaller, more specific rules to avoid generating excessive flows.
Regarding the hardcoded Actions: "allow" at line 199: This is intentional rather than a bug. The SecurityRule domain model has no field for allow/deny semantics—all rules are implicitly "allow" rules. Supporting deny/drop rules would require extending the domain model first with an Action field and updating all related repositories and service layers.
| func (a *OvsAdapter) CreateVXLANTunnel(ctx context.Context, bridge string, vni int, remoteIP string) error { | ||
| if !bridgeNameRegex.MatchString(bridge) { | ||
| return errors.New(errors.InvalidInput, "invalid bridge name") | ||
| } | ||
|
|
||
| tunnelName := fmt.Sprintf("vxlan-%s", strings.ReplaceAll(remoteIP, ".", "-")) | ||
| cmd := exec.CommandContext(ctx, a.ovsPath, | ||
| "add-port", bridge, tunnelName, | ||
| "--", "set", "interface", tunnelName, | ||
| "type=vxlan", | ||
| fmt.Sprintf("options:remote_ip=%s", remoteIP), | ||
| fmt.Sprintf("options:key=%d", vni), | ||
| ) | ||
|
|
||
| if err := cmd.Run(); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to create vxlan tunnel", err) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Missing validation for remoteIP parameter.
The remoteIP is used directly in command arguments without validation. While exec.CommandContext avoids shell injection, an invalid or malicious IP string could:
- Create improperly named tunnel interfaces
- Cause OVS configuration corruption
- Bypass the tunnel name validation that
DeletePortwould enforce
🔧 Add IP address validation
+import "net"
+
func (a *OvsAdapter) CreateVXLANTunnel(ctx context.Context, bridge string, vni int, remoteIP string) error {
if !bridgeNameRegex.MatchString(bridge) {
return errors.New(errors.InvalidInput, "invalid bridge name")
}
+ if ip := net.ParseIP(remoteIP); ip == nil {
+ return errors.New(errors.InvalidInput, "invalid remote IP address")
+ }
tunnelName := fmt.Sprintf("vxlan-%s", strings.ReplaceAll(remoteIP, ".", "-"))Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @internal/repositories/ovs/adapter.go around lines 119 - 138,
CreateVXLANTunnel currently uses remoteIP directly which can produce
invalid/malicious tunnel names; validate remoteIP with net.ParseIP and return an
InvalidInput error if parsing fails, then derive tunnelName from the parsed IP
(use ip.String()) and sanitize by replacing '.' and ':' with '-' before building
the exec.CommandContext args; ensure the error path uses the same validation
error type and keep bridge validation via bridgeNameRegex unchanged so
DeletePort validation remains consistent.
| func (a *OvsAdapter) CreateVethPair(ctx context.Context, hostEnd, containerEnd string) error { | ||
| cmd := exec.CommandContext(ctx, "ip", "link", "add", hostEnd, "type", "veth", "peer", "name", containerEnd) | ||
| if err := cmd.Run(); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to create veth pair", err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func (a *OvsAdapter) AttachVethToBridge(ctx context.Context, bridge, vethEnd string) error { | ||
| if err := a.AddPort(ctx, bridge, vethEnd); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| cmd := exec.CommandContext(ctx, "ip", "link", "set", vethEnd, "up") | ||
| if err := cmd.Run(); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to set veth up", err) | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| func (a *OvsAdapter) DeleteVethPair(ctx context.Context, hostEnd string) error { | ||
| cmd := exec.CommandContext(ctx, "ip", "link", "del", hostEnd) | ||
| if err := cmd.Run(); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to delete veth pair", err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func (a *OvsAdapter) SetVethIP(ctx context.Context, vethEnd, ip, cidr string) error { | ||
| cmd := exec.CommandContext(ctx, "ip", "addr", "add", fmt.Sprintf("%s/%s", ip, cidr), "dev", vethEnd) | ||
| if err := cmd.Run(); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to set veth ip", err) | ||
| } | ||
|
|
||
| cmdUp := exec.CommandContext(ctx, "ip", "link", "set", vethEnd, "up") | ||
| if err := cmdUp.Run(); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to bring veth up", err) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Missing input validation for veth and IP configuration methods.
CreateVethPair, DeleteVethPair, and SetVethIP pass user-controlled input directly to the ip command without validation. While exec.CommandContext doesn't use shell interpolation, these methods should validate:
hostEnd/containerEnd: interface names (similar tobridgeNameRegex)ip: valid IPv4/IPv6 addresscidr: valid prefix length
This ensures consistency with the validation applied to bridge/port names elsewhere.
🔧 Add validation for veth operations
+import "net"
+
+var ifaceNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
+
func (a *OvsAdapter) CreateVethPair(ctx context.Context, hostEnd, containerEnd string) error {
+ if !ifaceNameRegex.MatchString(hostEnd) || !ifaceNameRegex.MatchString(containerEnd) {
+ return errors.New(errors.InvalidInput, "invalid veth endpoint name")
+ }
cmd := exec.CommandContext(ctx, "ip", "link", "add", hostEnd, "type", "veth", "peer", "name", containerEnd)
if err := cmd.Run(); err != nil {
return errors.Wrap(errors.Internal, "failed to create veth pair", err)
}
return nil
}
func (a *OvsAdapter) DeleteVethPair(ctx context.Context, hostEnd string) error {
+ if !ifaceNameRegex.MatchString(hostEnd) {
+ return errors.New(errors.InvalidInput, "invalid veth endpoint name")
+ }
cmd := exec.CommandContext(ctx, "ip", "link", "del", hostEnd)
func (a *OvsAdapter) SetVethIP(ctx context.Context, vethEnd, ip, cidr string) error {
+ if !ifaceNameRegex.MatchString(vethEnd) {
+ return errors.New(errors.InvalidInput, "invalid veth endpoint name")
+ }
+ if parsedIP := net.ParseIP(ip); parsedIP == nil {
+ return errors.New(errors.InvalidInput, "invalid IP address")
+ }
+ // Optionally validate cidr is a valid prefix length (0-32 for IPv4, 0-128 for IPv6)
cmd := exec.CommandContext(ctx, "ip", "addr", "add", fmt.Sprintf("%s/%s", ip, cidr), "dev", vethEnd)Committable suggestion skipped: line range outside the PR's diff.
- Change cleanDB to log and ignore errors instead of failing - Allows tests to run even if not all migrations have been applied - Fixes CI integration test failures
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/repositories/libvirt/adapter.go (1)
499-516: Add bounds checking for network pool exhaustion.
getNextNetworkRange()has no error return and doesn't validate whether the allocated network exceedspoolEnd. If the pool is exhausted (after ~101 networks given the default range), subsequent allocations will silently proceed beyond the intended range, potentially creating network conflicts.Consider modifying
getNextNetworkRange()to return an error when the pool is exhausted, and handle it here:🔧 Proposed fix
func (a *LibvirtAdapter) CreateNetwork(ctx context.Context, name string) (string, error) { // Allocate network range from pool - gateway, rangeStart, rangeEnd := a.getNextNetworkRange() + gateway, rangeStart, rangeEnd, err := a.getNextNetworkRange() + if err != nil { + return "", fmt.Errorf("failed to allocate network range: %w", err) + } // Simple NAT network xml := generateNetworkXML(name, "virbr-"+name, gateway, rangeStart, rangeEnd)And update
getNextNetworkRange()signature and implementation (see next comment).
🤖 Fix all issues with AI agents
In @cmd/api/main.go:
- Around line 332-342: The security-group routes (sgGroup) only apply
authentication via httputil.Auth(identitySvc) but lack RBAC checks; update the
route group or each handler to enforce the appropriate permission(s) by adding
the Permission middleware (e.g., httputil.Permission(...))—for example apply
VPC-level permissions such as domain.PermissionVpcCreate,
domain.PermissionVpcList, domain.PermissionVpcGet, domain.PermissionVpcDelete,
domain.PermissionVpcUpdate (or the SG-specific permissions if present like
domain.PermissionSgCreate / PermissionSgList / PermissionSgGet /
PermissionSgDelete / PermissionSgUpdate) either as
sgGroup.Use(httputil.Permission(...)) for group-wide rules or by adding the
middleware to individual routes (sgHandler.Create, List, Get, Delete, AddRule,
Attach) so only authorized roles can access each endpoint.
- Around line 136-139: The OVS adapter initialization can return nil and cause
panics because networkBackend is used regardless; either fail fast on
initialization error or supply a safe fallback. Update the block that calls
ovs.NewOvsAdapter so that on error you return/fatalize (e.g., log error and
exit) when OVS is required, or instantiate and pass a NoOpNetworkBackend
implementation instead of leaving networkBackend nil; implement
NoOpNetworkBackend to satisfy the NetworkBackend interface and return clear
errors for all methods, and ensure services and the health endpoint use the
non-nil networkBackend variable.
In @internal/repositories/libvirt/adapter.go:
- Around line 716-749: The getNextNetworkRange function must validate that the
computed network does not exceed the configured pool end before incrementing
networkCounter; modify getNextNetworkRange to compute the candidate baseIP from
a.poolStart + (a.networkCounter * 256), then compare it to a.poolEnd using
bytes.Compare (import "bytes") and if the candidate > a.poolEnd return an error
instead of allocating (so change the signature to return (gateway, rangeStart,
rangeEnd string, err error)), only increment a.networkCounter after the bounds
check passes, and update all callers to handle the new error return.
🧹 Nitpick comments (3)
internal/repositories/libvirt/adapter.go (1)
59-67: Consider making the network pool range configurable.The hardcoded IP pool (192.168.100.0 to 192.168.200.255) may not suit all deployment environments. Since the PR objectives mention configurable network settings via environment variables (DEFAULT_VPC_CIDR, NETWORK_POOL_START, NETWORK_POOL_END), consider using those here as well.
♻️ Proposed refactor to use environment variables
+import ( + "os" +) + func NewLibvirtAdapter(logger *slog.Logger, uri string) (*LibvirtAdapter, error) { + poolStartStr := os.Getenv("NETWORK_POOL_START") + if poolStartStr == "" { + poolStartStr = "192.168.100.0" + } + poolEndStr := os.Getenv("NETWORK_POOL_END") + if poolEndStr == "" { + poolEndStr = "192.168.200.255" + } + + poolStart := net.ParseIP(poolStartStr) + poolEnd := net.ParseIP(poolEndStr) + if poolStart == nil || poolEnd == nil { + return nil, fmt.Errorf("invalid network pool configuration") + } + return &LibvirtAdapter{ conn: l, logger: logger, uri: uri, portMappings: make(map[string]map[string]int), networkCounter: 0, - poolStart: net.ParseIP("192.168.100.0"), - poolEnd: net.ParseIP("192.168.200.255"), + poolStart: poolStart, + poolEnd: poolEnd, }, nil }internal/platform/config.go (1)
30-32: Consider validating network configuration formats at initialization.The configuration values are used without format validation. Invalid CIDR blocks or IP addresses will only fail later during VPC or network operations. Adding validation here provides fail-fast behavior and clearer error messages at startup.
♻️ Suggested validation approach
+import ( + "fmt" + "net" +) + func NewConfig() (*Config, error) { _ = godotenv.Load() // Ignore error if .env doesn't exist + defaultCIDR := getEnv("DEFAULT_VPC_CIDR", "10.0.0.0/16") + if _, _, err := net.ParseCIDR(defaultCIDR); err != nil { + return nil, fmt.Errorf("invalid DEFAULT_VPC_CIDR: %w", err) + } + + poolStart := getEnv("NETWORK_POOL_START", "192.168.100.0") + if net.ParseIP(poolStart) == nil { + return nil, fmt.Errorf("invalid NETWORK_POOL_START: not a valid IP address") + } + + poolEnd := getEnv("NETWORK_POOL_END", "192.168.200.255") + if net.ParseIP(poolEnd) == nil { + return nil, fmt.Errorf("invalid NETWORK_POOL_END: not a valid IP address") + } + return &Config{ Port: getEnv("PORT", "8080"), DatabaseURL: getEnv("DATABASE_URL", "postgres://cloud:cloud@localhost:5433/thecloud"), Environment: getEnv("APP_ENV", "development"), SecretsEncryptionKey: os.Getenv("SECRETS_ENCRYPTION_KEY"), ComputeBackend: getEnv("COMPUTE_BACKEND", "docker"), - DefaultVPCCIDR: getEnv("DEFAULT_VPC_CIDR", "10.0.0.0/16"), - NetworkPoolStart: getEnv("NETWORK_POOL_START", "192.168.100.0"), - NetworkPoolEnd: getEnv("NETWORK_POOL_END", "192.168.200.255"), + DefaultVPCCIDR: defaultCIDR, + NetworkPoolStart: poolStart, + NetworkPoolEnd: poolEnd, }, nil }internal/core/services/vpc.go (1)
37-40: Validate CIDR format before VPC creation.The CIDR block is used without format validation. Invalid CIDR notation (e.g., "not-a-cidr", "10.0.0.0/33") would only fail when the VPC is used for subnet allocation or network operations, resulting in unclear error messages.
♻️ Add CIDR validation
+import ( + "net" +) + func (s *VpcService) CreateVPC(ctx context.Context, name, cidrBlock string) (*domain.VPC, error) { if cidrBlock == "" { cidrBlock = s.defaultCIDR } + + // Validate CIDR format + if _, _, err := net.ParseCIDR(cidrBlock); err != nil { + return nil, errors.New(errors.BadRequest, "invalid CIDR block format") + } userID := appcontext.UserIDFromContext(ctx) vpcID := uuid.New()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.env.examplecmd/api/main.gointernal/core/services/vpc.gointernal/core/services/vpc_test.gointernal/platform/config.gointernal/repositories/libvirt/adapter.go
🧰 Additional context used
🧬 Code graph analysis (2)
internal/core/services/vpc.go (5)
internal/core/ports/vpc.go (1)
VpcRepository(10-16)internal/core/ports/network.go (1)
NetworkBackend(16-44)internal/core/domain/vpc.go (1)
VPC(9-19)internal/core/context/context.go (1)
UserIDFromContext(22-28)internal/errors/errors.go (3)
Wrap(61-63)Internal(13-13)Error(38-43)
cmd/api/main.go (11)
internal/repositories/postgres/security_group_repo.go (1)
NewSecurityGroupRepository(18-20)internal/repositories/postgres/subnet_repo.go (1)
NewSubnetRepository(19-21)internal/repositories/ovs/adapter.go (1)
NewOvsAdapter(26-42)internal/core/services/vpc.go (1)
NewVpcService(24-35)internal/core/services/subnet.go (1)
NewSubnetService(24-31)internal/core/services/instance.go (1)
NewInstanceService(35-47)internal/core/services/security_group.go (1)
NewSecurityGroupService(25-39)internal/handlers/security_group_handler.go (1)
NewSecurityGroupHandler(17-19)internal/handlers/subnet_handler.go (1)
NewSubnetHandler(16-18)internal/core/domain/rbac.go (3)
Permission(7-7)PermissionVpcUpdate(20-20)PermissionVpcRead(19-19)pkg/httputil/auth.go (2)
Permission(49-66)Auth(13-36)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 25-25: [UnorderedKey] The NETWORK_POOL_END key should go before the NETWORK_POOL_START key
(UnorderedKey)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (10)
internal/repositories/libvirt/adapter.go (1)
34-38: LGTM! Clean field additions for network pool management.The new fields appropriately support dynamic network allocation with poolStart/poolEnd defining the range and networkCounter tracking the next available /24 subnet.
.env.example (1)
19-28: Network configuration defaults look good.The default values are appropriate for development. The static analysis warning about key ordering (NETWORK_POOL_END before NETWORK_POOL_START) is a style preference for alphabetical ordering and can be safely ignored.
internal/platform/config.go (1)
16-18: New configuration fields are well-structured.The addition of
DefaultVPCCIDR,NetworkPoolStart, andNetworkPoolEndaligns with the OVS networking integration objectives.internal/core/services/vpc_test.go (4)
17-45: Test coverage for VPC creation looks good.The test properly validates the updated CreateVPC signature with CIDR parameter, NetworkBackend usage, and bridge-based identifiers.
47-66: Rollback test properly validates cleanup behavior.The test ensures that bridge deletion occurs when database persistence fails, which is critical for avoiding resource leaks.
68-93: Delete test correctly validates bridge cleanup.
95-132: List and Get tests properly updated for new service constructor.internal/core/services/vpc.go (2)
24-35: Constructor properly handles default CIDR fallback.The fallback to "10.0.0.0/16" ensures the service always has a valid default when initialization provides an empty string.
100-123: DeleteVPC implementation handles cleanup correctly.The deletion flow properly removes the OVS bridge before database deletion, with appropriate error handling and logging.
cmd/api/main.go (1)
141-147: Service initialization properly wires network backend and configuration.The services are correctly initialized with the network backend and default CIDR from configuration, assuming the critical nil pointer issue (lines 136-139) is addressed.
| // getNextNetworkRange allocates the next /24 network from the pool | ||
| func (a *LibvirtAdapter) getNextNetworkRange() (gateway, rangeStart, rangeEnd string) { | ||
| a.mu.Lock() | ||
| defer a.mu.Unlock() | ||
|
|
||
| // Calculate base IP: poolStart + (networkCounter * 256) | ||
| baseIP := make(net.IP, len(a.poolStart)) | ||
| copy(baseIP, a.poolStart) | ||
|
|
||
| // Add offset for this network (each network gets a /24) | ||
| offset := a.networkCounter * 256 | ||
| for i := len(baseIP) - 1; i >= 0 && offset > 0; i-- { | ||
| sum := int(baseIP[i]) + offset | ||
| baseIP[i] = byte(sum % 256) | ||
| offset = sum / 256 | ||
| } | ||
|
|
||
| a.networkCounter++ | ||
|
|
||
| // Gateway is .1, DHCP range is .2 to .254 | ||
| gw := make(net.IP, len(baseIP)) | ||
| copy(gw, baseIP) | ||
| gw[len(gw)-1] = 1 | ||
|
|
||
| start := make(net.IP, len(baseIP)) | ||
| copy(start, baseIP) | ||
| start[len(start)-1] = 2 | ||
|
|
||
| end := make(net.IP, len(baseIP)) | ||
| copy(end, baseIP) | ||
| end[len(end)-1] = 254 | ||
|
|
||
| return gw.String(), start.String(), end.String() | ||
| } |
There was a problem hiding this comment.
Critical: Add bounds validation to prevent allocation beyond pool range.
The function increments networkCounter unconditionally without checking if the resulting network exceeds poolEnd. After approximately 101 networks (given the default range 192.168.100.0 to 192.168.200.255), allocations will exceed the intended pool boundary, potentially causing IP conflicts with external networks.
🛡️ Proposed fix with bounds checking
-func (a *LibvirtAdapter) getNextNetworkRange() (gateway, rangeStart, rangeEnd string) {
+func (a *LibvirtAdapter) getNextNetworkRange() (gateway, rangeStart, rangeEnd string, err error) {
a.mu.Lock()
defer a.mu.Unlock()
// Calculate base IP: poolStart + (networkCounter * 256)
baseIP := make(net.IP, len(a.poolStart))
copy(baseIP, a.poolStart)
// Add offset for this network (each network gets a /24)
offset := a.networkCounter * 256
for i := len(baseIP) - 1; i >= 0 && offset > 0; i-- {
sum := int(baseIP[i]) + offset
baseIP[i] = byte(sum % 256)
offset = sum / 256
}
+ // Validate the allocated network is within pool bounds
+ // Check if baseIP + 255 exceeds poolEnd
+ endOfNetwork := make(net.IP, len(baseIP))
+ copy(endOfNetwork, baseIP)
+ endOfNetwork[len(endOfNetwork)-1] = 255
+
+ if bytes.Compare(baseIP, a.poolEnd) > 0 || bytes.Compare(endOfNetwork, a.poolEnd) > 0 {
+ return "", "", "", fmt.Errorf("network pool exhausted: cannot allocate more /24 networks")
+ }
+
a.networkCounter++
// Gateway is .1, DHCP range is .2 to .254
gw := make(net.IP, len(baseIP))
copy(gw, baseIP)
gw[len(gw)-1] = 1
start := make(net.IP, len(baseIP))
copy(start, baseIP)
start[len(start)-1] = 2
end := make(net.IP, len(baseIP))
copy(end, baseIP)
end[len(end)-1] = 254
- return gw.String(), start.String(), end.String()
+ return gw.String(), start.String(), end.String(), nil
}Note: You'll need to import "bytes" for the bytes.Compare function.
🤖 Prompt for AI Agents
In @internal/repositories/libvirt/adapter.go around lines 716 - 749, The
getNextNetworkRange function must validate that the computed network does not
exceed the configured pool end before incrementing networkCounter; modify
getNextNetworkRange to compute the candidate baseIP from a.poolStart +
(a.networkCounter * 256), then compare it to a.poolEnd using bytes.Compare
(import "bytes") and if the candidate > a.poolEnd return an error instead of
allocating (so change the signature to return (gateway, rangeStart, rangeEnd
string, err error)), only increment a.networkCounter after the bounds check
passes, and update all callers to handle the new error return.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/repositories/postgres/helper_test.go (1)
83-90: Approve with suggestion: Consider checking for specific error codes.The change successfully makes cleanup resilient to missing tables, fixing the CI integration test failures as intended. However, the implementation ignores all errors rather than specifically checking for the "undefined_table" error (42P01) mentioned in the comment.
♻️ Refactor to check specific PostgreSQL error codes
Consider checking for specific error codes to avoid hiding real errors like permission issues or connection problems:
+import ( + "errors" + "github.com/jackc/pgx/v5/pgconn" +) + func cleanDB(t *testing.T, db *pgxpool.Pool) { ctx := context.Background() queries := []string{ // ... existing queries ... } for _, q := range queries { _, err := db.Exec(ctx, q) - // Ignore errors if table doesn't exist (42P01 error code) - // This allows tests to run even if not all migrations have been applied if err != nil { - t.Logf("Cleanup query failed (ignoring): %s - %v", q, err) + var pgErr *pgconn.PgError + // Only ignore "undefined_table" errors (42P01) from missing migrations + if errors.As(err, &pgErr) && pgErr.Code == "42P01" { + t.Logf("Cleanup query skipped (table doesn't exist): %s", q) + } else { + // Log but still continue for other errors to maintain resilience + t.Logf("Cleanup query failed (ignoring): %s - %v", q, err) + } } } }This provides better visibility into which errors are expected (missing tables) versus unexpected (permissions, connections).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/repositories/postgres/helper_test.go
🔇 Additional comments (1)
internal/repositories/postgres/helper_test.go (1)
28-31: LGTM - Better diagnostics for test skips.Including the underlying error in the skip message provides helpful diagnostic information when the database is unavailable.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 109 out of 110 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| echo "--- Configuring Permissions ---" | ||
| # Allow current user to manage OVS without sudo | ||
| USER_NAME=$(whoami) | ||
| # Note: On some systems, adding to the 'ovs' or 'openvswitch' group works if it exists. | ||
| # Otherwise, we use ACLs or set permissions on the socket. | ||
| sudo setfacl -m u:$USER_NAME:rw /var/run/openvswitch/db.sock |
There was a problem hiding this comment.
The setup-ovs.sh script uses setfacl to grant permissions on the OVS socket but doesn't check if the 'ovs' or 'openvswitch' group exists first. On some systems, adding the user to these groups is the preferred approach. Consider checking for group existence and using that method when available, or document the requirement for ACL support.
| // 2. Create OVS bridge | ||
| bridgeName := fmt.Sprintf("br-vpc-%s", vpcID.String()[:8]) | ||
| if err := s.network.CreateBridge(ctx, bridgeName, vxlanID); err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to create OVS bridge", err) | ||
| } |
There was a problem hiding this comment.
The VPC service now uses a NetworkBackend interface but the CreateBridge call doesn't validate if the bridge already exists. If this method is called multiple times with the same bridge name (e.g., during a retry), OVS will fail. Consider checking if the bridge exists before attempting to create it, or handle the error gracefully.
| func (s *InstanceService) isValidHostIP(ip net.IP, n *net.IPNet) bool { | ||
| // Simple check: not network address and not broadcast address (if /30 or larger) | ||
| // For simplicity in this demo, we just ensure it's in range and not gateway | ||
| return n.Contains(ip) | ||
| } |
There was a problem hiding this comment.
The IP allocation logic doesn't handle the broadcast address. For a /24 subnet, the last IP (e.g., 10.0.1.255) is the broadcast address and should not be allocated. The isValidHostIP function should exclude both the network address and broadcast address from allocation.
| func (s *SecurityGroupService) syncGroupFlows(ctx context.Context, sg *domain.SecurityGroup) error { | ||
| vpc, err := s.vpcRepo.GetByID(ctx, sg.VPCID) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // For each rule, generate an OVS flow | ||
| for _, rule := range sg.Rules { | ||
| flow := s.translateToFlow(rule) | ||
| if err := s.network.AddFlowRule(ctx, vpc.NetworkID, flow); err != nil { | ||
| return err | ||
| } | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
The syncGroupFlows function adds flow rules but never removes old ones. If a security rule is updated or removed, the old OpenFlow rules will remain active on the bridge, potentially allowing unwanted traffic. The function should delete existing flows for the group before adding new ones.
| ALTER TABLE instances ADD COLUMN subnet_id UUID REFERENCES subnets(id); | ||
| ALTER TABLE instances ADD COLUMN private_ip INET; | ||
| ALTER TABLE instances ADD COLUMN ovs_port VARCHAR(64); | ||
|
|
||
| CREATE INDEX idx_instances_subnet ON instances(subnet_id); | ||
|
|
||
| -- +goose Down | ||
| DROP INDEX IF EXISTS idx_instances_subnet; | ||
| ALTER TABLE instances DROP COLUMN IF EXISTS subnet_id; | ||
| ALTER TABLE instances DROP COLUMN IF EXISTS private_ip; | ||
| ALTER TABLE instances DROP COLUMN IF EXISTS ovs_port; |
There was a problem hiding this comment.
The migration file adds new columns to the instances table but doesn't set default values for existing rows. If there are existing instances in the database, they will have NULL values for subnet_id, private_ip, and ovs_port. Consider adding default values or a data migration step to handle existing instances.
| @@ -28,13 +28,17 @@ services: | |||
| condition: service_healthy | |||
| ports: | |||
| - "${PORT:-8080}:${PORT:-8080}" | |||
There was a problem hiding this comment.
The Docker Compose file adds NET_ADMIN and NET_RAW capabilities but doesn't document why these are needed or the security implications. These are privileged capabilities that increase the container's attack surface. Add a comment explaining that these are required for OVS networking operations.
| - "${PORT:-8080}:${PORT:-8080}" | |
| - "${PORT:-8080}:${PORT:-8080}" | |
| # NOTE: NET_ADMIN and NET_RAW are privileged capabilities required for | |
| # Open vSwitch (OVS) networking operations (see /var/run/openvswitch mount). | |
| # These increase the container's network attack surface and should only be | |
| # enabled when OVS-based networking is explicitly needed. |
| // 1. Generate unique VNI (for demo purposes we use a hash based int) | ||
| vxlanID := int(vpcID[0]) + 100 |
There was a problem hiding this comment.
The VXLAN ID generation using int(vpcID[0]) + 100 is simplistic and may cause collisions. For example, VPCs with IDs starting with the same byte will get the same VXLAN ID. Consider using a hash function or maintaining a counter/registry to ensure unique VXLAN IDs across VPCs.
| return ports.FlowRule{ | ||
| Priority: rule.Priority, | ||
| Match: strings.Join(matchParts, ","), | ||
| Actions: "allow", // Simplified | ||
| } |
There was a problem hiding this comment.
The security group flow translation uses "allow" as an action, but this is not a valid OpenFlow action. Standard OVS actions are "output", "drop", "normal", etc. The flows will not work as expected. Consider using "normal" or a specific output port, or document that this is a placeholder for future implementation.
| func NewOvsAdapter(logger *slog.Logger) (*OvsAdapter, error) { | ||
| ovsctl, err := exec.LookPath("ovs-vsctl") | ||
| if err != nil { | ||
| return nil, fmt.Errorf("ovs-vsctl not found: %w", err) | ||
| } | ||
|
|
||
| ofctl, err := exec.LookPath("ovs-ofctl") | ||
| if err != nil { | ||
| return nil, fmt.Errorf("ovs-ofctl not found: %w", err) | ||
| } | ||
|
|
||
| return &OvsAdapter{ | ||
| ovsPath: ovsctl, | ||
| ofctlPath: ofctl, | ||
| logger: logger, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
The OVS adapter calls exec.LookPath to find ovs-vsctl and ovs-ofctl at initialization. If OVS is installed after the application starts, or if the PATH changes, the adapter will not find the tools. Consider documenting that OVS must be installed before starting the application, or implement periodic health checks that update the paths.
1. Network Backend Nil Safety: - Add NoopNetworkAdapter as fallback when OVS fails to initialize - Update /health/ovs endpoint with nil checks and degraded status - Prevents panics from nil networkBackend 2. Migration Completeness: - Add missing .down.sql files for migrations 031 and 032 - Separate embedded down migration from 032 .up.sql file - All migrations now have proper Up/Down pairs for Goose 3. Security Group Ownership Validation: - Add ownership checks in AddInstanceToGroup and RemoveInstanceFromGroup - Verify both instance and security group belong to caller - Use transactions to prevent race conditions - Return Forbidden error for unauthorized access attempts
- Wrap deferred tx.Rollback calls in anonymous functions - Explicitly ignore error with underscore assignment - Fixes errcheck linter warnings
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 112 out of 113 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix Postgres INET/CIDR type handling with NULLIF and explicit casting - Resolve clock skew flakiness in queue repository tests by using DB server time - Ensure isolation tests pass by allowing NULL network fields in tests - All integration tests now passing consistently
- Added package documentation comments - Added struct and method docstrings for VpcService, InstanceService - Added documentation for InstanceRepository, VpcRepository, and PostgresQueueRepository - Improved code discoverability for IDEs and godoc
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 113 out of 114 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/repositories/postgres/queue_repo.go (3)
70-78: Missingrows.Err()check after iteration.After the
for rows.Next()loop, you should checkrows.Err()to catch any errors that occurred during iteration. Without this, iteration errors (e.g., network issues mid-fetch) are silently ignored.Proposed fix
for rows.Next() { q := &domain.Queue{} if err := rows.Scan(&q.ID, &q.UserID, &q.Name, &q.ARN, &q.VisibilityTimeout, &q.RetentionDays, &q.MaxMessageSize, &q.Status, &q.CreatedAt, &q.UpdatedAt); err != nil { return nil, err } queues = append(queues, q) } + if err := rows.Err(); err != nil { + return nil, err + } return queues, nil
89-108: Clock skew condition is always true—logic bug.The condition on line 99 checks if
m.VisibleAtis within one second oftime.Now(), butm.VisibleAtis set totime.Now()just 5 lines earlier (line 94). This means the condition will always be true, making theelsebranch unreachable dead code.Additionally, when using
NOW(), the returned*domain.Messagestill contains Go'stime.Now()values rather than the actual database server timestamps, which could cause inconsistency.Suggested simplification (always use DB time)
func (r *PostgresQueueRepository) SendMessage(ctx context.Context, queueID uuid.UUID, body string) (*domain.Message, error) { m := &domain.Message{ ID: uuid.New(), QueueID: queueID, Body: body, - VisibleAt: time.Now(), - CreatedAt: time.Now(), } - query := `INSERT INTO queue_messages (id, queue_id, body, visible_at, created_at) VALUES ($1, $2, $3, $4, $5)` - var err error - if time.Since(m.VisibleAt) < time.Second && time.Until(m.VisibleAt) < time.Second { - query = `INSERT INTO queue_messages (id, queue_id, body, visible_at, created_at) VALUES ($1, $2, $3, NOW(), NOW())` - _, err = r.db.Exec(ctx, query, m.ID, m.QueueID, m.Body) - } else { - _, err = r.db.Exec(ctx, query, m.ID, m.QueueID, m.Body, m.VisibleAt, m.CreatedAt) - } + query := `INSERT INTO queue_messages (id, queue_id, body, visible_at, created_at) + VALUES ($1, $2, $3, NOW(), NOW()) + RETURNING visible_at, created_at` + err := r.db.QueryRow(ctx, query, m.ID, m.QueueID, m.Body).Scan(&m.VisibleAt, &m.CreatedAt) if err != nil { return nil, err } return m, nil }
185-197:GetQueueStatsis unused code and not part of theQueueRepositoryinterface.The method is defined in
PostgresQueueRepositoryand has a corresponding mock, but is never called anywhere in the codebase and is not exposed through theQueueRepositoryinterface (seeinternal/core/ports/queue.golines 16-28).Either remove this method if it's not needed, or if it's intended for future use, add it to the interface:
GetQueueStats(ctx context.Context, queueID uuid.UUID) (visible int, inFlight int, err error)
🤖 Fix all issues with AI agents
In @internal/core/services/instance.go:
- Around line 146-173: The post-launch OVS wiring silently swallows errors and
calls the host "ip" command directly; update the code that calls
network.CreateVethPair, network.AttachVethToBridge, and network.SetVethIP (the
block guarded by inst.OvsPort and functions referencing s.vpcRepo.GetByID and
s.subnetRepo.GetByID) to check and handle returned errors (log via process
logger and return an error to mark the instance network setup as failed) instead
of using `_ =`. Add a BringUpInterface(ctx, ifaceName) method to the
NetworkBackend interface, implement it in backends, and replace
exec.CommandContext("ip", "link", "set", ...) with network.BringUpInterface(ctx,
vethContainer); ensure all network setup failures cause the instance launch path
to surface the error so instances are not reported running without connectivity.
In @internal/core/services/vpc.go:
- Around line 52-53: The current VXLAN ID generation uses only the first byte of
vpcID (vxlanID := int(vpcID[0]) + 100) which restricts VNI space and risks
collisions; replace it with a proper 24-bit VNI derivation or allocator: derive
vxlanID by hashing more of the UUID (e.g., use the first 3 bytes or a hash like
CRC32/SHA1 and mask/mod to 24 bits) and ensure the result is within the valid
VXLAN range (0..16777215) and offset as needed, or plug in a
counter/registry-based allocator to guarantee uniqueness; update the vxlanID
variable assignment and any docs/comments accordingly.
In @internal/repositories/postgres/security_group_repo.go:
- Around line 224-246: ListInstanceGroups currently returns groups for any
instance ID without enforcing ownership; change the signature to
ListInstanceGroups(ctx context.Context, instanceID uuid.UUID, userID uuid.UUID)
and modify the SQL to enforce ownership by joining the instances table and/or
checking group owner (e.g. add "JOIN instances i ON isg.instance_id = i.id" and
"WHERE isg.instance_id = $1 AND i.user_id = $2 AND sg.user_id = $2" or at
minimum "AND sg.user_id = $2"), pass instanceID and userID into r.db.Query, and
update all callers to supply the caller's userID so only groups owned by that
user (and attached to an instance owned by that user) are returned.
🧹 Nitpick comments (3)
internal/core/services/vpc.go (1)
109-127: DeleteVPC order may leave inconsistent state on DB failure.If
DeleteBridgesucceeds butrepo.Deletefails, the VPC record remains in DB but the bridge is gone. Consider reversing the order: delete from DB first (or use a transaction), then delete the bridge.♻️ Proposed fix to improve deletion ordering
func (s *VpcService) DeleteVPC(ctx context.Context, idOrName string) error { vpc, err := s.GetVPC(ctx, idOrName) if err != nil { return err } - // 1. Remove OVS bridge - if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil { - s.logger.Error("failed to remove OVS bridge", "bridge", vpc.NetworkID, "error", err) - return errors.Wrap(errors.Internal, "failed to remove OVS bridge", err) - } - s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID) - - // 2. Delete from DB + // 1. Delete from DB first (authoritative state) if err := s.repo.Delete(ctx, vpc.ID); err != nil { return errors.Wrap(errors.Internal, "failed to delete VPC from database", err) } + // 2. Remove OVS bridge (best effort after DB success) + if err := s.network.DeleteBridge(ctx, vpc.NetworkID); err != nil { + s.logger.Error("failed to remove OVS bridge after DB deletion", "bridge", vpc.NetworkID, "error", err) + // Log but don't fail - DB is source of truth + } + s.logger.Info("vpc bridge removed", "bridge", vpc.NetworkID) + _ = s.auditSvc.Log(ctx, vpc.UserID, "vpc.delete", "vpc", vpc.ID.String(), map[string]interface{}{ "name": vpc.Name, }) return nil }internal/core/services/instance.go (1)
498-540: IP allocation logic has edge cases and potential infinite loop risk.Issues:
- The loop increments IP bytes but the overflow handling at line 523-528 may not correctly handle all cases (e.g., 255.255.255.255 increment).
isValidHostIP(lines 542-545) only checksContains()but the comment says it should exclude network and broadcast addresses.- No timeout/limit on the loop - a full subnet could cause long iteration.
♻️ Proposed fix for isValidHostIP
func (s *InstanceService) isValidHostIP(ip net.IP, n *net.IPNet) bool { - // Simple check: not network address and not broadcast address (if /30 or larger) - // For simplicity in this demo, we just ensure it's in range and not gateway - return n.Contains(ip) + if !n.Contains(ip) { + return false + } + // Check for network address (all host bits zero) + if ip.Equal(n.IP) { + return false + } + // Check for broadcast address (all host bits one) - IPv4 only + if ip4 := ip.To4(); ip4 != nil { + broadcast := make(net.IP, len(n.IP)) + for i := range broadcast { + broadcast[i] = n.IP[i] | ^n.Mask[i] + } + if ip.Equal(broadcast) { + return false + } + } + return true }internal/repositories/postgres/queue_repo.go (1)
134-145: Consider using database time for visibility timeout calculation.The visibility timeout is calculated using Go's
time.Now()(line 135, 142), while message selection uses the database'sNOW()(line 124). Clock skew between the application server and database server could cause messages to become visible earlier or later than expected.For consistency, consider calculating
visible_atin the UPDATE query:- updateQuery := `UPDATE queue_messages SET receipt_handle = $1, visible_at = $2, received_count = received_count + 1 WHERE id = $3` + updateQuery := `UPDATE queue_messages SET receipt_handle = $1, visible_at = NOW() + ($2 || ' seconds')::interval, received_count = received_count + 1 WHERE id = $3` for _, m := range messages { - _, err := tx.Exec(ctx, updateQuery, m.ReceiptHandle, m.VisibleAt, m.ID) + _, err := tx.Exec(ctx, updateQuery, m.ReceiptHandle, visibilityTimeout, m.ID)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
cmd/api/main.gointernal/core/services/instance.gointernal/core/services/vpc.gointernal/repositories/noop/network_adapter.gointernal/repositories/postgres/instance_repo.gointernal/repositories/postgres/migrations/031_create_subnets.down.sqlinternal/repositories/postgres/migrations/032_add_instance_network_fields.down.sqlinternal/repositories/postgres/migrations/032_add_instance_network_fields.up.sqlinternal/repositories/postgres/queue_repo.gointernal/repositories/postgres/security_group_repo.gointernal/repositories/postgres/subnet_repo.gointernal/repositories/postgres/vpc_repo.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/repositories/postgres/subnet_repo.go
🧰 Additional context used
🧬 Code graph analysis (6)
internal/core/services/instance.go (4)
internal/core/ports/instance.go (1)
InstanceRepository(11-19)internal/core/ports/compute.go (1)
ComputeBackend(9-37)internal/core/ports/network.go (1)
NetworkBackend(16-44)internal/core/domain/instance.go (1)
Instance(25-40)
internal/repositories/postgres/instance_repo.go (5)
internal/core/ports/instance.go (1)
InstanceRepository(11-19)pkg/sdk/compute.go (1)
Instance(8-21)internal/core/domain/instance.go (1)
Instance(25-40)internal/errors/errors.go (1)
Wrap(61-63)internal/core/context/context.go (1)
UserIDFromContext(22-28)
internal/repositories/postgres/security_group_repo.go (2)
internal/errors/errors.go (5)
Wrap(61-63)Internal(13-13)New(57-59)NotFound(11-11)Forbidden(16-16)internal/core/context/context.go (1)
UserIDFromContext(22-28)
internal/repositories/noop/network_adapter.go (1)
internal/core/ports/network.go (1)
FlowRule(8-13)
internal/repositories/postgres/vpc_repo.go (5)
internal/core/ports/vpc.go (1)
VpcRepository(10-16)pkg/sdk/vpc.go (1)
VPC(8-17)internal/core/domain/vpc.go (1)
VPC(9-19)internal/errors/errors.go (2)
Wrap(61-63)Internal(13-13)internal/core/context/context.go (1)
UserIDFromContext(22-28)
internal/repositories/postgres/queue_repo.go (1)
internal/core/ports/queue.go (1)
QueueRepository(16-28)
🔇 Additional comments (29)
internal/repositories/postgres/migrations/031_create_subnets.down.sql (1)
1-4: LGTM! Clean down-migration following best practices.The migration correctly reverses the corresponding up-migration by:
- Using the proper
-- +goose Downmarker- Dropping indices before the table
- Using
IF EXISTSfor idempotent executionThe absence of
CASCADEon theDROP TABLEstatement is appropriate as it provides a safety check—if other tables have foreign keys referencingsubnets, the migration will fail rather than cascade deletions, forcing explicit dependency handling in the migration sequence.internal/repositories/postgres/vpc_repo.go (3)
15-36: LGTM! Proper handling of nullable CIDR field.The Create method correctly uses
NULLIF($4, '')::cidrto handle empty string conversion to NULL for PostgreSQL's CIDR type. The docstrings are clear, and parameter ordering is correct.
38-70: LGTM! Consistent user scoping and NULL handling.Both
GetByIDandGetByNameproperly:
- Filter by authenticated user ID from context
- Use
COALESCE(cidr_block::text, '')to handle NULL CIDR values- Cast CIDR to text before applying COALESCE
72-108: LGTM! Complete CRUD operations with proper scoping.The
ListandDeletemethods maintain consistency:
- User-scoped filtering throughout
- Proper error handling with NotFound checks
- Consistent use of error wrapping patterns
internal/repositories/noop/network_adapter.go (2)
1-35: LGTM! Clean fallback implementation.The noop adapter provides a safe fallback when OVS is unavailable:
- Clear package documentation
- Warning logs help identify when the fallback is in use
- Returns safe zero values (empty slice for
ListBridges, nil for errors)
36-96: LGTM! Complete NetworkBackend implementation.All interface methods are properly implemented:
- Consistent warning logs for mutating operations
- Safe return values for queries (empty slices)
Ping()returns nil (healthy) for a noop adapterType()returns "noop" for identificationThis provides a complete fallback when OVS is unavailable.
internal/repositories/postgres/migrations/032_add_instance_network_fields.down.sql (1)
1-5: LGTM! Proper down migration with idempotency guards.The down migration correctly:
- Drops operations in reverse order (index first, then columns)
- Uses
IF EXISTSguards for idempotency- Matches the up migration changes
internal/repositories/postgres/instance_repo.go (5)
26-40: LGTM! Correct parameter ordering and NULL handling.The Create method properly:
- Uses
NULLIF($10, '')::inetto convert empty strings to NULL for the INET type- Aligns all 14 query parameters with the Exec arguments
- Maintains consistent error handling
42-82: LGTM! Consistent user scoping and NULL handling.Both
GetByIDandGetByName:
- Properly filter by authenticated user ID
- Use
COALESCEwith appropriate type casting for nullable fields- Match SELECT column order with Scan parameters
84-111: LGTM! Complete list implementation with proper handling.The List method:
- Applies user-scoped filtering
- Orders by creation time (newest first)
- Handles nullable fields consistently
- Properly iterates rows with deferred close
113-134: LGTM! Proper optimistic locking and parameter ordering.The Update method correctly:
- Implements optimistic locking via the version field
- Uses
NULLIF($8, '')::inetfor private_ip handling- Aligns all 12 parameters correctly
- Checks
RowsAffected()for conflict detection- Updates the in-memory version and timestamp after success
136-163: LGTM! Secure subnet-scoped query.The new
ListBySubnetmethod properly:
- Filters by both subnet ID and authenticated user ID for security
- Uses consistent NULL handling with COALESCE
- Maintains the same structure as the List method
- Orders results by creation time
internal/repositories/postgres/migrations/032_add_instance_network_fields.up.sql (1)
1-6: No issue found. Migration 031 (031_create_subnets.up.sql) correctly creates the subnets table before migration 032 references it. The foreign key constraint on line 2 is safely dependent on a table created in the preceding migration.cmd/api/main.go (2)
30-31: OVS adapter initialization with noop fallback looks good.The fallback to noop adapter when OVS initialization fails ensures the API can still start in degraded mode. The warning log provides visibility into the fallback scenario.
Also applies to: 137-144
271-288: OVS health check endpoint is well-structured.The endpoint correctly handles three states: unavailable (nil backend), degraded (noop), and healthy/unhealthy (OVS). The
Type()method usage for backend detection is clean.internal/core/services/vpc.go (2)
1-1: VpcService refactored for network backend integration.The struct and constructor changes properly inject the network backend and default CIDR. The fallback to
"10.0.0.0/16"is sensible.Also applies to: 17-25, 27-40
77-84: Good rollback logic for bridge cleanup on DB failure.The code correctly cleans up the OVS bridge if the database insert fails, preventing orphaned network resources.
internal/core/services/instance.go (2)
23-36: InstanceService updated with subnet and network backend dependencies.The constructor and struct changes properly integrate the new dependencies for subnet-aware networking.
Also applies to: 38-51
95-111: Pre-launch OVS networking setup is well-structured.The subnet validation, IP allocation, and veth port naming follow a logical flow. Good use of error wrapping with appropriate error types.
internal/repositories/postgres/security_group_repo.go (4)
14-32: Repository structure and Create method look good.The struct initialization and Create method properly handle the security group insertion with appropriate error wrapping.
34-76: GetByID and GetByName properly enforce user ownership.Both methods correctly filter by
user_idfrom context and attach rules via the helper method.
124-172: AddInstanceToGroup and RemoveInstanceFromGroup have proper transaction handling and ownership checks.Both methods correctly:
- Use transactions for atomicity
- Verify both instance and security group ownership
- Return appropriate errors (Forbidden, NotFound)
- Handle deferred rollback safely
Also applies to: 174-222
248-265: getRulesForGroup helper is correctly implemented.The internal method properly fetches rules ordered by priority. As a private helper called after ownership is already verified, it doesn't need additional access checks.
internal/repositories/postgres/queue_repo.go (6)
15-24: LGTM!Clean struct definition with proper interface return type from the constructor. Using
pgxpool.Poolis appropriate for concurrent access.
26-35: LGTM!Standard parameterized insert with proper field mapping.
37-59: LGTM!Consistent pattern for retrieving queues with proper user scoping. Returning
(nil, nil)for not-found is idiomatic for this codebase.
164-174: LGTM!Proper validation of deletion success with clear error messaging for expired or invalid receipt handles.
176-183: LGTM!Clean implementation returning the count of purged messages.
81-85: Delete method ownership scoping is intentional and correctly enforced at the service layer.The
DeleteQueueservice method callsGetQueue(ctx, id)before invokingrepo.Delete().GetQueueextracts the user ID from context and callsrepo.GetByID(ctx, id, userID), which includes aWHERE id = $1 AND user_id = $2clause. If the queue doesn't belong to the user or doesn't exist,GetQueuereturns an error and the delete never executes. The repository-levelDeletemethod intentionally omits user scoping because it's a low-level internal method only called after ownership has been verified upstream. This is a valid architectural pattern where business logic validation happens at the service layer.
| // Security Group Routes (Protected) | ||
| sgGroup := r.Group("/security-groups") | ||
| sgGroup.Use(httputil.Auth(identitySvc)) | ||
| { | ||
| sgGroup.POST("", sgHandler.Create) | ||
| sgGroup.GET("", sgHandler.List) | ||
| sgGroup.GET("/:id", sgHandler.Get) | ||
| sgGroup.DELETE("/:id", sgHandler.Delete) | ||
| sgGroup.POST("/:id/rules", sgHandler.AddRule) | ||
| sgGroup.POST("/attach", sgHandler.Attach) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for security group permissions in domain
rg -n "PermissionSg" --type=go
# Also check for pattern of other permission definitions
rg -n "Permission.*=" --type=go | head -50Repository: PoyrazK/thecloud
Length of output: 4356
🏁 Script executed:
#!/bin/bash
# Read the complete domain/rbac.go file
cat -n internal/core/domain/rbac.go
# Check for security group routes in main.go
echo "=== Checking security group routes ==="
rg -n "security-groups|sgGroup" cmd/api/main.go -A 10 -B 2
# Check for other protected routes to verify the pattern
echo "=== Checking instance routes pattern ==="
rg -n "instances|instanceGroup" cmd/api/main.go -A 10 -B 2 | head -40Repository: PoyrazK/thecloud
Length of output: 8585
🏁 Script executed:
#!/bin/bash
# Check storage routes more closely
rg -n "storage" cmd/api/main.go -A 10 -B 2 | head -50
# Check if there are any other route groups without Permission middleware
rg -n "\.Use\(httputil\.Auth" cmd/api/main.go | wc -l
rg -n "httputil\.Permission" cmd/api/main.go | wc -lRepository: PoyrazK/thecloud
Length of output: 1658
Add RBAC permission checks to security group routes (and define missing permission constants first).
Security group routes lack httputil.Permission() middleware checks like other protected resources (instances, VPCs, etc.). However, the required permission constants (PermissionSgCreate, PermissionSgRead, PermissionSgDelete, PermissionSgUpdate) do not exist in internal/core/domain/rbac.go and must be defined before the middleware can be applied.
First, add the missing permission constants to internal/core/domain/rbac.go following the pattern of existing resources:
// Security Group Permissions
PermissionSgCreate Permission = "sg:create"
PermissionSgRead Permission = "sg:read"
PermissionSgDelete Permission = "sg:delete"
PermissionSgUpdate Permission = "sg:update"Then apply the httputil.Permission() middleware to routes at lines 352–357, similar to instance routes at lines 317–323.
Note: Storage routes (lines 364–367) and event routes (lines 374) have the same issue and should be addressed similarly.
| // 4a. OVS Post-launch plumb | ||
| if inst.OvsPort != "" && s.network != nil { | ||
| vethContainer := fmt.Sprintf("eth0-%s", inst.ID.String()[:8]) | ||
| if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err == nil { | ||
| vpc, _ := s.vpcRepo.GetByID(ctx, *inst.VpcID) | ||
| _ = s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort) | ||
|
|
||
| // Set IP on the host simulation "container" end | ||
| if inst.SubnetID != nil { | ||
| subnet, _ := s.subnetRepo.GetByID(ctx, *inst.SubnetID) | ||
| if subnet != nil { | ||
| _, ipNet, _ := net.ParseCIDR(subnet.CIDRBlock) | ||
| ones, _ := ipNet.Mask.Size() | ||
| // In a real cloud, this happens inside the container namespace. | ||
| // For this demo, we do it on the host for the 'peer' end. | ||
| _ = s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones)) | ||
|
|
||
| // Ensure the "container" end is also up on host if simulating | ||
| cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up") | ||
| _ = cmd.Run() | ||
| } | ||
| } | ||
|
|
||
| // Note: Moving veth to container namespace requires the container PID, | ||
| // which Docker shim handles in a real cloud. For this local demo, | ||
| // we just create the veth pair on host. | ||
| } | ||
| } |
There was a problem hiding this comment.
Post-launch OVS wiring silently ignores errors and uses direct shell execution.
Several issues:
- Errors from
CreateVethPair,AttachVethToBridge, andSetVethIPare silently ignored with_ =or nested in if-else. - Line 164-165 uses
exec.CommandContextdirectly, which bypasses the network backend abstraction and may fail silently. - If veth setup fails, the instance appears running but has no network connectivity.
🔧 Proposed improvements
// 4a. OVS Post-launch plumb
if inst.OvsPort != "" && s.network != nil {
vethContainer := fmt.Sprintf("eth0-%s", inst.ID.String()[:8])
- if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err == nil {
- vpc, _ := s.vpcRepo.GetByID(ctx, *inst.VpcID)
- _ = s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort)
-
- // Set IP on the host simulation "container" end
- if inst.SubnetID != nil {
- subnet, _ := s.subnetRepo.GetByID(ctx, *inst.SubnetID)
- if subnet != nil {
- _, ipNet, _ := net.ParseCIDR(subnet.CIDRBlock)
- ones, _ := ipNet.Mask.Size()
- // In a real cloud, this happens inside the container namespace.
- // For this demo, we do it on the host for the 'peer' end.
- _ = s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones))
-
- // Ensure the "container" end is also up on host if simulating
- cmd := exec.CommandContext(ctx, "ip", "link", "set", vethContainer, "up")
- _ = cmd.Run()
- }
- }
-
- // Note: Moving veth to container namespace requires the container PID,
- // which Docker shim handles in a real cloud. For this local demo,
- // we just create the veth pair on host.
+ if err := s.network.CreateVethPair(ctx, inst.OvsPort, vethContainer); err != nil {
+ s.logger.Warn("failed to create veth pair", "instance_id", inst.ID, "error", err)
+ } else {
+ vpc, err := s.vpcRepo.GetByID(ctx, *inst.VpcID)
+ if err != nil {
+ s.logger.Warn("failed to get VPC for bridge attachment", "vpc_id", inst.VpcID, "error", err)
+ } else if err := s.network.AttachVethToBridge(ctx, vpc.NetworkID, inst.OvsPort); err != nil {
+ s.logger.Warn("failed to attach veth to bridge", "bridge", vpc.NetworkID, "error", err)
+ }
+
+ if inst.SubnetID != nil {
+ subnet, err := s.subnetRepo.GetByID(ctx, *inst.SubnetID)
+ if err != nil {
+ s.logger.Warn("failed to get subnet for IP config", "subnet_id", inst.SubnetID, "error", err)
+ } else if subnet != nil {
+ _, ipNet, err := net.ParseCIDR(subnet.CIDRBlock)
+ if err != nil {
+ s.logger.Warn("failed to parse subnet CIDR", "cidr", subnet.CIDRBlock, "error", err)
+ } else {
+ ones, _ := ipNet.Mask.Size()
+ if err := s.network.SetVethIP(ctx, vethContainer, inst.PrivateIP, fmt.Sprintf("%d", ones)); err != nil {
+ s.logger.Warn("failed to set veth IP", "veth", vethContainer, "ip", inst.PrivateIP, "error", err)
+ }
+ }
+ }
+ }
}
}Consider adding a BringUpInterface(ctx, ifaceName) method to the NetworkBackend interface to avoid direct exec.Command calls.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @internal/core/services/instance.go around lines 146 - 173, The post-launch
OVS wiring silently swallows errors and calls the host "ip" command directly;
update the code that calls network.CreateVethPair, network.AttachVethToBridge,
and network.SetVethIP (the block guarded by inst.OvsPort and functions
referencing s.vpcRepo.GetByID and s.subnetRepo.GetByID) to check and handle
returned errors (log via process logger and return an error to mark the instance
network setup as failed) instead of using `_ =`. Add a BringUpInterface(ctx,
ifaceName) method to the NetworkBackend interface, implement it in backends, and
replace exec.CommandContext("ip", "link", "set", ...) with
network.BringUpInterface(ctx, vethContainer); ensure all network setup failures
cause the instance launch path to surface the error so instances are not
reported running without connectivity.
| // 1. Generate unique VNI (for demo purposes we use a hash based int) | ||
| vxlanID := int(vpcID[0]) + 100 |
There was a problem hiding this comment.
VXLAN ID generation may cause collisions.
Using only the first byte of the UUID (int(vpcID[0]) + 100) limits VXLAN IDs to range 100-355, which can cause collisions with more than 256 VPCs. VXLAN VNI supports 24-bit values (up to ~16 million).
Consider using more bytes or a counter-based approach for production:
- // 1. Generate unique VNI (for demo purposes we use a hash based int)
- vxlanID := int(vpcID[0]) + 100
+ // Generate VNI using first 3 bytes of UUID (24-bit range)
+ vxlanID := (int(vpcID[0])<<16 | int(vpcID[1])<<8 | int(vpcID[2])) & 0xFFFFFF
+ if vxlanID < 100 {
+ vxlanID += 100 // Avoid reserved range
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // 1. Generate unique VNI (for demo purposes we use a hash based int) | |
| vxlanID := int(vpcID[0]) + 100 | |
| // Generate VNI using first 3 bytes of UUID (24-bit range) | |
| vxlanID := (int(vpcID[0])<<16 | int(vpcID[1])<<8 | int(vpcID[2])) & 0xFFFFFF | |
| if vxlanID < 100 { | |
| vxlanID += 100 // Avoid reserved range | |
| } |
🤖 Prompt for AI Agents
In @internal/core/services/vpc.go around lines 52 - 53, The current VXLAN ID
generation uses only the first byte of vpcID (vxlanID := int(vpcID[0]) + 100)
which restricts VNI space and risks collisions; replace it with a proper 24-bit
VNI derivation or allocator: derive vxlanID by hashing more of the UUID (e.g.,
use the first 3 bytes or a hash like CRC32/SHA1 and mask/mod to 24 bits) and
ensure the result is within the valid VXLAN range (0..16777215) and offset as
needed, or plug in a counter/registry-based allocator to guarantee uniqueness;
update the vxlanID variable assignment and any docs/comments accordingly.
| func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error { | ||
| query := `DELETE FROM security_rules WHERE id = $1` | ||
| _, err := r.db.Exec(ctx, query, ruleID) | ||
| return err | ||
| } | ||
|
|
||
| func (r *SecurityGroupRepository) Delete(ctx context.Context, id uuid.UUID) error { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
| query := `DELETE FROM security_groups WHERE id = $1 AND user_id = $2` | ||
| _, err := r.db.Exec(ctx, query, id, userID) | ||
| return err | ||
| } |
There was a problem hiding this comment.
DeleteRule and Delete lack ownership verification.
DeleteRule deletes any rule by ID without verifying the caller owns the parent security group. Delete checks user_id but doesn't verify the row was actually deleted (could silently succeed on non-existent ID).
🔒 Proposed fix for DeleteRule
func (r *SecurityGroupRepository) DeleteRule(ctx context.Context, ruleID uuid.UUID) error {
- query := `DELETE FROM security_rules WHERE id = $1`
- _, err := r.db.Exec(ctx, query, ruleID)
- return err
+ userID := appcontext.UserIDFromContext(ctx)
+ query := `
+ DELETE FROM security_rules sr
+ USING security_groups sg
+ WHERE sr.id = $1
+ AND sr.group_id = sg.id
+ AND sg.user_id = $2
+ `
+ result, err := r.db.Exec(ctx, query, ruleID, userID)
+ if err != nil {
+ return errors.Wrap(errors.Internal, "failed to delete security rule", err)
+ }
+ if result.RowsAffected() == 0 {
+ return errors.New(errors.NotFound, "security rule not found or access denied")
+ }
+ return nil
}
func (r *SecurityGroupRepository) Delete(ctx context.Context, id uuid.UUID) error {
userID := appcontext.UserIDFromContext(ctx)
query := `DELETE FROM security_groups WHERE id = $1 AND user_id = $2`
- _, err := r.db.Exec(ctx, query, id, userID)
- return err
+ result, err := r.db.Exec(ctx, query, id, userID)
+ if err != nil {
+ return errors.Wrap(errors.Internal, "failed to delete security group", err)
+ }
+ if result.RowsAffected() == 0 {
+ return errors.New(errors.NotFound, "security group not found")
+ }
+ return nil
}| func (r *SecurityGroupRepository) ListInstanceGroups(ctx context.Context, instanceID uuid.UUID) ([]*domain.SecurityGroup, error) { | ||
| query := ` | ||
| SELECT sg.id, sg.user_id, sg.vpc_id, sg.name, sg.description, sg.arn, sg.created_at | ||
| FROM security_groups sg | ||
| JOIN instance_security_groups isg ON sg.id = isg.group_id | ||
| WHERE isg.instance_id = $1 | ||
| ` | ||
| rows, err := r.db.Query(ctx, query, instanceID) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer rows.Close() | ||
|
|
||
| var groups []*domain.SecurityGroup | ||
| for rows.Next() { | ||
| var sg domain.SecurityGroup | ||
| if err := rows.Scan(&sg.ID, &sg.UserID, &sg.VPCID, &sg.Name, &sg.Description, &sg.ARN, &sg.CreatedAt); err != nil { | ||
| return nil, err | ||
| } | ||
| groups = append(groups, &sg) | ||
| } | ||
| return groups, nil | ||
| } |
There was a problem hiding this comment.
ListInstanceGroups doesn't enforce user ownership.
This method returns all security groups attached to an instance without verifying the caller owns either the instance or the groups. A user could potentially list security groups from another user by providing an instance ID.
🔒 Proposed fix to add ownership check
func (r *SecurityGroupRepository) ListInstanceGroups(ctx context.Context, instanceID uuid.UUID) ([]*domain.SecurityGroup, error) {
+ userID := appcontext.UserIDFromContext(ctx)
+
+ // First verify instance ownership
+ var instanceOwner uuid.UUID
+ ownerQuery := `SELECT user_id FROM instances WHERE id = $1`
+ err := r.db.QueryRow(ctx, ownerQuery, instanceID).Scan(&instanceOwner)
+ if err != nil {
+ if err == pgx.ErrNoRows {
+ return nil, errors.New(errors.NotFound, "instance not found")
+ }
+ return nil, errors.Wrap(errors.Internal, "failed to verify instance ownership", err)
+ }
+ if instanceOwner != userID {
+ return nil, errors.New(errors.Forbidden, "you do not own this instance")
+ }
+
query := `
SELECT sg.id, sg.user_id, sg.vpc_id, sg.name, sg.description, sg.arn, sg.created_at
FROM security_groups sg
JOIN instance_security_groups isg ON sg.id = isg.group_id
WHERE isg.instance_id = $1
`Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @internal/repositories/postgres/security_group_repo.go around lines 224 -
246, ListInstanceGroups currently returns groups for any instance ID without
enforcing ownership; change the signature to ListInstanceGroups(ctx
context.Context, instanceID uuid.UUID, userID uuid.UUID) and modify the SQL to
enforce ownership by joining the instances table and/or checking group owner
(e.g. add "JOIN instances i ON isg.instance_id = i.id" and "WHERE
isg.instance_id = $1 AND i.user_id = $2 AND sg.user_id = $2" or at minimum "AND
sg.user_id = $2"), pass instanceID and userID into r.db.Query, and update all
callers to supply the caller's userID so only groups owned by that user (and
attached to an instance owned by that user) are returned.


Integrated Open vSwitch (OVS) as a networking backend for The Cloud. This PR includes:
Summary by CodeRabbit
New Features
Enhancements
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.