Skip to content

feat: Enhanced Instance Provisioning and Resource Limits#51

Merged
poyrazK merged 120 commits intomainfrom
feature/global-load-balancer
Feb 9, 2026
Merged

feat: Enhanced Instance Provisioning and Resource Limits#51
poyrazK merged 120 commits intomainfrom
feature/global-load-balancer

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Feb 9, 2026

This PR introduces several enhancements to the instance provisioning logic:

  • Domain Models: Added CPULimit, MemoryLimit, DiskLimit, Env, Cmd, and VolumeBinds to the Instance and ProvisionJob entities.
  • Service Layer: Refactored InstanceService to use a structured LaunchParams object, allowing for more flexible provisioning.
  • Repository Layer: Updated PostgreSQL repository to handle new configuration fields and added a migration.
  • K8s Provisioner: Fixed High Availability (HA) Load Balancer cleanup in the Deprovision flow and resolved mock encryption panics in tests.
  • Docker Adapter: Added shell constant and supported custom commands/env in container creation.
  • Testing: Cleaned up duplicated literals, renamed test functions to follow conventions, and added 082 migration scripts.
  • API: Updated the instances endpoint and synced Swagger documentation.

Copilot AI review requested due to automatic review settings February 9, 2026 19:58
Auth: []ssh.AuthMethod{
ssh.PublicKeys(signer),
},
HostKeyCallback: ssh.InsecureIgnoreHostKey(),

Check failure

Code scanning / CodeQL

Use of insecure HostKeyCallback implementation High

Configuring SSH ClientConfig with insecure HostKeyCallback implementation from
this source
.

Copilot Autofix

AI 3 months ago

In general, the fix is to replace ssh.InsecureIgnoreHostKey() with a HostKeyCallback that validates the server’s host key against an allow list. The most straightforward and robust approach in Go, when you know the expected host key, is to use ssh.FixedHostKey(pubKey), where pubKey is a ssh.PublicKey parsed from a known‑good public key (for example, obtained out‑of‑band and stored alongside or derived from the private key).

For this specific code, we must avoid changing the overall behavior beyond security. The SSHExecutor already receives a private key string key and parses it to a ssh.Signer used for authentication. We can reuse that signer to derive the corresponding public key used by the server, and feed that into ssh.FixedHostKey. This avoids adding configuration fields or changing the NewSSHExecutor signature. Concretely, in Run:

  • After signer, err := ssh.ParsePrivateKey([]byte(e.key)), obtain pubKey := signer.PublicKey().
  • Replace HostKeyCallback: ssh.InsecureIgnoreHostKey(), with HostKeyCallback: ssh.FixedHostKey(pubKey),.

This change remains within internal/repositories/k8s/node_executor.go, uses only already‑imported golang.org/x/crypto/ssh, and doesn’t alter how callers construct SSHExecutor or how commands are executed, other than now failing if the host key doesn’t match the key pair being used.

Suggested changeset 1
internal/repositories/k8s/node_executor.go

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/internal/repositories/k8s/node_executor.go b/internal/repositories/k8s/node_executor.go
--- a/internal/repositories/k8s/node_executor.go
+++ b/internal/repositories/k8s/node_executor.go
@@ -69,12 +69,14 @@
 		return "", fmt.Errorf("failed to parse private key: %w", err)
 	}
 
+	pubKey := signer.PublicKey()
+
 	config := &ssh.ClientConfig{
 		User: e.user,
 		Auth: []ssh.AuthMethod{
 			ssh.PublicKeys(signer),
 		},
-		HostKeyCallback: ssh.InsecureIgnoreHostKey(),
+		HostKeyCallback: ssh.FixedHostKey(pubKey),
 		Timeout:         10 * time.Second,
 	}
 
EOF
@@ -69,12 +69,14 @@
return "", fmt.Errorf("failed to parse private key: %w", err)
}

pubKey := signer.PublicKey()

config := &ssh.ClientConfig{
User: e.user,
Auth: []ssh.AuthMethod{
ssh.PublicKeys(signer),
},
HostKeyCallback: ssh.InsecureIgnoreHostKey(),
HostKeyCallback: ssh.FixedHostKey(pubKey),
Timeout: 10 * time.Second,
}

Copilot is powered by AI and may make mistakes. Always verify output.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 9, 2026

Important

Review skipped

Too many files!

This PR contains 183 files, which is 33 over the limit of 150.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/global-load-balancer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances instance/cluster provisioning by introducing structured launch parameters, additional instance configuration fields (env/cmd/binds/resource limits/user-data), corresponding DB migrations, and updates across compute backends (Docker/Libvirt), workers, and API/Swagger; it also refactors and parallelizes many tests.

Changes:

  • Introduce ports.LaunchParams / extended CreateInstanceOptions and thread new provisioning fields through handlers, services, workers, and compute backends.
  • Add/adjust Postgres migrations for new instance/cluster/load-balancer schema fields and update Swagger specs accordingly.
  • Refactor and reorganize K8s provisioning (cloud-init templates, SG logic) and broaden test parallelization/cleanup.

Reviewed changes

Copilot reviewed 182 out of 183 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/gateway_e2e_test.go Improves assertions/diagnostics for gateway E2E route creation and response decoding
scripts/test_libvirt.go Updates libvirt adapter call to LaunchInstanceWithOptions
internal/workers/provision_worker_test.go Updates failing backend mock to new compute method name
internal/workers/provision_worker.go Passes full ProvisionJob into instance provisioning
internal/repositories/postgres/vpc_repo_unit_test.go Marks tests/subtests parallel
internal/repositories/postgres/secret_repo_unit_test.go Marks tests/subtests parallel
internal/repositories/postgres/rbac_repo_unit_test.go Marks tests parallel
internal/repositories/postgres/notify_repo_unit_test.go Marks tests parallel
internal/repositories/postgres/migrator_unit_test.go Adds t.Parallel() and whitespace cleanup
internal/repositories/postgres/migrations/082_add_instance_config_fields.up.sql Adds instance config columns (binds/env/cmd/resource limits)
internal/repositories/postgres/migrations/082_add_instance_config_fields.down.sql Drops instance config columns
internal/repositories/postgres/migrations/081_fix_clusters_table.up.sql Adds/migrates missing cluster columns
internal/repositories/postgres/migrations/081_fix_clusters_table.down.sql Drops added cluster columns
internal/repositories/postgres/migrations/080_add_lb_ip.up.sql Adds LB ip column (idempotent)
internal/repositories/postgres/migrations/080_add_lb_ip.down.sql Drops LB ip column
internal/repositories/postgres/migrations/078_add_api_key_expiry.up.sql Makes expires_at column add idempotent
internal/repositories/postgres/migrations/077_add_gateway_methods.up.sql Makes gateway methods/index/constraint creation idempotent
internal/repositories/postgres/migrations/076_add_gateway_pattern_matching.up.sql Makes gateway pattern-matching column adds idempotent
internal/repositories/postgres/migrations/071_add_tenant_id_to_resources.up.sql Simplifies table-existence checks for tenant-id migration
internal/repositories/postgres/migrations/070_create_tenants.up.sql Makes default tenant-id column adds idempotent
internal/repositories/postgres/migrations/060_bucket_encryption.up.sql Makes encryption columns idempotent
internal/repositories/postgres/migrations/058_add_object_versioning.up.sql Makes object versioning columns idempotent; drops alt constraint defensively
internal/repositories/postgres/migrations/058_add_object_versioning.down.sql Makes constraint re-add defensive in down migration
internal/repositories/postgres/migrations/054_extend_clusters_v2_2.up.sql Makes cluster/LB column adds idempotent
internal/repositories/postgres/migrations/054_extend_clusters_v2_2.down.sql Makes cluster/LB column drops idempotent
internal/repositories/postgres/migrations/053_add_cluster_network_isolation.up.sql Makes network_isolation column add idempotent
internal/repositories/postgres/migrations/053_add_cluster_network_isolation.down.sql Makes network_isolation column drop idempotent
internal/repositories/postgres/migrations/036_add_volume_backend_path.up.sql Makes backend_path column add idempotent
internal/repositories/postgres/migrations/032_add_instance_network_fields.up.sql Makes instance network column adds idempotent
internal/repositories/postgres/migrations/020_create_clusters.up.sql Adds new “fresh install” clusters/cluster_nodes schema
internal/repositories/postgres/migrations/020_create_clusters.down.sql Drops clusters/cluster_nodes tables
internal/repositories/postgres/migrations/014_resource_scoping.up.sql Makes user_id column adds idempotent
internal/repositories/postgres/migrations/013_create_users_table.up.sql Makes api_keys FK creation defensive
internal/repositories/postgres/migrations/012_add_failure_tracking.up.sql Makes failure-tracking columns idempotent; cleans formatting
internal/repositories/postgres/migrations/006_create_vpcs_table.up.sql Makes instances.vpc_id add idempotent
internal/repositories/postgres/migrations/003_add_container_id.up.sql Makes container_id add idempotent
internal/repositories/postgres/migrations/003_add_container_id.down.sql Makes container_id drop idempotent
internal/repositories/postgres/lifecycle_repo_unit_test.go Marks tests/subtests parallel
internal/repositories/postgres/image_repo_unit_test.go Marks tests parallel
internal/repositories/postgres/identity_repo_unit_test.go Marks tests parallel
internal/repositories/postgres/helper_test.go Adds placeholder test file for package
internal/repositories/postgres/function_repo_unit_test.go Marks tests parallel and formats literals
internal/repositories/postgres/encryption_repo_unit_test.go Marks tests/subtests parallel
internal/repositories/postgres/database_repo_unit_test.go Marks tests parallel
internal/repositories/postgres/cron_repo_unit_test.go Marks tests parallel
internal/repositories/postgres/container_repo_unit_test.go Marks tests/subtests parallel
internal/repositories/postgres/cluster_repo_unit_test.go Updates cluster repo expectations + parallelizes subtests
internal/repositories/postgres/accounting_repo_unit_test.go Marks tests parallel
internal/repositories/noop/adapters_test.go Updates method name to LaunchInstanceWithOptions
internal/repositories/libvirt/templates_test.go Adds integration build tag + parallelizes
internal/repositories/libvirt/mock_client_test.go Adds integration build tag
internal/repositories/libvirt/adapter_volume_test.go Adds integration build tag + parallelizes
internal/repositories/libvirt/adapter_test.go Adds integration build tag + parallelizes
internal/repositories/libvirt/adapter_network_test.go Adds integration build tag + parallelizes
internal/repositories/libvirt/adapter_lifecycle_test.go Adds integration build tag + parallelizes; updates method name
internal/repositories/libvirt/adapter_context_test.go Adds integration build tag + parallelizes; updates method name
internal/repositories/libvirt/adapter.go Renames create method, adds StartInstance, extends cloud-init generation with user-data passthrough
internal/repositories/k8s/templates/worker.yaml Adds cloud-init template for worker bootstrap/join
internal/repositories/k8s/templates/control_plane.yaml Adds cloud-init template for control-plane bootstrap/init
internal/repositories/k8s/security_test.go Deleted (k8s SG/base-security tests removed)
internal/repositories/k8s/security.go Refactors SG naming/rules and creation logic
internal/repositories/k8s/provisioner_all_test.go Deleted (large K8s provisioner test removed)
internal/repositories/k8s/node_executor_test.go Normalizes shell invocation in tests
internal/repositories/k8s/mocks_test.go Adds local mocks for k8s package tests
internal/repositories/k8s/mock_provisioner.go Deleted (old mock provisioner removed)
internal/repositories/k8s/kubeadm.go Deleted (old kubeadm helpers removed)
internal/repositories/k8s/health_test.go Updates mocks/constants; switches kubeconfig retrieval to secret decryption
internal/repositories/k8s/ha_test.go Deleted (HA-related tests removed)
internal/repositories/k8s/export_test.go Deleted (export helpers removed)
internal/repositories/k8s/cni_test.go Deleted (CNI tests removed)
internal/repositories/k8s/cni.go Deleted (CNI logic removed)
internal/repositories/docker/stats_test.go Updates adapter ctor signature + method name changes
internal/repositories/docker/fakes_test.go Adds call counting + mutex for thread-safe fake assertions
internal/repositories/docker/adapter_unit_test.go Updates method names/ctor signature; adds UserData unit test
internal/repositories/docker/adapter_test.go Updates ctor signature/method name; adds integration UserData bootstrap check
internal/handlers/ws/ws_test.go Parallelizes WS tests
internal/handlers/ws/hub_test.go Parallelizes hub tests
internal/handlers/ws/client_test.go Parallelizes client tests
internal/handlers/vpc_handler_test.go Parallelizes handler tests
internal/handlers/volume_handler_test.go Parallelizes handler tests
internal/handlers/tenant_handler_test.go Parallelizes handler tests
internal/handlers/subnet_handler_test.go Parallelizes handler tests
internal/handlers/storage_handler_errors_test.go Parallelizes test + subtests
internal/handlers/stack_handler_test.go Parallelizes handler tests
internal/handlers/snapshot_handler_test.go Parallelizes handler tests
internal/handlers/security_group_handler_test.go Parallelizes handler tests
internal/handlers/secret_handler_test.go Parallelizes handler tests
internal/handlers/rbac_handler_test.go Parallelizes handler tests
internal/handlers/queue_handler_test.go Parallelizes handler tests
internal/handlers/notify_handler_test.go Re-formats constants; parallelizes tests/subtests
internal/handlers/lifecycle_handler_test.go Parallelizes handler tests
internal/handlers/lb_handler_test.go Parallelizes handler tests
internal/handlers/instance_type_handler_test.go Parallelizes handler test
internal/handlers/instance_handler.go Extends launch request fields and uses ports.LaunchParams
internal/handlers/image_handler_test.go Parallelizes handler tests
internal/handlers/identity_handler_test.go Parallelizes handler tests
internal/handlers/helper_test.go Parallelizes helper tests
internal/handlers/health_handler_test.go Parallelizes health handler tests
internal/handlers/global_lb_handler_test.go Parallelizes GLB handler tests
internal/handlers/gateway_handler_test.go Parallelizes gateway handler tests
internal/handlers/function_handler_test.go Re-formats constants; parallelizes tests/subtests
internal/handlers/event_handler_test.go Parallelizes event handler test
internal/handlers/dns_handler_test.go Parallelizes DNS handler tests
internal/handlers/database_handler_test.go Re-formats constants; parallelizes tests/subtests
internal/handlers/dashboard_handler_test.go Parallelizes dashboard handler tests
internal/handlers/cron_handler_test.go Re-formats constants; parallelizes tests/subtests
internal/handlers/container_handler_test.go Parallelizes container handler tests/subtests
internal/handlers/cluster_handler_test.go Parallelizes cluster handler tests
internal/handlers/cache_handler_test.go Parallelizes cache handler tests
internal/handlers/autoscaling_handler_test.go Parallelizes autoscaling handler tests/subtests
internal/handlers/auth_handler_test.go Parallelizes auth handler tests
internal/handlers/audit_handler_test.go Parallelizes audit handler test
internal/handlers/accounting_handler_test.go Parallelizes accounting handler tests
internal/core/services/volume_test.go Updates service calls to LaunchParams / ProvisionJob
internal/core/services/stack_test.go Parallelizes stack service tests
internal/core/services/stack.go Updates instance launch to LaunchParams
internal/core/services/shared_test.go Updates instance service/compute backend mocks to new signatures
internal/core/services/scale_test.go Updates benchmark launches to LaunchParams
internal/core/services/rbac_cached_test.go Parallelizes cached RBAC tests
internal/core/services/loadbalancer_service_test.go Parallelizes LB service tests
internal/core/services/lifecycle_svc_test.go Parallelizes lifecycle service tests
internal/core/services/lb_worker_test.go Parallelizes LB worker tests
internal/core/services/instance_ports_test.go Parallelizes instance ports tests
internal/core/services/instance_internal_test.go Parallelizes instance internal tests
internal/core/services/health_test.go Updates Docker adapter ctor signature
internal/core/services/global_lb_test.go Parallelizes GLB service tests
internal/core/services/function_test.go Updates Docker adapter ctor signature
internal/core/services/database_test.go Updates Docker adapter ctor signature
internal/core/services/database.go Updates compute call to LaunchInstanceWithOptions
internal/core/services/dashboard_test.go Parallelizes dashboard service tests
internal/core/services/cron_worker_test.go Parallelizes cron worker tests
internal/core/services/container_worker_test.go Parallelizes container worker tests
internal/core/services/container_worker.go Updates instance launch to LaunchParams
internal/core/services/container_test.go Updates Provision invocation to ProvisionJob
internal/core/services/cluster_test.go Parallelizes cluster service tests; updates kubeconfig field name
internal/core/services/cluster_lifecycle_test.go Parallelizes cluster lifecycle tests
internal/core/services/cluster.go Adds Pod/Service CIDR defaults; migrates to encrypted kubeconfig field
internal/core/services/cached_identity_test.go Parallelizes cached identity tests
internal/core/services/cache_test.go Updates Docker adapter ctor signature
internal/core/services/cache.go Updates compute call to LaunchInstanceWithOptions
internal/core/services/benchmarks_test.go Updates benchmarks to LaunchParams
internal/core/services/autoscaling_worker_test.go Parallelizes autoscaling worker tests
internal/core/services/autoscaling_worker.go Updates instance launch to LaunchParams
internal/core/services/autoscaling_test.go Updates Provision invocation to ProvisionJob
internal/core/ports/mocks/common_mocks.go Adds SecretService mock
internal/core/ports/instance.go Introduces LaunchParams; expands InstanceService interface
internal/core/ports/compute_options.go Adds UserData to instance create options
internal/core/ports/compute.go Renames compute create method; adds StartInstance
internal/core/ports/cluster.go Adds PodCIDR/ServiceCIDR to cluster create params
internal/core/domain/ws_event_test.go Parallelizes domain tests
internal/core/domain/security_group_test.go Parallelizes domain tests
internal/core/domain/jobs.go Extends ProvisionJob with provisioning config fields
internal/core/domain/instance.go Extends Instance model with binds/env/cmd/resource limits
internal/core/domain/domain_test.go Parallelizes domain constants test
internal/core/domain/dns_test.go Parallelizes DNS domain tests
internal/core/domain/cluster.go Expands cluster/node domain models (networking + encrypted secrets fields)
internal/api/setup/infrastructure.go Updates Docker backend init to accept logger
docs/swagger/swagger.yaml Syncs schemas/description for new fields
docs/swagger/swagger.json Syncs schemas/description for new fields
docs/swagger/docs.go Syncs embedded swagger doc template for new fields
cmd/cloud/queue_cli_test.go Fixes indentation/formatting in CLI tests
.github/workflows/ci.yml Deploy staging on feature branches; add branch-name image tagging
Comments suppressed due to low confidence (1)

internal/handlers/instance_handler.go:92

  • validateLaunchRequest() doesn’t validate the newly added provisioning fields (VolumeBinds/Env/Cmd and CPULimit/MemoryLimit/DiskLimit). This allows negative/invalid limits or malformed env/cmd entries to reach the service layer. Add explicit validation (e.g., non-negative limits; reasonable max lengths/counts; optional format checks for env like KEY=VALUE).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/ci.yml
Comment on lines 287 to 320
@@ -316,6 +316,7 @@ jobs:
push: true
tags: |
ghcr.io/${{ steps.repo_name.outputs.REPO_LC }}:staging
ghcr.io/${{ steps.repo_name.outputs.REPO_LC }}:${{ github.ref_name }}
ghcr.io/${{ steps.repo_name.outputs.REPO_LC }}:${{ github.sha }}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging images with github.ref_name will produce invalid Docker tags for branches like "feature/foo" ("/" is not allowed in tags). Sanitize the ref name (e.g., replace "/" with "-"), or use a safe identifier (sha, short sha, or a slug output) for tags.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
CREATE TABLE IF NOT EXISTS clusters (
id UUID PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id),
vpc_id UUID NOT NULL REFERENCES vpcs(id),
name VARCHAR(255) NOT NULL,
version VARCHAR(50) NOT NULL DEFAULT 'v1.29.0',
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate migration prefix "020_" is already used by 020_create_notify.up.sql, so ordering depends on lexicographic filename and the history becomes ambiguous. Consider renumbering to keep migration versions unique, or consolidating/retiring the older clusters migration (037_create_clusters_table) to avoid long-term confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +31
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "{{ if .HAEnabled }}{{ .LBAddress }}{{ else }}localhost{{ end }}:6443"
networking:
podSubnet: "{{ .PodCIDR }}"
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controlPlaneEndpoint is set to localhost when HA is disabled, which will make the generated admin kubeconfig point to https://localhost:6443 and break remote access (and may affect cert SANs). Use the control-plane node address (or a provided API server address) instead of localhost.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +29
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "{{ if .HAEnabled }}{{ .LBAddress }}{{ else }}localhost{{ end }}:6443"
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubeadm config uses kubernetesVersion: stable, but ClusterService allows selecting a version. This will ignore the requested version and may cause unexpected upgrades. Pass/render an explicit version (e.g., cluster.Version) into the template instead of "stable".

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +33
# Install Containerd
- curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
- echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
- apt-get update
- apt-get install -y containerd.io
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands write to /etc/apt/keyrings/*.gpg but the directory may not exist on a fresh Ubuntu image. Create /etc/apt/keyrings (e.g., mkdir -p /etc/apt/keyrings) before running gpg --dearmor to avoid provisioning failures.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,3 @@
package postgres

// Dont remove this file
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "Dont" -> "Don't" (and consider adding a short reason why this file must exist).

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +21
sg, err := p.sgSvc.GetGroup(ctx, sgName, cluster.VpcID)
if err == nil && sg != nil {
return nil
}

// Create
sg, err := p.sgSvc.CreateGroup(ctx, cluster.VpcID, sgName, "Kubernetes cluster security group")
p.logger.Info("creating cluster security group", "sg_name", sgName)
newSG, err := p.sgSvc.CreateGroup(ctx, cluster.VpcID, sgName, "Managed security group for K8s cluster "+cluster.Name)
if err != nil {
return err
return fmt.Errorf("failed to create security group: %w", err)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureClusterSecurityGroup() creates a new security group on any GetGroup error. This can mask real failures (e.g., DB/network) and potentially create duplicates. Consider only creating when the error is a definite "not found" case; otherwise return/wrap the GetGroup error.

Copilot uses AI. Check for mistakes.
Comment on lines +24 to 34
// Add default rules (K8s control plane ports)
rules := []domain.SecurityRule{
{Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // API Server
{Protocol: "udp", PortMin: 4789, PortMax: 4789, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // VXLAN
{Protocol: "tcp", PortMin: 179, PortMax: 179, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // BGP
{Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // Kubelet
{Protocol: "tcp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort TCP
{Protocol: "udp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort UDP
}

for _, r := range rules {
_, _ = p.sgSvc.AddRule(ctx, sg.ID, r)
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: "0.0.0.0/0"},
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 22, PortMax: 22, CIDR: "0.0.0.0/0"},
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 2379, PortMax: 2380, CIDR: "10.0.0.0/8"}, // etcd (internal)
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: "10.0.0.0/8"}, // kubelet
}

return nil
}

func (p *KubeadmProvisioner) applyBaseSecurity(ctx context.Context, cluster *domain.Cluster, masterIP string) error {
exec, err := p.getExecutor(ctx, cluster, masterIP)
if err != nil {
return err
}

p.logger.Info("applying base security manifests", "ip", masterIP, "isolation", cluster.NetworkIsolation)

var securityManifests string
if cluster.NetworkIsolation {
// Apply a default-deny ingress policy for the default namespace
securityManifests = `
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
`
}
if securityManifests != "" {
manifestFile := "/tmp/base-security.yaml"
_, _ = exec.Run(ctx, fmt.Sprintf("cat <<EOF > %s\n%s\nEOF", manifestFile, securityManifests))
_, err = exec.Run(ctx, fmt.Sprintf("%s apply -f %s", kubectlBase, manifestFile))
return err
for _, rule := range rules {
_, _ = p.sgSvc.AddRule(ctx, newSG.ID, rule)
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors from AddRule are ignored, so the cluster may proceed with an incomplete/incorrect security group configuration. Handle and return/report AddRule errors (and consider cleanup/rollback if rule creation fails).

Copilot uses AI. Check for mistakes.
Comment on lines 25 to +29
rules := []domain.SecurityRule{
{Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // API Server
{Protocol: "udp", PortMin: 4789, PortMax: 4789, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // VXLAN
{Protocol: "tcp", PortMin: 179, PortMax: 179, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // BGP
{Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // Kubelet
{Protocol: "tcp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort TCP
{Protocol: "udp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort UDP
}

for _, r := range rules {
_, _ = p.sgSvc.AddRule(ctx, sg.ID, r)
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: "0.0.0.0/0"},
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 22, PortMax: 22, CIDR: "0.0.0.0/0"},
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 2379, PortMax: 2380, CIDR: "10.0.0.0/8"}, // etcd (internal)
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: "10.0.0.0/8"}, // kubelet
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default rules open SSH (22) and the Kubernetes API (6443) to 0.0.0.0/0. Unless this is strictly intended, this is a high-risk default. Consider restricting to VPC CIDR / admin CIDRs and/or making these CIDRs configurable per cluster.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +46
# Install Containerd
- curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
- echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
- apt-get update
- apt-get install -y containerd.io
- mkdir -p /etc/containerd
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands write to /etc/apt/keyrings/*.gpg but the directory may not exist on a fresh Ubuntu image. Create /etc/apt/keyrings (e.g., mkdir -p /etc/apt/keyrings) before running gpg --dearmor to avoid provisioning failures.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 9, 2026 20:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 182 out of 183 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +12
CREATE TABLE IF NOT EXISTS clusters (
id UUID PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id),
vpc_id UUID NOT NULL REFERENCES vpcs(id),
name VARCHAR(255) NOT NULL,
version VARCHAR(50) NOT NULL DEFAULT 'v1.29.0',
status VARCHAR(50) NOT NULL DEFAULT 'pending',
worker_count INT NOT NULL DEFAULT 2,
ha_enabled BOOLEAN NOT NULL DEFAULT FALSE,
network_isolation BOOLEAN NOT NULL DEFAULT FALSE,
-- Networking
pod_cidr VARCHAR(50) NOT NULL DEFAULT '10.244.0.0/16',
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repository already contains 020_create_notify.* and 037_create_clusters_table.*. Introducing another 020_create_clusters.* means duplicate migration numbers and multiple “create clusters table” migrations, which can cause schema divergence (fresh DB vs existing DB) given the migrator reruns all .up.sql files without tracking. Prefer a new sequential migration (e.g., 083_...) that alters the existing clusters/cluster_nodes schema in-place.

Copilot uses AI. Check for mistakes.
Comment on lines +393 to +405
// UserData execution occurs asynchronously in a background goroutine.
// A brief pause ensures that the goroutine has sufficient time to initiate its Exec calls.
time.Sleep(50 * time.Millisecond)

// Verify the two-stage bootstrap sequence:
// Stage 1: Payload delivery (writing the bootstrap script to the container filesystem).
// Stage 2: Bootstrap execution (invoking the script via a background Exec operation).
// Each 'Exec' operation sequentially triggers ContainerExecCreate, ContainerExecAttach,
// and ContainerExecInspect according to the adapter's implementation.
require.Equal(t, 2, cli.CallCount("ContainerExecCreate"), "Expected dual Stage (delivery + execution) Exec invocations")
require.Equal(t, 2, cli.CallCount("ContainerExecAttach"), "Expected associated ExecAttach calls for I/O")
require.Equal(t, 2, cli.CallCount("ContainerExecInspect"), "Expected ExecInspect calls to verify termination state")
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Sleep(50ms) makes this test timing-dependent and potentially flaky on slower CI runners. Prefer synchronizing on a deterministic signal (e.g., require.Eventually until the expected call counts are observed, or expose a hook/waitgroup in the adapter for testability).

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +53
VolumeBinds []string `json:"volume_binds"`
Env []string `json:"env"`
Cmd []string `json:"cmd"`
CPULimit int64 `json:"cpu_limit"`
MemoryLimit int64 `json:"memory_limit"`
DiskLimit int64 `json:"disk_limit"`
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New request fields (cpu_limit, memory_limit, disk_limit) are accepted and forwarded without validation. Negative values (or nonsensical limits) can now reach the service/provisioner layer. Add validation (e.g., must be >= 0, and optionally enforce reasonable upper bounds / units).

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,3 @@
package postgres

// Dont remove this file
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling/grammar: comment should be "Don't remove this file" (and ideally explain why it must exist, e.g., to ensure the package is included in test compilation).

Copilot uses AI. Check for mistakes.
Comment on lines +24 to 30
// Add default rules (K8s control plane ports)
rules := []domain.SecurityRule{
{Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // API Server
{Protocol: "udp", PortMin: 4789, PortMax: 4789, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // VXLAN
{Protocol: "tcp", PortMin: 179, PortMax: 179, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // BGP
{Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // Kubelet
{Protocol: "tcp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort TCP
{Protocol: "udp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort UDP
}

for _, r := range rules {
_, _ = p.sgSvc.AddRule(ctx, sg.ID, r)
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: "0.0.0.0/0"},
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 22, PortMax: 22, CIDR: "0.0.0.0/0"},
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 2379, PortMax: 2380, CIDR: "10.0.0.0/8"}, // etcd (internal)
{Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: "10.0.0.0/8"}, // kubelet
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security group rules here allow SSH (22) and Kubernetes API server (6443) from 0.0.0.0/0 by default. This is an insecure default and can expose clusters to the public internet. Prefer scoping these CIDRs (e.g., VPC CIDR / admin IP allowlist) and/or make public access an explicit opt-in.

Copilot uses AI. Check for mistakes.
Comment thread Makefile Outdated

swagger:
@$(HOME)/go/bin/swag init -d cmd/api,internal/handlers -g main.go -o docs/swagger --parseDependency --parseInternal
@/Users/poyrazk/go/bin/swag init -d ./ -g cmd/api/main.go -o docs/swagger --parseDependency --parseInternal
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The swagger target uses an absolute, user-specific path (/Users/poyrazk/...) which will break for other developers and in CI. Use swag from PATH (or a configurable var like SWAG ?= $(HOME)/go/bin/swag) and keep paths repo-relative.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 9, 2026 20:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 182 out of 183 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +183 to +188
workerInst, err := p.instSvc.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{
Name: workerName,
ImageName: "ubuntu-22.04",
NetworkID: cluster.VpcID.String(),
UserData: userData,
})
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worker nodes are launched with image name "ubuntu-22.04" while control-plane uses "ubuntu:22.04". "ubuntu-22.04" is not a standard Docker image tag and is likely to fail pulls; consider using the same canonical image reference consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +393 to +396
// UserData execution occurs asynchronously in a background goroutine.
// A brief pause ensures that the goroutine has sufficient time to initiate its Exec calls.
time.Sleep(50 * time.Millisecond)

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test comment says UserData execution is asynchronous and uses a fixed sleep, but DockerAdapter.handleUserData executes synchronously in LaunchInstanceWithOptions. The sleep is unnecessary and can make the test slower/flakier; consider removing it or switching to require.Eventually only if the implementation is actually async.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,3 @@
package postgres

// Dont remove this file
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has a typo and is unclear. If the intent is to keep the package’s *_test build working, update the wording (e.g., “Don’t remove this file”) and briefly explain why it’s needed.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +32
kubernetesVersion: stable
controlPlaneEndpoint: "{{ if .HAEnabled }}{{ .LBAddress }}{{ else }}localhost{{ end }}:6443"
networking:
podSubnet: "{{ .PodCIDR }}"
serviceSubnet: "{{ .ServiceCIDR }}"
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When HA is disabled, setting kubeadm controlPlaneEndpoint to "localhost:6443" will produce a kubeconfig/server endpoint that’s not reachable from outside the control-plane node (and may break worker joins). Consider omitting controlPlaneEndpoint unless HA is enabled, or set it to a reachable address (e.g., the node IP) at runtime.

Copilot uses AI. Check for mistakes.
@poyrazK poyrazK merged commit 7f20697 into main Feb 9, 2026
33 of 35 checks passed
@poyrazK poyrazK deleted the feature/global-load-balancer branch April 12, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants