feat: Enhanced Instance Provisioning and Resource Limits#51
Conversation
| Auth: []ssh.AuthMethod{ | ||
| ssh.PublicKeys(signer), | ||
| }, | ||
| HostKeyCallback: ssh.InsecureIgnoreHostKey(), |
Check failure
Code scanning / CodeQL
Use of insecure HostKeyCallback implementation High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
In general, the fix is to replace ssh.InsecureIgnoreHostKey() with a HostKeyCallback that validates the server’s host key against an allow list. The most straightforward and robust approach in Go, when you know the expected host key, is to use ssh.FixedHostKey(pubKey), where pubKey is a ssh.PublicKey parsed from a known‑good public key (for example, obtained out‑of‑band and stored alongside or derived from the private key).
For this specific code, we must avoid changing the overall behavior beyond security. The SSHExecutor already receives a private key string key and parses it to a ssh.Signer used for authentication. We can reuse that signer to derive the corresponding public key used by the server, and feed that into ssh.FixedHostKey. This avoids adding configuration fields or changing the NewSSHExecutor signature. Concretely, in Run:
- After
signer, err := ssh.ParsePrivateKey([]byte(e.key)), obtainpubKey := signer.PublicKey(). - Replace
HostKeyCallback: ssh.InsecureIgnoreHostKey(),withHostKeyCallback: ssh.FixedHostKey(pubKey),.
This change remains within internal/repositories/k8s/node_executor.go, uses only already‑imported golang.org/x/crypto/ssh, and doesn’t alter how callers construct SSHExecutor or how commands are executed, other than now failing if the host key doesn’t match the key pair being used.
| @@ -69,12 +69,14 @@ | ||
| return "", fmt.Errorf("failed to parse private key: %w", err) | ||
| } | ||
|
|
||
| pubKey := signer.PublicKey() | ||
|
|
||
| config := &ssh.ClientConfig{ | ||
| User: e.user, | ||
| Auth: []ssh.AuthMethod{ | ||
| ssh.PublicKeys(signer), | ||
| }, | ||
| HostKeyCallback: ssh.InsecureIgnoreHostKey(), | ||
| HostKeyCallback: ssh.FixedHostKey(pubKey), | ||
| Timeout: 10 * time.Second, | ||
| } | ||
|
|
|
Important Review skippedToo many files! This PR contains 183 files, which is 33 over the limit of 150. You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR enhances instance/cluster provisioning by introducing structured launch parameters, additional instance configuration fields (env/cmd/binds/resource limits/user-data), corresponding DB migrations, and updates across compute backends (Docker/Libvirt), workers, and API/Swagger; it also refactors and parallelizes many tests.
Changes:
- Introduce
ports.LaunchParams/ extendedCreateInstanceOptionsand thread new provisioning fields through handlers, services, workers, and compute backends. - Add/adjust Postgres migrations for new instance/cluster/load-balancer schema fields and update Swagger specs accordingly.
- Refactor and reorganize K8s provisioning (cloud-init templates, SG logic) and broaden test parallelization/cleanup.
Reviewed changes
Copilot reviewed 182 out of 183 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/gateway_e2e_test.go | Improves assertions/diagnostics for gateway E2E route creation and response decoding |
| scripts/test_libvirt.go | Updates libvirt adapter call to LaunchInstanceWithOptions |
| internal/workers/provision_worker_test.go | Updates failing backend mock to new compute method name |
| internal/workers/provision_worker.go | Passes full ProvisionJob into instance provisioning |
| internal/repositories/postgres/vpc_repo_unit_test.go | Marks tests/subtests parallel |
| internal/repositories/postgres/secret_repo_unit_test.go | Marks tests/subtests parallel |
| internal/repositories/postgres/rbac_repo_unit_test.go | Marks tests parallel |
| internal/repositories/postgres/notify_repo_unit_test.go | Marks tests parallel |
| internal/repositories/postgres/migrator_unit_test.go | Adds t.Parallel() and whitespace cleanup |
| internal/repositories/postgres/migrations/082_add_instance_config_fields.up.sql | Adds instance config columns (binds/env/cmd/resource limits) |
| internal/repositories/postgres/migrations/082_add_instance_config_fields.down.sql | Drops instance config columns |
| internal/repositories/postgres/migrations/081_fix_clusters_table.up.sql | Adds/migrates missing cluster columns |
| internal/repositories/postgres/migrations/081_fix_clusters_table.down.sql | Drops added cluster columns |
| internal/repositories/postgres/migrations/080_add_lb_ip.up.sql | Adds LB ip column (idempotent) |
| internal/repositories/postgres/migrations/080_add_lb_ip.down.sql | Drops LB ip column |
| internal/repositories/postgres/migrations/078_add_api_key_expiry.up.sql | Makes expires_at column add idempotent |
| internal/repositories/postgres/migrations/077_add_gateway_methods.up.sql | Makes gateway methods/index/constraint creation idempotent |
| internal/repositories/postgres/migrations/076_add_gateway_pattern_matching.up.sql | Makes gateway pattern-matching column adds idempotent |
| internal/repositories/postgres/migrations/071_add_tenant_id_to_resources.up.sql | Simplifies table-existence checks for tenant-id migration |
| internal/repositories/postgres/migrations/070_create_tenants.up.sql | Makes default tenant-id column adds idempotent |
| internal/repositories/postgres/migrations/060_bucket_encryption.up.sql | Makes encryption columns idempotent |
| internal/repositories/postgres/migrations/058_add_object_versioning.up.sql | Makes object versioning columns idempotent; drops alt constraint defensively |
| internal/repositories/postgres/migrations/058_add_object_versioning.down.sql | Makes constraint re-add defensive in down migration |
| internal/repositories/postgres/migrations/054_extend_clusters_v2_2.up.sql | Makes cluster/LB column adds idempotent |
| internal/repositories/postgres/migrations/054_extend_clusters_v2_2.down.sql | Makes cluster/LB column drops idempotent |
| internal/repositories/postgres/migrations/053_add_cluster_network_isolation.up.sql | Makes network_isolation column add idempotent |
| internal/repositories/postgres/migrations/053_add_cluster_network_isolation.down.sql | Makes network_isolation column drop idempotent |
| internal/repositories/postgres/migrations/036_add_volume_backend_path.up.sql | Makes backend_path column add idempotent |
| internal/repositories/postgres/migrations/032_add_instance_network_fields.up.sql | Makes instance network column adds idempotent |
| internal/repositories/postgres/migrations/020_create_clusters.up.sql | Adds new “fresh install” clusters/cluster_nodes schema |
| internal/repositories/postgres/migrations/020_create_clusters.down.sql | Drops clusters/cluster_nodes tables |
| internal/repositories/postgres/migrations/014_resource_scoping.up.sql | Makes user_id column adds idempotent |
| internal/repositories/postgres/migrations/013_create_users_table.up.sql | Makes api_keys FK creation defensive |
| internal/repositories/postgres/migrations/012_add_failure_tracking.up.sql | Makes failure-tracking columns idempotent; cleans formatting |
| internal/repositories/postgres/migrations/006_create_vpcs_table.up.sql | Makes instances.vpc_id add idempotent |
| internal/repositories/postgres/migrations/003_add_container_id.up.sql | Makes container_id add idempotent |
| internal/repositories/postgres/migrations/003_add_container_id.down.sql | Makes container_id drop idempotent |
| internal/repositories/postgres/lifecycle_repo_unit_test.go | Marks tests/subtests parallel |
| internal/repositories/postgres/image_repo_unit_test.go | Marks tests parallel |
| internal/repositories/postgres/identity_repo_unit_test.go | Marks tests parallel |
| internal/repositories/postgres/helper_test.go | Adds placeholder test file for package |
| internal/repositories/postgres/function_repo_unit_test.go | Marks tests parallel and formats literals |
| internal/repositories/postgres/encryption_repo_unit_test.go | Marks tests/subtests parallel |
| internal/repositories/postgres/database_repo_unit_test.go | Marks tests parallel |
| internal/repositories/postgres/cron_repo_unit_test.go | Marks tests parallel |
| internal/repositories/postgres/container_repo_unit_test.go | Marks tests/subtests parallel |
| internal/repositories/postgres/cluster_repo_unit_test.go | Updates cluster repo expectations + parallelizes subtests |
| internal/repositories/postgres/accounting_repo_unit_test.go | Marks tests parallel |
| internal/repositories/noop/adapters_test.go | Updates method name to LaunchInstanceWithOptions |
| internal/repositories/libvirt/templates_test.go | Adds integration build tag + parallelizes |
| internal/repositories/libvirt/mock_client_test.go | Adds integration build tag |
| internal/repositories/libvirt/adapter_volume_test.go | Adds integration build tag + parallelizes |
| internal/repositories/libvirt/adapter_test.go | Adds integration build tag + parallelizes |
| internal/repositories/libvirt/adapter_network_test.go | Adds integration build tag + parallelizes |
| internal/repositories/libvirt/adapter_lifecycle_test.go | Adds integration build tag + parallelizes; updates method name |
| internal/repositories/libvirt/adapter_context_test.go | Adds integration build tag + parallelizes; updates method name |
| internal/repositories/libvirt/adapter.go | Renames create method, adds StartInstance, extends cloud-init generation with user-data passthrough |
| internal/repositories/k8s/templates/worker.yaml | Adds cloud-init template for worker bootstrap/join |
| internal/repositories/k8s/templates/control_plane.yaml | Adds cloud-init template for control-plane bootstrap/init |
| internal/repositories/k8s/security_test.go | Deleted (k8s SG/base-security tests removed) |
| internal/repositories/k8s/security.go | Refactors SG naming/rules and creation logic |
| internal/repositories/k8s/provisioner_all_test.go | Deleted (large K8s provisioner test removed) |
| internal/repositories/k8s/node_executor_test.go | Normalizes shell invocation in tests |
| internal/repositories/k8s/mocks_test.go | Adds local mocks for k8s package tests |
| internal/repositories/k8s/mock_provisioner.go | Deleted (old mock provisioner removed) |
| internal/repositories/k8s/kubeadm.go | Deleted (old kubeadm helpers removed) |
| internal/repositories/k8s/health_test.go | Updates mocks/constants; switches kubeconfig retrieval to secret decryption |
| internal/repositories/k8s/ha_test.go | Deleted (HA-related tests removed) |
| internal/repositories/k8s/export_test.go | Deleted (export helpers removed) |
| internal/repositories/k8s/cni_test.go | Deleted (CNI tests removed) |
| internal/repositories/k8s/cni.go | Deleted (CNI logic removed) |
| internal/repositories/docker/stats_test.go | Updates adapter ctor signature + method name changes |
| internal/repositories/docker/fakes_test.go | Adds call counting + mutex for thread-safe fake assertions |
| internal/repositories/docker/adapter_unit_test.go | Updates method names/ctor signature; adds UserData unit test |
| internal/repositories/docker/adapter_test.go | Updates ctor signature/method name; adds integration UserData bootstrap check |
| internal/handlers/ws/ws_test.go | Parallelizes WS tests |
| internal/handlers/ws/hub_test.go | Parallelizes hub tests |
| internal/handlers/ws/client_test.go | Parallelizes client tests |
| internal/handlers/vpc_handler_test.go | Parallelizes handler tests |
| internal/handlers/volume_handler_test.go | Parallelizes handler tests |
| internal/handlers/tenant_handler_test.go | Parallelizes handler tests |
| internal/handlers/subnet_handler_test.go | Parallelizes handler tests |
| internal/handlers/storage_handler_errors_test.go | Parallelizes test + subtests |
| internal/handlers/stack_handler_test.go | Parallelizes handler tests |
| internal/handlers/snapshot_handler_test.go | Parallelizes handler tests |
| internal/handlers/security_group_handler_test.go | Parallelizes handler tests |
| internal/handlers/secret_handler_test.go | Parallelizes handler tests |
| internal/handlers/rbac_handler_test.go | Parallelizes handler tests |
| internal/handlers/queue_handler_test.go | Parallelizes handler tests |
| internal/handlers/notify_handler_test.go | Re-formats constants; parallelizes tests/subtests |
| internal/handlers/lifecycle_handler_test.go | Parallelizes handler tests |
| internal/handlers/lb_handler_test.go | Parallelizes handler tests |
| internal/handlers/instance_type_handler_test.go | Parallelizes handler test |
| internal/handlers/instance_handler.go | Extends launch request fields and uses ports.LaunchParams |
| internal/handlers/image_handler_test.go | Parallelizes handler tests |
| internal/handlers/identity_handler_test.go | Parallelizes handler tests |
| internal/handlers/helper_test.go | Parallelizes helper tests |
| internal/handlers/health_handler_test.go | Parallelizes health handler tests |
| internal/handlers/global_lb_handler_test.go | Parallelizes GLB handler tests |
| internal/handlers/gateway_handler_test.go | Parallelizes gateway handler tests |
| internal/handlers/function_handler_test.go | Re-formats constants; parallelizes tests/subtests |
| internal/handlers/event_handler_test.go | Parallelizes event handler test |
| internal/handlers/dns_handler_test.go | Parallelizes DNS handler tests |
| internal/handlers/database_handler_test.go | Re-formats constants; parallelizes tests/subtests |
| internal/handlers/dashboard_handler_test.go | Parallelizes dashboard handler tests |
| internal/handlers/cron_handler_test.go | Re-formats constants; parallelizes tests/subtests |
| internal/handlers/container_handler_test.go | Parallelizes container handler tests/subtests |
| internal/handlers/cluster_handler_test.go | Parallelizes cluster handler tests |
| internal/handlers/cache_handler_test.go | Parallelizes cache handler tests |
| internal/handlers/autoscaling_handler_test.go | Parallelizes autoscaling handler tests/subtests |
| internal/handlers/auth_handler_test.go | Parallelizes auth handler tests |
| internal/handlers/audit_handler_test.go | Parallelizes audit handler test |
| internal/handlers/accounting_handler_test.go | Parallelizes accounting handler tests |
| internal/core/services/volume_test.go | Updates service calls to LaunchParams / ProvisionJob |
| internal/core/services/stack_test.go | Parallelizes stack service tests |
| internal/core/services/stack.go | Updates instance launch to LaunchParams |
| internal/core/services/shared_test.go | Updates instance service/compute backend mocks to new signatures |
| internal/core/services/scale_test.go | Updates benchmark launches to LaunchParams |
| internal/core/services/rbac_cached_test.go | Parallelizes cached RBAC tests |
| internal/core/services/loadbalancer_service_test.go | Parallelizes LB service tests |
| internal/core/services/lifecycle_svc_test.go | Parallelizes lifecycle service tests |
| internal/core/services/lb_worker_test.go | Parallelizes LB worker tests |
| internal/core/services/instance_ports_test.go | Parallelizes instance ports tests |
| internal/core/services/instance_internal_test.go | Parallelizes instance internal tests |
| internal/core/services/health_test.go | Updates Docker adapter ctor signature |
| internal/core/services/global_lb_test.go | Parallelizes GLB service tests |
| internal/core/services/function_test.go | Updates Docker adapter ctor signature |
| internal/core/services/database_test.go | Updates Docker adapter ctor signature |
| internal/core/services/database.go | Updates compute call to LaunchInstanceWithOptions |
| internal/core/services/dashboard_test.go | Parallelizes dashboard service tests |
| internal/core/services/cron_worker_test.go | Parallelizes cron worker tests |
| internal/core/services/container_worker_test.go | Parallelizes container worker tests |
| internal/core/services/container_worker.go | Updates instance launch to LaunchParams |
| internal/core/services/container_test.go | Updates Provision invocation to ProvisionJob |
| internal/core/services/cluster_test.go | Parallelizes cluster service tests; updates kubeconfig field name |
| internal/core/services/cluster_lifecycle_test.go | Parallelizes cluster lifecycle tests |
| internal/core/services/cluster.go | Adds Pod/Service CIDR defaults; migrates to encrypted kubeconfig field |
| internal/core/services/cached_identity_test.go | Parallelizes cached identity tests |
| internal/core/services/cache_test.go | Updates Docker adapter ctor signature |
| internal/core/services/cache.go | Updates compute call to LaunchInstanceWithOptions |
| internal/core/services/benchmarks_test.go | Updates benchmarks to LaunchParams |
| internal/core/services/autoscaling_worker_test.go | Parallelizes autoscaling worker tests |
| internal/core/services/autoscaling_worker.go | Updates instance launch to LaunchParams |
| internal/core/services/autoscaling_test.go | Updates Provision invocation to ProvisionJob |
| internal/core/ports/mocks/common_mocks.go | Adds SecretService mock |
| internal/core/ports/instance.go | Introduces LaunchParams; expands InstanceService interface |
| internal/core/ports/compute_options.go | Adds UserData to instance create options |
| internal/core/ports/compute.go | Renames compute create method; adds StartInstance |
| internal/core/ports/cluster.go | Adds PodCIDR/ServiceCIDR to cluster create params |
| internal/core/domain/ws_event_test.go | Parallelizes domain tests |
| internal/core/domain/security_group_test.go | Parallelizes domain tests |
| internal/core/domain/jobs.go | Extends ProvisionJob with provisioning config fields |
| internal/core/domain/instance.go | Extends Instance model with binds/env/cmd/resource limits |
| internal/core/domain/domain_test.go | Parallelizes domain constants test |
| internal/core/domain/dns_test.go | Parallelizes DNS domain tests |
| internal/core/domain/cluster.go | Expands cluster/node domain models (networking + encrypted secrets fields) |
| internal/api/setup/infrastructure.go | Updates Docker backend init to accept logger |
| docs/swagger/swagger.yaml | Syncs schemas/description for new fields |
| docs/swagger/swagger.json | Syncs schemas/description for new fields |
| docs/swagger/docs.go | Syncs embedded swagger doc template for new fields |
| cmd/cloud/queue_cli_test.go | Fixes indentation/formatting in CLI tests |
| .github/workflows/ci.yml | Deploy staging on feature branches; add branch-name image tagging |
Comments suppressed due to low confidence (1)
internal/handlers/instance_handler.go:92
- validateLaunchRequest() doesn’t validate the newly added provisioning fields (VolumeBinds/Env/Cmd and CPULimit/MemoryLimit/DiskLimit). This allows negative/invalid limits or malformed env/cmd entries to reach the service layer. Add explicit validation (e.g., non-negative limits; reasonable max lengths/counts; optional format checks for env like KEY=VALUE).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -316,6 +316,7 @@ jobs: | |||
| push: true | |||
| tags: | | |||
| ghcr.io/${{ steps.repo_name.outputs.REPO_LC }}:staging | |||
| ghcr.io/${{ steps.repo_name.outputs.REPO_LC }}:${{ github.ref_name }} | |||
| ghcr.io/${{ steps.repo_name.outputs.REPO_LC }}:${{ github.sha }} | |||
There was a problem hiding this comment.
Tagging images with github.ref_name will produce invalid Docker tags for branches like "feature/foo" ("/" is not allowed in tags). Sanitize the ref name (e.g., replace "/" with "-"), or use a safe identifier (sha, short sha, or a slug output) for tags.
| CREATE TABLE IF NOT EXISTS clusters ( | ||
| id UUID PRIMARY KEY, | ||
| user_id UUID NOT NULL REFERENCES users(id), | ||
| vpc_id UUID NOT NULL REFERENCES vpcs(id), | ||
| name VARCHAR(255) NOT NULL, | ||
| version VARCHAR(50) NOT NULL DEFAULT 'v1.29.0', |
There was a problem hiding this comment.
Duplicate migration prefix "020_" is already used by 020_create_notify.up.sql, so ordering depends on lexicographic filename and the history becomes ambiguous. Consider renumbering to keep migration versions unique, or consolidating/retiring the older clusters migration (037_create_clusters_table) to avoid long-term confusion.
| kind: ClusterConfiguration | ||
| kubernetesVersion: stable | ||
| controlPlaneEndpoint: "{{ if .HAEnabled }}{{ .LBAddress }}{{ else }}localhost{{ end }}:6443" | ||
| networking: | ||
| podSubnet: "{{ .PodCIDR }}" |
There was a problem hiding this comment.
controlPlaneEndpoint is set to localhost when HA is disabled, which will make the generated admin kubeconfig point to https://localhost:6443 and break remote access (and may affect cert SANs). Use the control-plane node address (or a provided API server address) instead of localhost.
| apiVersion: kubeadm.k8s.io/v1beta3 | ||
| kind: ClusterConfiguration | ||
| kubernetesVersion: stable | ||
| controlPlaneEndpoint: "{{ if .HAEnabled }}{{ .LBAddress }}{{ else }}localhost{{ end }}:6443" |
There was a problem hiding this comment.
The kubeadm config uses kubernetesVersion: stable, but ClusterService allows selecting a version. This will ignore the requested version and may cause unexpected upgrades. Pass/render an explicit version (e.g., cluster.Version) into the template instead of "stable".
| # Install Containerd | ||
| - curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg | ||
| - echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null | ||
| - apt-get update | ||
| - apt-get install -y containerd.io |
There was a problem hiding this comment.
These commands write to /etc/apt/keyrings/*.gpg but the directory may not exist on a fresh Ubuntu image. Create /etc/apt/keyrings (e.g., mkdir -p /etc/apt/keyrings) before running gpg --dearmor to avoid provisioning failures.
| @@ -0,0 +1,3 @@ | |||
| package postgres | |||
|
|
|||
| // Dont remove this file | |||
There was a problem hiding this comment.
Spelling: "Dont" -> "Don't" (and consider adding a short reason why this file must exist).
| sg, err := p.sgSvc.GetGroup(ctx, sgName, cluster.VpcID) | ||
| if err == nil && sg != nil { | ||
| return nil | ||
| } | ||
|
|
||
| // Create | ||
| sg, err := p.sgSvc.CreateGroup(ctx, cluster.VpcID, sgName, "Kubernetes cluster security group") | ||
| p.logger.Info("creating cluster security group", "sg_name", sgName) | ||
| newSG, err := p.sgSvc.CreateGroup(ctx, cluster.VpcID, sgName, "Managed security group for K8s cluster "+cluster.Name) | ||
| if err != nil { | ||
| return err | ||
| return fmt.Errorf("failed to create security group: %w", err) |
There was a problem hiding this comment.
ensureClusterSecurityGroup() creates a new security group on any GetGroup error. This can mask real failures (e.g., DB/network) and potentially create duplicates. Consider only creating when the error is a definite "not found" case; otherwise return/wrap the GetGroup error.
| // Add default rules (K8s control plane ports) | ||
| rules := []domain.SecurityRule{ | ||
| {Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // API Server | ||
| {Protocol: "udp", PortMin: 4789, PortMax: 4789, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // VXLAN | ||
| {Protocol: "tcp", PortMin: 179, PortMax: 179, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // BGP | ||
| {Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // Kubelet | ||
| {Protocol: "tcp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort TCP | ||
| {Protocol: "udp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort UDP | ||
| } | ||
|
|
||
| for _, r := range rules { | ||
| _, _ = p.sgSvc.AddRule(ctx, sg.ID, r) | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: "0.0.0.0/0"}, | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 22, PortMax: 22, CIDR: "0.0.0.0/0"}, | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 2379, PortMax: 2380, CIDR: "10.0.0.0/8"}, // etcd (internal) | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: "10.0.0.0/8"}, // kubelet | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| func (p *KubeadmProvisioner) applyBaseSecurity(ctx context.Context, cluster *domain.Cluster, masterIP string) error { | ||
| exec, err := p.getExecutor(ctx, cluster, masterIP) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| p.logger.Info("applying base security manifests", "ip", masterIP, "isolation", cluster.NetworkIsolation) | ||
|
|
||
| var securityManifests string | ||
| if cluster.NetworkIsolation { | ||
| // Apply a default-deny ingress policy for the default namespace | ||
| securityManifests = ` | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: default-deny-ingress | ||
| namespace: default | ||
| spec: | ||
| podSelector: {} | ||
| policyTypes: | ||
| - Ingress | ||
| ` | ||
| } | ||
| if securityManifests != "" { | ||
| manifestFile := "/tmp/base-security.yaml" | ||
| _, _ = exec.Run(ctx, fmt.Sprintf("cat <<EOF > %s\n%s\nEOF", manifestFile, securityManifests)) | ||
| _, err = exec.Run(ctx, fmt.Sprintf("%s apply -f %s", kubectlBase, manifestFile)) | ||
| return err | ||
| for _, rule := range rules { | ||
| _, _ = p.sgSvc.AddRule(ctx, newSG.ID, rule) | ||
| } |
There was a problem hiding this comment.
Errors from AddRule are ignored, so the cluster may proceed with an incomplete/incorrect security group configuration. Handle and return/report AddRule errors (and consider cleanup/rollback if rule creation fails).
| rules := []domain.SecurityRule{ | ||
| {Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // API Server | ||
| {Protocol: "udp", PortMin: 4789, PortMax: 4789, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // VXLAN | ||
| {Protocol: "tcp", PortMin: 179, PortMax: 179, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // BGP | ||
| {Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // Kubelet | ||
| {Protocol: "tcp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort TCP | ||
| {Protocol: "udp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort UDP | ||
| } | ||
|
|
||
| for _, r := range rules { | ||
| _, _ = p.sgSvc.AddRule(ctx, sg.ID, r) | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: "0.0.0.0/0"}, | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 22, PortMax: 22, CIDR: "0.0.0.0/0"}, | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 2379, PortMax: 2380, CIDR: "10.0.0.0/8"}, // etcd (internal) | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: "10.0.0.0/8"}, // kubelet |
There was a problem hiding this comment.
The default rules open SSH (22) and the Kubernetes API (6443) to 0.0.0.0/0. Unless this is strictly intended, this is a high-risk default. Consider restricting to VPC CIDR / admin CIDRs and/or making these CIDRs configurable per cluster.
| # Install Containerd | ||
| - curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg | ||
| - echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null | ||
| - apt-get update | ||
| - apt-get install -y containerd.io | ||
| - mkdir -p /etc/containerd |
There was a problem hiding this comment.
These commands write to /etc/apt/keyrings/*.gpg but the directory may not exist on a fresh Ubuntu image. Create /etc/apt/keyrings (e.g., mkdir -p /etc/apt/keyrings) before running gpg --dearmor to avoid provisioning failures.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 182 out of 183 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| CREATE TABLE IF NOT EXISTS clusters ( | ||
| id UUID PRIMARY KEY, | ||
| user_id UUID NOT NULL REFERENCES users(id), | ||
| vpc_id UUID NOT NULL REFERENCES vpcs(id), | ||
| name VARCHAR(255) NOT NULL, | ||
| version VARCHAR(50) NOT NULL DEFAULT 'v1.29.0', | ||
| status VARCHAR(50) NOT NULL DEFAULT 'pending', | ||
| worker_count INT NOT NULL DEFAULT 2, | ||
| ha_enabled BOOLEAN NOT NULL DEFAULT FALSE, | ||
| network_isolation BOOLEAN NOT NULL DEFAULT FALSE, | ||
| -- Networking | ||
| pod_cidr VARCHAR(50) NOT NULL DEFAULT '10.244.0.0/16', |
There was a problem hiding this comment.
This repository already contains 020_create_notify.* and 037_create_clusters_table.*. Introducing another 020_create_clusters.* means duplicate migration numbers and multiple “create clusters table” migrations, which can cause schema divergence (fresh DB vs existing DB) given the migrator reruns all .up.sql files without tracking. Prefer a new sequential migration (e.g., 083_...) that alters the existing clusters/cluster_nodes schema in-place.
| // UserData execution occurs asynchronously in a background goroutine. | ||
| // A brief pause ensures that the goroutine has sufficient time to initiate its Exec calls. | ||
| time.Sleep(50 * time.Millisecond) | ||
|
|
||
| // Verify the two-stage bootstrap sequence: | ||
| // Stage 1: Payload delivery (writing the bootstrap script to the container filesystem). | ||
| // Stage 2: Bootstrap execution (invoking the script via a background Exec operation). | ||
| // Each 'Exec' operation sequentially triggers ContainerExecCreate, ContainerExecAttach, | ||
| // and ContainerExecInspect according to the adapter's implementation. | ||
| require.Equal(t, 2, cli.CallCount("ContainerExecCreate"), "Expected dual Stage (delivery + execution) Exec invocations") | ||
| require.Equal(t, 2, cli.CallCount("ContainerExecAttach"), "Expected associated ExecAttach calls for I/O") | ||
| require.Equal(t, 2, cli.CallCount("ContainerExecInspect"), "Expected ExecInspect calls to verify termination state") | ||
| } |
There was a problem hiding this comment.
time.Sleep(50ms) makes this test timing-dependent and potentially flaky on slower CI runners. Prefer synchronizing on a deterministic signal (e.g., require.Eventually until the expected call counts are observed, or expose a hook/waitgroup in the adapter for testability).
| VolumeBinds []string `json:"volume_binds"` | ||
| Env []string `json:"env"` | ||
| Cmd []string `json:"cmd"` | ||
| CPULimit int64 `json:"cpu_limit"` | ||
| MemoryLimit int64 `json:"memory_limit"` | ||
| DiskLimit int64 `json:"disk_limit"` |
There was a problem hiding this comment.
New request fields (cpu_limit, memory_limit, disk_limit) are accepted and forwarded without validation. Negative values (or nonsensical limits) can now reach the service/provisioner layer. Add validation (e.g., must be >= 0, and optionally enforce reasonable upper bounds / units).
| @@ -0,0 +1,3 @@ | |||
| package postgres | |||
|
|
|||
| // Dont remove this file | |||
There was a problem hiding this comment.
Spelling/grammar: comment should be "Don't remove this file" (and ideally explain why it must exist, e.g., to ensure the package is included in test compilation).
| // Add default rules (K8s control plane ports) | ||
| rules := []domain.SecurityRule{ | ||
| {Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // API Server | ||
| {Protocol: "udp", PortMin: 4789, PortMax: 4789, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // VXLAN | ||
| {Protocol: "tcp", PortMin: 179, PortMax: 179, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // BGP | ||
| {Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // Kubelet | ||
| {Protocol: "tcp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort TCP | ||
| {Protocol: "udp", PortMin: 30000, PortMax: 32767, CIDR: AnyCIDR, Direction: domain.RuleIngress, Priority: 100}, // NodePort UDP | ||
| } | ||
|
|
||
| for _, r := range rules { | ||
| _, _ = p.sgSvc.AddRule(ctx, sg.ID, r) | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 6443, PortMax: 6443, CIDR: "0.0.0.0/0"}, | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 22, PortMax: 22, CIDR: "0.0.0.0/0"}, | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 2379, PortMax: 2380, CIDR: "10.0.0.0/8"}, // etcd (internal) | ||
| {Direction: domain.RuleIngress, Protocol: "tcp", PortMin: 10250, PortMax: 10250, CIDR: "10.0.0.0/8"}, // kubelet | ||
| } |
There was a problem hiding this comment.
Security group rules here allow SSH (22) and Kubernetes API server (6443) from 0.0.0.0/0 by default. This is an insecure default and can expose clusters to the public internet. Prefer scoping these CIDRs (e.g., VPC CIDR / admin IP allowlist) and/or make public access an explicit opt-in.
|
|
||
| swagger: | ||
| @$(HOME)/go/bin/swag init -d cmd/api,internal/handlers -g main.go -o docs/swagger --parseDependency --parseInternal | ||
| @/Users/poyrazk/go/bin/swag init -d ./ -g cmd/api/main.go -o docs/swagger --parseDependency --parseInternal |
There was a problem hiding this comment.
The swagger target uses an absolute, user-specific path (/Users/poyrazk/...) which will break for other developers and in CI. Use swag from PATH (or a configurable var like SWAG ?= $(HOME)/go/bin/swag) and keep paths repo-relative.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 182 out of 183 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| workerInst, err := p.instSvc.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{ | ||
| Name: workerName, | ||
| ImageName: "ubuntu-22.04", | ||
| NetworkID: cluster.VpcID.String(), | ||
| UserData: userData, | ||
| }) |
There was a problem hiding this comment.
Worker nodes are launched with image name "ubuntu-22.04" while control-plane uses "ubuntu:22.04". "ubuntu-22.04" is not a standard Docker image tag and is likely to fail pulls; consider using the same canonical image reference consistently.
| // UserData execution occurs asynchronously in a background goroutine. | ||
| // A brief pause ensures that the goroutine has sufficient time to initiate its Exec calls. | ||
| time.Sleep(50 * time.Millisecond) | ||
|
|
There was a problem hiding this comment.
This test comment says UserData execution is asynchronous and uses a fixed sleep, but DockerAdapter.handleUserData executes synchronously in LaunchInstanceWithOptions. The sleep is unnecessary and can make the test slower/flakier; consider removing it or switching to require.Eventually only if the implementation is actually async.
| @@ -0,0 +1,3 @@ | |||
| package postgres | |||
|
|
|||
| // Dont remove this file | |||
There was a problem hiding this comment.
The comment has a typo and is unclear. If the intent is to keep the package’s *_test build working, update the wording (e.g., “Don’t remove this file”) and briefly explain why it’s needed.
| kubernetesVersion: stable | ||
| controlPlaneEndpoint: "{{ if .HAEnabled }}{{ .LBAddress }}{{ else }}localhost{{ end }}:6443" | ||
| networking: | ||
| podSubnet: "{{ .PodCIDR }}" | ||
| serviceSubnet: "{{ .ServiceCIDR }}" |
There was a problem hiding this comment.
When HA is disabled, setting kubeadm controlPlaneEndpoint to "localhost:6443" will produce a kubeconfig/server endpoint that’s not reachable from outside the control-plane node (and may break worker joins). Consider omitting controlPlaneEndpoint unless HA is enabled, or set it to a reachable address (e.g., the node IP) at runtime.
This PR introduces several enhancements to the instance provisioning logic: