Skip to content

Conversation

h3adex
Copy link
Contributor

@h3adex h3adex commented Sep 26, 2025

Description

This PR implements the changes proposed in stackitcloud/stackit-sdk-go#3587 to provide users with more feedback when a SKE cluster configuration returns an error.

The update introduces warnings in the following scenarios:

  • When a cluster enters an UNHEALTHY or  UNSPECIFIED state with structured errors.
  • When a cluster has been stuck in the CREATING or RECONCILING state for more than 15 minutes and contains structured error information.

Example Terraform configuration (to test warnings):

resource "stackit_network" "example_network" {
  project_id  = var.stackit_project_id
  name        = "example-ske"
  ipv4_prefix = "10.224.65.0/24"

  # Deliberately remove DNS configuration to trigger a DNS error (only available in SNA projects)
  ipv4_nameservers = ["1.1.1.1"]
}

resource "stackit_ske_cluster" "ske_cluster_01" {
  project_id             = var.stackit_project_id
  name                   = "secret-tes1"
  kubernetes_version_min = "1.33"

  network = {
    id = stackit_network.example_network.network_id
  }

  maintenance = {
    enable_kubernetes_version_updates    = true
    enable_machine_image_version_updates = true
    start                                = "01:00:00Z"
    end                                  = "02:00:00Z"
  }

  node_pools = [
    {
      name               = "standard"
      machine_type       = "g2i.4"
      minimum            = "3"
      maximum            = "9"
      max_surge          = "3"
      availability_zones = ["eu01-1", "eu01-2", "eu01-3"]
      os_version_min     = "4230.2.1"
      os_name            = "flatcar"
      volume_size        = 32
      volume_type        = "storage_premium_perf6"
    },
    {
      name               = "gpus"
      # This flavor is not supported in standard projects and will trigger an error
      machine_type       = "n3.56d.g4"
      minimum            = "3"
      maximum            = "9"
      max_surge          = "3"
      availability_zones = ["eu01-1", "eu01-2", "eu01-3"]
      os_version_min     = "2204.20250728.0"
      os_name            = "ubuntu"
      volume_size        = 32
      volume_type        = "storage_premium_perf6"
    },
  ]
}

This configuration creates two error scenarios. After 15 minutes of being stuck in the CREATING or RECONCILING state, warnings will be surfaced in the terminal

How it will look for the user:
Screenshot 2025-09-26 at 12 52 53

I've tested the go-sdk addition using replace in the go.mod. I pushed it for reference to the PR:

replace (
	github.com/stackitcloud/stackit-sdk-go/services/ske => /Users/uphoffm/GolandProjects/stackit-sdk-go/services/ske
)

Checklist

  • Issue was linked above
  • Code format was applied: make fmt
  • Examples were added / adjusted (see examples/ directory)
  • Docs are up-to-date: make generate-docs (will be checked by CI)
  • Unit tests got implemented or updated
  • Acceptance tests got implemented or updated (see e.g. here)
  • Unit tests are passing: make test (will be checked by CI)
  • No linter issues: make lint (will be checked by CI)

Signed-off-by: Mauritz Uphoff <mauritz.uphoff@stackit.cloud>
@h3adex h3adex force-pushed the feat/improve-ske-create-update-warnings branch from ac8eda7 to 0a7f859 Compare September 26, 2025 12:56
Copy link

github-actions bot commented Oct 4, 2025

This PR was marked as stale after 7 days of inactivity and will be closed after another 7 days of further inactivity. If this PR should be kept open, just add a comment, remove the stale label or push new commits to it.

@github-actions github-actions bot added the Stale PR is marked as stale due to inactivity. label Oct 4, 2025
@marceljk marceljk removed the Stale PR is marked as stale due to inactivity. label Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants