[Bug] ROLLBACK_COMPLETE stacks pass compatibility check with misleading log

## What happened

When a `create nodegroup` run fails and leaves a CloudFormation stack in `ROLLBACK_COMPLETE` state, a subsequent `eksctl create nodegroup` with the same config silently does nothing and logs:

```
[ℹ]  checking security group configuration for all nodegroups
[ℹ]  all nodegroups have up-to-date cloudformation templates
```

The user sees a reassuring "all good" message and only discovers later that the nodegroup is still broken. The `ROLLBACK_COMPLETE` stack is unusable (CloudFormation refuses to update it), yet eksctl treats it as a healthy existing nodegroup, excludes it from the create plan, and exits successfully.

Related: #4006 (same symptom, closed by stale-bot without a fix or root-cause analysis).

## What was expected

eksctl should either warn/error that one or more nodegroup stacks are in `ROLLBACK_COMPLETE` and point to a fix, or recreate the rolled-back stack automatically. At a minimum, it must not claim "all nodegroups have up-to-date cloudformation templates" when it has not checked template freshness at all.

## Steps to reproduce

1. `eksctl create nodegroup` with a config that causes CloudFormation to fail mid-create (e.g. an invalid launch template).
2. Observe the nodegroup stack reaches `ROLLBACK_COMPLETE`.
3. Fix the config issue and re-run `eksctl create nodegroup` with the same config file.
4. The rolled-back stack is silently treated as an existing nodegroup, excluded from the create plan, the compatibility check runs, `all nodegroups have up-to-date cloudformation templates` is logged, and the command exits 0 without creating anything.

## Environment

Reproduces against current `master` at commit `b86e8bdfb`. Not a regression from a specific recent version — the code paths involved have existed for a long time.

## Root cause

Two independent bugs combine:

### 1. Misleading log line in `ValidateExistingNodeGroupsForCompatibility`

`pkg/eks/nodegroup_service.go` line 309:

```go
logger.Info(\"all nodegroups have up-to-date cloudformation templates\")
```

The function (`pkg/eks/nodegroup_service.go` lines 282-322) does not check template freshness. It only checks whether existing nodegroup stacks expose the `NodeGroupFeatureSharedSecurityGroup` CloudFormation output via `isNodeGroupCompatible` (`pkg/eks/compatibility.go` lines 44-96). The log message dates from that narrow shared-security-group compatibility check but reads like a general "your stacks are clean and current" health assertion.

### 2. `ROLLBACK_COMPLETE` is grouped with healthy terminal states

`pkg/cfn/manager/api.go` lines 533-550:

```go
func (*StackCollection) StackStatusIsNotTransitional(s *Stack) bool {
    for _, state := range nonTransitionalReadyStackStatuses() {
        if s.StackStatus == state {
            return true
        }
    }
    return false
}

func nonTransitionalReadyStackStatuses() []types.StackStatus {
    return []types.StackStatus{
        types.StackStatusCreateComplete,
        types.StackStatusUpdateComplete,
        types.StackStatusRollbackComplete,        // <-- problem 
        types.StackStatusUpdateRollbackComplete,
    }
}
```

`ROLLBACK_COMPLETE` means the stack's initial `CREATE` failed and was rolled back — all resources are gone, only the empty stack shell remains, and CloudFormation refuses to update it. Grouping it with `CREATE_COMPLETE`/`UPDATE_COMPLETE` causes callers to treat broken stacks as healthy. Note: `UPDATE_ROLLBACK_COMPLETE` *is* genuinely healthy (a failed update rolled back to a known-good state), so only `ROLLBACK_COMPLETE` is wrong here.

### 3. The create-nodegroup filter excludes rolled-back stacks

Separately — and this is why the command exits 0 without creating anything — `NodeGroupFilter.SetOnlyLocal` (`pkg/ctl/cmdutils/filter/nodegroup_filter.go` line 80) calls `loadLocalAndRemoteNodegroups`, which treats any existing stack (including `ROLLBACK_COMPLETE`) as a "remote" nodegroup to be excluded from creation. The underlying `ListNodeGroupStacks` (`pkg/cfn/manager/nodegroup.go` line 238) only filters out `DELETE_COMPLETE`/`DELETE_FAILED`, so `ROLLBACK_COMPLETE` stacks pass through as "existing".

## Suggested fix direction

1. **Fail fast in `SetOnlyLocal`** when a nodegroup in the user's config has an existing stack in `ROLLBACK_COMPLETE` — surface an actionable error like `nodegroup(s) %q have a CloudFormation stack in ROLLBACK_COMPLETE state; delete the failed stack(s) first with 'eksctl delete nodegroup --region=%s --cluster=%s --name=<name>' and then retry`.
2. **Reword the log line** at `pkg/eks/nodegroup_service.go:309` to describe what is actually checked (shared-SG compatibility), not template freshness.
3. **Remove `ROLLBACK_COMPLETE` from `nonTransitionalReadyStackStatuses`**. The helper has only one caller (`StackStatusIsNotTransitional` → `ValidateExistingNodeGroupsForCompatibility`), so the blast radius is tiny and the semantic fix is desirable there too. Keep `UPDATE_ROLLBACK_COMPLETE`. Do *not* touch `allNonDeletedStackStatuses` — `delete`/`describe` paths legitimately need to see `ROLLBACK_COMPLETE` stacks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] ROLLBACK_COMPLETE stacks pass compatibility check with misleading log #8712

What happened

What was expected

Steps to reproduce

Environment

Root cause

1. Misleading log line in `ValidateExistingNodeGroupsForCompatibility`

2. `ROLLBACK_COMPLETE` is grouped with healthy terminal states

3. The create-nodegroup filter excludes rolled-back stacks

Suggested fix direction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] ROLLBACK_COMPLETE stacks pass compatibility check with misleading log #8712

Description

What happened

What was expected

Steps to reproduce

Environment

Root cause

1. Misleading log line in ValidateExistingNodeGroupsForCompatibility

2. ROLLBACK_COMPLETE is grouped with healthy terminal states

3. The create-nodegroup filter excludes rolled-back stacks

Suggested fix direction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Misleading log line in `ValidateExistingNodeGroupsForCompatibility`

2. `ROLLBACK_COMPLETE` is grouped with healthy terminal states