Skip to content

fix(exec): support Windows pods in mixed clusters (#687)#730

Merged
nadaverell merged 5 commits into
mainfrom
fix/windows-pod-exec-687
May 20, 2026
Merged

fix(exec): support Windows pods in mixed clusters (#687)#730
nadaverell merged 5 commits into
mainfrom
fix/windows-pod-exec-687

Conversation

@nadaverell
Copy link
Copy Markdown
Contributor

@nadaverell nadaverell commented May 19, 2026

Fixes #687.

Opening a terminal on a Windows pod fails today because Radar's exec handler always wraps the command in sh -c "<bash/ash/sh detection script>" — Windows containers have no POSIX shell, so the runtime errors out with hcs::System::CreateProcess: ... The system cannot find the file specified.

What changes

Detect Windows pods, three-tier:

  1. pod.Spec.OS.Name — GA in K8s 1.25, the field explicitly designed for this
  2. pod.Spec.NodeSelector["kubernetes.io/os"] (with beta.kubernetes.io/os fallback)
  3. The scheduled node's labels — for pods that omit the selector and rely on default node-affinity

If RBAC denies get nodes, tier 3 is skipped and we default to the Linux path (matches pre-existing behavior).

Route Windows pods through cmd.exe:

cmd.exe /c "where powershell >/dev/null 2>&1 && powershell || cmd"

PowerShell is preferred when present, with cmd.exe as fallback so Nano Server images (no PowerShell installed) still work.

Preserve operator escape hatches: ?shell= override still wins on any pod. --pod-shell-default stays POSIX-only by contract (documented on the var) — Windows pods always use the built-in Windows script. The pod fetch + OS detection only happens when neither override is set, so Linux users see zero added work.

Error classification: isShellNotFoundError now recognizes hcs::System::CreateProcess and the system cannot find the file so the frontend's "Start debug container" CTA fires if cmd.exe itself is missing or an operator-supplied ?shell= is wrong.

Comparison with competing tools

Tool Detection Shell
k9s nodeSelector → node label `cmd /c "where powershell ... && powershell
Headlamp nodeSelector only tries [powershell.exe, cmd.exe] client-side
Freelens nodeSelector only powershell hardcoded, no fallback
Radar (this PR) pod.Spec.OS.Name → nodeSelector → node label same as k9s

We're the only one that checks pod.Spec.OS.Name first — the authoritative signal. The node-label tier matters because most Windows-bound pods don't carry an explicit nodeSelector (they rely on taints + admission); Headlamp and Freelens miss this and fail open.

Testing

13 new unit test cases in internal/server/exec_test.go:

  • TestDefaultExecCommand — 4 new Windows cases (auto-detect, fallback-ignored, override-still-wins on Windows, explicit linux)
  • TestWindowsDefaultShellScript — tripwire pinning the script + behavioral asserts on the where powershell probe and || cmd fallback
  • TestDetectPodOS — 10 cases covering all three tiers, label preference, node-fetch RBAC failure, nil pod
  • TestIsShellNotFoundError — extended with the verbatim error from Cannot Access Terminal on Windows pods on mixed cluster with Linux and Windows nodes #687 + a non-English-locale variant

Verification I can't do: I don't have a Windows-node cluster. The shell argv matches what k9s has been running in production for years, but a real end-to-end test on a windowsServerContainers pod would be valuable. If a maintainer can test or the issue reporter is willing, that would close the loop.


Note

Medium Risk
Adds OS-detection and Windows-specific exec command routing, including new Kubernetes API calls to fetch pods/nodes; behavior changes could impact terminal exec in clusters with unusual RBAC or pod scheduling setups.

Overview
Exec sessions now detect the target pod’s OS (via pod.spec.os.name, nodeSelector labels, or scheduled node labels) and choose an appropriate shell command: Windows pods run cmd.exe /c with a PowerShell-or-cmd fallback, while Linux pods retain the existing sh -c behavior and ?shell= override still wins.

The exec handler now fetches the Pod (and sometimes Node labels) when no explicit ?shell= is provided, and isShellNotFoundError is extended to recognize Windows hcs::System::CreateProcess-style “executable missing” errors so the frontend can show the correct remediation CTA. Unit tests are expanded to cover the new precedence rules, OS detection tiers, and Windows error patterns.

Reviewed by Cursor Bugbot for commit 03a1eb6. Bugbot is set up for automated code reviews on this repo. Configure here.

Detect Windows pods via a 3-tier check — pod.Spec.OS.Name (GA in 1.25)
first, then pod.Spec.NodeSelector kubernetes.io/os, then the scheduled
node's labels — and route Windows pods through cmd.exe instead of the
POSIX sh -c shell-detection script.

The Windows exec script prefers PowerShell when installed (`where
powershell` probe) and falls back to cmd.exe, so Nano Server images
(no PowerShell) keep working. ?shell= override still wins; the
--pod-shell-default fallback is documented as POSIX-only by contract.

Also recognize hcs::System::CreateProcess errors as shell-not-found so
the frontend's "Start debug container" CTA fires when cmd.exe itself is
missing or an operator-supplied ?shell= is wrong.

Comparison reference: k9s, Headlamp, and Freelens all handle Windows
exec, but only k9s falls through to the node's labels when the pod's
nodeSelector is absent — a common case since Windows pods are usually
scheduled via taints/admission rather than explicit selectors. None of
the three check Spec.OS.Name; we do.
@nadaverell nadaverell requested a review from hisco as a code owner May 19, 2026 11:18
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 94e6849. Configure here.

Comment thread internal/server/exec.go
- Strip ticket refs from new comments; drop oversized docstrings that
  duplicated identifier names or speculated about future flags.
- Remove the nil-pod defensive check in detectPodOS — caller guards on
  the pod fetch, so nil would be a bug worth panicking on.
- Log node-label lookup failures in detectPodOS tier 3; previously
  swallowed silently, leaving Windows users with no breadcrumb.
- Add empty-string nodeSelector value test case to guard against a
  refactor that drops the `v != ""` check in osFromLabels.
Same class of "defensive against can't-happen" as the nil-pod check
already removed. handlePodExec guards client != nil before any of this
runs, so nodeLabelsLookupFor never sees nil, so detectPodOS never sees
nil lookupNode.
Comment thread internal/server/exec.go
if overrideShell == "" && DefaultPodShellCommand == "" {
pod, err := client.CoreV1().Pods(namespace).Get(r.Context(), podName, metav1.GetOptions{})
if err != nil {
log.Printf("[exec] OS detection skipped for %s/%s (assuming Linux): %v", namespace, podName, err)
Comment thread internal/server/exec.go
if overrideShell == "" && DefaultPodShellCommand == "" {
pod, err := client.CoreV1().Pods(namespace).Get(r.Context(), podName, metav1.GetOptions{})
if err != nil {
log.Printf("[exec] OS detection skipped for %s/%s (assuming Linux): %v", namespace, podName, err)
The short-circuit at `overrideShell == "" && DefaultPodShellCommand == ""`
meant that operators using --pod-shell-default never ran OS detection,
so Windows pods fell through to `sh -c <fallback>` and hit the same
hcs::System::CreateProcess failure #687 is supposed to fix.

defaultExecCommand's documented precedence puts Windows AHEAD of the
POSIX-only fallback for exactly this reason; the caller just wasn't
giving it the podOS signal to act on. Drop DefaultPodShellCommand from
the short-circuit so detection runs whenever ?shell= isn't explicit.
- defaultExecCommand only branches on `podOS == "windows"`; testing
  "linux" and "" both cover the same fall-through path. Drop one.
- The case-normalization test exercised strings.ToLower on a value
  the apiserver doesn't emit (spec says lowercase only).
- Handler comment was repeating DefaultPodShellCommand's doc; just
  pin the contract the bug exposed.
@nadaverell nadaverell merged commit 287b5ca into main May 20, 2026
8 checks passed
@nadaverell nadaverell deleted the fix/windows-pod-exec-687 branch May 20, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot Access Terminal on Windows pods on mixed cluster with Linux and Windows nodes

2 participants