fix(exec): support Windows pods in mixed clusters (#687) by nadaverell · Pull Request #730 · skyhook-io/radar

nadaverell · 2026-05-19T11:18:33Z

Fixes #687.

Opening a terminal on a Windows pod fails today because Radar's exec handler always wraps the command in sh -c "<bash/ash/sh detection script>" — Windows containers have no POSIX shell, so the runtime errors out with hcs::System::CreateProcess: ... The system cannot find the file specified.

What changes

Detect Windows pods, three-tier:

pod.Spec.OS.Name — GA in K8s 1.25, the field explicitly designed for this
pod.Spec.NodeSelector["kubernetes.io/os"] (with beta.kubernetes.io/os fallback)
The scheduled node's labels — for pods that omit the selector and rely on default node-affinity

If RBAC denies get nodes, tier 3 is skipped and we default to the Linux path (matches pre-existing behavior).

Route Windows pods through cmd.exe:

cmd.exe /c "where powershell >/dev/null 2>&1 && powershell || cmd"

PowerShell is preferred when present, with cmd.exe as fallback so Nano Server images (no PowerShell installed) still work.

Preserve operator escape hatches: ?shell= override still wins on any pod. --pod-shell-default stays POSIX-only by contract (documented on the var) — Windows pods always use the built-in Windows script. The pod fetch + OS detection only happens when neither override is set, so Linux users see zero added work.

Error classification: isShellNotFoundError now recognizes hcs::System::CreateProcess and the system cannot find the file so the frontend's "Start debug container" CTA fires if cmd.exe itself is missing or an operator-supplied ?shell= is wrong.

Comparison with competing tools

Tool	Detection	Shell
k9s	nodeSelector → node label	`cmd /c "where powershell ... && powershell
Headlamp	nodeSelector only	tries `[powershell.exe, cmd.exe]` client-side
Freelens	nodeSelector only	`powershell` hardcoded, no fallback
Radar (this PR)	`pod.Spec.OS.Name` → nodeSelector → node label	same as k9s

We're the only one that checks pod.Spec.OS.Name first — the authoritative signal. The node-label tier matters because most Windows-bound pods don't carry an explicit nodeSelector (they rely on taints + admission); Headlamp and Freelens miss this and fail open.

Testing

13 new unit test cases in internal/server/exec_test.go:

TestDefaultExecCommand — 4 new Windows cases (auto-detect, fallback-ignored, override-still-wins on Windows, explicit linux)
TestWindowsDefaultShellScript — tripwire pinning the script + behavioral asserts on the where powershell probe and || cmd fallback
TestDetectPodOS — 10 cases covering all three tiers, label preference, node-fetch RBAC failure, nil pod
TestIsShellNotFoundError — extended with the verbatim error from Cannot Access Terminal on Windows pods on mixed cluster with Linux and Windows nodes #687 + a non-English-locale variant

Verification I can't do: I don't have a Windows-node cluster. The shell argv matches what k9s has been running in production for years, but a real end-to-end test on a windowsServerContainers pod would be valuable. If a maintainer can test or the issue reporter is willing, that would close the loop.

Note

Medium Risk
Adds OS-detection and Windows-specific exec command routing, including new Kubernetes API calls to fetch pods/nodes; behavior changes could impact terminal exec in clusters with unusual RBAC or pod scheduling setups.

Overview
Exec sessions now detect the target pod’s OS (via pod.spec.os.name, nodeSelector labels, or scheduled node labels) and choose an appropriate shell command: Windows pods run cmd.exe /c with a PowerShell-or-cmd fallback, while Linux pods retain the existing sh -c behavior and ?shell= override still wins.

The exec handler now fetches the Pod (and sometimes Node labels) when no explicit ?shell= is provided, and isShellNotFoundError is extended to recognize Windows hcs::System::CreateProcess-style “executable missing” errors so the frontend can show the correct remediation CTA. Unit tests are expanded to cover the new precedence rules, OS detection tiers, and Windows error patterns.

^{Reviewed by Cursor Bugbot for commit 03a1eb6. Bugbot is set up for automated code reviews on this repo. Configure here.}

Detect Windows pods via a 3-tier check — pod.Spec.OS.Name (GA in 1.25) first, then pod.Spec.NodeSelector kubernetes.io/os, then the scheduled node's labels — and route Windows pods through cmd.exe instead of the POSIX sh -c shell-detection script. The Windows exec script prefers PowerShell when installed (`where powershell` probe) and falls back to cmd.exe, so Nano Server images (no PowerShell) keep working. ?shell= override still wins; the --pod-shell-default fallback is documented as POSIX-only by contract. Also recognize hcs::System::CreateProcess errors as shell-not-found so the frontend's "Start debug container" CTA fires when cmd.exe itself is missing or an operator-supplied ?shell= is wrong. Comparison reference: k9s, Headlamp, and Freelens all handle Windows exec, but only k9s falls through to the node's labels when the pod's nodeSelector is absent — a common case since Windows pods are usually scheduled via taints/admission rather than explicit selectors. None of the three check Spec.OS.Name; we do.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 94e6849. Configure here.}

- Strip ticket refs from new comments; drop oversized docstrings that duplicated identifier names or speculated about future flags. - Remove the nil-pod defensive check in detectPodOS — caller guards on the pod fetch, so nil would be a bug worth panicking on. - Log node-label lookup failures in detectPodOS tier 3; previously swallowed silently, leaving Windows users with no breadcrumb. - Add empty-string nodeSelector value test case to guard against a refactor that drops the `v != ""` check in osFromLabels.

Same class of "defensive against can't-happen" as the nil-pod check already removed. handlePodExec guards client != nil before any of this runs, so nodeLabelsLookupFor never sees nil, so detectPodOS never sees nil lookupNode.

+	if overrideShell == "" && DefaultPodShellCommand == "" {
+		pod, err := client.CoreV1().Pods(namespace).Get(r.Context(), podName, metav1.GetOptions{})
+		if err != nil {
+			log.Printf("[exec] OS detection skipped for %s/%s (assuming Linux): %v", namespace, podName, err)


+	if overrideShell == "" && DefaultPodShellCommand == "" {
+		pod, err := client.CoreV1().Pods(namespace).Get(r.Context(), podName, metav1.GetOptions{})
+		if err != nil {
+			log.Printf("[exec] OS detection skipped for %s/%s (assuming Linux): %v", namespace, podName, err)


The short-circuit at `overrideShell == "" && DefaultPodShellCommand == ""` meant that operators using --pod-shell-default never ran OS detection, so Windows pods fell through to `sh -c <fallback>` and hit the same hcs::System::CreateProcess failure #687 is supposed to fix. defaultExecCommand's documented precedence puts Windows AHEAD of the POSIX-only fallback for exactly this reason; the caller just wasn't giving it the podOS signal to act on. Drop DefaultPodShellCommand from the short-circuit so detection runs whenever ?shell= isn't explicit.

- defaultExecCommand only branches on `podOS == "windows"`; testing "linux" and "" both cover the same fall-through path. Drop one. - The case-normalization test exercised strings.ToLower on a value the apiserver doesn't emit (spec says lowercase only). - Handler comment was repeating DefaultPodShellCommand's doc; just pin the contract the bug exposed.

nadaverell requested a review from hisco as a code owner May 19, 2026 11:18

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread internal/server/exec.go

nadaverell added 2 commits May 19, 2026 14:25

exec: drop defensive nil checks for client/lookupNode

52606f4

Same class of "defensive against can't-happen" as the nil-pod check already removed. handlePodExec guards client != nil before any of this runs, so nodeLabelsLookupFor never sees nil, so detectPodOS never sees nil lookupNode.

github-advanced-security AI found potential problems May 19, 2026

View reviewed changes

nadaverell added 2 commits May 19, 2026 14:57

nadaverell merged commit 287b5ca into main May 20, 2026
8 checks passed

nadaverell deleted the fix/windows-pod-exec-687 branch May 20, 2026 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(exec): support Windows pods in mixed clusters (#687)#730

fix(exec): support Windows pods in mixed clusters (#687)#730
nadaverell merged 5 commits into
mainfrom
fix/windows-pod-exec-687

nadaverell commented May 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nadaverell commented May 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes

Comparison with competing tools

Testing

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nadaverell commented May 19, 2026 •

edited by cursor Bot

Loading