fix(exec): support Windows pods in mixed clusters (#687)#730
Merged
Conversation
Detect Windows pods via a 3-tier check — pod.Spec.OS.Name (GA in 1.25) first, then pod.Spec.NodeSelector kubernetes.io/os, then the scheduled node's labels — and route Windows pods through cmd.exe instead of the POSIX sh -c shell-detection script. The Windows exec script prefers PowerShell when installed (`where powershell` probe) and falls back to cmd.exe, so Nano Server images (no PowerShell) keep working. ?shell= override still wins; the --pod-shell-default fallback is documented as POSIX-only by contract. Also recognize hcs::System::CreateProcess errors as shell-not-found so the frontend's "Start debug container" CTA fires when cmd.exe itself is missing or an operator-supplied ?shell= is wrong. Comparison reference: k9s, Headlamp, and Freelens all handle Windows exec, but only k9s falls through to the node's labels when the pod's nodeSelector is absent — a common case since Windows pods are usually scheduled via taints/admission rather than explicit selectors. None of the three check Spec.OS.Name; we do.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 94e6849. Configure here.
- Strip ticket refs from new comments; drop oversized docstrings that duplicated identifier names or speculated about future flags. - Remove the nil-pod defensive check in detectPodOS — caller guards on the pod fetch, so nil would be a bug worth panicking on. - Log node-label lookup failures in detectPodOS tier 3; previously swallowed silently, leaving Windows users with no breadcrumb. - Add empty-string nodeSelector value test case to guard against a refactor that drops the `v != ""` check in osFromLabels.
Same class of "defensive against can't-happen" as the nil-pod check already removed. handlePodExec guards client != nil before any of this runs, so nodeLabelsLookupFor never sees nil, so detectPodOS never sees nil lookupNode.
| if overrideShell == "" && DefaultPodShellCommand == "" { | ||
| pod, err := client.CoreV1().Pods(namespace).Get(r.Context(), podName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| log.Printf("[exec] OS detection skipped for %s/%s (assuming Linux): %v", namespace, podName, err) |
| if overrideShell == "" && DefaultPodShellCommand == "" { | ||
| pod, err := client.CoreV1().Pods(namespace).Get(r.Context(), podName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| log.Printf("[exec] OS detection skipped for %s/%s (assuming Linux): %v", namespace, podName, err) |
The short-circuit at `overrideShell == "" && DefaultPodShellCommand == ""` meant that operators using --pod-shell-default never ran OS detection, so Windows pods fell through to `sh -c <fallback>` and hit the same hcs::System::CreateProcess failure #687 is supposed to fix. defaultExecCommand's documented precedence puts Windows AHEAD of the POSIX-only fallback for exactly this reason; the caller just wasn't giving it the podOS signal to act on. Drop DefaultPodShellCommand from the short-circuit so detection runs whenever ?shell= isn't explicit.
- defaultExecCommand only branches on `podOS == "windows"`; testing "linux" and "" both cover the same fall-through path. Drop one. - The case-normalization test exercised strings.ToLower on a value the apiserver doesn't emit (spec says lowercase only). - Handler comment was repeating DefaultPodShellCommand's doc; just pin the contract the bug exposed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Fixes #687.
Opening a terminal on a Windows pod fails today because Radar's exec handler always wraps the command in
sh -c "<bash/ash/sh detection script>"— Windows containers have no POSIX shell, so the runtime errors out withhcs::System::CreateProcess: ... The system cannot find the file specified.What changes
Detect Windows pods, three-tier:
pod.Spec.OS.Name— GA in K8s 1.25, the field explicitly designed for thispod.Spec.NodeSelector["kubernetes.io/os"](withbeta.kubernetes.io/osfallback)If RBAC denies
get nodes, tier 3 is skipped and we default to the Linux path (matches pre-existing behavior).Route Windows pods through cmd.exe:
PowerShell is preferred when present, with
cmd.exeas fallback so Nano Server images (no PowerShell installed) still work.Preserve operator escape hatches:
?shell=override still wins on any pod.--pod-shell-defaultstays POSIX-only by contract (documented on the var) — Windows pods always use the built-in Windows script. The pod fetch + OS detection only happens when neither override is set, so Linux users see zero added work.Error classification:
isShellNotFoundErrornow recognizeshcs::System::CreateProcessandthe system cannot find the fileso the frontend's "Start debug container" CTA fires ifcmd.exeitself is missing or an operator-supplied?shell=is wrong.Comparison with competing tools
[powershell.exe, cmd.exe]client-sidepowershellhardcoded, no fallbackpod.Spec.OS.Name→ nodeSelector → node labelWe're the only one that checks
pod.Spec.OS.Namefirst — the authoritative signal. The node-label tier matters because most Windows-bound pods don't carry an explicit nodeSelector (they rely on taints + admission); Headlamp and Freelens miss this and fail open.Testing
13 new unit test cases in
internal/server/exec_test.go:TestDefaultExecCommand— 4 new Windows cases (auto-detect, fallback-ignored, override-still-wins on Windows, explicit linux)TestWindowsDefaultShellScript— tripwire pinning the script + behavioral asserts on thewhere powershellprobe and|| cmdfallbackTestDetectPodOS— 10 cases covering all three tiers, label preference, node-fetch RBAC failure, nil podTestIsShellNotFoundError— extended with the verbatim error from Cannot Access Terminal on Windows pods on mixed cluster with Linux and Windows nodes #687 + a non-English-locale variantVerification I can't do: I don't have a Windows-node cluster. The shell argv matches what k9s has been running in production for years, but a real end-to-end test on a
windowsServerContainerspod would be valuable. If a maintainer can test or the issue reporter is willing, that would close the loop.Note
Medium Risk
Adds OS-detection and Windows-specific exec command routing, including new Kubernetes API calls to fetch pods/nodes; behavior changes could impact terminal exec in clusters with unusual RBAC or pod scheduling setups.
Overview
Exec sessions now detect the target pod’s OS (via
pod.spec.os.name,nodeSelectorlabels, or scheduled node labels) and choose an appropriate shell command: Windows pods runcmd.exe /cwith a PowerShell-or-cmd fallback, while Linux pods retain the existingsh -cbehavior and?shell=override still wins.The exec handler now fetches the Pod (and sometimes Node labels) when no explicit
?shell=is provided, andisShellNotFoundErroris extended to recognize Windowshcs::System::CreateProcess-style “executable missing” errors so the frontend can show the correct remediation CTA. Unit tests are expanded to cover the new precedence rules, OS detection tiers, and Windows error patterns.Reviewed by Cursor Bugbot for commit 03a1eb6. Bugbot is set up for automated code reviews on this repo. Configure here.