Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows kubelet won't start "Failed to create listener for podResources endpoint" #78628

Closed
PatrickLang opened this issue Jun 2, 2019 · 7 comments · Fixed by #78704

Comments

@PatrickLang
Copy link
Contributor

commented Jun 2, 2019

What happened:

I replaced the kubelet on one of my 1.14 nodes with a build from master, and now I'm getting this error starting up.

Test passes since 5/31/2019 commit b7fa33e have been failing on testgrid. If you look at the logs, the Windows nodes are not coming up. https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-aks-engine-azure-master-windows/1134583855047512068

F0602 02:35:46.663366   12224 server.go:179] Failed to create listener for podResources endpoint: parse npipe://%5Cvar%5Clib%5Ckubelet%5Cpod-resources//./pipe/kubelet: invalid URL escape "%5C"
goroutine 156 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog.stacks(0x72b9d01, 0x3, 0xc00041c280, 0xc1)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/klog.go:900 +0xb8
k8s.io/kubernetes/vendor/k8s.io/klog.(*loggingT).output(0x72b9d80, 0xc000000003, 0xc0003b6b60, 0x71c6d2d, 0x9, 0xb3, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/klog.go:815 +0xed
k8s.io/kubernetes/vendor/k8s.io/klog.(*loggingT).printf(0x72b9d80, 0x3, 0x40f4965, 0x37, 0xc000a4bea0, 0x1, 0x1)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/klog.go:727 +0x155
k8s.io/kubernetes/vendor/k8s.io/klog.Fatalf(...)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/klog.go:1309
k8s.io/kubernetes/pkg/kubelet/server.ListenAndServePodResources(0xc0002a9740, 0x3e, 0x9a1abb0, 0xc000289300, 0x9a1abd0, 0xc00078d320)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/server.go:179 +0x175
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).ListenAndServePodResources(0xc0002c0900)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:2216 +0x220
created by k8s.io/kubernetes/cmd/kubelet/app.startKubelet
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubelet/app/server.go:1103 +0x123

What you expected to happen:
No errors starting up kubelet

How to reproduce it (as minimally and precisely as possible):
Build master, then try to run the kubelet on Windows

Anything else we need to know?:

/sig windows

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration: Azure
  • OS (e.g: cat /etc/os-release): Windows Server 2019
  • Kernel (e.g. uname -a): 10.0.17763.529
  • Install tools: aks-engine
@PatrickLang

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

cc @adelina-t
/priority critical-urgent

@PatrickLang

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

/milestone v1.15

Release team - it looks like a recent PR broke windows nodes. We will need a fix here for 1.15

@adelina-t

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

Ok, so the problem here is a little more complicated and we're actually seeing a bug that is actually in the code for quite some time, just that we never used that particular code path.

But first:
TL;DR: Quick fix is to start the kubelet on Windows with: --feature-gates="KubeletPodResources=false"

Now, as to why this is happening:

  • PR [1] was merged 3 days ago and it graduates KubeletPodResources feature as Beta. That means kubelet will enable it by default.

  • With the introduction of the above feature, this branch of code is entered when starting kubelet [2]. We can see a call to the function: ListenAndServePodResources defined here [3].

  • While innocent enough, ListenAndServePodResources gets the necessary npipe by calling: util.LocalEndpoint(kl.getPodResourcesDir(), podresources.Socket). LocalEndpoint is defined here [4] for Windows . The problem is actually that url.URL can't handle Windows style paths [5] and will replace "" with "%5C" as it's to be expected. The paths are transformed along the way from the default "/var/lib/kubelet" to "\var\lib\kubelet" by kl.getPodResourcesDir() defined here [6]. More specifically in the use of filepath.join to create a path. It's inputs are unix style "/path/to/something" but filepath.join returns the path Windows style "\path\to\something\else" as it's to be expected.

  • The solution is to replace the usage of url.URL in LocalEndpoint with some alternative package or just implement it.

[1] #77274
[2] https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/server.go#L1102-L1104
[3]

func (kl *Kubelet) ListenAndServePodResources() {
server.ListenAndServePodResources(util.LocalEndpoint(kl.getPodResourcesDir(), podresources.Socket), kl.podManager, kl.containerManager)
}

[4] https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/util/util_windows.go#L111-L117
[5] https://play.golang.org/p/a_hH99LP6O9
[6] https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_getters.go#L158-L160

@ddebroy

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

Thanks @adelina-t for the analysis! I was a bit concerned around the fact that this started showing up around my commit: b7fa33e but the root cause here is indeed, as you said, the graduation of KubeletPodResources feature to Beta and the way the invalid escape sequences are being introduced.

@yujuhong

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

PatrickLang added a commit to PatrickLang/kubernetes that referenced this issue Jun 3, 2019
@PatrickLang

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

I'm trying a quick s/\\/\// to see if anything else is broken. There are other functions in that same file doing that substitution already so it may make sense to use that as a workaround until a better solution is available.

yujuhong added a commit to yujuhong/test-infra that referenced this issue Jun 3, 2019
@PatrickLang

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

Still working on it - looks like the path to create the directories is also busted.

With 91a29c0 still getting F0603 19:57:15.224243 285656 server.go:181] Failed to create listener for podResources endpoint: open //./var/lib/kubelet/pod-resources//./pipe/kubelet: The system cannot find the path specified.

PatrickLang added a commit to PatrickLang/kubernetes that referenced this issue Jun 3, 2019
PatrickLang added a commit to PatrickLang/kubernetes that referenced this issue Jun 3, 2019
benmoss added a commit to benmoss/kubo-deployment that referenced this issue Jun 3, 2019
PatrickLang added a commit to PatrickLang/kubernetes that referenced this issue Jun 3, 2019
mtaufen added a commit to mtaufen/kubernetes that referenced this issue Jun 3, 2019
Disable KubeletPodResources on Windows
The feature caused tests to fail when it was enabled.

- kubernetes#78628

Work is in progress to fix the feature, but until that work is complete,
we will disable it in the GCE scripts.
chases2 added a commit to chases2/test-infra that referenced this issue Jun 4, 2019

@PatrickLang PatrickLang added this to In Progress+Review in SIG-Windows Jun 4, 2019

SIG-Windows automation moved this from In Progress+Review to Done (v1.15) Jun 4, 2019

mirandachrist added a commit to mirandachrist/test-infra that referenced this issue Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants
You can’t perform that action at this time.