Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE2: Still connecting to unix://C:\\csi\\csi.sock on a Hybrid cluster #2578

Closed
sonergzn opened this issue Sep 29, 2023 · 2 comments
Closed

Comments

@sonergzn
Copy link

sonergzn commented Sep 29, 2023

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug or /kind feature

What happened:
I have a hybrid RKE2 cluster, running on my Vsphere platform.
So got 1 Windows 2022 DC node, 1 ubuntu 22.04 master and worker node. 3 nodes in total.

I have installed the vpshere CSI driver on the Rancher Kubernetes. The Vsphere CPI seems to be running fine.
The Vpshere CSI driver is also running fine on the linux node but, on my windows node i am getting an error that i can't figure it out.

The CSI-proxy on the windows node is also running fine (as a service).

According to pod logs the error that i am getting is:

[node-driver-registrar container]

"9636 connection.go:173] Still connecting to unix://C:\csi\csi.sock"
"9636 connection.go:173] Still connecting to unix://C:\csi\csi.sock"

[liveness-probe container]

2023-09-29T09:13:00.949026600+01:00 W0929 09:13:00.949026 3304 connection.go:173] Still connecting to unix:///csi/csi.sock
2023-09-29T09:13:10.948610400+01:00 W0929 09:13:10.948515 3304 connection.go:173] Still connecting to unix:///csi/csi.sock
2023-09-29T09:13:20.948573600+01:00 W0929 09:13:20.948463 3304 connection.go:173] Still connecting to unix:///csi/csi.sock
W0929 09:13:30.947937 3304 connection.go:173] Still connecting to unix:///csi/csi.sock

[vsphere-csi-node container]

{"level":"error","time":"2023-09-29T09:12:21.5234113+01:00","caller":"k8sorchestrator/k8sorchestrator.go:399","msg":"failed to fetch configmap internal-feature-states.csi.vsphere.vmware.com from namespace kube-system. Error: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/internal-feature-states.csi.vsphere.vmware.com\": dial tcp 10.43.0.1:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.","TraceId":"72b85973-1ec2-47ef-9300-ce49b8c07cb3","TraceId":"bdd838a3-6e12-4204-b107-74abb3f446ca","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.initFSS\n\t/build/pkg/csi/service/common/commonco/k8sorchestrator/k8sorchestrator.go:399\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.Newk8sOrchestrator\n\t/build/pkg/csi/service/common/commonco/k8sorchestrator/k8sorchestrator.go:272\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco.GetContainerOrchestratorInterface\n\t/build/pkg/csi/service/common/commonco/coagnostic.go:93\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:119\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
2023-09-29T09:12:21.523411300+01:00 {"level":"error","time":"2023-09-29T09:12:21.5234113+01:00","caller":"k8sorchestrator/k8sorchestrator.go:274","msg":"Failed to initialize the orchestrator. Error: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/internal-feature-states.csi.vsphere.vmware.com\": dial tcp 10.43.0.1:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connec
tion failed because connected host has failed to respond.","TraceId":"72b85973-1ec2-47ef-9300-ce49b8c07cb3","TraceId":"bdd838a3-6e12-4204-b107-74abb3f446ca","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.Newk8sOrchestrator\n\t/build/pkg/csi/service/common/commonco/k8sorchestrator/k8sorchestrator.go:274\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco.GetContainerOrchestratorInterface\n\t/build/pkg/csi/service/common/commonco/coagnostic.go:93\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:119\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
2023-09-29T09:12:21.523411300+01:00 {"level":"error","time":"2023-09-29T09:12:21.5234113+01:00","caller":"commonco/coagnostic.go:95","msg":"creating k8sOrchestratorInstance failed. Err: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/internal-feature-states.csi.vsphere.vmware.com\": dial tcp 10.43.0.1:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.","TraceId":"72b85973-1ec2-47ef-9300-ce49b8c07cb3","TraceId":"bdd838a3-6e12-4204-b107-74abb3f446ca","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco.GetContainerOrchestratorInterface\n\t/build/pkg/csi/service/common/commonco/coagnostic.go:95\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:119\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
2023-09-29T09:12:21.523411300+01:00 {"level":"error",
"time":"2023-09-29T09:12:21.5234113+01:00","caller":"service/driver.go:122","msg":"Failed to create CO agnostic interface. Error: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/internal-feature-states.csi.vsphere.vmware.com\": dial tcp 10.43.0.1:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.","TraceId":"72b85973-1ec2-47ef-9300-ce49b8c07cb3","TraceId":"bdd838a3-6e12-4204-b107-74abb3f446ca","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:122\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
2023-09-29T09:12:21.523411300+01:00 {"level":"info","time":"2023-09-29T09:12:21.5234113+01:00","caller":"service/driver.go:109","msg":"Configured: "csi.vsphere.vmware.com" with clusterFlavor: "VANILLA" and mode: ""","TraceId":"72b85973-1ec2-47ef-9300-ce49b8c07cb3","TraceId":"bdd838a3-6e12-4204-b107-74abb3f446ca"}
2023-09-29T09:12:21.523928100+01:00 {"level":"error","time":"2023-09-29T09:12:21.5234113+01:00","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/internal-feature-states.csi.vsphere.vmware.com\": dial tcp 10.43.0.1:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.","TraceId":"72b85973-1ec2-47ef-9300-ce49b8c07cb3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:203\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

What you expected to happen:
I would expect that it just runs fine after following the official docs on vmware.

How to reproduce it (as minimally and precisely as possible):
Not sure but if get a RKE2 cluster up and running with at least 1 windows worker node. os: 2022 DC.
And try to install the vpshere CPI and CSI following the official docs.

Anything else we need to know?:
I really searched this error a lot on the internet but could't find anything.
I've tried various different approaches to get this working but no success.
The csi.sock should be created during deployment as far as i know. It is working great on the linux node but fails on the windows node.
Not sure if i am missing something related to windows specific configuration.
VMname and the hostname of windows node are the same.

It is possible that i am missing something guys but just wanted to mention the issue here perhaps some expert has an idea or advise.
ps: During vsphere CSI installation, i've checked the "enable windows support" flag.

Environment:

  • RKE version: 2.7.6
  • csi-vsphere version: 3.0.1
  • vsphere-cloud-controller-manager version: 1.24.5
  • Kubernetes version: 1.25.13
  • vSphere version: 7.0.3
  • OS (e.g. from /etc/os-release): 2 ubuntu 22.04 and windows 2022 DataCenter
@divyenpatel
Copy link
Member

Seems Node Daemonsets pod are not able to connect to API server to fetch feature state config.

@sonergzn
Copy link
Author

sonergzn commented Oct 3, 2023

@divyenpatel Linux worker and master + windows worker nodes, they are all in the same network. It is working fine on Linux nodes, without any errors. It is only failing on the windows worker node.

  • I have disabled windows firewall on windows node.

also: CSI required CPI to be installed. CPI initializes all the nodes.
However, initializing of Windows node doesn't look to be completed successfully. Here are logs from CPI pod:

E1003 10:14:20.475089 1 node_controller.go:229] error syncing : failed to get provider ID for node at cloudprovider: failed to get instance ID from cloud provider: VM GuestNicInfo is empty, requeuing

Not sure if this is the root cause.

@sonergzn sonergzn closed this as not planned Won't fix, can't repro, duplicate, stale Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants