Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still connecting to unix:///... #139

Closed
eBeyond opened this issue Aug 7, 2020 · 40 comments
Closed

Still connecting to unix:///... #139

eBeyond opened this issue Aug 7, 2020 · 40 comments

Comments

@eBeyond
Copy link

eBeyond commented Aug 7, 2020

The hcloud-csi nodes and the controller are in Crashloops after istalling hcloud-csi as described.

The hcloud-csi-driver reports:
level=debug ts=2020-08-07T05:42:01.694472148Z msg="getting instance id from metadata service"

All other pods of the hcloud-csi-controller reporting:
Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
The pods of the hcloud-csi-nodes are reporting (except the hcloud-csi-driver, which reports the same mentioned above):
Still connecting to unix:///csi/csi.sock

It's a k8s 1.18.6 running in nbg-dc3 on private networks only. All of them have no public network-interfaces but having a firewall with nat allowed for everything. ipv6 is disabled for all nodes.

@eBeyond
Copy link
Author

eBeyond commented Aug 7, 2020

It seems that it's not possible to get an ID. It's calling http://169.254.169.254/2009-04-04/meta-data/instance-id (hardcoded in code). But it's not reachable. Means: It cannot be reached from the k8s-nodes within the private network. It's only reachable from the nodes which have a public interface.

@jekakm
Copy link

jekakm commented Aug 7, 2020

#131
Take a look at this, if request are NATed - metadata server can't recognize what node making request and can't mount volume to appropriate server

@eBeyond
Copy link
Author

eBeyond commented Aug 7, 2020

Okay.... That seems to be a way. I'm not happy with it. Opening a door always make it accessible from both sides. But it's just a small one... :(

@LKaemmerling
Copy link
Member

@eBeyond try to use the current master deployment :) there is a small change in there which first uses the node name for getting the data. https://github.com/hetznercloud/csi-driver/blob/master/deploy/kubernetes/hcloud-csi-master.yml

@eBeyond
Copy link
Author

eBeyond commented Aug 7, 2020

I've tried the hcloud-csi-master, but it didn't succeed. Now I've tried it with the route and it work as expected. Thanks!

@eBeyond eBeyond closed this as completed Aug 7, 2020
@LKaemmerling
Copy link
Member

LKaemmerling commented Aug 7, 2020

Actually it should work, do the nodes in k8s have the same name as in the Cloud Console? Did you apply the new deployment?

@JWDobken
Copy link

JWDobken commented Aug 18, 2020

I have Kubernetes 1.18.0 running on CX21 instances with Ubuntu 18.04. Following the quick guide I created the secret, deployed the latest hcloud-csi-master.yml. Deployment of the containers does not give any issue, but from the logs of the pod: hcloud-csi-node-h6cjw running on the master node will try to connect the socket:

│ csi-node-driver-registrar I0818 08:31:03.924615       1 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"                                                │
│ csi-node-driver-registrar I0818 08:31:03.924646       1 connection.go:151] Connecting to unix:///csi/csi.sock                                                                  │
│ csi-node-driver-registrar W0818 08:31:13.925445       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:31:23.925405       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:31:33.924873       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:31:43.924885       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:31:53.925084       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:32:03.924902       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:32:13.925007       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:32:23.924883       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ csi-node-driver-registrar W0818 08:32:33.924853       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ liveness-probe I0818 08:31:11.568778       1 connection.go:151] Connecting to unix:///csi/csi.sock                                                                             │
│ liveness-probe W0818 08:31:21.576747       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:31:31.569400       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:31:41.569045       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:31:51.569540       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:32:01.569005       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:32:11.569049       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:32:21.569052       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:32:31.569049       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ liveness-probe W0818 08:32:41.569141       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ csi-node-driver-registrar W0818 08:32:43.924932       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ hcloud-csi-driver stream closed                                                                                                                                                │
│ liveness-probe W0818 08:32:51.569085       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ csi-node-driver-registrar W0818 08:32:53.924888       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                            │
│ liveness-probe W0818 08:33:01.569025       1 connection.go:170] Still connecting to unix:///csi/csi.sock                                                                       │
│ csi-node-driver-registrar W0818 08:33:03.924955       1 connection.go:170] Still connecting to unix:///csi/csi.sock

Same for the controller and the other nodes, just like @eBeyond explained

@LKaemmerling
Copy link
Member

@JWDobken what does the controller say?

@JWDobken
Copy link

@JWDobken what does the controller say?

│ liveness-probe W0818 08:37:51.353118       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                           │
│ liveness-probe W0818 08:38:01.353133       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                           │
│ csi-attacher W0818 08:38:04.420338       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                             │
│ csi-provisioner W0818 08:38:06.416689       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                          │
│ csi-resizer W0818 08:38:09.426785       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock

@LKaemmerling
Copy link
Member

@JWDobken the controller pod log more than just this (likely after it started).

@JWDobken
Copy link

Full output of the controller just after launch:

 csi-attacher I0818 08:53:34.726025       1 main.go:91] Version: v2.2.0-0-g97411fa7                                                                                             │
│ csi-attacher I0818 08:53:34.728121       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                   │
│ csi-attacher W0818 08:53:44.728881       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                             │
│ csi-attacher W0818 08:53:54.728511       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                             │
│ csi-resizer I0818 08:53:39.820133       1 main.go:61] Version : v0.3.0-0-g150071d                                                                                              │
│ csi-resizer I0818 08:53:39.821187       1 connection.go:151] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                    │
│ csi-resizer W0818 08:53:49.821836       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                              │
│ csi-provisioner W0818 08:53:46.819849       1 deprecatedflags.go:53] Warning: option provisioner="csi.hetzner.cloud" is deprecated and has no effect                           │
│ csi-provisioner I0818 08:53:46.819919       1 feature_gate.go:243] feature gates: &{map[Topology:true]}                                                                        │
│ csi-provisioner I0818 08:53:46.819942       1 csi-provisioner.go:107] Version: v1.6.0-0-g321fa5c1c                                                                             │
│ csi-provisioner I0818 08:53:46.819957       1 csi-provisioner.go:121] Building kube configs for running in cluster...                                                          │
│ csi-provisioner I0818 08:53:46.834836       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                │
│ csi-provisioner W0818 08:53:56.835306       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                          │
│ hcloud-csi-driver level=debug ts=2020-08-18T08:53:50.457833279Z msg="getting instance id from metadata service"                                                                │
│ hcloud-csi-driver level=debug ts=2020-08-18T08:53:50.459078735Z msg="fetching server"                                                                                          │
│ liveness-probe I0818 08:53:51.919082       1 connection.go:151] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                 │
│ csi-resizer W0818 08:53:59.821414       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                              │
│ liveness-probe W0818 08:54:01.919454       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                           │
│ csi-attacher W0818 08:54:04.728545       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock

@LKaemmerling
Copy link
Member

It doesn't looks like you are using the latest "master", as it tries to use the metadata service (which is not possible for your setup based on your comments in #131). Please use: https://github.com/hetznercloud/csi-driver/blob/master/deploy/kubernetes/hcloud-csi-master.yml

@JWDobken
Copy link

Instead of this line from the quickstart:

kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/v1.4.0/deploy/kubernetes/hcloud-csi.yml

I run:

$ kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi-master.yml
csidriver.storage.k8s.io/csi.hetzner.cloud created
storageclass.storage.k8s.io/hcloud-volumes created
serviceaccount/hcloud-csi created
clusterrole.rbac.authorization.k8s.io/hcloud-csi created
clusterrolebinding.rbac.authorization.k8s.io/hcloud-csi created
statefulset.apps/hcloud-csi-controller created
daemonset.apps/hcloud-csi-node created
service/hcloud-csi-controller-metrics created
service/hcloud-csi-node-metrics created

Logs of the controller:

 csi-attacher I0818 09:05:23.707861       1 main.go:91] Version: v2.2.0-0-g97411fa7                                                                                             │
│ csi-attacher I0818 09:05:23.711141       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                   │
│ csi-attacher W0818 09:05:33.711523       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                             │
│ csi-attacher W0818 09:05:43.711432       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                             │
│ csi-resizer I0818 09:05:29.430596       1 main.go:61] Version : v0.3.0-0-g150071d                                                                                              │
│ csi-resizer I0818 09:05:29.431849       1 connection.go:151] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                    │
│ csi-resizer W0818 09:05:39.432659       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                              │
│ csi-provisioner W0818 09:05:37.054418       1 deprecatedflags.go:53] Warning: option provisioner="csi.hetzner.cloud" is deprecated and has no effect                           │
│ csi-provisioner I0818 09:05:37.054490       1 feature_gate.go:243] feature gates: &{map[Topology:true]}                                                                        │
│ csi-provisioner I0818 09:05:37.054516       1 csi-provisioner.go:107] Version: v1.6.0-0-g321fa5c1c                                                                             │
│ csi-provisioner I0818 09:05:37.054531       1 csi-provisioner.go:121] Building kube configs for running in cluster...                                                          │
│ csi-provisioner I0818 09:05:37.066302       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                │
│ hcloud-csi-driver level=debug ts=2020-08-18T09:05:40.627217379Z msg="getting instance id from metadata service"                                                                │
│ hcloud-csi-driver level=debug ts=2020-08-18T09:05:40.629569322Z msg="fetching server"                                                                                          │
│ liveness-probe I0818 09:05:42.089503       1 connection.go:151] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                                 │
│ csi-provisioner W0818 09:05:47.066593       1 connection.go:172] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock                                          │
│ csi-resizer W0818 09:05:49.432170       1 connection.go:170] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock```

@JWDobken
Copy link

btw, if I change /etc/network/interfaces.d/61-my-private-network.cfg to:

auto ens10
iface ens10 inet dhcp
        post-up ip route add default via 10.0.0.1
        post-up ip route add 169.254.169.254 dev eth0

and restart the service networking:

$ systemctl status networking.service
 networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2020-08-18 12:56:42 CEST; 6s ago
     Docs: man:interfaces(5)
  Process: 20424 ExecStop=/sbin/ifdown -a --read-environment --exclude=lo (code=exited, status=0/SUCCESS)
  Process: 20571 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
  Process: 20547 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm se
 Main PID: 20571 (code=exited, status=1/FAILURE)

Aug 18 12:56:42 node-1 dhclient[20793]: DHCPOFFER of 10.10.1.11 from 10.10.0.1
Aug 18 12:56:42 node-1 dhclient[20793]: DHCPACK of 10.10.1.11 from 10.10.0.1
Aug 18 12:56:42 node-1 ifup[20571]: DHCPACK of 10.10.1.11 from 10.10.0.1
Aug 18 12:56:42 node-1 dhclient[20793]: bound to 10.10.1.11 -- renewal in 34406 seconds.
Aug 18 12:56:42 node-1 ifup[20571]: bound to 10.10.1.11 -- renewal in 34406 seconds.
Aug 18 12:56:42 node-1 ifup[20571]: Error: Nexthop has invalid gateway.
Aug 18 12:56:42 node-1 ifup[20571]: Failed to bring up ens10.
Aug 18 12:56:42 node-1 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
Aug 18 12:56:42 node-1 systemd[1]: networking.service: Failed with result 'exit-code'.
Aug 18 12:56:42 node-1 systemd[1]: **Failed to start Raise network interfaces.**

@MrSaints
Copy link

MrSaints commented Aug 26, 2020

I'm seeing a similar issue, but no logs in the hcloud-csi-driver container. The node is able to get the metadata without any issues.

Managed to resolve it by checking:

  1. Is the DNS working? No
  2. Can we reach the DNS server (CoreDNS)? No
  3. Why can't we reach?

At this point, I compared iptables rules between nodes that were working vs. ones that weren't.

I noticed in the non-working ones:

-P FORWARD DROP

So I ran iptables --policy FORWARD ACCEPT, then requests started going through to CoreDNS, and everything appears to be working now :)


Related kubernetes/kubernetes#40182

@mehdisadeghi
Copy link

mehdisadeghi commented Sep 2, 2020

I have exactly the same issue with 1.4.0 and kubelet 1.17.9 (from rancher) as described here. Nor the suggestion above, nor the yaml file from master helped. I had the exact same setup before and it worked. They only change I made on this new cluster, is using Weave for networking.
Update: it didn't work even without weave.
Update 2:
I also tried adding iptables rules suggested here and in the original issue (from 2017!) to each of my nodes (Ubuntu 18.04 LTS) to no avail.

@tumbl3w33d
Copy link

tumbl3w33d commented Sep 3, 2020

Same for me with the following setup:

  • Debian 10 VMs
  • kubespray-based k8s (2.13.3)
  • k8s 1.17.9
  • calico network (3.15.2)
  • hcloud-cloud-controller-manager active (1.7.0), routing cluster traffic via private network
  • 3 master nodes, 2 worker nodes
  • latest docker (19.03.12)

Since I can recreate my cluster from scratch (including recreating VMs, using ansible) within ~40 minutes, I redeployed numerous times and this behavior is reproduced everytime. Remarkable - the csi-node-driver-registrar keeps saying

I0903 14:43:43.209303       6 main.go:110] Version: v1.3.0-0-g6e9fff3e
I0903 14:43:43.209394       6 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0903 14:43:43.209412       6 connection.go:151] Connecting to unix:///csi/csi.sock
W0903 14:43:53.209845       6 connection.go:170] Still connecting to unix:///csi/csi.sock
[repeated over and over again]

while hcloud-csi-driver is quiet after

level=debug ts=2020-09-03T14:50:58.686361553Z msg="getting instance id from metadata service"
level=debug ts=2020-09-03T14:50:58.688746261Z msg="fetching server"

Remarkable - of the 5 hcloud-csi-nodes there's always only one broken like that and it's always the one on my first worker (which is no different than the second, based on my simple kubespray knowledge) it seems to be the one that's on the same worker node which also hosts the hcloud-csi-controller with no recognizable correlation/pattern. The other ones happily report

I0903 14:43:43.219396       6 main.go:110] Version: v1.3.0-0-g6e9fff3e
I0903 14:43:43.219533       6 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0903 14:43:43.219568       6 connection.go:151] Connecting to unix:///csi/csi.sock
I0903 14:43:48.481906       6 main.go:127] Calling CSI driver to discover driver name
I0903 14:43:48.481986       6 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0903 14:43:48.482002       6 connection.go:181] GRPC request: {}
I0903 14:43:48.486558       6 connection.go:183] GRPC response: {"name":"csi.hetzner.cloud","vendor_version":"1.4.0"}
I0903 14:43:48.487519       6 connection.go:184] GRPC error: <nil>
I0903 14:43:48.487538       6 main.go:137] CSI driver name: "csi.hetzner.cloud"
I0903 14:43:48.487651       6 node_register.go:51] Starting Registration Server at: /registration/csi.hetzner.cloud-reg.sock
I0903 14:43:48.487832       6 node_register.go:60] Registration Server started at: /registration/csi.hetzner.cloud-reg.sock
I0903 14:43:49.046609       6 main.go:77] Received GetInfo call: &InfoRequest{}
I0903 14:43:50.046955       6 main.go:77] Received GetInfo call: &InfoRequest{}
I0903 14:43:50.091637       6 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

I'm clueless where else to look.

P.S.: I also tried with master but the behavior is exactly the same.

@ArtemLavrentii
Copy link
Contributor

ArtemLavrentii commented Sep 3, 2020

Just had same issue. In my case metadata service wasn't accessible inside of pod, and I'm also having NAT-like setup.

While hcloud-csi-node are getting instance IDs from their name, controlled doesn't as it doesn't have KUBE_NODE_NAME variable. Modifying deployment file and adding new env. variable fixed this issue for me.

PR: #144

@mehdisadeghi
Copy link

No luck for me. Still getting the same error even with the above PR merged. Logs from 'hcloud-csi-driver'

 Mon, Sep 7 2020 4:40:34 pm
level=debug ts=2020-09-07T14:40:34.128483056Z msg="server not found by name, fallback to metadata service" err=null
Mon, Sep 7 2020 4:40:34 pm
level=debug ts=2020-09-07T14:40:34.128543659Z msg="getting instance id from metadata service"
Mon, Sep 7 2020 4:40:34 pm
level=debug ts=2020-09-07T14:40:34.129628629Z msg="fetching server"
Mon, Sep 7 2020 4:40:34 pm
panic: runtime error: invalid memory address or nil pointer dereference
Mon, Sep 7 2020 4:40:34 pm
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x90e2f2]
Mon, Sep 7 2020 4:40:34 pm
Mon, Sep 7 2020 4:40:34 pm
goroutine 1 [running]:
Mon, Sep 7 2020 4:40:34 pm
main.main()
Mon, Sep 7 2020 4:40:34 pm
/csi/cmd/driver/main.go:112 +0x6a2 

@LKaemmerling
Copy link
Member

@mehdisadeghi looks like the lookup using the name was not possible (the node name in k8s is different from the name in the hcloud cloud console) and then the lookup by the meta data service also failed. I guess there is something wrong with your network.

@mehdisadeghi
Copy link

@LKaemmerling thanks for the tip. The hostnames on both nodes (the control plane and the only worker) are exactly the same as the hostnames on hcloud console (as I can see under my servers list and using hostname command on the nodes).

The cluster is also a vanilla cluster created using rancher based on Ubuntu 18.04 images with Weave networking.

@LKaemmerling
Copy link
Member

Could you make sure that you can access https://api.hetzner.cloud (Maybe just a curl within a pod)? It looks like the network doesn't work correctly.

@mehdisadeghi
Copy link

mehdisadeghi commented Sep 8, 2020

@LKaemmerling in order to move Weave out of the equation, I just upgraded my rancher image to the latest version (rancher/rancher:latest - released 18 hours ago), created a new cluster with one control-plane node (etcd, control plane) and one worker node (both CPX21). With K8s v1.18.6-rancher1-2 with Canal networking (rancher's default) with docker v19.03. Then I applied my secret and then the csi-driver yaml declarations on the master branch. Still similar errors, which look like a go runtime error:

kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi-master.yml
and the rest logs as before:

 Tue, Sep 8 2020 5:00:38 pm
level=debug ts=2020-09-08T15:00:38.103320541Z msg="server not found by name, fallback to metadata service" err=null
Tue, Sep 8 2020 5:00:38 pm
level=debug ts=2020-09-08T15:00:38.103451366Z msg="getting instance id from metadata service"
Tue, Sep 8 2020 5:00:38 pm
level=debug ts=2020-09-08T15:00:38.104406866Z msg="fetching server"
Tue, Sep 8 2020 5:00:38 pm
panic: runtime error: invalid memory address or nil pointer dereference
Tue, Sep 8 2020 5:00:38 pm
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x90e2f2]
Tue, Sep 8 2020 5:00:38 pm
Tue, Sep 8 2020 5:00:38 pm
goroutine 1 [running]:
Tue, Sep 8 2020 5:00:38 pm
main.main()
Tue, Sep 8 2020 5:00:38 pm
/csi/cmd/driver/main.go:112 +0x6a2 

I also deployed a pod and I can access the api within the pod (even though I get a TLS error, because the dnsutils pod does not come with necessary certificates):

~ kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
~ kubectl exec -ti pod/dnsutils -- wget https://api.hetzner.cloud
Connecting to api.hetzner.cloud (213.239.246.1:443)
ssl_client: api.hetzner.cloud: TLS connect failed
wget: error getting response: Connection reset by peer
command terminated with exit code 1
~ kubectl exec -ti pod/dnsutils -- wget http://api.hetzner.cloud 
Connecting to api.hetzner.cloud (213.239.246.1:80)
wget: server returned error: HTTP/1.1 308 Permanent Redirect
command terminated with exit code 1

~ kubectl exec -ti pod/dnsutils -- nslookup api.hetzner.cloud  
Server:		10.43.0.10
Address:	10.43.0.10#53

Name:	api.hetzner.cloud
Address: 213.239.246.1

FWIW all the nodes are connected over a private network.
Edit: for creating Hetzner nodes from Rancher I use ui-driver-hetzner which in turn uses docker-machine-driver-hetzner.

@vlasov-y
Copy link

vlasov-y commented Sep 10, 2020

I think the issue is because of rancher-provided kubernetes. I have read hcloud-csi and found out that it use to mount /var/lib/kubelet/plugins/, while k3s or rancher-k8s are installed into /var/lib/rancher and file structure differs.

@mehdisadeghi
Copy link

@vlasov-y interesting point. I checked the nodes and infact /var/lib/kubelet/plugins/ exist on each one of them. This is the layout:

root@cp-1:~# ls /var/lib/kubelet/ /var/lib/kubelet/plugins/csi.hetzner.cloud/
/var/lib/kubelet/:
cpu_manager_state  device-plugins  pki  plugins  plugins_registry  pod-resources  pods  volumeplugins

/var/lib/kubelet/plugins/csi.hetzner.cloud/:

There is this csi.hetzner.cloud directory there, but it is empty. The rancher directory (/var/lib/rancher/rke/log/) only contains logs.

@vlasov-y
Copy link

@vlasov-y interesting point. I checked the nodes and infact /var/lib/kubelet/plugins/ exist on each one of them. This is the layout:

root@cp-1:~# ls /var/lib/kubelet/ /var/lib/kubelet/plugins/csi.hetzner.cloud/
/var/lib/kubelet/:
cpu_manager_state  device-plugins  pki  plugins  plugins_registry  pod-resources  pods  volumeplugins

/var/lib/kubelet/plugins/csi.hetzner.cloud/:

There is this csi.hetzner.cloud directory there, but it is empty. The rancher directory (/var/lib/rancher/rke/log/) only contains logs.

I am setting cluster with plain kubeadm and try running hcloud-csi. Will reply there about the results.

@vlasov-y
Copy link

The same result... Setting up of cluster using kubeadm and v1.18.8 produce the same result

@mehdisadeghi
Copy link

@vlasov-y good to know that! In that case this ticket should be reopened.

@vlasov-y
Copy link

vlasov-y commented Sep 11, 2020

Hello @mehdisadeghi. I had an error in secrets configuration. I have fixed it and now it works!

I0911 10:42:25.214230       1 main.go:71] Health check succeeded
I0911 10:33:27.491828       1 main.go:110] Version: v1.3.0-0-g6e9fff3e
I0911 10:33:27.491925       1 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0911 10:33:27.491950       1 connection.go:151] Connecting to unix:///csi/csi.sock
I0911 10:33:29.538873       1 main.go:127] Calling CSI driver to discover driver name
I0911 10:33:29.540027       1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0911 10:33:29.540045       1 connection.go:181] GRPC request: {}
I0911 10:33:29.544250       1 connection.go:183] GRPC response: {"name":"csi.hetzner.cloud","vendor_version":"1.4.0"}
I0911 10:33:29.545565       1 connection.go:184] GRPC error: <nil>
I0911 10:33:29.545588       1 main.go:137] CSI driver name: "csi.hetzner.cloud"
I0911 10:33:29.545700       1 node_register.go:51] Starting Registration Server at: /registration/csi.hetzner.cloud-reg.sock
I0911 10:33:29.545932       1 node_register.go:60] Registration Server started at: /registration/csi.hetzner.cloud-reg.sock
I0911 10:33:30.868693       1 main.go:77] Received GetInfo call: &InfoRequest{}
I0911 10:33:30.896740       1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

Solution in my case was migration from k3s.io to standard kubeadm. Issue should be in:

            - --kubelet-registration-path=/var/lib/kubelet/plugins/csi.hetzner.cloud/csi.sock

@mehdisadeghi
Copy link

mehdisadeghi commented Sep 21, 2020

Thanks for the update @vlasov-y. I'll try to see whether I can fix that, however in that case it should be noted in the README that hetznercloud csi-driver does not work with Rancher-based clusters.

@mehdisadeghi
Copy link

mehdisadeghi commented Sep 21, 2020

I hardcoded my server ID in csi-driver source code, built and ran the csi-driver image locally and realized the token I was using belonged to a different project on Hetzner and not the one running the Rancher VMs. After using the right token everything worked as expected.

My carelessness aside, some less cryptic error message would have gone a long way. Thanks.

@intdel
Copy link

intdel commented May 13, 2021

I have the same issue. In my case it is with the new firewall feature. If I do this:
image

It does not work and says the message: Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock

If I allow any incoming traffic it works. So what am I missing?

@thbiela
Copy link

thbiela commented May 23, 2021

@intdel I also face this issue. Would be great to know what incoming ports need to be allowed for the csi-driver to work.

@thbiela
Copy link

thbiela commented May 27, 2021

@intdel You have to allow incoming UDP Port 8472

@ekeih
Copy link

ekeih commented May 28, 2021

@intdel You have to allow incoming UDP Port 8472

This is a rather popular issue, so just in case other people end up here while searching for the error message: Please don't allow public access to 8472, we sorted that out in #204 (comment).

@ludgart
Copy link

ludgart commented Jan 16, 2022

Hello,

I was struggling with the same problem. My solution were, I forgot to install the following packages

  • ca-certificates
  • curl
  • gnupg
  • lsb-release
  • ntp
  • apparmor
  • apparmor-utils

Install command:
apt-get install ca-certificates curl gnupg lsb-release ntp apparmor apparmor-utils

After it, the connection issue disappeared.

Regards
ludgart

@mehran-prs
Copy link

In our case, it was just because the hcloud-csi-driver docker image version was on the latest in the daemonset, so on the new image version, it caused this error.

We just changed the image tag with the stable version and it was fixed.

@sui77
Copy link
Contributor

sui77 commented Nov 1, 2022

I had this issue with a new node from a different hetzner cloud project. Worked fine after assigning the VM to the right project.

@mehdisadeghi
Copy link

Two years are gone and I had the same issue. Came back to this thread. Read supportive community comments and finally my own commands to realize that, well... I have made the same mistake of using the wrong hcloud token in a wrong project. I guess I'm not getting younger.

The morale of the story, it is good to write descriptive comments and contribute even if it is reporting errors.

@strowi
Copy link

strowi commented Feb 6, 2024

Came here again after a while. api.hetzner.cloud was resolving to a wrong ip. Solution was adding the linked ndots to the csi-controller deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests