-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI stopped working #34
Comments
I think this has little to do with CSI -- after I restarted the Linode it has lost the traits it needs to work. For example if I edit this Linode I get something like the below: metadata: annotations: node.alpha.kubernetes.io/ttl: "0" projectcalico.org/IPv4Address: 192.168.xxx.yyy/17 volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2019-08-01T15:28:39Z" labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/os: linux kubernetes.io/hostname: node-3 name: widearea-live-node-3 resourceVersion: "30261967" selfLink: /api/v1/nodes/node-3 uid: 097dd8b3-xxxxxx spec: podCIDR: 10.244.5.0/24 taints: - effect: NoSchedule key: node.cloudprovider.kubernetes.io/uninitialized value: "true" but for a working one I get: metadata: annotations: csi.volume.kubernetes.io/nodeid: '{"linodebs.csi.linode.com":"1234567"}' kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: "0" projectcalico.org/IPv4Address: 192.168.xxx.yyyy/17 volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2019-02-24T19:26:35Z" labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/instance-type: g6-standard-4 beta.kubernetes.io/os: linux failure-domain.beta.kubernetes.io/region: eu-west kubernetes.io/hostname: node-2 topology.linode.com/region: eu-west name: node-2 resourceVersion: "30262732" selfLink: /api/v1/nodes/node-2 uid: 19841cc5-xxxxxxxx spec: podCIDR: 10.244.1.0/24 providerID: linode://1234567 status: addresses: - address: live-node-2 type: Hostname - address: 213.x.y.z type: ExternalIP - address: 192.168.aaa.bbbb type: InternalIP allocatable: attachable-volumes-csi-linodebs.csi.linode.com: "8" Our problem has lost all its Linode traits. |
Closed in favor of linode/linode-cloud-controller-manager#36 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After an OOM problem on one of our Linodes, CSI has stopped working and the Linode is not properly part of the cluster anymore.
The CoreOS logs for the node with the problem has a few of these:
systemd-networkd[604]: eth0: Could not set NDisc route or address: Connection timed out
then there is an OOM
and then more messages like the above. There are also a number of problems which seem to be caused by the OOM event
Since then we have seen a number of problems:
the node did not appear in 'kubectl get nodes'
after a reboot the node is no longer properly recognised by Kubernetes:
e.g.
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node-2 Ready 157d v1.13.0 192.168.145.28 213.xxx.xxx.xxx Container Linux by CoreOS 2135.5.0 (Rhyolite) 4.19.50-coreos-r1 docker://18.6.3
node-3 Ready 5h41m v1.13.0 Container Linux by CoreOS 2135.5.0 (Rhyolite) 4.19.50-coreos-r1 docker://18.6.3
Note:
Aug 01 09:12:03 node-3 kubelet[687]: , failed to "StartContainer" for "csi-linode-plugin" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=csi-linode-plugin pod=csi
Aug 01 09:12:03 node-3 kubelet[687]: ]
Aug 01 09:12:18 node-3 kubelet[687]: E0801 09:12:18.716293 687 pod_workers.go:190] Error syncing pod c55016b7-b439-11e9-a66e-f23c914badbb ("csi-linode-node-4rqz6_kube-system(c55>
Aug 01 09:12:18 node-3 kubelet[687]: , failed to "StartContainer" for "csi-linode-plugin" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=csi-linode-plugin pod=csi>
Aug 01 09:12:18 node-3 kubelet[687]: ]
BODY :
{
"errors": [
{
"reason": "Invalid OAuth Token"
}
]
}
The text was updated successfully, but these errors were encountered: