Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

[Packet] sig-storage-local-static-provisioner recently stopped working #160

Closed
DGollings opened this issue Feb 5, 2020 · 5 comments · Fixed by #164
Closed

[Packet] sig-storage-local-static-provisioner recently stopped working #160

DGollings opened this issue Feb 5, 2020 · 5 comments · Fixed by #164
Assignees
Labels
bug Something isn't working

Comments

@DGollings
Copy link
Contributor

DGollings commented Feb 5, 2020

At some point between my last known working configuration of Lokomotive (56acc13) and current master (2512344)
sig-storage-local-static-provisioner stopped working

  • the provisioner starts correctly, finds the block device pool that I created in /mnt/disks and adds them to the cache.
  • the application starts correctly and binds a physical volume claim to a physical volume when asked to do so
  • wait a minute or so, and it all times out with MountVolume.NewMounter initialization failed for volume "local-pv-7f70b248" : path "/mnt/local/vol1" does not exist

ssh into the host, disk is there, disk is accessible, permissions are correct (tried 777 -R just in case)

So provisioner, block device, detection of disks, kubernetes pod security profiles, etc. Everything seems to work and thinks it should be working except that something (I'm not 100% sure which component, CSI? Kubelet?) can't access the disk in order to actually finish the mounting 'loop'

(probably relevant, the cleanup script that the provisioner runs after deleting a pvc (some variation of rm -rf path_that_was_mounted does work, although I don't know if that's executed on the host or in a container)

Go back to 56acc13, everything works again.

The only hint I can find is regarding containerized kubelets but when I checked the kubelet systemd unit file /mnt looked like it was accessable, although the kubelet running docker didnt (rkt and docker kubelet?)

Before I dive into this further, are you aware of any changes that could have caused this?

edit: how to recreate, bind mount a block device into /mnt/disks, helm install the static provisioner as per the docs and try to claim it via a deployment. Let me know if you need some example config, although it's 99% what's in the installation guide

@invidian
Copy link
Contributor

invidian commented Feb 5, 2020

@DGollings thanks for reporting. I suspect that might be a regression of kubelet running as a DaemonSet. Could you try removing the kubelet DaemonSet running in kube-system namespace? If you do so, host's kubelet should take over.

@DGollings
Copy link
Contributor Author

Will do so in a bit, thanks

@DGollings
Copy link
Contributor Author

@invidian can confirm that deleting the daemonset kubelet helps

@invidian invidian added the bug Something isn't working label Feb 5, 2020
@DGollings
Copy link
Contributor Author

and can also confirm that editing the kubelet daemonset config, adding

        - mountPath: /mnt
          name: mnt

and

      - hostPath:
          path: /mnt
          type: ""
        name: mnt

also works

@rata
Copy link
Contributor

rata commented Feb 5, 2020

Awesome, thanks for the report!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants