Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting on external-attacher/io-rancher-longhorn #722

Closed
josecelano opened this issue Sep 13, 2019 · 9 comments
Closed

Waiting on external-attacher/io-rancher-longhorn #722

josecelano opened this issue Sep 13, 2019 · 9 comments

Comments

@josecelano
Copy link

I have created this issue in the rancher repo:

rancher/rancher#22829

but I do not really know if it's a Rancher problem or a longhorn issue.

@josecelano
Copy link
Author

I found a solution.

I deleted PVs directly in the google console but somehow the PVs are still there for the rancher. Using Rancher kubectl console I was able to list the PVs. You have to launch kubectl and execute:

kubectl get pv

And I saw the same list as in Rancher UI:

> kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                                            STORAGECLASS              REASON   AGE
pvc-19c8c35d-d143-11e9-b61a-42010a9c01a0   10Gi       RWX            Delete           Terminating   qa/qa-var-www-app-storage               longhorn                           8d
pvc-3ae6424d-d158-11e9-b61a-42010a9c01a0   10Gi       RWX            Retain           Terminating   qa/qa-var-lib-mysql                     longhorn-manual-cleanup            8d
pvc-5156c579-d143-11e9-b61a-42010a9c01a0   10Gi       RWX            Delete           Terminating   qa/qa-var-lib-mysql                     longhorn                           8d
pvc-5b5e392b-d158-11e9-b61a-42010a9c01a0   10Gi       RWX            Retain           Terminating   qa/qa-var-www-app-storage               longhorn-manual-cleanup            8d
pvc-604486dc-d40b-11e9-b61a-42010a9c01a0   10Gi       RWO            Retain           Terminating   qa/qa-var-lib-mysql                     longhorn-manual-cleanup            4d14h
pvc-8d71c020-9989-11e9-b61a-42010a9c01a0   10Gi       RWX            Delete           Terminating   staging1/staging1-var-lib-mysql         longhorn                           79d
pvc-aeaf645f-8f8f-11e9-b1a0-42010a9c008a   10Gi       RWX            Delete           Terminating   feature/feature-var-lib-mysql           longhorn                           91d
pvc-ba32baae-9989-11e9-b61a-42010a9c01a0   10Gi       RWX            Delete           Terminating   staging1/staging1-var-www-app-storage   longhorn                           79d
pvc-e884abf6-8f8f-11e9-b1a0-42010a9c008a   10Gi       RWX            Delete           Terminating   feature/feature-var-www-app-storage     longhorn                           91d

Then I had to edit the PVs with:

kubectl edit pvc-19c8c35d-d143-11e9-b61a-42010a9c01a0

And I removed again the lines:

  finalizers:
  - external-attacher/io-rancher-longhorn

And that's all, the PVs disappeared from Rancher UI.

IMPORTANT: This is something I did directly on the GKE console, but it did not change in the rancher object. The rancher did not even remove the PVs. Maybe all the problems were caused because I did not follow GKE installation instructions for longhorn:

https://github.com/longhorn/longhorn/blob/master/docs/gke.md

and Rancher was not able to sync data from the cluster.

@yasker
Copy link
Member

yasker commented Sep 15, 2019

@josecelano I think GKE is the problem (and has the cache). Rancher reads all the information from kubectl which is a standard Kubernetes interface to the Kubernetes API. Rancher didn't control the result of kubectl. If kubectl say the PV still exists but GKE think it's gone, then GKE is wrong.

@mattsdevop
Copy link

Ran into the same issue on "custom" provider (locally hosted CentOS servers). @josecelano solution to edit the pv did work. I used the following commands:
kubectl --namespace longhorn-system get pv
kubectl --namespace longhorn-system edit pv pvc-77849875-de03-11e9-a581-005056a3ab27
then delete the two lines mentioned, save and exit.

@yasker
Copy link
Member

yasker commented Oct 10, 2019

@mattsdevop how did it happen? Normally PV should be cleaned if Longhorn is still installed.

@mattsdevop
Copy link

mattsdevop commented Oct 10, 2019

@yasker Thanks for the reply! I wish I could explain how this happened. I can say that some of the pvc that were stuck in Terminating status have persisted through Longhorn updates (pre-0.5.0) and some were only a couple days old (created since upgrading to Longhorn 0.6.2). The host nodes are running CentoOS and have been kept relatively up to date.

May or may not be related: This particular cluster was bit by the kubernetes/kernel bug explained here: https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/ which prompted the most recent upgrade and investigation into the disk usage.

I will keep an eye out moving forward for any specific steps that might replicate it. If there is anything else that would help in tracking it down, please let me know and I'll be sure to watch out for it!

@yasker
Copy link
Member

yasker commented Oct 10, 2019

@mattsdevop You might want to send us a support bundle at longhorn-support-bundle@rancher.com, and note which volumes are stuck before.

And if I understand correctly, when you delete the PVC, Longhorn did clean up the volume but PV is still stuck.

@mattsdevop
Copy link

@yasker Everything has been working as expected since I last posted. Additionally, the machines have been rebooted and packages have been updated. If this happens again, I will get a support bundle. Thank you again. And, thank you for your great work and support!

@mattsdevop
Copy link

@yasker you are correct. Longhorn deletes the volume and I can no longer view it in the Longhorn UI. However, the PV shows Bound and green in Rancher under Storage -> Persistent Volumes.

I attempted to manually remove it using the "View API" option in Rancher. I clicked "Delete". now it is stuck in the "removing" status.

I am emailing the Support Bundle to you. And, have reached out in Slack for any potential troubleshooting you want to perform.

@mattsdevop
Copy link

Found this which might be the actual issue: kubernetes/kubernetes#69697

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants