Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

Provisioner stops processing PVC's #1019

Closed
Nilesh20 opened this issue Oct 8, 2018 · 13 comments
Closed

Provisioner stops processing PVC's #1019

Nilesh20 opened this issue Oct 8, 2018 · 13 comments
Labels
area/lib lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@Nilesh20
Copy link

Nilesh20 commented Oct 8, 2018

I am using the concept of external-storage.
I am running a provisioner which is processing the PVC's and binding them to PV's.
But after some time it stops processing the PVC's and PVC remains in pending state which looks like that it stops working , restarting provisioner solves this issue but i dont want to restart provisioner every time.

Few logs statement which looks suspicious are as follows.

I1003 15:35:12.426810 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
I1003 15:35:12.427387 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
I1003 15:35:12.427632 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
I1003 15:35:12.427771 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
I1003 15:35:12.427909 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF

Failed to watch *v1.Node: Get /api/v1/nodes?resourceVersion=98676317&timeoutSeconds=354&watch=true: dial tcp 172.56.0.1:443: getsockopt: connection refused
Failed to watch *v1.PersistentVolume: Get /api/v1/persistentvolumes?resourceVersion=98666055&timeoutSeconds=428&watch=true: dial tcp 172.56.0.1:443: getsockopt: connection refused
I1003 15:35:12.428047 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
I1003 15:35:12.428057 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
I1003 15:35:12.428180 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
I1003 15:35:12.428287 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Failed to watch *v1.StorageClass: Get /apis/storage.k8s.io/v1/storageclasses?resourceVersion=98658274&timeoutSeconds=569&watch=true: dial tcp 172.56.0.1:443: getsockopt: connection refused

watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old.
watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old.
watch of *v1.PersistentVolumeClaim ended with: The resourceVersion for the provided watch is too old.
watch of *v1.ConfigMap ended with: The resourceVersion for the provided watch is too old.
watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old.

@MartinForReal
Copy link
Contributor

Could you please check your api server? It seems that api server is not working.

@Nilesh20
Copy link
Author

If i redeploy the provisioner then it starts working. so my query is if API server is not working then how redeploying provisioner works.

@wongma7
Copy link
Contributor

wongma7 commented Oct 16, 2018

what version of the library are you using? It is using shared informers now and not doing anything special with its watches.

@Nilesh20
Copy link
Author

I am using the controller structure from external storage and following libraries for kubernetes.

k8s.io/client-go = version v6.0.0
k8s.io/kubernetes = version v1.9.1

And i am getting above logs.
Apart from above log there are below statements also.

Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:default:" cannot list persistentvolumes at the cluster scope: User "system:serviceaccount:default:" cannot list all persistentvolumes in the cluster.

So first it is showing connection refused and after that it is not able to list the resources in cluster and at end of log it is priniting
watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old.

@MartinForReal
Copy link
Contributor

Could you please make sue the service account the provisioner is using have access to list all of persistentvolumes? Please refer to docs

@wongma7
Copy link
Contributor

wongma7 commented Oct 17, 2018

yes, it is an RBAC issue. refer also to https://github.com/kubernetes-incubator/external-storage/blob/master/aws/efs/deploy/rbac.yaml#L1 for an example of a ClusterRole. All provisioners using the library need at a minimum the RBAC permissions listed there: read/write PVs, read PVCs, read StorageClasses, write events.

/area lib

@Nilesh20
Copy link
Author

Thanks

I checked RBAC yaml and i dont have following role and role binding in my rbac.yaml, i added it now.
I have query that i started the provisioner it was able to process the pvc , i also created lot of pvc's with wrong configuration also and after that i created pvc's with right configuration and provisioner was able to process.
but after increasting load and trying lots of combination provisioner suddenly stop processing pvc by giving above messages in log. if it is due to RBAC or api server related issue then how it was running initially and after redeploying the provisioner starts working.

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-efs-provisioner
rules:

  • apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-efs-provisioner
subjects:

  • kind: ServiceAccount
    name: efs-provisioner

    replace with namespace where provisioner is deployed

    namespace: default
    roleRef:
    kind: Role
    name: leader-locking-efs-provisioner
    apiGroup: rbac.authorization.k8s.io

@Nilesh20
Copy link
Author

May i get help on this ??

@wongma7
Copy link
Contributor

wongma7 commented Oct 25, 2018

try the latest library version, e.g. https://github.com/kubernetes-incubator/external-storage/releases/tag/v5.2.0

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 26, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/lib lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants