Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach/detach controller does not recover from missed pod deletion #34242

Closed
jsafrane opened this issue Oct 6, 2016 · 17 comments
Closed

Attach/detach controller does not recover from missed pod deletion #34242

jsafrane opened this issue Oct 6, 2016 · 17 comments
Assignees
Labels
area/kubelet sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Milestone

Comments

@jsafrane
Copy link
Member

jsafrane commented Oct 6, 2016

We run OpenShift in master-slave setup and our master crashes once in a while (from unrelated reason). When a new master starts, it does not detach volumes that should be detached.

Steps to reproduce on AWS with standard Kubernetes:

  1. run a AWS-aware cluster, hack/local-up-cluster.sh is fine
  2. create several pods that use claims that point to AWS PVs
  3. kill controller-manager process
  4. delete all pods
  5. start a new controller manager

Result: volumes are attached forever (or at least for next 30 minutes).
It should be reproducible also on GCE. Shouldn't there be a periodic sync that ensures the controller finds deleted pods? This comment looks scary: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/attach_detach_controller.go#L76

Affected version: kubernetes-1.3.8

@saad-ali @jingxu97 @kubernetes/sig-storage

@jsafrane jsafrane added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Oct 6, 2016
@jingxu97
Copy link
Contributor

jingxu97 commented Oct 6, 2016

Jan,

Yes, this is an issue that we haven't addressed. Basically when master
controller restarted, how to recover the state (which volumes are
attached/detached). If during this time, no pods are deleted, controller
can recover the state by retrieving volume information from pod obj in API
server, populate it in desired state and reconciler can recover the actual
state afterwards. The challenge is that if during this time, pods are
deleted from API server, the information about which volumes are still
attached to the node will be lost. One way to get this info is through
cloud provider. But the question is after getting a list of attached
volumes from cloud provider, how to decide which one should be detached by
controller. It will be dangerous to detach volumes that are not supposed
to. The other approach is to checkpoint the state from master controller.
Need more discussion about this...

Please let me know any suggestions/comments. Thank you!

Best,
Jing

On Thu, Oct 6, 2016 at 8:39 AM, Jan Šafránek notifications@github.com
wrote:

We run OpenShift in master-slave setup and our master crashes once in a
while (from unrelated reason). When a new master starts, it does not detach
volumes that should be detached.

Steps to reproduce on AWS with standard Kubernetes:

  1. run a AWS-aware cluster, hack/local-up-cluster.sh is fine
  2. create several pods that use claims that point to AWS PVs
  3. kill controller-manager process
  4. delete all pods
  5. start a new controller manager

Result: volumes are attached forever (or at least for next 30 minutes).
It should be reproducible also on GCE. Shouldn't there be a periodic sync
that ensures the controller finds deleted pods? This comment looks scary:
https://github.com/kubernetes/kubernetes/blob/master/pkg/
controller/volume/attachdetach/attach_detach_controller.go#L76

Affected version: kubernetes-1.3.8

@saad-ali https://github.com/saad-ali @jingxu97
https://github.com/jingxu97 @kubernetes/sig-storage
https://github.com/orgs/kubernetes/teams/sig-storage


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34242, or mute the
thread
https://github.com/notifications/unsubscribe-auth/ASSNxeWwEbe_ePWER7F11_5R7Q01ArPWks5qxRYcgaJpZM4KQF98
.

  • Jing

@jsafrane
Copy link
Member Author

jsafrane commented Oct 7, 2016

We could tag AWS EBS and Cinder volumes on attach with name(s) of pod(s) that use it and un-tag it on detach. I am not sure about GCE, there is PD.Description where we put some json when dynamically creating the volume. Perhaps we can update the json on attach/detach. I don't know anything about Ceph RBD, which is getting attach/detach support soon.

In addition, on AWS we assign devices /dev/xvdb[a-z] /dev/xvdc[a-z] and so on to Kubernetes volumes, leaving /dev/xvd[a-z] and /dev/xvda[a-z] to "system".

Still, the safest thing would be to save attach/detach information somewhere in API server as a separate object or somewhere inside Node.Spec or Status.

@jsafrane
Copy link
Member Author

jsafrane commented Dec 6, 2016

Returning back to this bug with newest kube-controller-manager and kubelet (almost 1.5), I noticed that if I restart controller-manager, node retains enough information about attached volumes:

  status:
    volumesAttached:
    - devicePath: /dev/xvdba
      name: kubernetes.io/aws-ebs/aws://us-east-1d/vol-4fc15dde

Could it be enough to detach these volumes when controller restarts? I know, there is some window where the volume is attached and node status is not written yet, still it would help in most of the cases.

@jingxu97
Copy link
Contributor

jingxu97 commented Dec 6, 2016 via email

@rootfs
Copy link
Contributor

rootfs commented Dec 6, 2016

@jsafrane @jingxu97 @rkouj
would #37727 help? Controller should relist the node status and repopulate cache.

Separately, does gke controller upgrade see this issue?

@jingxu97
Copy link
Contributor

jingxu97 commented Dec 6, 2016 via email

@rootfs
Copy link
Contributor

rootfs commented Dec 6, 2016

@jingxu97 here is my thought. When controller master restarts, if it first gets the node status (and gets the attached volumes) before sync pods, wouldn't the attached volume be still there by the time pod is to be deleted?

@jingxu97
Copy link
Contributor

jingxu97 commented Dec 6, 2016 via email

@jingxu97 jingxu97 added this to the v1.6 milestone Dec 16, 2016
@jingxu97 jingxu97 self-assigned this Dec 16, 2016
@saad-ali
Copy link
Member

But if node restarts at the same, then the information about the attached volumes will be gone because the whole node object is deleted (we plan to revisit this logic about delete node object too.)

Spoke with @jingxu97 offline. I'm ok with using volumesAttached from node status to populate actual state on controller start.

tsmetana added a commit to tsmetana/kubernetes that referenced this issue Jan 9, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status and figure out which volumes to detach. This requires some changes in the
volume providers too: the only information available from the nodes is the
volume name and the device path. The controller needs to find the correct volume
plugin and reconstruct the volume spec just from the name. This reuired a small
change also in the volume plugin interface.
@tsmetana
Copy link
Member

Hello.
I'm trying to fix the problem: you may take a look at the patch. I'm basically pre-populating the DesiredStateOfWorld with information from the pods and ActualStateOfTheWolrd using the volumesAttached data from the nodes. Since the only thing I can get from the node is the volume unique name and the device path I had to add a helper interface method to the volume plugins too: I need to get a volume spec and the existing interface requires a mount path for this (which is rather odd). Also I need to find out what's the plugin name so I assumed the unique name is always <plugin name>/<volume name>.

I tested this in AWS and it looks to be working as expected: the unused volume gets detached even when the pod has been deleted during the controller-manager downtime.

@jingxu97
Copy link
Contributor

@tsmetana thank you for helping on this. I think when pod is deleted from api server, some information such as volume spec is not recoverable. But it might be ok to put some dummy information as long as the information needed for detach is good.
You are right, unique name is /

tsmetana added a commit to tsmetana/kubernetes that referenced this issue Jan 19, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status and figure out which volumes to detach. This requires some changes in the
volume providers too: the only information available from the nodes is the
volume name and the device path. The controller needs to find the correct volume
plugin and reconstruct the volume spec just from the name. This reuired a small
change also in the volume plugin interface.
tsmetana added a commit to tsmetana/kubernetes that referenced this issue Feb 14, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status and figure out which volumes to detach. This requires some changes in the
volume providers too: the only information available from the nodes is the
volume name and the device path. The controller needs to find the correct volume
plugin and reconstruct the volume spec just from the name. This reuired a small
change also in the volume plugin interface.
tsmetana added a commit to tsmetana/kubernetes that referenced this issue Mar 1, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
@jsravn
Copy link
Contributor

jsravn commented Mar 6, 2017

Does this also affect HA controller-manager w/ leader election? Because I observe a similar issue when leaders swap (the new leader doesn't detach some volumes correctly).

@jsravn
Copy link
Contributor

jsravn commented Mar 6, 2017

I don't think it is, because what I observe is the pod remains running during the leader election swap. For some reason when the pod is then deleted hours later the new volume controller master doesn't detach the volume. From what I gather the volume manager should be able to handle this case.

@childsb
Copy link
Contributor

childsb commented Mar 13, 2017

There's a fix for this here:

#39732

Its a large fix, and I'd like sign off from @saad-ali before merging it.

@saad-ali
Copy link
Member

saad-ali commented Mar 13, 2017

That's way too large a change to merge to 1.6. We can consider it for post-1.6.

@ethernetdan
Copy link
Contributor

Too late for v1.6, moving to v1.7 milestone. If this is incorrect please correct.

/cc @kubernetes/release-team

@ethernetdan ethernetdan modified the milestones: v1.7, v1.6 Mar 14, 2017
tsmetana added a commit to tsmetana/kubernetes that referenced this issue Mar 22, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
tsmetana added a commit to tsmetana/kubernetes that referenced this issue Apr 13, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
wongma7 pushed a commit to wongma7/kubernetes that referenced this issue Apr 17, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
tsmetana added a commit to tsmetana/kubernetes that referenced this issue Apr 19, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
tsmetana added a commit to tsmetana/kubernetes that referenced this issue Apr 20, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
k8s-github-robot pushed a commit that referenced this issue Apr 20, 2017
Automatic merge from submit-queue (batch tested with PRs 44722, 44704, 44681, 44494, 39732)

Fix issue #34242: Attach/detach should recover from a crash

When the attach/detach controller crashes and a pod with attached PV is deleted afterwards the controller will never detach the pod's attached volumes. To prevent this the controller should try to recover the state from the nodes status and figure out which volumes to detach. This requires some changes in the volume providers too: the only information available from the nodes is the volume name and the device path. The controller needs to find the correct volume plugin and reconstruct the volume spec just from the name. This required a small change also in the volume plugin interface.

Fixes Issue #34242.
cc: @jsafrane @jingxu97
jayunit100 pushed a commit to jayunit100/kubernetes that referenced this issue Apr 25, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
PiotrProkop pushed a commit to intelsdi-x/kubernetes that referenced this issue May 16, 2017
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
deads2k pushed a commit to deads2k/kubernetes that referenced this issue May 22, 2017
…over from a crash

:100644 100644 d3de5fdf98... 01658bd9b3... M	pkg/controller/volume/attachdetach/BUILD
:100644 100644 01d2adc016... 66cac888ca... M	pkg/controller/volume/attachdetach/attach_detach_controller.go
:100644 100644 4a7a8ebfd2... a1a2266d65... M	pkg/controller/volume/attachdetach/attach_detach_controller_test.go
:100644 100644 5387bec0d9... db40529822... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world.go
:100644 100644 86f0461493... fa19728b33... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world_test.go
:100644 100644 505e11e071... 08ce7effc1... M	pkg/controller/volume/attachdetach/reconciler/reconciler.go
:100644 100644 7911072557... baf67d9ca7... M	pkg/controller/volume/attachdetach/reconciler/reconciler_test.go
:100644 100644 b484cfa8ce... 2b954e6b79... M	pkg/controller/volume/attachdetach/testing/testvolumespec.go
:100644 100644 b78c76d2f9... 89b29be2a5... M	pkg/volume/plugins.go
:100644 100644 8e28405786... f8ae260244... M	pkg/volume/util/operationexecutor/operation_executor.go
:100644 100644 f1aff52c81... f6a9eb092b... M	pkg/volume/util/operationexecutor/operation_generator.go
:100644 100644 c55c8db60e... d4dd45dfe3... M	pkg/volume/util/volumehelper/volumehelper.go
smarterclayton pushed a commit to smarterclayton/kubernetes that referenced this issue May 23, 2017
…over from a crash

:100644 100644 d3de5fdf98... 01658bd9b3... M	pkg/controller/volume/attachdetach/BUILD
:100644 100644 01d2adc016... 66cac888ca... M	pkg/controller/volume/attachdetach/attach_detach_controller.go
:100644 100644 4a7a8ebfd2... a1a2266d65... M	pkg/controller/volume/attachdetach/attach_detach_controller_test.go
:100644 100644 5387bec0d9... db40529822... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world.go
:100644 100644 86f0461493... fa19728b33... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world_test.go
:100644 100644 505e11e071... 08ce7effc1... M	pkg/controller/volume/attachdetach/reconciler/reconciler.go
:100644 100644 7911072557... baf67d9ca7... M	pkg/controller/volume/attachdetach/reconciler/reconciler_test.go
:100644 100644 b484cfa8ce... 2b954e6b79... M	pkg/controller/volume/attachdetach/testing/testvolumespec.go
:100644 100644 b78c76d2f9... 89b29be2a5... M	pkg/volume/plugins.go
:100644 100644 8e28405786... f8ae260244... M	pkg/volume/util/operationexecutor/operation_executor.go
:100644 100644 f1aff52c81... f6a9eb092b... M	pkg/volume/util/operationexecutor/operation_generator.go
:100644 100644 c55c8db60e... d4dd45dfe3... M	pkg/volume/util/volumehelper/volumehelper.go
deads2k pushed a commit to deads2k/kubernetes that referenced this issue Jun 7, 2017
…over from a crash

:100644 100644 d3de5fdf98... 01658bd9b3... M	pkg/controller/volume/attachdetach/BUILD
:100644 100644 01d2adc016... 66cac888ca... M	pkg/controller/volume/attachdetach/attach_detach_controller.go
:100644 100644 4a7a8ebfd2... a1a2266d65... M	pkg/controller/volume/attachdetach/attach_detach_controller_test.go
:100644 100644 5387bec0d9... db40529822... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world.go
:100644 100644 86f0461493... fa19728b33... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world_test.go
:100644 100644 505e11e071... 08ce7effc1... M	pkg/controller/volume/attachdetach/reconciler/reconciler.go
:100644 100644 7911072557... baf67d9ca7... M	pkg/controller/volume/attachdetach/reconciler/reconciler_test.go
:100644 100644 b484cfa8ce... 2b954e6b79... M	pkg/controller/volume/attachdetach/testing/testvolumespec.go
:100644 100644 b78c76d2f9... 89b29be2a5... M	pkg/volume/plugins.go
:100644 100644 8e28405786... f8ae260244... M	pkg/volume/util/operationexecutor/operation_executor.go
:100644 100644 f1aff52c81... f6a9eb092b... M	pkg/volume/util/operationexecutor/operation_generator.go
:100644 100644 c55c8db60e... d4dd45dfe3... M	pkg/volume/util/volumehelper/volumehelper.go
deads2k pushed a commit to openshift/kubernetes that referenced this issue Jun 7, 2017
…over from a crash

:100644 100644 d3de5fdf98... 01658bd9b3... M	pkg/controller/volume/attachdetach/BUILD
:100644 100644 01d2adc016... 66cac888ca... M	pkg/controller/volume/attachdetach/attach_detach_controller.go
:100644 100644 4a7a8ebfd2... a1a2266d65... M	pkg/controller/volume/attachdetach/attach_detach_controller_test.go
:100644 100644 5387bec0d9... db40529822... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world.go
:100644 100644 86f0461493... fa19728b33... M	pkg/controller/volume/attachdetach/cache/actual_state_of_world_test.go
:100644 100644 505e11e071... 08ce7effc1... M	pkg/controller/volume/attachdetach/reconciler/reconciler.go
:100644 100644 7911072557... baf67d9ca7... M	pkg/controller/volume/attachdetach/reconciler/reconciler_test.go
:100644 100644 b484cfa8ce... 2b954e6b79... M	pkg/controller/volume/attachdetach/testing/testvolumespec.go
:100644 100644 b78c76d2f9... 89b29be2a5... M	pkg/volume/plugins.go
:100644 100644 8e28405786... f8ae260244... M	pkg/volume/util/operationexecutor/operation_executor.go
:100644 100644 f1aff52c81... f6a9eb092b... M	pkg/volume/util/operationexecutor/operation_generator.go
:100644 100644 c55c8db60e... d4dd45dfe3... M	pkg/volume/util/volumehelper/volumehelper.go
@saad-ali
Copy link
Member

saad-ali commented Jun 7, 2017

#39732 merged for 1.7
Closing

@saad-ali saad-ali closed this as completed Jun 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

9 participants