Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-controller-manager crash looping with "nil pointer dereference" in volume/cinder/cinder.go #49418

Closed
saad-ali opened this issue Jul 21, 2017 · 14 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@saad-ali
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
We have at least one report of the v1.7.1 kube-controller-manager binary on a master machine crash looping.

Stack trace:

E0721 18:50:07.099437       7 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:514
/usr/local/go/src/runtime/panic.go:489
/usr/local/go/src/runtime/panic.go:63
/usr/local/go/src/runtime/signal_unix.go:290
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/cinder/cinder.go:97
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/plugins.go:384
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/util/operationexecutor/operation_generator.go:119
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/util/operationexecutor/operation_executor.go:647
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/util/operationexecutor/operation_executor.go:616
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:131
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:122
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:115
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:97
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:98
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:52
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:98
/usr/local/go/src/runtime/asm_amd64.s:2197
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2b0b6c5]

goroutine 1016 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x126
panic(0x3091b80, 0x50f6190)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
k8s.io/kubernetes/pkg/volume/cinder.(*cinderPlugin).CanSupport(0xc42050fa60, 0x0, 0xc4207355a0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/cinder/cinder.go:97 +0x5
k8s.io/kubernetes/pkg/volume.(*VolumePluginMgr).FindPluginBySpec(0xc420d7d700, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/plugins.go:384 +0x153
k8s.io/kubernetes/pkg/volume/util/operationexecutor.(*operationGenerator).GenerateVolumesAreAttachedFunc(0xc420d7edb0, 0xc42211ec00, 0x8, 0x8, 0xc4216fe0f0, 0x23, 0x7f2ecfa82f80, 0xc420d7ed80, 0x4010106, 0x1, ...)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/util/operationexecutor/operation_generator.go:119 +0x14c
k8s.io/kubernetes/pkg/volume/util/operationexecutor.(*operationExecutor).VerifyVolumesAreAttachedPerNode(0xc4205341e0, 0xc42211ec00, 0x8, 0x8, 0xc4216fe0f0, 0x23, 0x7f2ecfa82f80, 0xc420d7ed80, 0xc4225b8270, 0x54)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/util/operationexecutor/operation_executor.go:647 +0x8d
k8s.io/kubernetes/pkg/volume/util/operationexecutor.(*operationExecutor).VerifyVolumesAreAttached(0xc4205341e0, 0xc4225b82a0, 0x7f2ecfa82f80, 0xc420d7ed80)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/util/operationexecutor/operation_executor.go:616 +0xd80
k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler.(*reconciler).syncStates(0xc420ddd7a0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:131 +0xad
k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler.(*reconciler).sync(0xc420ddd7a0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:122 +0x51
k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler.(*reconciler).reconciliationLoopFunc.func1()
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:115 +0x1d9
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc421e44df0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:97 +0x5e
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc421e44df0, 0x5f5e100, 0x0, 0xc420ddd701, 0xc420248f00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:98 +0xbd
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc421e44df0, 0x5f5e100, 0xc420248f00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:52 +0x4d
k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler.(*reconciler).Run(0xc420ddd7a0, 0xc420248f00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go:98 +0x50
created by k8s.io/kubernetes/pkg/controller/volume/attachdetach.(*attachDetachController).Run
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/attachdetach/attach_detach_controller.go:257 +0x2af

What you expected to happen:

Not crash loop.

How to reproduce it (as minimally and precisely as possible):
Uncertain

Anything else we need to know?:
No

Environment:

  • Kubernetes version (use kubectl version): master 1.7.1 node 1.5.3
  • Cloud provider or hardware configuration**:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 21, 2017
@k8s-github-robot
Copy link

@saad-ali
There are no sig labels on this issue. Please add a sig label by:

  1. mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
    e.g., @kubernetes/contributor-experience-<group-suffix> to notify the contributor experience sig, OR

  2. specifying the label manually: /sig <label>
    e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. You can find the group list here and label list here.
The <group-suffix> in the method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 21, 2017
@saad-ali
Copy link
Member Author

CC @kubernetes/sig-storage-bugs

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. labels Jul 21, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 21, 2017
@saad-ali
Copy link
Member Author

Suspected cause: PR #39732 introduces code that populates the attach/detach controller's "actual state of the world" cache with a volume with a nil volume spec. The code assumes that subsequent steps will populate the volume spec. In this case, it appears that does not happen and another attach/detach controller component, syncState, not expecting a nil volume spec, encounters a nil volume spec causing the controller to crash. When the controller restarts the offending code is first to start, and pre-populates the cache with the nil volume spec and the controller eventually hits the same error.

This means the error is not due to any particular volume plugin (cinder, azure, gce pd, etc.), but can potentially happen due to any of the attached volumes.

Details:

Let's follow the stack trace:

  1. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/volume/cinder/cinder.go#L97
    • spec.Volume and spec.PersistentVolume are both nil checked so that must mean that spec itself was nil.
  2. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/volume/plugins.go#L384
    • spec is just pass through here.
  3. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/volume/util/operationexecutor/operation_generator.go#L119
    • spec comes from volumeAttached.VolumeSpec
  4. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/volume/util/operationexecutor/operation_executor.go#L647
    • attachedVolumes is pass through.
  5. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/volume/util/operationexecutor/operation_executor.go#L616
    • nodeAttachedVolumes comes from attachedVolumes map[types.NodeName][]AttachedVolume
  6. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/controller/volume/attachdetach/reconciler/reconciler.go#L131
    • volumesPerNode is returned by rc.actualStateOfWorld.GetAttachedVolumesPerNode()

This indicates that GetAttachedVolumesPerNode() populated a volume with a nil spec at some point.
Let's see where that code happen.

  1. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/controller/volume/attachdetach/cache/actual_state_of_world.go#L592
    • This populates the volume from the volumes that exist in the actual state.

This indicates that the actual_state_of_the_world must have been populated with a volume with an empty spec.

Let's see if ActualStateOfWorld.AddVolumeNode(...) is every called with an empty volumeSpec.

  1. https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/controller/volume/attachdetach/attach_detach_controller.go#L279
    • This code populates the cache with a nil spec:
			// The nil VolumeSpec is safe only in the case the volume is not in use by any pod.
			// In such a case it should be detached in the first reconciliation cycle and the
			// volume spec is not needed to detach a volume. If the volume is used by a pod, it
			// its spec can be: this would happen during in the populateDesiredStateOfWorld which
			// scans the pods and updates their volumes in the ActualStateOfWorld too.
			err = adc.actualStateOfWorld.MarkVolumeAsAttached(uniqueName, nil /* VolumeSpec */, nodeName, attachedVolume.DevicePath)
			if err != nil {
				glog.Errorf("Failed to mark the volume as attached: %v", err)
				continue
			}

This code was added in PR #39732 merged on April 20, 2017 and part of the v1.7.0 release.

There is a comment in that PR apparently calling out this exact issue: #39732 (comment)

So I suspect that the assumptions that code is making are not always true (TODO: need to figure out what cases that is) resulting in the volume spec being nil and causing the controller crash.

Suggested temporary work around:

The suspect code, which populates the ASW with a volume with a nil spec, does not have a feature gate, so it can not be disabled.

However, the code that is crashing because of the nil spec, does have a flag gate. To prevent the crash looping you can set the --disable-attach-detach-reconcile-sync=true flag on the kube-controller-manager binary to disable the component that crashes. This component is not critical for operation, it is used as a safety to detect volumes that the controller thinks are attached but may not be; therefore, it can be safely disabled.

@gnufied
Copy link
Member

gnufied commented Jul 21, 2017

I think what is happening is, we added this check - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_executor.go#L571 thinking it would prevent processing of volumes with nil spec, but that check isn't enough because if there is another volume on the node which has proper spec we are breaking from that loop and we perform verification of volumes one by one.

So to fix this, I am thinking what we can do is - we introduce additional check here - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_generator.go#L119 and skip the volume if it has nil spec.

Also, I think relying on reconciler to fill all those specs is racy and it depends on whether syncStates runs after reconile completes. But since bunch of stuff in reconciler run in its own goroutine we can make no such guarantee. cc @tsmetana

@saad-ali
Copy link
Member Author

saad-ali commented Jul 21, 2017

@gnufied Ya, we just reached the same conclusion. That said, the extra nil check will mask the real issue, i.e. why was there a nil spec to begin with (but we should still do it).

Also, I think relying on reconciler to fill all those specs is racy and it depends on whether syncStates runs before reconile completes. But since bunch of stuff in reconciler run in its own goroutine we can make no such guarantee. cc @tsmetana

Exactly. @msau42's has a solid theory, for what might be happening:

  • A pod is created with a volume that is attached, all is good.
  • kube-controller-manager goes down. In the meantime the pod is deleted.
  • kube-controller-manager starts again, and this populator adds the volume to the actual_state because it is still attached. It does not add it to the desired_state, because the original pod is deleted.
  • The populator code makes this assumption: In such a case it should be detached in the first reconciliation cycle and the volume spec is not needed to detach a volume
  • However, this assumption assumes that the detach happens fast enough, before any other code picks up the nil volume spec.
  • The reconciler will pick start the detach immediately, wait 60 seconds and start the syncStates loop. If the detach hasn't completed it will crash.

So basically @msau42's theory is that a detach triggered by the populator introduced in #39732 races with the syncStates(...) which starts every 60 seconds.

@saad-ali
Copy link
Member Author

To expand on the first part of my comment above, regarding PR #49420. It's fine to add this check. But I question if the caller logic even makes sense. Why is nodeError := oe.VerifyVolumesAreAttachedPerNode... in https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_executor.go#L616 even inside the for loop:

		for _, volumeAttached := range nodeAttachedVolumes {
...
			// If node doesn't support Bulk volume polling it is best to poll individually
			nodeError := oe.VerifyVolumesAreAttachedPerNode(nodeAttachedVolumes, node, actualStateOfWorld)
...
		}

This means that VerifyVolumesAreAttachedPerNode(...) which process all the items in nodeAttachedVolumes is called len(nodeAttachedVolumes) times. That makes no sense.

@gnufied
Copy link
Member

gnufied commented Jul 21, 2017

No it doesn't get called len(nodeAttachedVolumes) number of times. The idea is - if the node has any volume plugin that doesn't support BulkVolumeVerifiction, then we are breaking out of the loop and performing validation of that node individually - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_executor.go#L620

In a ideal world - people wouldn't be using more than one volume type, but even if they do and they have volume types that supports bulkvolumeverification and that doesn't, the code takes care of both.

@gnufied
Copy link
Member

gnufied commented Jul 21, 2017

Also, @tsmetana is working on a proper fix here - #48755 this would ensure that we get rid of splitting volume-plugin's name and we can populate volume spec properly.

@saad-ali
Copy link
Member Author

@gnufied Ack, we'll revisit that code later.

More from @msau42's debugging:

  • The race above seems unlikely; 60 seconds is a very long time for a detach and for it to crash loop would mean that it would need to hit the 60 second race every time, unlikely.
  • Instead maybe the pod was not deleted, instead maybe the PVC was deleted. Consider this scenario:
    • A pod is created referencing a volume via a PVC, volume is attached, all is good.
    • The PVC is deleted but the pod continues to exist.
    • kube-controller-manager maintains information about the PVC/PV in-memory so no problems.
    • kube-controller-manager restarts (all in memory cache is also lost about the PVC/PV).
      • Logs show attach_detach_controller.go:334] Error creating spec for volume... error processing PVC... failed to find PVC... in PVCInformer cache
    • The populator code adds the volume to actual_state without the volume spec.
    • The populator tries to add the volume to the desired state (hoping to fix the volume spec in the process), but fails because CreateVolumeSpec(...) can not find the PVC, so the volume spec remains nil.
    • When syncState kicks off for the first time (60 seconds after controller restart), it encounters the nil volume spec, and panics.

So repro steps should be:

  • Have at least 2 volumes (referenced via PVC) successfully attached to a node (either through a single pod or multiple pods).
  • Delete the PVC for at least 1 of the volumes.
  • Restart the kube-controller-manager
  • Observe kube-controller-manager panic and crash after about 60 seconds.

@saad-ali
Copy link
Member Author

Huge props to @msau42 for helping debug this! And thanks to @gnufied for jumping in to help patch it on a Friday evening.

We think the above is the root cause of this issue.

Work around for folks hitting this issue:

  • Recreate the deleted PVC
  • Or disable syncState by setting --disable-attach-detach-reconcile-sync=true flag on the kube-controller-manager binary to disable the component that crashes

Permanent fixes:

  1. PR Fix controller crash because of nil volume spec #49420 - should fix the symptoms of the immediate issue. @gnufied can you please use the above repro steps to verify the fix, and then cherry pick it to 1.7 branch.
  2. Issue should prevent the deletion of a PVC that is referenced by an active pod #45143 - Prevent the deletion of PVC before pod is deleted.
  3. For the detach race that could trigger this, we can consider Fix #44858: Store the volume attaching state in the node status #48755, but are not too concerned about it since it is exceedingly unlikely and should not result in a crash loop.
  4. We should reconsider the original design of PR Fix issue #34242: Attach/detach should recover from a crash #39732 -- is there anyway to avoid having that component populate the cache with a volume with a nil volume spec?

@gnufied
Copy link
Member

gnufied commented Jul 22, 2017

@saad-ali sure I will test my patch above with repo. steps and update here.

k8s-github-robot pushed a commit that referenced this issue Jul 22, 2017
Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)

Fix controller crash because of nil volume spec

For volumes that don't support bulk volume verification, a nil
volume spec can cause crash of controller.

Fixes #49418

**Release note**:
```release-note
Fixes #49418 where kube-controller-manager can panic on volume.CanSupport methods and enter a crash loop.
```
@gnufied
Copy link
Member

gnufied commented Jul 24, 2017

Tested my fix against both patched and unpatched kube versions and it seems to be working fine.

@tsmetana
Copy link
Member

I'm afraid the fix for PR#47855 will not fix the race I introduced with the fix for PR#39732: it may remove the hack with splitting of the uniqueName. I actually think that the best option really is to count with the possibility the volume spec might be nil. Once the pod is gone there is no sure way to reconstruct the spec. Detaching the volumes with no pods separately from the other volumes (outside the reconciliation loop) might lead to other kinds of problems (it has to be another loop since detach may fail, it would have to be synced with the states of the worlds of the controller, etc.).

@jingxu97
Copy link
Contributor

Just a few things we might consider related to this issue

  1. Refactor the code related to VerifyVolumesAreAttached in operation_executor and operation_generator. The code is a little bit hard to read after a few rounds of changes. This function has two different ways of checking the volume attachment status, one is BulkVerifyVolumes and the other is VerifyVolumesAreAttachedPerNode. Some for loops are performed in operation_executor and some are in operation_generator. They are not consistent in these two VerifyVolume functions. Opened up an issue Refactor the code related to VerifyVolumesAreAttached #49509
  2. About reconstruct VolumeSpec: in volume management code, VolumeSpec is firstly needed to find the volume plugin. After the volume plugin is found, VolumeSpec is again used to attach or mount or check volume is still attached or not. The assumptions used here are
  • If pod still exists, VolumeSpec should be reconstructed when controller is restarted. However, as discover from this bug, it might not be true if PVC is deleted in the meantime because pod only has PVC information and requires to use PVC to get PV information. By fixing Issue should prevent the deletion of a PVC that is referenced by an active pod #45143, we could fix this problem.

  • If pod is already deleted, VolumeSpec is not really needed for detaching. But before detaching finishes, syncState loop might check this volume's attachement status and will use VolumeSpec during checking. One safe guard we can put here is to fail FindAttachablePluginBySpec if spec is nil. This function is used by multiple places such as attach_detach_controller, desired_state_of_world, and actual_state_of_world, oeperation_generator etc. Instead of adding nil check many places, I feel FindAttachablePluginBySpec or even better FindPluginBySpec (is called by FindAttachablePluginBySpec and other places). Open an issue for this Check VolumeSpec if it is nil during volume reconstruction for failure #49513

k8s-github-robot pushed a commit that referenced this issue Sep 23, 2017
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

E2E test to make sure controller does not crash because of nil volume spec

Fixes #49521

Tests fix of issue referenced in #49418
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

6 participants