Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cinder-csi-plugin] VolumeAttachment has attached status true but actual state false #2137

Closed
Payback159 opened this issue Feb 28, 2023 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Payback159
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

csi-attacher in csi-cinder-controllerplugin logs several VolumeAttachments with the message that the attached status is true but actual state is false.

I have tried to understand which 2 states are being compared here but with my current understanding this log entry should not come up.

example:

csi-cinder-controllerplugin-76c485697b-984tc csi-attacher W0228 09:29:56.252060       1 csi_handler.go:186] VA csi-18d7a5e4ba40c759559d5ea42ae029eb21cc1eac39b6323941749a500a198325 for volume d2268cd2-3b8e-43b0-ae5e-b8cde0673d34 has attached status true but actual state false. Adding back to VA queue for forced reprocessing

Can someone please explain to me exactly which 2 states are being compared here? From my point of view the first attached state value should come from kubectl get volumeattachments csi-18d7a5e4ba40c759559d5ea42ae029eb21cc1eac39b6323941749a500a198325. But where exactly does the 2nd value come from, the false value?

What you expected to happen:

Like other clusters with the same component combination, the log entry should not show up at all.

How to reproduce it:

Unfortunately still no idea how

Anything else we need to know?:

Environment:

  • openstack-cloud-controller-manager(or other related binary) version: v1.18.2
  • csi-attacher:v3.3.0
  • cinder-csi-plugin:v1.22.2
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 28, 2023
@jichenjc
Copy link
Contributor

jichenjc commented Mar 1, 2023

this comes from https://github.com/kubernetes-csi/external-attacher/blob/master/pkg/controller/csi_handler.go#L189
but it's not CPO CSI's code, I am curious what issue you are having now ... only the log or any functions not working ?

@Payback159
Copy link
Author

Hi @jichenjc,

thanks for your answer. Specifically, we have the problem that after upgrading our clusters, one cluster since then takes longer than 2 minutes to mount PVs.

A typical event description in the affected cluster:

  Normal   Scheduled               2m32s  default-scheduler        Successfully assigned default/nginx-with-pvc-56c8444fb-q7476 to dev-internal-k8s-node-nf-39
  Warning  FailedAttachVolume      2m31s  attachdetach-controller  Multi-Attach error for volume "pvc-f2dc38a5-f613-41c6-993c-f09ef17353dc" Volume is already used by pod(s) nginx-with-pvc-56c8444fb-r8th6
  Normal   SuccessfulAttachVolume  44s    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-f2dc38a5-f613-41c6-993c-f09ef17353dc"
  Warning  FailedMount             28s    kubelet                  Unable to attach or mount volumes: unmounted volumes=[my-pvc], unattached volumes=[my-pvc kube-api-access-g8kct]: timed out waiting for the condition
  Normal   Pulling                 16s    kubelet                  Pulling image "registry/nginx:latest"
  Normal   Pulled                  12s    kubelet                  Successfully pulled image "registry/nginx:latest" in 3.85668224s
  Normal   Created                 12s    kubelet                  Created container nginx
  Normal   Started                 11s    kubelet                  Started container nginx

Here is an example how it is in another cluster with comparable size:

Normal   Scheduled               28s   default-scheduler        Successfully assigned default/nginx-with-pvc-56c8444fb-cwbl8 to test-internal-k8s-node-nf-39
  Warning  FailedAttachVolume      29s   attachdetach-controller  Multi-Attach error for volume "pvc-997d21c6-1acb-465d-a729-f551ef311c29" Volume is already used by pod(s) nginx-with-pvc-56c8444fb-pttgf
  Normal   SuccessfulAttachVolume  21s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-997d21c6-1acb-465d-a729-f551ef311c29"
  Normal   Pulling                 11s   kubelet                  Pulling image "registry/nginx:latest"
  Normal   Pulled                  9s    kubelet                  Successfully pulled image "registry/nginx:latest" in 1.896766857s
  Normal   Created                 9s    kubelet                  Created container nginx
  Normal   Started                 9s    kubelet                  Started container nginx

The biggest anomaly we found is that the csi-attacher in the affected cluster regularly logs multiple logs regarding the states' misbehavior. The other clusters do not.

From my point of view the csi-attacher gets the 2 states compared here https://github.com/kubernetes-csi/external-attacher/blob/master/pkg/controller/csi_handler.go#L180

Once from here:

https://github.com/kubernetes-csi/external-attacher/blob/master/pkg/controller/csi_handler.go#L134
https://github.com/kubernetes-csi/external-attacher/blob/master/pkg/controller/csi_handler.go#L157

and from here:

https://github.com/kubernetes-csi/external-attacher/blob/master/pkg/controller/csi_handler.go#L162

From my point of view line 162 is comparable or exactly the output like when I do a kubectl get volumeattachments and check the value status.attached.

But I still dont quite get what list exactly the csi-attacher gets when it calls published, err := h.CSIVolumeLister.ListVolumes(ctx) in line 134.

As far as I could understand the CSIVolumeLister is set here: https://github.com/kubernetes-csi/external-attacher/blob/32c6854482d3d1d27ffaeeaf3a37dcc8bfb94506/cmd/csi-attacher/main.go#L214 and thus of type attacher.

From there I am not sure but am I correct in my assumption that the values stored in the published comes from the external csi controller or? I would be interested where the csi-attacher gets the 2nd value false. Since the value false only arises when the nodeID of the VolumeAttachment does not match the "stuff" in the published that has the volumeHandle of the PersistentVolume (volumeHandle value comes from here https://github.com/kubernetes-csi/external-attacher/blob/32c6854482d3d1d27ffaeeaf3a37dcc8bfb94506/pkg/controller/csi_handler.go#L157) and the false value arises here: https://github.com/kubernetes-csi/external-attacher/blob/32c6854482d3d1d27ffaeeaf3a37dcc8bfb94506/pkg/controller/csi_handler.go#L180

@jichenjc
Copy link
Contributor

jichenjc commented Mar 2, 2023

I am not CSI attacher expert but I think this might be helpful for you

https://github.com/kubernetes-csi/external-attacher/blob/master/pkg/controller/csi_handler.go#L49
tells us that it calls external driver which should be
https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/controllerserver.go#L285

so I think there might be something wrong here?
https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/controllerserver.go#L317
e.g the status is incorrect etc?

I suggest you open an issue at https://github.com/kubernetes-csi/external-attacher and link the issue here
they might provide some suggestions as well

@Payback159
Copy link
Author

Thank you for your support! I think it has become a little more understandable for me.

@Payback159
Copy link
Author

Hello @jichenjc,

The logs seem to occur when a PVC exists that has been mounted once in the past and then the pod is shut down (e.g. for autoscaling). As soon as you clean up the unused PVCs the log entry disappears.

So the log entry makes sense again, because an unused PVC has of course the status "attached: false". I would argue that this is actually unclean and an unused / unmounted PVC/PV should not show up in the logs.

How do you see it?

This way I would like to thank you again for your support.

@jichenjc
Copy link
Contributor

an unused / unmounted PVC/PV should not show up in the logs.

yes, seems this reasonable, but as this is csi-attacher code, did you want to align with those folks and see whether there are other use case that must keep?

@Payback159
Copy link
Author

Sure, just wanted to let you know. I think this issue can be closed now. Since I have already opened an issue in the external-attacher project in the course of the analysis and posted the same information there.

@vhurtevent
Copy link

New findings in this issue : #2295

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants