New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] RWX volume remains attached after workload deleted if it's upgraded from v1.4.2 #6139
Comments
The detachment needs to wait for a while. cc @PhanLe1010 |
statefultset was deleted at Then, the
|
Just for information. It still remains attached and healthy after a more than 3 hours waiting. |
I see. This case can happen when there is no workload pod on the same node with the share manager pod. When user upgrade to 1.5.x, we create an upgrade AD ticket for the share manager's node to keep volume attached there. When the user scale down the workload, we don't cleanup that upgrade AD ticket because that ticket is on different node as the workload pod's node. As the result, no one is cleaning up the upgrade AD ticket and the volume stuck there forever. Still figuring out how to fix this issue. |
@yangchiu Do you have the support bundle after 3 hours? |
It sounds we create a ticket for the node where the share manager pod is running. Can we create multiple tickets for nodes where workloads are running instead? |
This is a potential approach. We need to be careful not to fall into some race conditions though. For example, at the time we decided to create a ticket for a running workload, it was running but when we finished creating the ticket the workload already stopped. Now there will be a leftover ticket that nobody is going to clean it up |
I am thinking if we should use the native Kubernetes volumeAttachment object to create ticket instead. The logic is:
|
Hmm. But we still have problem with the both of the above approaches:
|
How about this design? In the upgrgade path:
Limitation:
Another benefit of this design:
|
Btw, this issue is side effect of this fix https://github.com/longhorn/longhorn-manager/pull/1993/files. We fixed the detaching issue but introduced the stuck attaching issue as a side effect 馃槃 |
The design looks good to me.
Agree with the removal. |
Clarify it a bit. It is not a side effect of the detaching fix. The Hence, both RWX volumes attaching and detaching issues exist in AD controller design and implementation. |
You are right @derekbit it is already exist. The behavior just a little bit different but there is still same issue. Thank you for the clarification |
@PhanLe1010 can't we just delete the ticket (upgrader) if the share manager is not required anymore given no workloads are using it (of course not maintenance mode), no matter what node the ticket is belonging to? or anything I missed? any obstacles? |
BTW, the proposal I agreed as well, especially for having the correct attacker type. For static volume attachment (not triggered by K8s volume attachment), why not just use |
|
Thanks @innobead and @shuo-wu for the great feedbacks! I agree with points 1 and 2 @shuo-wu mentioned above (which I understand that they are also the points that @innobead proposed. Please correct me if I am understand it wrong @innobead ) For the point 2 that @shuo-wu mentioned:
Let's me evaluate more to see which one is better option: |
@PhanLe1010 This seems a blocker for 1.5.0. Let's make this the highest priority to tackle first. Thanks. |
Point 3 I mentioned was similar to what David suggested but with extra concerns about the auto-attached volumes. |
I think the
|
Pre Ready-For-Testing Checklist
|
Test PlanCheck the problem in the issue description
Test attach/detach after upgrade
Upgrade tests
|
Verified pass in
After upgrade from v1.4.2 to master and upgrade from v1.4.2 to v1.5.x, perform test steps were passed, delete workload, the RWX volume become detached as well. |
Describe the bug (馃悰 if you encounter this issue)
RWX volume remains attached and healthy after workload deleted if it's created in
v1.4.2
and then upgraded tomaster-head
orv1.5.x-head
.Directly create/delete a workload using RWX volume in
master-head
orv1.5.x-head
doesn't have this issue.To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Log or Support bundle
supportbundle_7c27605c-22df-493e-9e2a-b9135c68e20b_2023-06-16T02-29-21Z.zip
Environment
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: