-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long lasting request (attach/detach) fail because token expiration #1394
Comments
what error/response code does cinder API return? If there is a 401, openstack client should reauth automatically and retry. UPD:
new service client creation won't cause reauth, you need to create a brand new provider client and auth it. |
It is Nova where the attach and detach requests are going. It returns normally as everything is fine. However it will then internally after API call start handling the attaching/detaching using the same token that user used against it. If that token now expires the volume handling will stall and no further action is possible. Cinder CSI has now way of detecting or fixing it as volume is now permanently in attaching/detaching state. Some one has to forcefully reset the state of the volume. Okei yes then we would need to call CreateOpenStackProvider() if I am not wrong. Or if we could reauthenticate ~50% of token expiration they it could fix the problem. However I do not think gophercloud implements such feature or is willing to implement. |
I don't think that reauth in advance would help. Nova API is asynchronous and we probably need to introduce additional "waitfor" functions to verify the status of the volume like it is done for LBaaS. |
Problem isn't the waiting. The attach and detach already implement the waiting. Problem is that Openstack Nova it self is poorly implemented. It just blindly will reuse token given by user for its future actions. Example In some openstacks they introduced optional feature where Nova will create its own service token that it self manages. And when it does these delecation actions with user token it authorizes using both tokens. (its own and user token) this then allows accepting expired user tokens as long as Novas own service token is still valid. So if Cinder CSI would create new token/client every time attaching or detaching this would be resolved as then openstack is usually capable of handling the request. |
Since you're talking about the internal openstack services communication, are there related openstack bug reports? Are the nova/cinder attach/detach actions the only actions, which require a token renewal? cc @Joker-official @RaphaelVogel can it be related to our cases, when volumes attach/detach fail? |
@kayrus I assume this will be fixed at gopher cloud layer to add a re-auth function on some time limit? just curious, how does gophercloud handle token expire issue now? e.g we created a cloud provider client then if overtime the token expire in OCCM layer, what will happen? thanks |
reauth is triggered by a 401 response code. So there must be an API call performed to identify that the client token is expired. |
Thanks, this make sense for me ,so the issue is if this token expire in the middle of a long call, then gophercloud can do nothing... thus we need introduce timely refresh mechanism.. |
I would assume any action where Nova makes a call to another service can be problematic in some way. Adding/Removing a new port might thus be problematic. But I would assume a retry helps there.
We have service tokens enabled since ~2 years and it helped a lot. |
The service token was introduce to Openstack to fix this issue. However not all openstacks on production have this feature enabled. And usually users do not see this if they use openstack cli/horizon for all their actions as those will create new token for every request, even some times they might create multiple tokens as that code was never build to be efficient. But if gophercloud could have feature where it would renew token after X minutes it would solve this issue. Or at least if there would exits long lasting request you would only need to tune the renew after value on CSI and/or update openstack token expiration time. Luckily attach and detach are usually quite fast. Even with slow systems it just take some minutes. So just to be able to renew token some time before it expires could solve the issue on those old openstacks. |
each token contains an {
"token": {
"audit_ids": [
"abcdefg"
],
"catalog": "***",
"expires_at": "2021-02-03T19:50:45.000000Z",
...
} This attribute can be checked in advance before each API call, and if necessary trigger a reauth. @mape90 it's better to discuss this topic directly in gophercloud. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Cinder internally fails to attach or to detach. Causing permanent failure for volume.
Issue is due openstack design problem where they reuse user tokens internally and has solution on OS side.
https://docs.openstack.org/cinder/latest/configuration/block-storage/service-token.html
However not every one have that fix or is willing or able to fix their openstack installation.
What you expected to happen:
Detach and attach shuold work.
How to reproduce it:
Have openstack without service tokens configured for Cinder service. And attach or detach volumes during the time when token is about to be expired. This also needs that attach/detach are a bit slow on openstack side so that window of failure would be bigger.
Anything else we need to know?:
Fix is to create new client everytime we do attach or detach. (AttachVolume and DetachVolume in openstack_volume.go in csi/cinder/openstack)
cli, err := openstack.NewComputeV2(os.compute.ProviderClient, os.epOpts)
It is already done for some openstack calls line attach+multiattach and in volume expansion.
Downside on this is that we need to create extra call to keystone api, but this fixes many error situations on older openstacks that do not have service tokens defined by default. I am not sure if those are by default enabled even on newer openstacks.
All the currently maintained binaries are:
/kind bug
The text was updated successfully, but these errors were encountered: