-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent name and Exponential backoff #66
Persistent name and Exponential backoff #66
Conversation
@lpabon could you please give it a spin in e2e and review the code. |
pkg/controller/controller.go
Outdated
|
||
rep, err := p.csiClient.CreateVolume(ctx, &req) | ||
opts := wait.Backoff{Factor: backoffFactor, Steps: backoffSteps} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question, since I am not familiar with the provisioner code, does this path happen on a separate go routine? Or does it block the provisioner from processing other PVCs?
If it blocks other provisioners, we should change the model such that we have a separate goroutine per volume (maybe using https://github.com/kubernetes/kubernetes/blob/master/pkg/util/goroutinemap/goroutinemap.go).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the provisioner controller code and it seem each Provision action would be run in a separate go routine. So it seems to be safe to block one specific one while it waits for a driver response.
pkg/controller/controller.go
Outdated
@@ -247,12 +251,17 @@ func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.Persis | |||
if err != nil { | |||
return nil, err | |||
} | |||
// create random share name | |||
share := fmt.Sprintf("%s-%s", p.volumeNamePrefix, strings.Replace(string(uuid.NewUUID()), "-", "", -1)[0:p.volumeNameUUIDLength]) | |||
// create persistent name based on a volumeNamePrefix and volumeNameUUIDLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put volume name generation code in to a new method for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean make this code into a separate func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, separate function please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/controller/controller.go
Outdated
if len(pvcUID) == 0 { | ||
return nil, fmt.Errorf("corrupted PVC object, it is missing UID") | ||
} | ||
share := fmt.Sprintf("%s-%s", p.volumeNamePrefix, strings.Replace(string(pvcUID), "-", "", -1)[0:p.volumeNameUUIDLength]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pvcUID guaranteed to be unique across namespaces? If so, that is a pretty cool trick, and should be suggested to kubernetes/community#1790 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it is not 100% guaranteed since there is a super small opportunity of the collision, but I think since the chances are so small, so imho this way should be acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes uses "pvc-" + string(claim.UID)
as unique PV name and nobody complained so far (despite ugly and long name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should align with Kubernetes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/lgtm |
err = wait.ExponentialBackoff(opts, func() (bool, error) { | ||
ctx, cancel := context.WithTimeout(context.Background(), p.timeout) | ||
defer cancel() | ||
rep, err = p.csiClient.CreateVolume(ctx, &req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok, but still depends on the csiCLient to be setup. That is the #64 where this client should always be setup and never stop retrying. It looks like this change does not handler #64. If it does not handle that issue and the client is never setup correctly, then .CreateVolume
will never work. and will eventually time out.
It looks like Provision()
and maybe other calls needs to create the connection to the client at every time it calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, up to @sbezverk if he wants to address it in this PR or in a seperate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lpabon I am sorry but I do disagree with csi client constantly
trying to connect. Here is why: We do not carry state of relation between provisioner and the driver. So every time when there is a need for provisioner to communicate, then it will build connection to the provisioner, check it is actual
state of mind and run the operation. I have done lots of negative testing with this approach and it worked really well. If you can reproduce a failure scenario which is not coverer here, please let me know I would be really interested to dissect it ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I no problem. Let's go with this change and we can adjust the Connect()
call if we need to.
…RS-and-OWNERS_ALIASES OCPBUGS-14811: Chore: Update OWNERS and OWNERS_ALIASES
This PR addresses two issues:
Now there are 2 timers, one short timeout which is enforced by the context, second is exponential backoff timer.
This functionality will require support from the driver. It needs to detect if the request to CreateVolume comes for already initiated volume and then check for completion periodically, once it is completed it should return CreateVolumeResponse struct.
Fixes #63