-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Use rate limited work queues and up resync from 15s to 15m #825
Conversation
wongma7
commented
Jun 20, 2018
- Up resync period to 15 minutes
- Ditch goroutinemap and instead use the exponential backoff that comes with rate limited work queues
- Ditch failedProvisionStats and failedDeleteStats, and instead use NumRequeues that comes with rate limited work queues
@cofyc if you would like to review. @childsb if you would like to also/find somebody else willing to :) The PR looks big but the code for claim queue and volume queue is pretty much identical. I considered refactoring it so there is less duplicate code but I will let y'all decide if it is okay as-is. |
lib/controller/controller.go
Outdated
@@ -473,8 +511,8 @@ func NewProvisionController( | |||
|
|||
volumeHandler := cache.ResourceEventHandlerFuncs{ | |||
AddFunc: nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why omitting AddFunc
here? If omitted, volumes will not be queued until first resync happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only care about volume Updates and not Adds because we are checking if a volume has entered 'Released' state and needs to be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but volumes entered 'Released' state during provisioner is not running (e.g. restarting or migrating to another node) will be delayed 15 minutes (by default) to delete. I'm not sure if this is acceptable. Old controller implementation did not handle Add
event, but its resync period is only 15 seconds.
shouldDelete
only checks volume itself, queuing all volumes should not affect performance much IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, you're correct, I've added AddFunc.
lib/controller/controller.go
Outdated
|
||
if err := ctrl.syncClaimHandler(key); err != nil { | ||
if ctrl.claimQueue.NumRequeues(obj) < ctrl.failedProvisionThreshold { | ||
glog.Errorf("retrying syncing claim '%s' because failures %v < threshold %v", key, ctrl.claimQueue.NumRequeues(obj), ctrl.failedProvisionThreshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using warning here to distinguish with giving up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glog does not give us a Warnf :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has Warningf
, see https://godoc.org/github.com/golang/glog#Warningf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autocomplete failed me... done!
also sidenote, logging verbosity is kinda all over the place in general, I will fix it later.
lib/controller/controller.go
Outdated
// Done but do not Forget: it will not be in the queue but NumRequeues | ||
// will be saved until the obj is deleted from kubernetes | ||
} | ||
return fmt.Errorf("error syncing claim '%s': %s", key, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use %q
instead of '%s'
? using %q
to print quoted string seems more idiomatic in golang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, replaced everywhere.
failedProvisionStats: make(map[types.UID]int), | ||
failedDeleteStats: make(map[types.UID]int), | ||
failedProvisionStatsMutex: &sync.Mutex{}, | ||
failedDeleteStatsMutex: &sync.Mutex{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be removed from ProvisionController struct too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lib/controller/controller.go
Outdated
if key, ok = obj.(string); !ok { | ||
ctrl.claimQueue.Forget(obj) | ||
utilruntime.HandleError(fmt.Errorf("expected string in workqueue but got %#v", obj)) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we can simply return this error here, because outside function will call utilruntime.HandleError
if err != nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
LGTM |
New changes are detected. LGTM label has been removed. |
thank you very much for your review, I'll merge this shortly and continue working on controller improvements. |