Bug 2082599: add upper bound to number of retries #2970

ricky-rav · 2022-05-06T14:00:46Z

The retry logic should not attempt to add or delete an object indefinitely. Adding an upper bound to the maximum number of times we can attempt to add or delete a given object, as we do already in level-driven controllers.

fixes #2082599

Signed-off-by: Riccardo Ravaioli rravaiol@redhat.com

coveralls · 2022-05-06T14:29:37Z

Coverage increased (+0.09%) to 52.043% when pulling 1d5486c on ricky-rav:maxretry into fc66d17 on ovn-org:master.

ricky-rav · 2022-05-09T10:53:24Z

PTAL @trozet , @msherif1234 . Thanks!

msherif1234

overall looks good threshold is a bit too high IMHO timer fires every 30 sec so that means keep retry for 7 mins ?
left few comments

Thanks!!

go-controller/pkg/ovn/obj_retry.go

msherif1234 · 2022-05-10T22:15:39Z

go-controller/pkg/ovn/obj_retry.go

 			if !isResourceScheduled(r.oType, entry.oldObj) {
 				klog.V(5).Infof("Retry: %s %s not scheduled", r.oType, objKey)
+				entry.failedRetries++


why inc here we aren't scheduled for retry yet right ?

Yeah, I wasn't 100% sure here. Can a pod be never scheduled? Should we keep retrying forever in that case?
I'm lacking some hands-on experience on what happens with these pods that fail to be added/deleted...

@trozet any input? :)

sure a pod can never be scheduled. If it has a node selector that isn't applicable (either the selector doesnt apply to any nodes, or the nodes it does apply to are unready. In either case once the pod is scheduled, we would get a pod update event. I think we can try a number of times and give up or just ignore pods that are not scheduled and not add them to retry. Either is fine.

msherif1234 · 2022-05-10T22:16:13Z

go-controller/pkg/ovn/obj_retry.go

 			if !isResourceScheduled(r.oType, entry.newObj) {
 				klog.V(5).Infof("Retry: %s %s not scheduled", r.oType, objKey)
+				entry.failedRetries++


go-controller/pkg/ovn/obj_retry.go

go-controller/pkg/ovn/pods_test.go

go-controller/pkg/ovn/obj_retry.go

ricky-rav · 2022-05-11T15:38:13Z

Let's discuss also the value for the maximum number of retries, as @msherif1234 suggested. For how long do we care about an object?

trozet

overall lgtm

@ricky-rav let me know if you want to update anything else before merge

go-controller/pkg/ovn/obj_retry.go

ricky-rav · 2022-05-12T10:49:41Z

@trozet thanks! I've updated the two comments you pointed at, I think we're good now.

ricky-rav · 2022-05-12T19:05:25Z

I've just rebased. PR should be ready for merging once CI is green.

msherif1234 · 2022-05-12T20:30:35Z

/lgtm

ricky-rav · 2022-05-23T16:55:44Z

I reworked a bit the failed retry counter so that we now take into account all failed attempts to add/update/delete an object (and not just failed retries), and we initialize to 0 the counter every time a new add/update/event comes in, which was missing in my initial commit.

@msherif1234 @trozet PTAL

msherif1234 · 2022-05-24T14:15:17Z

go-controller/pkg/ovn/obj_retry.go

 					entry.timeStamp = time.Now()
+					entry.failedAttempts++


here u increment the counter directly while below u use the new method which use locking , do we really need to lock in the new method ? and if we do then probably we will need a read method with lock to use in iterate when compare against the max limit ?

It happens at the very beginning of iterateRetryResources:

func (oc *Controller) iterateRetryResources(r *retryObjs, updateAll bool) { r.retryMutex.Lock() defer r.retryMutex.Unlock() now := time.Now()

To be honest, I don't have a very strong opinion about needing to acquire a lock when incrementing the counter...

The retry logic should not attempt to add/update/delete an object indefinitely. Adding an upper bound to the maximum number of times we can attempt to add/update/delete a given object, as we do already in level-driven controllers. Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>

ricky-rav · 2022-06-14T14:20:33Z

/retest-failed

ricky-rav changed the title ~~add upper bound to number of retries~~ Bug 2082599: add upper bound to number of retries May 6, 2022

ricky-rav force-pushed the maxretry branch from ca62dc5 to 135d308 Compare May 6, 2022 14:10

msherif1234 reviewed May 10, 2022

View reviewed changes

ricky-rav force-pushed the maxretry branch from 135d308 to b028a62 Compare May 11, 2022 15:28

trozet approved these changes May 11, 2022

View reviewed changes

go-controller/pkg/ovn/obj_retry.go Show resolved Hide resolved

go-controller/pkg/ovn/obj_retry.go Outdated Show resolved Hide resolved

ricky-rav force-pushed the maxretry branch from b028a62 to 3a3f6d2 Compare May 12, 2022 10:48

ricky-rav force-pushed the maxretry branch 2 times, most recently from 15f4c04 to b331efc Compare May 12, 2022 19:03

msherif1234 approved these changes May 12, 2022

View reviewed changes

ricky-rav force-pushed the maxretry branch from b331efc to b9c1a46 Compare May 23, 2022 16:50

ricky-rav mentioned this pull request May 24, 2022

do not log start of periodic retry #2991

Closed

ricky-rav force-pushed the maxretry branch from b9c1a46 to 1d5486c Compare May 24, 2022 13:05

msherif1234 reviewed May 24, 2022

View reviewed changes

ricky-rav force-pushed the maxretry branch from 1d5486c to 01c366c Compare June 8, 2022 15:59

ricky-rav force-pushed the maxretry branch from 01c366c to ca4559b Compare June 8, 2022 16:03

trozet approved these changes Jun 14, 2022

View reviewed changes

trozet merged commit afe4007 into ovn-org:master Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 2082599: add upper bound to number of retries #2970

Bug 2082599: add upper bound to number of retries #2970

ricky-rav commented May 6, 2022 •

edited

coveralls commented May 6, 2022 •

edited

ricky-rav commented May 9, 2022

msherif1234 left a comment •

edited

msherif1234 May 10, 2022

ricky-rav May 11, 2022

trozet May 11, 2022

msherif1234 May 10, 2022

ricky-rav commented May 11, 2022

trozet left a comment

ricky-rav commented May 12, 2022

ricky-rav commented May 12, 2022

msherif1234 commented May 12, 2022

ricky-rav commented May 23, 2022 •

edited

msherif1234 May 24, 2022

ricky-rav May 24, 2022

ricky-rav commented Jun 14, 2022

Bug 2082599: add upper bound to number of retries #2970

Bug 2082599: add upper bound to number of retries #2970

Conversation

ricky-rav commented May 6, 2022 • edited

coveralls commented May 6, 2022 • edited

ricky-rav commented May 9, 2022

msherif1234 left a comment • edited

Choose a reason for hiding this comment

msherif1234 May 10, 2022

Choose a reason for hiding this comment

ricky-rav May 11, 2022

Choose a reason for hiding this comment

trozet May 11, 2022

Choose a reason for hiding this comment

msherif1234 May 10, 2022

Choose a reason for hiding this comment

ricky-rav commented May 11, 2022

trozet left a comment

Choose a reason for hiding this comment

ricky-rav commented May 12, 2022

ricky-rav commented May 12, 2022

msherif1234 commented May 12, 2022

ricky-rav commented May 23, 2022 • edited

msherif1234 May 24, 2022

Choose a reason for hiding this comment

ricky-rav May 24, 2022

Choose a reason for hiding this comment

ricky-rav commented Jun 14, 2022

ricky-rav commented May 6, 2022 •

edited

coveralls commented May 6, 2022 •

edited

msherif1234 left a comment •

edited

ricky-rav commented May 23, 2022 •

edited