Allow multiple routers to update route status #8309

DirectXMan12 · 2016-03-31T02:13:50Z

Previously, if multiple differently-named routers attempted to update
their respective Ingress entries in a route status, one would go
through, while the others would recieve conflict errors.

The last-touched cache, designed to prevent fighting between two
same-named routers (for example, on a rolling update), would get
the route added with the zero-time, meaning that the router would
not submit further updates for that route (and consequently that
router would never set the proper status on the route).

Now, conflicts no longer insert an entry into the last-touched cache.
Instead, the code now properly updates the last-touched time, which
prevents the fighting between instances of the same router while
tolerating conflicts inbetween instances of different routers.

Additionally, cache entries now expire after a period of time
(the "contention period"), as originally intended.

DirectXMan12 · 2016-03-31T02:14:12Z

cc @smarterclayton @knobunc @ramr

smarterclayton · 2016-03-31T02:38:37Z

pkg/router/controller/status.go

@@ -157,7 +169,7 @@ func (a *StatusAdmitter) hasIngressBeenTouched(route *routeapi.Route, lastTouch
 // recordIngressTouch tracks whether the ingress record updated succeeded and returns true if the admitter can
 // continue. Conflict errors are treated as no error, but indicate the touch was not successful and the caller
 // should retry.
-func (a *StatusAdmitter) recordIngressTouch(route *routeapi.Route, touch *unversioned.Time, err error) (bool, error) {
+func (a *StatusAdmitter) recordIngressTouch(route *routeapi.Route, touch *unversioned.Time, oldTouch *unversioned.Time, err error) (bool, error) {


This variable should probably be called ourTouch or localTime, since it belongs to us.

It's actually the previous touch time (before we updated it to be now). I'm not super-averse to changing it, but IMO it's clearer as oldTouch or prevTouch (or possibly touch could be newTouch, and oldTouch could be currentTouch)

DirectXMan12 · 2016-03-31T02:39:38Z

I've added in new unit tests, and done a some testing in my local setup, but it would be nice to have another couple pairs of eyes run through the logic.

smarterclayton · 2016-03-31T02:40:14Z

pkg/router/controller/status.go

+			// still need to set the last-touched time so that others can tell we've
+			// modified this Ingress value
+			now := nowFn()
+			ingress.Conditions[i].LastTransitionTime = &now


Since this modifying the ingress, is false really appropriate?

If so, comment.

I think it actually should stay as false -- it's only used in the rejection path, and it's actually in a path where we don't want to update if that's the only thing that would be changed, AFAICT, but I'll double check in the morning when I'm more awake ;-).

smarterclayton · 2016-03-31T02:43:38Z

pkg/router/controller/status_test.go

+
+}
+
+func TestStatusFightBetweenRouters(t *testing.T) {


I think we might want to run a longer sequence of random fighting (increment one or the other randomly and invoke handle route in a for loop), then simulate a jump forward in the clock (set now past the content interval) and verify only one ends up winning.

Would do that as a separate test:

i.e.:

for i ... 100 if rand.Bool() mutateA else mutateB verify postcondition still holds advance clock > contention interval for 1..100

Might even want to simulate 3. It should be fast to run, but I really want to ensure we have a realistic "fight".

I'll try adding something like this in.

DirectXMan12 · 2016-03-31T20:38:59Z

Alright, I've added in a simulated protracted series of events between 2 'old' routers and two 'new' routers. It shows what happens when the new routers spin up, one hits a conflict, and the updates that result. It also shows what happens when an update to a route comes in in the middle of a rolling upgrade, and tests that cache entries expire after a while.

Additionally, I uncovered a couple another corner case that appeared to be incorrect, corrected it, and added tests so that we don't regress in the future. Let me know if I've misdiagnosed anything. Namely: if we reject a route, but only change the hostname, leaving the message the same, we should still consider that changed, and try to update.

smarterclayton · 2016-03-31T20:40:41Z

pkg/router/controller/status_test.go

+
+	t.Logf("...which should cause 'new' router #2 and the two 'old' routers to receive an update, and ignore it")
+	now = unversioned.Time{Time: now.Add(1 * time.Second)}
+	nowFn = func() unversioned.Time { return now }


nit: I wouldn't expect you to have to redefine this since the closure should capture the variable.

smarterclayton · 2016-03-31T20:43:11Z

Changes look good to me, thanks for digging in [test]

DirectXMan12 · 2016-03-31T20:44:48Z

Whoops, don't merge quite yet, there's one tweak I need to make to the test b/c the contention period causes one of the back-and-forths to last 1 round-trip longer (just realized it as I was looking at the tests one last time ;-) )

Previously, if multiple differently-named routers attempted to update their respective Ingress entries in a route status, one would go through, while the others would recieve conflict errors. The last-touched cache, designed to prevent fighting between two same-named routers (for example, on a rolling update), would get the route added with the zero-time, meaning that the router would not submit further updates for that route (and consequently that router would never set the proper status on the route). Now, conflicts no longer insert an entry into the last-touched cache. Instead, the code now properly updates the last-touched time, which prevents the fighting between instances of the same router while tolerating conflicts inbetween instances of different routers. Additionally, cache entries now expire after a period of time (the "contention period"), as originally intended.

DirectXMan12 · 2016-03-31T22:00:52Z

That should fix the "incorrect" test (nothing was broken -- it just slightly misrepresented the sequence of events due to the 'old' routers still being in their contention period).

openshift-bot · 2016-03-31T22:15:19Z

Evaluated for origin test up to e8d857a

openshift-bot · 2016-03-31T23:40:54Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2646/)

smarterclayton · 2016-03-31T23:44:58Z

Lgtm [merge]

openshift-bot · 2016-03-31T23:50:16Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5500/) (Image: devenv-rhel7_3887)

openshift-bot · 2016-03-31T23:50:16Z

Evaluated for origin merge up to e8d857a

smarterclayton reviewed Mar 31, 2016
View reviewed changes

DirectXMan12 force-pushed the bug/route-update-conflicts branch from d5cdfdb to 3df79a4 Compare March 31, 2016 02:40

smarterclayton reviewed Mar 31, 2016
View reviewed changes

DirectXMan12 force-pushed the bug/route-update-conflicts branch 2 times, most recently from 3b8ab82 to 8fe78b8 Compare March 31, 2016 20:36

smarterclayton reviewed Mar 31, 2016
View reviewed changes

DirectXMan12 force-pushed the bug/route-update-conflicts branch from 8fe78b8 to e8d857a Compare March 31, 2016 21:54

openshift-bot merged commit 9436066 into openshift:master Apr 1, 2016

DirectXMan12 deleted the bug/route-update-conflicts branch April 1, 2016 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple routers to update route status #8309

Allow multiple routers to update route status #8309

DirectXMan12 commented Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

smarterclayton Mar 31, 2016

DirectXMan12 Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

smarterclayton Mar 31, 2016

smarterclayton Mar 31, 2016

DirectXMan12 Mar 31, 2016

smarterclayton Mar 31, 2016

DirectXMan12 Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

smarterclayton Mar 31, 2016

smarterclayton commented Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

openshift-bot commented Mar 31, 2016

openshift-bot commented Mar 31, 2016

smarterclayton commented Mar 31, 2016

openshift-bot commented Mar 31, 2016

openshift-bot commented Mar 31, 2016

Allow multiple routers to update route status #8309

Allow multiple routers to update route status #8309

Conversation

DirectXMan12 commented Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 commented Mar 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 commented Mar 31, 2016

Choose a reason for hiding this comment

smarterclayton commented Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

DirectXMan12 commented Mar 31, 2016

openshift-bot commented Mar 31, 2016

openshift-bot commented Mar 31, 2016

smarterclayton commented Mar 31, 2016

openshift-bot commented Mar 31, 2016

openshift-bot commented Mar 31, 2016