Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple routers to update route status #8309

Merged

Conversation

DirectXMan12
Copy link
Contributor

Previously, if multiple differently-named routers attempted to update
their respective Ingress entries in a route status, one would go
through, while the others would recieve conflict errors.

The last-touched cache, designed to prevent fighting between two
same-named routers (for example, on a rolling update), would get
the route added with the zero-time, meaning that the router would
not submit further updates for that route (and consequently that
router would never set the proper status on the route).

Now, conflicts no longer insert an entry into the last-touched cache.
Instead, the code now properly updates the last-touched time, which
prevents the fighting between instances of the same router while
tolerating conflicts inbetween instances of different routers.

Additionally, cache entries now expire after a period of time
(the "contention period"), as originally intended.

@DirectXMan12
Copy link
Contributor Author

cc @smarterclayton @knobunc @ramr

@@ -157,7 +169,7 @@ func (a *StatusAdmitter) hasIngressBeenTouched(route *routeapi.Route, lastTouch
// recordIngressTouch tracks whether the ingress record updated succeeded and returns true if the admitter can
// continue. Conflict errors are treated as no error, but indicate the touch was not successful and the caller
// should retry.
func (a *StatusAdmitter) recordIngressTouch(route *routeapi.Route, touch *unversioned.Time, err error) (bool, error) {
func (a *StatusAdmitter) recordIngressTouch(route *routeapi.Route, touch *unversioned.Time, oldTouch *unversioned.Time, err error) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable should probably be called ourTouch or localTime, since it belongs to us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the previous touch time (before we updated it to be now). I'm not super-averse to changing it, but IMO it's clearer as oldTouch or prevTouch (or possibly touch could be newTouch, and oldTouch could be currentTouch)

@DirectXMan12
Copy link
Contributor Author

I've added in new unit tests, and done a some testing in my local setup, but it would be nice to have another couple pairs of eyes run through the logic.

// still need to set the last-touched time so that others can tell we've
// modified this Ingress value
now := nowFn()
ingress.Conditions[i].LastTransitionTime = &now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this modifying the ingress, is false really appropriate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it actually should stay as false -- it's only used in the rejection path, and it's actually in a path where we don't want to update if that's the only thing that would be changed, AFAICT, but I'll double check in the morning when I'm more awake ;-).


}

func TestStatusFightBetweenRouters(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to run a longer sequence of random fighting (increment one or the other randomly and invoke handle route in a for loop), then simulate a jump forward in the clock (set now past the content interval) and verify only one ends up winning.

Would do that as a separate test:

i.e.:

for i ... 100
  if rand.Bool()
    mutateA
  else
    mutateB
  verify postcondition still holds

advance clock > contention interval

for 1..100

Might even want to simulate 3. It should be fast to run, but I really want to ensure we have a realistic "fight".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try adding something like this in.

@DirectXMan12 DirectXMan12 force-pushed the bug/route-update-conflicts branch 2 times, most recently from 3b8ab82 to 8fe78b8 Compare March 31, 2016 20:36
@DirectXMan12
Copy link
Contributor Author

Alright, I've added in a simulated protracted series of events between 2 'old' routers and two 'new' routers. It shows what happens when the new routers spin up, one hits a conflict, and the updates that result. It also shows what happens when an update to a route comes in in the middle of a rolling upgrade, and tests that cache entries expire after a while.

Additionally, I uncovered a couple another corner case that appeared to be incorrect, corrected it, and added tests so that we don't regress in the future. Let me know if I've misdiagnosed anything. Namely: if we reject a route, but only change the hostname, leaving the message the same, we should still consider that changed, and try to update.


t.Logf("...which should cause 'new' router #2 and the two 'old' routers to receive an update, and ignore it")
now = unversioned.Time{Time: now.Add(1 * time.Second)}
nowFn = func() unversioned.Time { return now }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wouldn't expect you to have to redefine this since the closure should capture the variable.

@smarterclayton
Copy link
Contributor

Changes look good to me, thanks for digging in [test]

@DirectXMan12
Copy link
Contributor Author

Whoops, don't merge quite yet, there's one tweak I need to make to the test b/c the contention period causes one of the back-and-forths to last 1 round-trip longer (just realized it as I was looking at the tests one last time ;-) )

Previously, if multiple differently-named routers attempted to update
their respective Ingress entries in a route status, one would go
through, while the others would recieve conflict errors.

The last-touched cache, designed to prevent fighting between two
same-named routers (for example, on a rolling update), would get
the route added with the zero-time, meaning that the router would
not submit further updates for that route (and consequently that
router would never set the proper status on the route).

Now, conflicts no longer insert an entry into the last-touched cache.
Instead, the code now properly updates the last-touched time, which
prevents the fighting between instances of the same router while
tolerating conflicts inbetween instances of different routers.

Additionally, cache entries now expire after a period of time
(the "contention period"), as originally intended.
@DirectXMan12
Copy link
Contributor Author

That should fix the "incorrect" test (nothing was broken -- it just slightly misrepresented the sequence of events due to the 'old' routers still being in their contention period).

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to e8d857a

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2646/)

@smarterclayton
Copy link
Contributor

Lgtm [merge]

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5500/) (Image: devenv-rhel7_3887)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to e8d857a

@openshift-bot openshift-bot merged commit 9436066 into openshift:master Apr 1, 2016
@DirectXMan12 DirectXMan12 deleted the bug/route-update-conflicts branch April 1, 2016 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants