Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee watch before action in e2e event observer helper function. #43107

Merged

Conversation

ConnorDoyle
Copy link
Contributor

What this PR does / why we need it:

Adds a missing synchronization barrier to an e2e event observation helper function.

  • This change should guarantee that in observeEventAfterAction,
    the action is only executed after the informer begins watching
    the event stream.

Release note:

NONE

cc @kubernetes/sig-scheduling-pr-reviews @bsalamat

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 14, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 14, 2017
@k8s-reviewable
Copy link

This change is Reviewable

- This change should guarantee that in observeEventAfterAction,
  the action is only executed after the informer begins watching
  the event stream.
@ConnorDoyle ConnorDoyle force-pushed the guarantee-watch-before-action branch from cf91666 to 05696cf Compare March 14, 2017 23:02
@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 14, 2017
@bsalamat
Copy link
Member

/assign

@bsalamat
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ConnorDoyle, bsalamat
We suggest the following additional approver: @smarterclayton

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@bsalamat
Copy link
Member

Thanks Connor for fixing this.

@ConnorDoyle
Copy link
Contributor Author

Cc @timothysc

@timothysc timothysc self-assigned this Mar 15, 2017
@timothysc timothysc self-requested a review March 15, 2017 14:39
@timothysc
Copy link
Member

This change should guarantee that in observeEventAfterAction,
the action is only executed after the informer begins watching
the event stream.

@ConnorDoyle , I'm slightly confused... how could the action be executed prior to the event stream?

@aveshagarwal
Copy link
Member

@ConnorDoyle Trying to understand here, if this logic is really needed in both functions.

So you are doing this, i think with the assumption, that when controller.Run is executed, watch might not have started, before in the immediate next step action is executed and it has a chance of missing the event/node update? May be like in nanosecond order.

And then, later down the code, you are polling waiting for the event/node update upto 2 mins (order of mins), with the assumption, that the even/node update might not have appeared upto 2 mins?

@ConnorDoyle
Copy link
Contributor Author

@timothysc and @aveshagarwal, here's an illustration of the before/after scenarios around this PR as I understand it:

Before: no ordering gurarantee between concurrent test and informer goroutines.

go test
   |
   |
 go controller.Start ---- controller.Start()
   |                         |
 action()                 f.ClientSet.Core().Events(f.Namespace.Name).Watch(options)
   |                         |
 ...                       ...

After: added synchronization to provide ordering guarantee.

go test
   |
   |
 go controller.Start ---- controller.Start()
   |                         |
   |                      f.ClientSet.Core().Events(f.Namespace.Name).Watch(options)
   |                         |
   |                      close(informerStarted)
   |
 <-informerStarted
   |
 action()
 ...

@aveshagarwal
Copy link
Member

@ConnorDoyle I think I get that. I was wondering if it brings any real advantage in most (or all practical e2e) scenarios. May be like in a corner case that might not be experienced ever. That said I dont see any harm really in doing this and it LGTM.

@timothysc
Copy link
Member

@ConnorDoyle gotcha... I have other comments about some of the logic but we can hold off and revisit, b/c I want to see @bsalamat 's flake patch get in.

@timothysc timothysc added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 15, 2017
@bsalamat
Copy link
Member

@aveshagarwal as @ConnorDoyle graph shows, there was a clear race condition between starting the watch and performing the action. Please note that the event that we are watching for happens as a result of action. Without Connor's change, the action could happen before starting the watch (and therefore missing the event). Connor's change improves reliability of our event watching mechanism.

@aveshagarwal
Copy link
Member

@bsalamat Thats what I was trying to understand/explain in #43107 (comment), what is the probability of that happening, given that the polling code waits for upto 2m?

@ConnorDoyle
Copy link
Contributor Author

@aveshagarwal the race occurs before the test goroutine begins polling. More generally, the point behind the PR that depends on this (#42928) is to rely on causality, not timing in the scheduling e2e tests. The point is to eliminate the need for speculation about probabilities of the various interleavings based on expected timings.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 355f576 into kubernetes:master Mar 15, 2017
@ConnorDoyle ConnorDoyle deleted the guarantee-watch-before-action branch March 15, 2017 18:10
@k8s-ci-robot
Copy link
Contributor

@ConnorDoyle: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins non-CRI GCE e2e 05696cf link @k8s-bot non-cri e2e test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants