New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Route creation reconciler loop. #8164
Conversation
366beaa
to
8bb2ed3
Compare
c13cee2
to
93a2e22
Compare
f9b3ba4
to
01f6aeb
Compare
}).Do() | ||
if err != nil { | ||
return err | ||
} | ||
if err := gce.waitForGlobalOp(insertOp); err != nil { | ||
if gapiErr, ok := err.(*googleapi.Error); ok && gapiErr.Code == http.StatusConflict { | ||
// TODO (cjcullen): Make this actually check the route is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resolution of this TODO is that the reconciler will now remove an incorrect route and create a correct one. Any transient 409s or 404s from reconciling will be logged, but will not prevent progress.
(Other than that, LGTM.) |
fdcceb5
to
d58c7aa
Compare
if err := rc.reconcile(testCase.nodes, testCase.initialRoutes); err != nil { | ||
t.Errorf("%d. Error from rc.reconcile(): %v", i, err) | ||
} | ||
time.Sleep(10 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel gross about this. The reconciler kicks off goroutines to add/delete routes in the cloudprovider, so immediately checking the state of the fake_cloud in the main thread will show that nothing has been updated. A sleep of pretty much any length is enough to allow the background threads to update the in-memory representation, but it feels so wrong.
Any suggestions on a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notify via a channnel that one is complete.
ee492dc
to
75703f2
Compare
@cjcullen Have you run e2e test with your change yet? |
Yes. All green (1 time). This was already rebased on top of #6949 though :/ |
Rebased against #6949. Now running e2e's (over and over again). |
2 for 2 on e2es so far after the rebase (tearing down and rebuilding each time) |
This already got an LGTM before I rebased. Any opposition to me adding the status/LGTM tag? |
if err != nil { | ||
return fmt.Errorf("error listing routes: %v", err) | ||
} | ||
nodeList, err := rc.kubeClient.Nodes().List(labels.Everything(), fields.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: use pkg/controller/framework.NewInformer to watch this and reduce the number of lists needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
LGTM-- a few requests for comments |
Thank you very much, unfortunately I noticed one more thing. Sorry! |
tick := time.Tick(10 * time.Millisecond) | ||
poll: | ||
for { | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lavalamp is this more like what you were thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you, much better!
Rename reconcilePodCIDRs to reconcileNodeCIDRs. Add comments and TODOs about using controller framework.
LGTM |
LGTM. but still @cjcullen have you run e2e with this one yet? cc/ @justinsb on aws, and @derekwaynecarr on vagrant. Just a FYI. |
e2e's were solid green 2 days ago. I'll rebase and run another round (or 3) to get a little more confidence that this will play nice with what we have at head. |
@cjcullen Thanks for your patience. |
Route creation reconciler loop.
This teases apart node registration from cloudprovider routing configuration. It also makes route configuration robust against controller crashes and transient cloudprovider failures.
RIght now, I just have this naively syncing every 10 seconds. It would be nicer to react to changes in the NodeList, and then just run a background sync every few minutes. (@lavalamp, I'm told you're the expert on that :) )
I'd like to wait for #7984 to get in and then rebase this before merging.