Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud node controller: Only call once into cloud provider #85735

Conversation

alvaroaleman
Copy link
Member

@alvaroaleman alvaroaleman commented Nov 28, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

/kind cleanup

What this PR does / why we need it:

Currently, the cloud provider node initialization wraps both the "calling to cloud provider" and "update node" into a retry.RetryOnConflict. Since calling into the cloudprovider is a pretty expensive operation and there is a high chance of hitting a conflict especially for a new node, this PR changes that behaviour to re-use the result of calling into the cloud provider using a set of modify funcs.

I also thought about using patch, but assumed that if it was that easy someone would have done it long ago already.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

none

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

none

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 28, 2019
@alvaroaleman
Copy link
Member Author

/assign @andrewsykim

@alvaroaleman
Copy link
Member Author

/assign @luxas @thockin

Any chance one of you can have a look, as Andrew is unavailable?


type nodeModify func(*v1.Node)
var nodeModifyers []nodeModify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modifiers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

type nodeModify func(*v1.Node)
var nodeModifyers []nodeModify

if err := func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warrants AT LEAST a comment, if not being a whole new function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was trying to minimize the diff for this PR. I have factored this out to keep the code readable.

if curNode.Spec.ProviderID == "" {
providerID, err := cloudprovider.GetInstanceProviderID(ctx, cnc.cloud, types.NodeName(curNode.Name))
if err == nil {
curNode.Spec.ProviderID = providerID
nodeModifyers = append(nodeModifyers, func(n *v1.Node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, the whole loop would be retried. That would re-evaluate the conditional above. I am not SUPER familiar with this, but it seems that it's safe NOT to re-evaluate the conditional, since (if we did lose a race and hit a conflict) this modify function will be idempotent, right?

If so, please comment. Maybe above, where you declare vars, explain that such modify funcs MUST be safe idempotent in the face of a conflict-and-retry loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, this is tricky enough that I am asking you to make it more obvious by comments, code structure, or both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add godocs mentioning why those modify funcs exist and what expectations there are. Also, this particular part actually did introduce a change in behavior, by allowing to overwrite an existing providerID, which wasn't possible before. I fixed that by also doing the empty string check in the modify.

@alvaroaleman alvaroaleman force-pushed the make-node-initialization-more-efficient branch from d5b2845 to 7326a50 Compare December 17, 2019 15:25
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 17, 2019
@alvaroaleman alvaroaleman force-pushed the make-node-initialization-more-efficient branch 4 times, most recently from 80a124e to f5b5b77 Compare December 17, 2019 18:01
@alvaroaleman
Copy link
Member Author

/retest

@alvaroaleman alvaroaleman force-pushed the make-node-initialization-more-efficient branch from f5b5b77 to 18fa7bd Compare December 17, 2019 19:03
@thockin
Copy link
Member

thockin commented Dec 17, 2019

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 17, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 17, 2019
@k8s-ci-robot k8s-ci-robot merged commit e397797 into kubernetes:master Dec 18, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants