Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the latest cloud provider node addresses #65226

Conversation

ingvagabund
Copy link
Contributor

@ingvagabund ingvagabund commented Jun 19, 2018

What this PR does / why we need it:
Buffer the recently retrieved node address so they can be used as soon as the next node status update is run.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #65814

Special notes for your reviewer:

Release note:

None

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Jun 19, 2018
@ingvagabund
Copy link
Contributor Author

@sjenning PTAL

@k8s-ci-robot k8s-ci-robot requested a review from vishh June 19, 2018 14:51
// Last list of node addresses retrieved from the cloud provider
cloudproviderLastNodeAddresses []v1.NodeAddress
// Last error retrieved from the cloud provider
cloudproviderLastError error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed here as member var and not just in the kubelet_node_status code block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both variables needs to be on the global level so the go routine does not store its values into a local variable that disappears. Plus, the cloud related code is going out at some point and the variables are not exported. So it's fine to keep it that way. We can change it and polish later.

@@ -486,7 +486,7 @@ func (kl *Kubelet) setNodeAddress(node *v1.Node) error {
kl.cloudproviderRequestMux.Unlock()

go func() {
nodeAddresses, err = instances.NodeAddresses(context.TODO(), kl.nodeName)
kl.cloudproviderLastNodeAddresses, kl.cloudproviderLastError = instances.NodeAddresses(context.TODO(), kl.nodeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i may be missing something obvious, but seems like cloudProviderLastError can just move next to declaration of var err error above on line 476

Copy link
Contributor Author

@ingvagabund ingvagabund Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The culprit is actually the case <-kl.cloudproviderRequestSync case. Each time the go routine asking for the node addresses finishes, it sends a value to the kl.cloudproviderRequestSync channel. If the happens after the kl.cloudproviderRequestTimeout, the kl.cloudproviderRequestSync is non-empty and in the next node status iteration the select picks the first case <-kl.cloudproviderRequestSync instead of waiting for the go routine to finish. Given the nodeAddresses and err are local variables and are both nil (cause the go routine responsible for setting them stores the values into previous local variables which no longer exist in the setNodeAddress scope), the nodeAddresses and err is always nil from the moment the timeout occurs.

Thus we need to store both (the list of node addresses and the error) into variables that are not local to the setNodeAddress method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there must be a simpler way to do this. we've added 6 fields to the kubelet struct to support the timeout in this single area of the code. i'm not saying i see it yet. but there must be. i'm too tired to see it atm though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it is writing to member variables asynchronously outside a lock... isn't that a crashing data race?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely to happen but it is. Just the chance is very low. I am addressing this in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will update this PR with the suggested refactoring

@derekwaynecarr
Copy link
Member

it may help give some clarity on the issue we are having to describe the reason for the change for posterity.

@ingvagabund
Copy link
Contributor Author

/retest

@sjenning
Copy link
Contributor

@ingvagabund what would you think about abstracting this to a cloudRequestManager that implements

type CloudRequestManager interface {
  Start(cloud cloudprovider.Interface) error
  NodeAddresses(ctx context.Context, name types.NodeName) ([]v1.NodeAddress, error)
}

and abstract all this internal state out of the kubelet and remove the 6 new fields between #62543 and this PR?

kl.cloudRequestManager.Start() could be called where we start the other kubelet managers and block starting the kubelet on the success of first instances.NodeAddresses(), ensuring that the call in setNodeAddress() never fails. After that, the manager can cache the result for future calls to kl.cloudRequestManager.NodeAddresses() and can update against the cloud provided at whatever interval we want (or the cloud provider will allow).

I know this PR has been proven to work, so might do this later. But I had the idea in my head; wanted to get it in writing for later.

@ingvagabund
Copy link
Contributor Author

ingvagabund commented Jun 26, 2018

@sjenning what you are saying make sense. So far I am aware of two requests that are send:

  • request for node addresses
  • request for external ID

Do we want to have a cloud request manager for each cloud resource we ask for? Or group them based on some criteria? There are cloud resources we ask for once in a while (e.g. the node external ID) and cloud resources we ask for periodically (e.g. the node addresses). So maybe s/CloudRequestManager/CloudRequestSyncManager/? Effectively build it on top of the concept of an informer?

@sjenning
Copy link
Contributor

Do we want to have a cloud request manager for each cloud resource we ask for? Or group them based on some criteria? There are cloud resources we ask for once in a while (e.g. the node external ID) and cloud resources we ask for periodically (e.g. the node addresses). So maybe s/CloudRequestManager/CloudRequestSyncManager/? Effectively build it on top of the concept of an informer?

Yes, I think we are on the same wavelength. CloudRequestSyncManager is good too. We can use it to basically buffer all cloudprovider requests the kubelet makes since, as we've found out, not all cloudprovider code is equal.

The real question is do you want to go ahead with this PR for now and do that as a follow-on or modify this PR. I'm on the fence about it.

@sjenning
Copy link
Contributor

After some thought, I am in favor of going forward with this fix and doing the refactor in a follow-on PR.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2018
@liggitt
Copy link
Member

liggitt commented Jun 26, 2018

/hold

data race question in https://github.com/kubernetes/kubernetes/pull/65226/files#r198143459 is unresolved

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2018
@ingvagabund ingvagabund changed the title Store the latest cloud provider node addresses WIP: Store the latest cloud provider node addresses Jun 26, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2018
@ingvagabund
Copy link
Contributor Author

/lgtm cancel

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 26, 2018
@ingvagabund ingvagabund force-pushed the store-cloud-provider-latest-node-addresses branch 5 times, most recently from 452a7d2 to 3203fa6 Compare June 26, 2018 17:08
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 3, 2018
@ingvagabund
Copy link
Contributor Author

@dims thank you very much :)

@ingvagabund
Copy link
Contributor Author

@vishh PTAL

Copy link
Contributor

@vishh vishh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR is decoupling updating node status from fetching updated node IPs.
However it is not clear why kubelet needs to keep watching for node addresses continually in the first place.
Also, if there were any production issues that this PR addresses, it would help describe them here in this PR or in a separate issue (preferred).

}

// NodeAddresses does not wait for cloud provider to return a node addresses.
// It always returns node addresses or an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment true? The logic below is blocking on cloud provider API call to succeed at least once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first call to the NodeAddresses can be blocking but the remaining ones are not. The assumption is that a node needs to register first before it starts periodically invoking the method. So it is negligible the first call hangs for some time. I can rephrase the statement to be more clear about the fact.

@ingvagabund
Copy link
Contributor Author

@vishh issue opened #65814

@ingvagabund
Copy link
Contributor Author

@Random-Liu @dchen1107 @yujuhong PTAL

@sjenning
Copy link
Contributor

sjenning commented Jul 5, 2018

@vishh I agree that it might be unnecessary to poll the cloudprovider for our addresses if we can guarantee that the information we can on the first NodeAddresses() call does not change for the life of the kubelet process (or the instance itself).

However, we didn't want to tackle that wider-ranging issue here. That is very cloudprovider specific and would require acks for all cloud providers saying "yes, our cloudprovider will not return changing return values for NodeAddresses() over the life of the kubelet".

This PR is to simply protect the kubelet from latency introduced by the cloudprovider on a NodeAddresses() call. In our case (Azure), this was a throttling mechanism which could cause the node status update loop to stall and the Node go NotReady.

@ingvagabund
Copy link
Contributor Author

/retest

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

this is an improvement over what exists prior.

i think its up to more debate if the cloud provider interface should be handling this for us by taking a timeout or something on each of its calls.

case <-collected:
return nodeAddresses, err
case <-time.Tick(2 * nodeAddressesRetryPeriod):
return nil, fmt.Errorf("Timeout after %v waiting for address to appear", 2*nodeAddressesRetryPeriod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Timeout/timeout

see the original code had this casing issue as well, doesnt report back directly to end user, so not a huge deal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this is a test, so even less of a deal. got confused in my review.

// Request timeout
cloudproviderRequestTimeout time.Duration
// Handles requests to cloud provider with timeout
cloudResourceSyncManager *cloudResourceSyncManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to have a broader discussion on if we want all interaction with cloud to go through this interface in the future, or if we want to change the cloud provider interface to accept a context w/ timeout on operations in the future so each caller can decide how to handle it across the code base. for now, this cleans up the existing member vars on kubelet so is a nice incremental improvement.

kubelet.cloudproviderRequestParallelism = make(chan int, 1)
kubelet.cloudproviderRequestSync = make(chan int)
kubelet.cloudproviderRequestTimeout = 10 * time.Second
kubelet.cloudResourceSyncManager = NewCloudResourceSyncManager(kubelet.cloud, kubelet.nodeName, kubelet.nodeStatusUpdateFrequency)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can update this in the future from 10s to 1m, 5m, 10m, etc.

an issue or // todo to track it would be good.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, dims, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2018
@ingvagabund
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit f704109 into kubernetes:master Jul 9, 2018
@ingvagabund ingvagabund deleted the store-cloud-provider-latest-node-addresses branch July 10, 2018 07:26
@k8s-ci-robot
Copy link
Contributor

@ingvagabund: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 9d9fb4d link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@bashofmann
Copy link

Would it make sense to backport this to 1.11 to fix #68270 ?

@alena1108
Copy link
Contributor

Having correct ip addresses set on the nodes is a plumbing for all subsequent operations on k8s. Would be really helpful to backport this fix to 1.11. We have a lot of users who run 1.11 who are currently facing this problem, and not planning to move to k8s 1.12 yet,

@openstacker
Copy link

I'm echoing @alena1108

This is a very critical fix and we have seen this on v1.11.2. Missing internal IP makes the API server can't talk to the worker node and it breaks everything. We also don't want to much v1.12 so fast because I'm afraid there will be new unknown issue.

So please consider to backport this to v1.11.x. Thank you.

@2rs2ts
Copy link
Contributor

2rs2ts commented Nov 1, 2018

We ran into this problem too. Our solution was to just pass the --node-ip flag to kubelet.

cheftako added a commit to cheftako/kubernetes that referenced this pull request Nov 22, 2018
k8s-ci-robot added a commit that referenced this pull request Dec 3, 2018
…#65226-upstream-release-1.11

Automated cherry pick of #65226: Put all the node address cloud provider retrival complex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have requests to cloud provider non-blocking