Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start synchronizing pods after network is ready. #68752

Merged
merged 1 commit into from
Sep 26, 2018

Conversation

krzysztof-jastrzebski
Copy link
Contributor

What this PR does / why we need it:
Start synchronizing pods after network is ready. If pod is synchronized before network is ready then it fails and kubelet tries to synchronize pod after 10 seconds.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #68751

Release note:

Start synchronizing pods after network is ready. 

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 17, 2018
@krzysztof-jastrzebski
Copy link
Contributor Author

/assign @yujuhong

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 17, 2018
@krzysztof-jastrzebski
Copy link
Contributor Author

/assign mwielgus

Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intentionally allow pods to start syncing because pods using host network should be able to run (and they often are required to set up the pod networks).
If you want to avoid hitting the backoff error, perhaps you'll need to add a check in the per-pod sync routine to skip the rest of the work (and not incur penalty) when the network is not ready and the pod does not use the host network.

@neolit123
Copy link
Member

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 17, 2018
@@ -1823,6 +1823,13 @@ func (kl *Kubelet) syncLoop(updates <-chan kubetypes.PodUpdate, handler SyncHand
duration = time.Duration(math.Min(float64(max), factor*float64(duration)))
continue
}
if ns := kl.runtimeState.networkErrors(); len(ns) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not combine with the above codes, like:
rs := kl.runtimeState.runtimeErrors()
rs = append(rs, kl.runtimeState.networkErrors())
if len(rs) != 0 {
......
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this lines.

@losipiuk
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 19, 2018
@@ -263,6 +267,9 @@ func (p *podWorkers) wrapUp(uid types.UID, syncErr error) {
case syncErr == nil:
// No error; requeue at the regular resync interval.
p.workQueue.Enqueue(uid, wait.Jitter(p.resyncInterval, workerResyncIntervalJitterFactor))
case strings.Contains(syncErr.Error(), "network is not ready"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare network is not ready a constant and use it in both places to avoid unnecessary breakage in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

workerBackOffPeriodJitterFactor = 0.5

// backoff period when network is not ready.
backOffOnNetworkNotReadyPeriod = time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should generalize this for other non-pod-specific issues, but I think it's okay to do it later when there are more use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 21, 2018
@krzysztof-jastrzebski
Copy link
Contributor Author

/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-e2e-gce-device-plugin-gpu

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2018
Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

A unit test will be good, but I didn't find any existing test for this specific part of the code. Given the small scope of this PR, I think it's okay to let it in.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: krzysztof-jastrzebski, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2018
@krzysztof-jastrzebski
Copy link
Contributor Author

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit 4404cda into kubernetes:master Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods are synchronized by kubelet before network is ready
7 participants