Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy: treat failure to bind to a port as fatal #89350

Merged
merged 1 commit into from
Apr 7, 2020

Conversation

SataQiu
Copy link
Member

@SataQiu SataQiu commented Mar 23, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
kube-proxy: treat failure to bind to a port as fatal

Which issue(s) this PR fixes:

Ref #89265

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kube-proxy: add `--bind-address-hard-fail` flag to treat failure to bind to a port as fatal

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 23, 2020
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 23, 2020
@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Mar 23, 2020
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Mar 23, 2020
@neolit123
Copy link
Member

/remove-area kubeadm

@aojea
Copy link
Member

aojea commented Mar 23, 2020

/retest
it looks good to me, but I'm not aware of the possible ramifications so I defer to the people more familiar with it.
/assign @danwinship @thockin

@@ -168,6 +168,7 @@ func (o *Options) AddFlags(fs *pflag.FlagSet) {
fs.Var(utilflag.IPVar{Val: &o.config.BindAddress}, "bind-address", "The IP address for the proxy server to serve on (set to '0.0.0.0' for all IPv4 interfaces and '::' for all IPv6 interfaces)")
fs.Var(utilflag.IPPortVar{Val: &o.config.HealthzBindAddress}, "healthz-bind-address", "The IP address with port for the health check server to serve on (set to '0.0.0.0:10256' for all IPv4 interfaces and '[::]:10256' for all IPv6 interfaces). Set empty to disable.")
fs.Var(utilflag.IPPortVar{Val: &o.config.MetricsBindAddress}, "metrics-bind-address", "The IP address with port for the metrics server to serve on (set to '0.0.0.0:10249' for all IPv4 interfaces and '[::]:10249' for all IPv6 interfaces). Set empty to disable.")
fs.BoolVar(&o.config.MetricsBindHardFail, "metrics-bind-hard-fail", o.config.MetricsBindHardFail, "If true kube-proxy will treat metrics bind failure as fatal and exit")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cover both metrics and healthz? E.g. --bind-address-hard-fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @thockin ! I'll try to refactor my code.

cmd/kube-proxy/app/server.go Outdated Show resolved Hide resolved
@@ -595,6 +597,7 @@ func (s *ProxyServer) Run() error {
s.Broadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: s.EventClient.Events("")})
}

errCh := make(chan error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments? This seems out of place here, but if we handle both healthz and metrics, it makes more sense.

@SataQiu SataQiu changed the title kube-proxy: treat metrics bind failure as fatal kube-proxy: treat failure to bind to a port as fatal Mar 24, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Mar 25, 2020

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 25, 2020
@SataQiu SataQiu requested a review from thockin March 26, 2020 02:33
// Start up a healthz server if requested
if s.HealthzServer != nil {
s.HealthzServer.Run()
s.HealthzServer.Run(errCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a "pre-factoring", what would yo think about (in a commit of its own) making Run() be synchronous (take out the go inside it) and making callers call it like:

go func() {
    err := s.HealthzServer.Run()
    if err != nil {
        errCh <- err
    }
}

Then when you add the hard fail, that just becomes a flag to Run(), instead of a constructor flag.

I can do this prefactoring if you prefer - this code is itching for some cleanups, but I wonder if I am clear in my goals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a starting example: https://github.com/kubernetes/kubernetes/compare/master...thockin:proxy-cleanup?expand=1

I'd like to see the same for metrics (moved to a function as much as possible).

Then error handling becomes passing errCh to those (or nil for not reporting errors)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @thockin !
Please feel free to do the prefactoring.
If you don't have the time, I can also pick it up (thank you for your starting example).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@SataQiu SataQiu Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased on #89654

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#89654 merged.

Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given E2Es passed (with known test failures) and we've validated some of the changes in moby/ipvs (thanks @rikatz) I think this is good to merge. Prefer merging early in the cycle to leave more time for extra validation/testing.

Sorry commented in the wrong PR (was for #89116)

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 30, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Apr 1, 2020

/test pull-kubernetes-node-e2e

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

// For historical reasons we do not abort on errors here. We may
// change that in the future.
klog.Errorf("healthz server failed: %v", err)
klog.Errorf("starting healthz server failed: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an error starting, it's an error. If the HTTP server starts and later fails, this path is hit. The original error string is correct

// change that in the future.
klog.Errorf("healthz server failed: %v", err)
klog.Errorf("starting healthz server failed: %v", err)
if hardFail {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just test if errCh is nil or not - one less flag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

// For historical reasons we do not abort on errors here. We may
// change that in the future.
utilruntime.HandleError(fmt.Errorf("starting metrics server failed: %v", err))
err = fmt.Errorf("starting metrics server failed: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as above

…ind to a port as fatal

Signed-off-by: SataQiu <1527062125@qq.com>
@SataQiu
Copy link
Member Author

SataQiu commented Apr 2, 2020

/test pull-kubernetes-integration
/test pull-kubernetes-e2e-kind

@SataQiu
Copy link
Member Author

SataQiu commented Apr 5, 2020

kindly ping @thockin
need any other modifications?

@SataQiu SataQiu requested a review from thockin April 5, 2020 01:54
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SataQiu, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 6, 2020
@k8s-ci-robot k8s-ci-robot merged commit cabf5d1 into kubernetes:master Apr 7, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Apr 7, 2020
Bregor added a commit to evilmartians/chef-kubernetes that referenced this pull request Sep 10, 2020
…d to a port as fatal

kubernetes/kubernetes#89350

Signed-off-by: Maxim Filatov <pipopolam@gmail.com>
Bregor added a commit to evilmartians/chef-kubernetes that referenced this pull request Sep 14, 2020
…d to a port as fatal

kubernetes/kubernetes#89350

Signed-off-by: Maxim Filatov <pipopolam@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants