Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent reset of changes to kubernetes endpoints #104252

Open
RichardoC opened this issue Aug 9, 2021 · 21 comments
Open

Inconsistent reset of changes to kubernetes endpoints #104252

RichardoC opened this issue Aug 9, 2021 · 21 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@RichardoC
Copy link

What happened:

If you change some attributes of the kubernetes endpoint in the default namespace, they are not reset

Attributes that are reset (correctly)
  • IP address
Attributes that are not reset (incorrect)
  • Port
  • Port name
  • Protocol

If you change those attibutes of the kubernetes endpoint in the default namespace it is not reset to the correct value. This differs from the IP address which is reset rapidly.

What you expected to happen:

All attributes to be reset to the correct values.

Example of what I expect to happen

Here's an example of the IP being reset to 192.168.65.4 rather than being left at the wrong values (1.1.1.1)

$ kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/addresses/0/ip", "value":"1.1.1.1"}]' && sleep 60 && kubectl -n default get endpoints kubernetes -ojsonpath="{.subsets[0].addresses[0].ip}{'\n'}"
endpoints/kubernetes patched
192.168.65.4

How to reproduce it (as minimally and precisely as possible):

Example to patch the port to some nonsense that won't work

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/port", "value":443}]'

To reset to default

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/port", "value":6443}]'

Example to patch the port name

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/name", "value":"http"}]'

To reset to default

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/name", "value":"https"}]'

Example to patch the protocol

kubectl patch endpoints kubernetes --type='json' -n default -p='[{"op": "replace", "path": "/subsets/0/ports/0/protocol", "value":"UDP"}]'

To reset to default

kubectl patch endpoints kubernetes --type='json' -n default -p='[{"op": "replace", "path": "/subsets/0/ports/0/protocol", "value":"TCP"}]'

Anything else we need to know?:

The resetting behaviour of the IP address prevents #97076 being used to MITM traffic to the kubernetes API, so that's good.

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Kubernetes enabled on docker desktop on macos
  • OS (e.g: cat /etc/os-release):

image

  • Kernel (e.g. uname -a):
    N/A
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@RichardoC RichardoC added the kind/bug Categorizes issue or PR as related to a bug. label Aug 9, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 9, 2021
@neolit123
Copy link
Member

/sig network
/kind support
for triage

you might get more responses on:
http://git.k8s.io/kubernetes/SUPPORT.md

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. kind/support Categorizes issue or PR as a support question. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 16, 2021
@khenidak
Copy link
Contributor

/triage needs-information
/sig network
/assign @khenidak

@RichardoC was this an endpoint created automatically (by endpoint controller in response to service's selector) or a manually created endpoint?

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Aug 19, 2021
@thockin
Copy link
Member

thockin commented Aug 19, 2021

I confirm I can repro this. From pkg/controlplane/reconcilers/instancecount.go:

// Requirements:
//  * All apiservers MUST use the same ports for their {rw, ro} services.
//  * All apiservers MUST use ReconcileEndpoints and only ReconcileEndpoints to manage the
//      endpoints for their {rw, ro} services.
//  * All apiservers MUST know and agree on the number of apiservers expected
//      to be running (c.masterCount).
//  * ReconcileEndpoints is called periodically from all apiservers.

That's not super helpful. It looks like that code INTENDS to fix ports, but it's not obvious at a glance why it doesn't. Oh. That's WAY up-stack in pkg/controlplane/controller.go:

        // Service definition is not reconciled after first     
        // run, ports and type will be corrected only during
        // start.   
        if err := c.UpdateKubernetesService(false); err != nil {

I don't immediately see why that can't be true (or frankly, why it is a param at all). Some archaeology may be needed, but this doesn't seem particularly hard.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 19, 2021
@thockin thockin added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. labels Aug 19, 2021
@RichardoC
Copy link
Author

/triage needs-information
/sig network
/assign @khenidak

@RichardoC was this an endpoint created automatically (by endpoint controller in response to service's selector) or a manually created endpoint?

This was created automatically for a service (kubernetes service in default namespace for this example)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2021
@RichardoC
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2021
@haoruan
Copy link
Contributor

haoruan commented Jan 13, 2022

@khenidak just wondering if this issue is still being worked on, I'd like to help

@haoruan
Copy link
Contributor

haoruan commented Jan 26, 2022

/assign

@aojea
Copy link
Member

aojea commented Jan 26, 2022

/cc

@aojea
Copy link
Member

aojea commented Jan 28, 2022

Quoting from #107773 (comment)

This exhibits the same behavior as current endpoints generated by the endpoints controller, users with permission can modify endpoints (and they are not reconciled), if you have power permissions the system doesn't stop you to do "bad things" ( i.e. sudo rm -rf / ) smile

See related discussion here #98066 (comment)

Moreover, Jordan comment is spot on #107773 (comment) , right now there is the convention that all apiserver have the same port, and this change will make impossible to move to extra ports in the future.

In my opinion we should not modify this behavior

@RichardoC
Copy link
Author

This exhibits the same behavior as current endpoints generated by the endpoints controller, users with permission can modify endpoints (and they are not reconciled), if you have power permissions the system doesn't stop you to do "bad things" ( i.e. sudo rm -rf / ) 😄

See related discussion here #98066 (comment)

Moreover, Jordan comment is spot on #107773 (comment) , right now there is the convention that all apiserver have the same port, and this change will make impossible to move to extra ports in the future.

In my opinion we should not modify this behavior

Thanks for chiming in, I take your point about the API server port and that k8s doesn't usually stop you doing something bad but I do think this is a case where I would expect the API server to reconcile this back to the "correct" state.

I'm sure there are alternative ways of fixing this that will enable heterogeneous endpoint ports?

@aojea
Copy link
Member

aojea commented Feb 1, 2022

@RichardoC apparently apiserver multiports are going away , so may comment maybe outdated :)

#107872

@RichardoC
Copy link
Author

Thanks! Regardless of the port number conventions, k8s should be able to recover from this unfortunate scenario.

I hope this request makes sense?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2022
@RichardoC
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2022
@RichardoC
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@RichardoC
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 8, 2024
@RichardoC
Copy link
Author

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants