Inconsistent reset of changes to kubernetes endpoints #104252

RichardoC · 2021-08-09T21:15:40Z

What happened:

If you change some attributes of the kubernetes endpoint in the default namespace, they are not reset

Attributes that are reset (correctly)

IP address

Attributes that are not reset (incorrect)

Port
Port name
Protocol

If you change those attibutes of the kubernetes endpoint in the default namespace it is not reset to the correct value. This differs from the IP address which is reset rapidly.

What you expected to happen:

All attributes to be reset to the correct values.

Example of what I expect to happen

Here's an example of the IP being reset to 192.168.65.4 rather than being left at the wrong values (1.1.1.1)

$ kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/addresses/0/ip", "value":"1.1.1.1"}]' && sleep 60 && kubectl -n default get endpoints kubernetes -ojsonpath="{.subsets[0].addresses[0].ip}{'\n'}"
endpoints/kubernetes patched
192.168.65.4

How to reproduce it (as minimally and precisely as possible):

Example to patch the port to some nonsense that won't work

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/port", "value":443}]'

To reset to default

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/port", "value":6443}]'

Example to patch the port name

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/name", "value":"http"}]'

To reset to default

kubectl patch endpoints kubernetes -n default --type='json' -p='[{"op": "replace", "path": "/subsets/0/ports/0/name", "value":"https"}]'

Example to patch the protocol

kubectl patch endpoints kubernetes --type='json' -n default -p='[{"op": "replace", "path": "/subsets/0/ports/0/protocol", "value":"UDP"}]'

To reset to default

kubectl patch endpoints kubernetes --type='json' -n default -p='[{"op": "replace", "path": "/subsets/0/ports/0/protocol", "value":"TCP"}]'

Anything else we need to know?:

The resetting behaviour of the IP address prevents #97076 being used to MITM traffic to the kubernetes API, so that's good.

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration:
Kubernetes enabled on docker desktop on macos
OS (e.g: cat /etc/os-release):

Kernel (e.g. uname -a):
N/A
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

neolit123 · 2021-08-16T16:43:13Z

/sig network
/kind support
for triage

you might get more responses on:
http://git.k8s.io/kubernetes/SUPPORT.md

khenidak · 2021-08-19T19:52:05Z

/triage needs-information
/sig network
/assign @khenidak

@RichardoC was this an endpoint created automatically (by endpoint controller in response to service's selector) or a manually created endpoint?

thockin · 2021-08-19T20:32:42Z

I confirm I can repro this. From pkg/controlplane/reconcilers/instancecount.go:

// Requirements:
//  * All apiservers MUST use the same ports for their {rw, ro} services.
//  * All apiservers MUST use ReconcileEndpoints and only ReconcileEndpoints to manage the
//      endpoints for their {rw, ro} services.
//  * All apiservers MUST know and agree on the number of apiservers expected
//      to be running (c.masterCount).
//  * ReconcileEndpoints is called periodically from all apiservers.

That's not super helpful. It looks like that code INTENDS to fix ports, but it's not obvious at a glance why it doesn't. Oh. That's WAY up-stack in pkg/controlplane/controller.go:

        // Service definition is not reconciled after first     
        // run, ports and type will be corrected only during
        // start.   
        if err := c.UpdateKubernetesService(false); err != nil {

I don't immediately see why that can't be true (or frankly, why it is a param at all). Some archaeology may be needed, but this doesn't seem particularly hard.

/triage accepted

RichardoC · 2021-08-20T08:31:13Z

/triage needs-information
/sig network
/assign @khenidak

@RichardoC was this an endpoint created automatically (by endpoint controller in response to service's selector) or a manually created endpoint?

This was created automatically for a service (kubernetes service in default namespace for this example)

k8s-triage-robot · 2021-11-18T08:48:33Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

RichardoC · 2021-11-18T09:04:13Z

/remove-lifecycle stale

haoruan · 2022-01-13T03:34:12Z

@khenidak just wondering if this issue is still being worked on, I'd like to help

haoruan · 2022-01-26T02:19:59Z

/assign

aojea · 2022-01-26T09:21:29Z

/cc

aojea · 2022-01-28T08:37:04Z

Quoting from #107773 (comment)

This exhibits the same behavior as current endpoints generated by the endpoints controller, users with permission can modify endpoints (and they are not reconciled), if you have power permissions the system doesn't stop you to do "bad things" ( i.e. sudo rm -rf / ) smile

See related discussion here #98066 (comment)

Moreover, Jordan comment is spot on #107773 (comment) , right now there is the convention that all apiserver have the same port, and this change will make impossible to move to extra ports in the future.

In my opinion we should not modify this behavior

RichardoC · 2022-02-01T20:03:09Z

This exhibits the same behavior as current endpoints generated by the endpoints controller, users with permission can modify endpoints (and they are not reconciled), if you have power permissions the system doesn't stop you to do "bad things" ( i.e. sudo rm -rf / ) 😄

See related discussion here #98066 (comment)

Moreover, Jordan comment is spot on #107773 (comment) , right now there is the convention that all apiserver have the same port, and this change will make impossible to move to extra ports in the future.

In my opinion we should not modify this behavior

Thanks for chiming in, I take your point about the API server port and that k8s doesn't usually stop you doing something bad but I do think this is a case where I would expect the API server to reconcile this back to the "correct" state.

I'm sure there are alternative ways of fixing this that will enable heterogeneous endpoint ports?

aojea · 2022-02-01T22:22:35Z

@RichardoC apparently apiserver multiports are going away , so may comment maybe outdated :)

#107872

RichardoC · 2022-03-02T15:34:27Z

Thanks! Regardless of the port number conventions, k8s should be able to recover from this unfortunate scenario.

I hope this request makes sense?

k8s-triage-robot · 2022-05-31T16:27:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

RichardoC · 2022-05-31T18:33:18Z

/remove-lifecycle stale

k8s-triage-robot · 2022-08-29T18:35:24Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

RichardoC · 2022-08-29T21:00:51Z

/remove-lifecycle stale

k8s-triage-robot · 2023-02-08T01:20:55Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

RichardoC · 2023-02-08T09:05:11Z

/remove-lifecycle stale

k8s-triage-robot · 2024-02-08T09:12:43Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

RichardoC · 2024-02-12T11:15:23Z

/remove-lifecycle stale

RichardoC added the kind/bug Categorizes issue or PR as related to a bug. label Aug 9, 2021

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 9, 2021

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. kind/support Categorizes issue or PR as a support question. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 16, 2021

k8s-ci-robot assigned khenidak Aug 19, 2021

k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Aug 19, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 19, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2021

k8s-ci-robot assigned haoruan Jan 26, 2022

haoruan mentioned this issue Jan 26, 2022

Reset attributes of endpoints correctly #107773

Closed

aojea mentioned this issue Feb 1, 2022

Clean up some EndpointSlice-related code in apiserver endpoint reconciler #107878

Merged

rastislavs mentioned this issue Feb 23, 2022

Reconcile kubernetes service endpoints in KKP user-cluster-controller kubermatic/kubermatic#9047

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent reset of changes to kubernetes endpoints #104252

Inconsistent reset of changes to kubernetes endpoints #104252

RichardoC commented Aug 9, 2021

neolit123 commented Aug 16, 2021

khenidak commented Aug 19, 2021

thockin commented Aug 19, 2021

RichardoC commented Aug 20, 2021

k8s-triage-robot commented Nov 18, 2021

RichardoC commented Nov 18, 2021

haoruan commented Jan 13, 2022

haoruan commented Jan 26, 2022

aojea commented Jan 26, 2022

aojea commented Jan 28, 2022

RichardoC commented Feb 1, 2022

aojea commented Feb 1, 2022

RichardoC commented Mar 2, 2022

k8s-triage-robot commented May 31, 2022

RichardoC commented May 31, 2022

k8s-triage-robot commented Aug 29, 2022

RichardoC commented Aug 29, 2022

k8s-triage-robot commented Feb 8, 2023

RichardoC commented Feb 8, 2023

k8s-triage-robot commented Feb 8, 2024

RichardoC commented Feb 12, 2024

Inconsistent reset of changes to kubernetes endpoints #104252

Inconsistent reset of changes to kubernetes endpoints #104252

Comments

RichardoC commented Aug 9, 2021

What happened:

Attributes that are reset (correctly)

Attributes that are not reset (incorrect)

What you expected to happen:

Example of what I expect to happen

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

neolit123 commented Aug 16, 2021

khenidak commented Aug 19, 2021

thockin commented Aug 19, 2021

RichardoC commented Aug 20, 2021

k8s-triage-robot commented Nov 18, 2021

RichardoC commented Nov 18, 2021

haoruan commented Jan 13, 2022

haoruan commented Jan 26, 2022

aojea commented Jan 26, 2022

aojea commented Jan 28, 2022

RichardoC commented Feb 1, 2022

aojea commented Feb 1, 2022

RichardoC commented Mar 2, 2022

k8s-triage-robot commented May 31, 2022

RichardoC commented May 31, 2022

k8s-triage-robot commented Aug 29, 2022

RichardoC commented Aug 29, 2022

k8s-triage-robot commented Feb 8, 2023

RichardoC commented Feb 8, 2023

k8s-triage-robot commented Feb 8, 2024

RichardoC commented Feb 12, 2024