Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix 104329: check for headless before trying to release the ClusterIPs #104467

Merged
merged 2 commits into from Sep 2, 2021

Conversation

khenidak
Copy link
Contributor

@khenidak khenidak commented Aug 19, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

fixes #104329 where the api-server tries to release ClusterIPs for a headless service when it gets converted to ExternalName

Which issue(s) this PR fixes:

Fixes #104329

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 19, 2021
@k8s-ci-robot
Copy link
Contributor

@khenidak: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 19, 2021
@khenidak
Copy link
Contributor Author

/assign @thockin

@aojea
Copy link
Member

aojea commented Aug 20, 2021

and the test?? 😄

@khenidak
Copy link
Contributor Author

Tests don't fail without the fix. This happens only on kubectl apply. I am still looking for ways to test this.

@aojea
Copy link
Member

aojea commented Aug 20, 2021

Tests don't fail without the fix. This happens only on kubectl apply. I am still looking for ways to test this.

integration covers the whole dance, you should be able to reproduce it with an integration test c97f1b2 (it is an example, that test is just a PoC)

@khenidak
Copy link
Contributor Author

Tests don't fail without the fix. This happens only on kubectl apply. I am still looking for ways to test this.

integration covers the whole dance, you should be able to reproduce it with an integration test c97f1b2 (it is an example, that test is just a PoC)

as i was writing my earlier message i realized exactly that. Thanks for the confirmation. Will execute accordingly.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 20, 2021
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 20, 2021
@thockin
Copy link
Member

thockin commented Aug 21, 2021

I don't get why the unit test do not catch it?

Edit: Tracing thru my dev branch, we pass "None" all the way down to the allocator which calls ParseIP and gets nil, which it determines is not in the range and ... returns nil. ???

This should have failed 3 times before it got here. Not sure what the implication of returning an error here is, but it seems right.

@thockin
Copy link
Member

thockin commented Aug 21, 2021

I still don't follow where the internal error is coming from?

@thockin
Copy link
Member

thockin commented Aug 21, 2021

The fix looks OK, but I want to know why it's failing (and maybe we should barf when ParseIP() returns nil in releaseClusterIPs? That is an internal error.

@khenidak
Copy link
Contributor Author

This is the error:
For servies converting from headless to ExternalName we blindly attempt to generate a list of ClusterIPs to release. This causes an error since IPFamilies may not be set for headless. I am still looking into why this was not caught by unit testing.

E0823 16:20:06.317656       1 runtime.go:76] Observed a panic: runtime error: index out of range [1] with length 1
goroutine 17290 [running]:
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.finishRequest.func1.1(0xc00f33d0e0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:250 +0xdf
panic(0x492f440, 0xc00e5218d8)
        /usr/local/go/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/registry/core/service/storage.(*REST).handleClusterIPsForUpdatedService(0xc000719600, 0xc00ff30000, 0xc00f754a00, 0x0, 0x5511890, 0xc00f754a00, 0x5511890)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/registry/core/service/storage/rest.go:756 +0x934
k8s.io/kubernetes/pkg/registry/core/service/storage.(*REST).Update(0xc000719600, 0x556acb8, 0xc00a5db500, 0xc00dc97224, 0xa, 0x55178a8, 0xc00a5db5f0, 0xc005eaafa0, 0xc005b83a40, 0x7f061a8e2c00, ...)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/registry/core/service/storage/rest.go:469 +0xee9
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*patcher).patchResource.func2(0xc00f763e60, 0xc00f763dd0, 0xc006c9d400, 0xc00f33cd20)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/patch.go:589 +0x106
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*patcher).patchResource.func3(0xc008241400, 0x7f061a8e2cd0, 0xc012426a60, 0xc006c9d200)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/patch.go:594 +0x4f
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.finishRequest.func1(0xc00f33d0e0, 0xc01158b8a0, 0xc00f33d080, 0xc00f33d020)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:258 +0x62
created by k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.finishRequest
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:239 +0xd4

@thockin
Copy link
Member

thockin commented Sep 2, 2021

Did we figure out why the test doesn't catch it?

@khenidak
Copy link
Contributor Author

khenidak commented Sep 2, 2021

there was no test covering that particular update/type conversion here: https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/registry/core/service/storage/rest_test.go

With this PR there is an integration test for that. Once this: #104691 merges i will go over it again and move integration test to unit test.

@khenidak khenidak changed the title [DO NOT MERGE ] fix 104329: check for headless before trying to release the ClusterIPs fix 104329: check for headless before trying to release the ClusterIPs Sep 2, 2021
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 2, 2021
@thockin
Copy link
Member

thockin commented Sep 2, 2021

OK. I do think it should be catchable as a unit-test, and if not I am really unsure why.

@thockin
Copy link
Member

thockin commented Sep 2, 2021

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khenidak, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saschagrunert
Copy link
Member

I assume it's not required to cherry-pick into release-1.19, right?

k8s-ci-robot added a commit that referenced this pull request Oct 6, 2021
…4467-upstream-release-1.21

Automated cherry pick of #104467: fix 104329: check for headless before trying to release
@aojea
Copy link
Member

aojea commented Oct 6, 2021

I assume it's not required to cherry-pick into release-1.19, right?

IIRC ClusterIPs was introduced in 1.20, so no need

k8s-ci-robot added a commit that referenced this pull request Oct 6, 2021
…4467-upstream-release-1.20

Automated cherry pick of #104467: fix 104329: check for headless before trying to release
k8s-ci-robot added a commit that referenced this pull request Oct 6, 2021
…4467-upstream-release-1.22

Automated cherry pick of #104467: fix 104329: check for headless before trying to release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INTERNAL_ERROR when patching a Service
5 participants