Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook ServiceReference is not resolved to cluster IP #72936

Closed
khogeland opened this issue Jan 15, 2019 · 5 comments
Closed

Webhook ServiceReference is not resolved to cluster IP #72936

khogeland opened this issue Jan 15, 2019 · 5 comments
Labels
area/admission-control kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@khogeland
Copy link

khogeland commented Jan 15, 2019

What happened: Creating a MutatingWebhookConfiguration with service set and caBundle unset caused the API server to send a request to myservice.default.svc, which fails to resolve, as the API server cannot talk to KubeDNS.

What you expected to happen: The API server should have resolved the service reference to the service's cluster IP, and used that in the HTTP request.

Workaround for anyone with the same issue: weirdly, setting caBundle causes the webhook to work - in this case it looks like the API server is properly resolving the service reference to a cluster IP and not attempting to use DNS. You can verify by pointing a webhook config to a non-existent service and observing the "Service not found" error. Happy accident?

How to reproduce it (as minimally and precisely as possible):
Using a k8s setup where the API server can't talk to KubeDNS (i.e. running on bare metal with no special resolv.conf):

  • Create a MutatingWebhookConfiguration with the caBundle field unset/empty and the service field set
  • Attempt to create a matching resource, observe DNS failure for <something>.<namespace>.svc

Anything else we need to know?:

As stated in WebhookClientConfig, the API server is not expected to be able to resolve in-cluster service names. However, the default (and only?) ServiceResolver implementation simply returns the in-cluster service hostname. This seems like the root issue; this implementation is bogus, there should instead be an implementation that resolves the ServiceReference into a cluster IP.

The fact that it works with the caBundle field looks coincidental/accidental, but is interesting. I couldn't figure out where the difference in behavior was coming from. 🙂

Environment:

  • Kubernetes version (use kubectl version): 1.9.7
@khogeland khogeland added the kind/bug Categorizes issue or PR as related to a bug. label Jan 15, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 15, 2019
@khogeland
Copy link
Author

@kubernetes/sig-api-machinery-bugs (I think)

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2019
@k8s-ci-robot
Copy link
Contributor

@khogeland: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-bugs

In response to this:

@kubernetes/sig-api-machinery-bugs (I think)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liggitt
Copy link
Member

liggitt commented Jan 16, 2019

is this reproducible with 1.11+?

https://github.com/liggitt/kubernetes/blob/master/staging/src/k8s.io/kube-aggregator/pkg/apiserver/resolvers.go#L49-L63 is the implementation that resolves to cluster IPs

@liggitt liggitt added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jan 16, 2019
@khogeland
Copy link
Author

khogeland commented Jan 16, 2019

That resolver is also present in 1.9, so I assume so, unless the API server is configured differently? I'll try to repro this on 1.13.

@khogeland
Copy link
Author

Cool, looks like this has been fixed since 1.9! I guess the wiring did change.

1.9.7

$ kubectl create -f pod.yaml
Error from server (InternalError): error when creating "pod.yaml": Internal error occurred: failed calling admission webhook "test.sfdc.net": Post https://test.default.svc:443/: dial tcp: lookup test.default.svc
on 192.168.122.1:53: no such host

1.13.2

$ kubectl create -f pod.yaml
Error from server (InternalError): error when creating "pod.yaml": Internal error occurred: failed calling webhook "test.sfdc.net": Post https://test.default.svc:443/?timeout=30s: service "test" not found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/admission-control kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants