Use full URL (with cluster domain name) in admission webhook #107446

antoineozenne · 2022-01-10T10:37:30Z

What happened?

My cluster is behind a corporate proxy. Using kube-prometheus-stack, the apiserver raises an error when calling the mutating/validating admission webhook with a POST on https://kube-prometheus-stack-operator.kube-prometheus-stack.svc:443/admission-prometheusrules/validate?timeout=10s. This is because the wildcard .svc is not in the NO_PROXY environment variable in the apiserver. Instead, all my NO_PROXY environment variable contain the full cluster domain .svc.cluster.local, because this is used in the large majority of cases.

What did you expect to happen?

Can the apiserver use the full cluster domain name when calling webhooks? In my case, the URL would then be: https://kube-prometheus-stack-operator.kube-prometheus-stack.svc.cluster.local:443/admission-prometheusrules/validate?timeout=10s.

How can we reproduce it (as minimally and precisely as possible)?

Just use an admission webhook in a cluster behind a corporate proxy.

Anything else we need to know?

I know as a workaround, we can use url instead of service according to https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#webhookclientconfig-v1-admissionregistration-k8s-io, but it's less user-friendly and less flexible.

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.11", GitCommit:"27522a29febbcc4badac257763044d0d90c11abd", GitTreeState:"clean", BuildDate:"2021-09-15T19:21:44Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.11", GitCommit:"27522a29febbcc4badac257763044d0d90c11abd", GitTreeState:"clean", BuildDate:"2021-09-15T19:16:25Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

antoineozenne · 2022-01-10T10:39:04Z

/sig api-machinery

aojea · 2022-01-10T12:20:23Z

hmm, I think that the problem is that the apiserver doesn't have access to the cluster.domain value, since it is configured on the kubelets only, but it will be good if somebody else can confirm this

--cluster-domain string

antoineozenne · 2022-01-10T14:42:17Z

I just saw this in https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#webhookclientconfig-v1-admissionregistration-k8s-io for url:

The host should not refer to a service running in the cluster; use the service field instead. The host might be resolved via external DNS in some apiservers (e.g., kube-apiserver cannot resolve in-cluster DNS as that would be a layering violation).

So it looks like it is not possible to use url in my case, unless maybe using an ExternalName to resolve the host, but this is ugly...

aojea · 2022-01-10T15:16:41Z

can you check your apiserver configuration for enable-aggregator-routing ?
I think in your case it should be false, because if is true I think the webhook will try to connect directly to the IP and don't do any dns resolution.
Can you confirm this?

antoineozenne · 2022-01-10T16:45:23Z

Yes, I confirm enable-aggregator-routing is not set, so false. Do I need to set this parameter? What are the other impacts/consequences?

aojea · 2022-01-10T16:53:19Z

@antoineozenne actually I can reproduce it.
A workaround is adding "svc" to NO_PROXY
I need to investigate more
/assign

aojea · 2022-01-10T17:58:44Z

This comment is just for reference:

apiserver initializes a resolver for use on the aggregated-apiserver and webhook

kubernetes/cmd/kube-apiserver/app/server.go

Line 455 in 8cc7d14

    
           serviceResolver = buildServiceResolver(s.EnableAggregatorRouting, genericConfig.LoopbackClientConfig.Host, versionedInformers)

the resolver, depending on the enable-aggregator-routing variable, can resolve a service to an endpoint or to the clusterIP. The kubernetes.default service is always resolved to the current host

kubernetes/cmd/kube-apiserver/app/server.go

Lines 627 to 644 in 8cc7d14

    
           func buildServiceResolver(enabledAggregatorRouting bool, hostname string, informer clientgoinformers.SharedInformerFactory) webhook.ServiceResolver { 
        
           	var serviceResolver webhook.ServiceResolver 
        
           	if enabledAggregatorRouting { 
        
           		serviceResolver = aggregatorapiserver.NewEndpointServiceResolver( 
        
           			informer.Core().V1().Services().Lister(), 
        
           			informer.Core().V1().Endpoints().Lister(), 
        
           		) 
        
           	} else { 
        
           		serviceResolver = aggregatorapiserver.NewClusterIPServiceResolver( 
        
           			informer.Core().V1().Services().Lister(), 
        
           		) 
        
           	} 
        
           	// resolve kubernetes.default.svc locally 
        
           	if localHost, err := url.Parse(hostname); err == nil { 
        
           		serviceResolver = aggregatorapiserver.NewLoopbackServiceResolver(serviceResolver, localHost) 
        
           	} 
        
           	return serviceResolver 
        
           }

webhooks initialize a client based on the configuration, and overrides the dialer with the custom resolver

kubernetes/staging/src/k8s.io/apiserver/pkg/util/webhook/client.go

Lines 161 to 200 in 9c4905e

    
           if cc.Service != nil { 
        
           	port := cc.Service.Port 
        
           	if port == 0 { 
        
           		// Default to port 443 if no service port is specified 
        
           		port = 443 
        
           	} 
        
           	restConfig, err := cm.authInfoResolver.ClientConfigForService(cc.Service.Name, cc.Service.Namespace, int(port)) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	cfg := rest.CopyConfig(restConfig) 
        
           	serverName := cc.Service.Name + "." + cc.Service.Namespace + ".svc" 
        
           	host := net.JoinHostPort(serverName, strconv.Itoa(int(port))) 
        
           	cfg.Host = "https://" + host 
        
           	cfg.APIPath = cc.Service.Path 
        
           	// Set the server name if not already set 
        
           	if len(cfg.TLSClientConfig.ServerName) == 0 { 
        
           		cfg.TLSClientConfig.ServerName = serverName 
        
           	} 
        
           	delegateDialer := cfg.Dial 
        
           	if delegateDialer == nil { 
        
           		var d net.Dialer 
        
           		delegateDialer = d.DialContext 
        
           	} 
        
           	cfg.Dial = func(ctx context.Context, network, addr string) (net.Conn, error) { 
        
           		if addr == host { 
        
           			u, err := cm.serviceResolver.ResolveEndpoint(cc.Service.Namespace, cc.Service.Name, port) 
        
           			if err != nil { 
        
           				return nil, err 
        
           			} 
        
           			addr = u.Host 
        
           		} 
        
           		return delegateDialer(ctx, network, addr) 
        
           	} 
        
           	return complete(cfg) 
        
           }

It seems (I have to confirm) golang uses the proxy on Transport, so the custom resolvers are never used because the Dialer receives the proxy URL

The aggregator, however, uses an IP on the URL host

kubernetes/staging/src/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go

Lines 144 to 152 in ea07644

    
           rloc, err := r.serviceResolver.ResolveEndpoint(handlingInfo.serviceNamespace, handlingInfo.serviceName, handlingInfo.servicePort) 
        
           if err != nil { 
        
           	klog.Errorf("error resolving %s/%s: %v", handlingInfo.serviceNamespace, handlingInfo.serviceName, err) 
        
           	proxyError(w, req, "service unavailable", http.StatusServiceUnavailable) 
        
           	return 
        
           } 
        
           location.Host = rloc.Host 
        
           location.Path = req.URL.Path 
        
           location.RawQuery = req.URL.Query().Encode()

fedebongio · 2022-01-11T21:22:51Z

thanks @aojea for taking a look!
/triage accepted

aojea · 2022-01-12T11:48:59Z

I have to narrow down this function

kubernetes/staging/src/k8s.io/apimachinery/pkg/util/net/http.go

Lines 112 to 114 in 45f77ca

    
           // http.ProxyFromEnvironment doesn't respect CIDRs and that makes it impossible to exclude things like pod and service IPs from proxy settings 
        
           // ProxierWithNoProxyCIDR allows CIDR rules in NO_PROXY 
        
           t.Proxy = NewProxierWithNoProxyCIDR(http.ProxyFromEnvironment)

red herring 😓

aojea · 2022-01-12T22:24:13Z

A workaround is adding "svc" to NO_PROXY

@antoineozenne there is a probability that removing the proxy from the webhooks no matter what can break some clusters
#107501 (comment)

is not possible to add svc to the NO_PROXY variable, NO_PROXY=cluster.local,svc

antoineozenne · 2022-01-13T09:36:57Z

@aojea thank you for your work! Yes, I have the possibility to add .svc to the NO_PROXY variable, but I would avoid that because in some environments (private providers), .svc is used in the infra...

If I understand, you proposed to bypass the proxy when using Services. It seems a bit much to me. Can we naively and simply use .svc.<cluster domain> (by providing it in the apiserver conf) to call webhooks?

k8s-triage-robot · 2022-05-24T23:37:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

antoineozenne · 2022-05-25T11:19:30Z

/remove-lifecycle stale

k8s-triage-robot · 2022-08-23T11:29:42Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

antoineozenne · 2022-08-24T22:31:09Z

/remove-lifecycle stale

k8s-triage-robot · 2022-11-22T22:50:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

antoineozenne · 2022-11-28T13:44:45Z

/remove-lifecycle stale

sftim · 2022-12-29T19:17:05Z

the apiserver doesn't have access to the cluster.domain value

This is what I believe too. Separately, it's a frustration because it also means that we can't provide the cluster domain name to a client via the API (because the API server doesn't have that information).

k8s-triage-robot · 2024-01-20T01:08:47Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

seans3 · 2024-02-09T05:36:49Z

/triage accepted

antoineozenne added the kind/bug Categorizes issue or PR as related to a bug. label Jan 10, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 10, 2022

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 10, 2022

k8s-ci-robot assigned aojea Jan 10, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 11, 2022

aojea mentioned this issue Jan 12, 2022

Don't proxy apiserver webhook requests when using Services #107501

Closed

liggitt added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 23, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 24, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 22, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2022

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 20, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use full URL (with cluster domain name) in admission webhook #107446

Use full URL (with cluster domain name) in admission webhook #107446

antoineozenne commented Jan 10, 2022 •

edited

antoineozenne commented Jan 10, 2022

aojea commented Jan 10, 2022

antoineozenne commented Jan 10, 2022

aojea commented Jan 10, 2022

antoineozenne commented Jan 10, 2022

aojea commented Jan 10, 2022

aojea commented Jan 10, 2022 •

edited

fedebongio commented Jan 11, 2022

aojea commented Jan 12, 2022 •

edited

aojea commented Jan 12, 2022

antoineozenne commented Jan 13, 2022

k8s-triage-robot commented May 24, 2022

antoineozenne commented May 25, 2022

k8s-triage-robot commented Aug 23, 2022

antoineozenne commented Aug 24, 2022

k8s-triage-robot commented Nov 22, 2022

antoineozenne commented Nov 28, 2022

sftim commented Dec 29, 2022

k8s-triage-robot commented Jan 20, 2024

seans3 commented Feb 9, 2024

Use full URL (with cluster domain name) in admission webhook #107446

Use full URL (with cluster domain name) in admission webhook #107446

Comments

antoineozenne commented Jan 10, 2022 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

antoineozenne commented Jan 10, 2022

aojea commented Jan 10, 2022

antoineozenne commented Jan 10, 2022

aojea commented Jan 10, 2022

antoineozenne commented Jan 10, 2022

aojea commented Jan 10, 2022

aojea commented Jan 10, 2022 • edited

fedebongio commented Jan 11, 2022

aojea commented Jan 12, 2022 • edited

aojea commented Jan 12, 2022

antoineozenne commented Jan 13, 2022

k8s-triage-robot commented May 24, 2022

antoineozenne commented May 25, 2022

k8s-triage-robot commented Aug 23, 2022

antoineozenne commented Aug 24, 2022

k8s-triage-robot commented Nov 22, 2022

antoineozenne commented Nov 28, 2022

sftim commented Dec 29, 2022

k8s-triage-robot commented Jan 20, 2024

seans3 commented Feb 9, 2024

antoineozenne commented Jan 10, 2022 •

edited

aojea commented Jan 10, 2022 •

edited

aojea commented Jan 12, 2022 •

edited