client.New() takes more than 10 sec when far from apiserver (apiutil.NewDiscoveryRESTMapper) #537

guymguym · 2019-07-24T20:41:15Z

I am developing and running the https://github.com/noobaa/noobaa-operator as a CLI tool and remote operator for testing.

When I work with a local cluster like minikube, I don't notice anything, but when I am working to a cluster across the Atlantic, it suddenly takes long time just to initialize the client.

I wrote a short program that shows exactly where the problem is:

package main

import (
	"log"

	"k8s.io/apimachinery/pkg/api/meta"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/client-go/discovery"
	"k8s.io/client-go/rest"
	"k8s.io/client-go/restmapper"
	"sigs.k8s.io/controller-runtime/pkg/client/config"
)

func main() {
	log.Printf("Calling config.GetConfigOrDie() ...")
	config := config.GetConfigOrDie()
	log.Printf("Calling apiutil.NewDiscoveryRESTMapper() ...")
	NewDiscoveryRESTMapper(config)
	log.Printf("Done.")
}

func NewDiscoveryRESTMapper(c *rest.Config) (meta.RESTMapper, error) {
	// Get a mapper
	log.Printf("Calling discovery.NewDiscoveryClientForConfigOrDie() ...")
	dc := discovery.NewDiscoveryClientForConfigOrDie(c)
	log.Printf("Calling restmapper.GetAPIGroupResources() ...")
	gr, err := GetAPIGroupResources(dc)
	if err != nil {
		return nil, err
	}
	log.Printf("Calling restmapper.NewDiscoveryRESTMapper() ...")
	return restmapper.NewDiscoveryRESTMapper(gr), nil
}

// GetAPIGroupResources uses the provided discovery client to gather
// discovery information and populate a slice of APIGroupResources.
func GetAPIGroupResources(cl discovery.DiscoveryInterface) ([]*restmapper.APIGroupResources, error) {
	log.Printf("Calling cl.ServerGroups() ...")
	apiGroups, err := cl.ServerGroups()
	if err != nil {
		if apiGroups == nil || len(apiGroups.Groups) == 0 {
			return nil, err
		}
		// TODO track the errors and update callers to handle partial errors.
	}
	var result []*restmapper.APIGroupResources
	for _, group := range apiGroups.Groups {
		groupResources := &restmapper.APIGroupResources{
			Group:              group,
			VersionedResources: make(map[string][]metav1.APIResource),
		}
		for _, version := range group.Versions {
			log.Printf("Calling cl.ServerResourcesForGroupVersion(%s) ...", version.GroupVersion)
			resources, err := cl.ServerResourcesForGroupVersion(version.GroupVersion)
			if err != nil {
				// continue as best we can
				// TODO track the errors and update callers to handle partial errors.
				if resources == nil || len(resources.APIResources) == 0 {
					continue
				}
			}
			groupResources.VersionedResources[version.Version] = resources.APIResources
		}
		result = append(result, groupResources)
	}
	return result, nil
}

Any this is the output running it on a far away cluster - notice the jump of seconds on the log print times -

$ go run a.go
2019/07/24 23:27:21 Calling config.GetConfigOrDie() ...
2019/07/24 23:27:21 Calling apiutil.NewDiscoveryRESTMapper() ...
2019/07/24 23:27:21 Calling discovery.NewDiscoveryClientForConfigOrDie() ...
2019/07/24 23:27:21 Calling restmapper.GetAPIGroupResources() ...
2019/07/24 23:27:21 Calling cl.ServerGroups() ...
2019/07/24 23:27:22 Calling cl.ServerResourcesForGroupVersion(v1) ...
2019/07/24 23:27:22 Calling cl.ServerResourcesForGroupVersion(apiregistration.k8s.io/v1) ...
2019/07/24 23:27:22 Calling cl.ServerResourcesForGroupVersion(apiregistration.k8s.io/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(extensions/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(apps/v1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(apps/v1beta2) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(apps/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(events.k8s.io/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(authentication.k8s.io/v1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(authentication.k8s.io/v1beta1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(authorization.k8s.io/v1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(authorization.k8s.io/v1beta1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(autoscaling/v1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(autoscaling/v2beta1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(autoscaling/v2beta2) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(batch/v1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(batch/v1beta1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(certificates.k8s.io/v1beta1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(networking.k8s.io/v1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(policy/v1beta1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(rbac.authorization.k8s.io/v1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(rbac.authorization.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(storage.k8s.io/v1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(storage.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(admissionregistration.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(apiextensions.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(scheduling.k8s.io/v1beta1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(coordination.k8s.io/v1beta1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(apps.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(authorization.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(build.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(image.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(oauth.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(project.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(quota.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(route.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(security.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(template.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(user.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(packages.operators.coreos.com/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(config.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(operator.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(autoscaling.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(autoscaling.openshift.io/v1beta1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(ceph.rook.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(cloudcredential.openshift.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(imageregistry.operator.openshift.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(k8s.cni.cncf.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(machineconfiguration.openshift.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(monitoring.coreos.com/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(network.openshift.io/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(operators.coreos.com/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(operators.coreos.com/v1alpha2) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(operators.coreos.com/v1alpha1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(samples.operator.openshift.io/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(tuned.openshift.io/v1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(velero.io/v1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(healthchecking.openshift.io/v1alpha1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(noobaa.io/v1alpha1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(rook.io/v1alpha2) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(machine.openshift.io/v1beta1) ...
2019/07/24 23:27:33 Calling cl.ServerResourcesForGroupVersion(metrics.k8s.io/v1beta1) ...
2019/07/24 23:27:33 Calling restmapper.NewDiscoveryRESTMapper() ...
2019/07/24 23:27:33 Done.

So it's pretty clear that with high latency network to the apiserver this loop over api groups and versions is really wasteful. we can do much better if we just run these in parallel.

I can suggest a PR to process this list with go routine workers if it sounds good to you.

Also perhaps a variant of apiutil.NewDiscoveryRESTMapper() that takes a filter function that can reduce the number of group-versions that will be read from the server.

Perhaps there's a better way to avoid all this network-chatty initialization?

Thanks!

The text was updated successfully, but these errors were encountered:

guymguym · 2019-07-24T20:47:04Z

Maybe a fix can also take into account #321 because they are around the same code.
I also encountered this issue that creating a CRD does not reflect in the RESTMapper.

guymguym · 2019-07-24T22:18:08Z

Here is a function I used that in my case reduces the time from above 10 seconds to 2:

// NewFastDiscoveryRESTMapper loads the mapper data from the server with filter and concurrency
// See https://github.com/kubernetes-sigs/controller-runtime/issues/537
func NewFastDiscoveryRESTMapper(config *rest.Config, filter func(*metav1.APIGroup) bool) meta.RESTMapper {
	dc := discovery.NewDiscoveryClientForConfigOrDie(config)
	groups, err := dc.ServerGroups()
	Fatal(err)
	wg := wait.Group{}
	totalCount := 0
	pickedCount := 0
	var grs []*restmapper.APIGroupResources
	for _, group := range groups.Groups {
		pick := filter(&group)
		logrus.Debugf("Group: %s %v", group.Name, pick)
		totalCount++
		if !pick {
			continue
		}
		pickedCount++
		gr := &restmapper.APIGroupResources{
			Group:              group,
			VersionedResources: make(map[string][]metav1.APIResource),
		}
		grs = append(grs, gr)
		wg.Start(func() { discoverGroupResources(dc, gr) })
	}
	wg.Wait()
	logrus.Debugf("Picked %d/%d", pickedCount, totalCount)
	return restmapper.NewDiscoveryRESTMapper(grs)
}

func NewFastLazyDiscoveryRESTMapper(config *rest.Config, filter func(*metav1.APIGroup) bool) meta.RESTMapper {
	return meta.NewLazyRESTMapperLoader(func() (meta.RESTMapper, error) {
		return NewFastDiscoveryRESTMapper(config, discoveryGroupFilter), nil
	})
}

func discoverGroupResources(dc discovery.DiscoveryInterface, gr *restmapper.APIGroupResources) {
	for _, version := range gr.Group.Versions {
		resources, err := dc.ServerResourcesForGroupVersion(version.GroupVersion)
		Fatal(err)
		gr.VersionedResources[version.Version] = resources.APIResources
	}
}

DirectXMan12 · 2019-08-06T01:03:25Z

yes, parallel discovery, lazy discovery, and dynamic reloading of discovery would be useful. PRs definitely welcome, but be prepared for some design discussion.

I'd prefer lazy discovery to explicit filters, though.

nicolaferraro · 2019-09-09T23:18:27Z

@guymguym your approach really makes the difference. Thanks!

It would be awesome to have that in the standard controller.

This adds concurrency and optional groups filtering It also dynamically handles the case of new CRDs by running re-discover on NoMatchError

guymguym · 2019-09-10T07:13:35Z

@nicolaferraro I opened a PR - lets see if it looks OK

This adds concurrency and optional groups filtering It also dynamically handles the case of new CRDs by running re-discover on NoMatchError

Adds a discovery with concurrency and optional groups filtering. It also dynamically handles the case of new CRDs, by running re-discover on NoMatchError.

fejta-bot · 2019-12-09T07:22:58Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Adds a discovery with concurrency and optional groups filtering. It also dynamically handles the case of new CRDs, by running re-discover on NoMatchError.

Adds a DynamicRESTMapperOption for discovery with groups filtering. Perhaps we can push an optional filter to discovery.DiscoveryClient which will simplify the solution.

fejta-bot · 2020-01-08T08:08:51Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

vincepri · 2020-01-22T21:22:09Z

/lifecycle frozen
/assign @guymguym

use gcr auth proxy image

vincepri · 2020-02-21T17:12:13Z

/priority important-longterm

Adds a DynamicRESTMapperOption for discovery with groups filtering. Perhaps we can push an optional filter to discovery.DiscoveryClient which will simplify the solution. Signed-off-by: Guy Margalit <guymguym@gmail.com>

…for NewDynamicRESTMapper() This new option provides a custom filter that is used to select only the needed api groups that the controller really needs, instead of the default discover client that will discover all the groups on the server. The motivation is since controllers typically use a fixed small number of api groups, while servers have many dynamic api groups, and in such cases if the controller has some non-zero latency to the api server, or the server is loaded, the full groups discovery can take long time (10+ seconds). The filter can reduce the number of groups drastically. We might want to consider contributing the filter option to client-go/discovery client. Signed-off-by: Guy Margalit <guymguym@gmail.com>

vincepri · 2023-05-04T20:26:20Z

In 0.15 a new dynamic and lazy RESTMapper implementation has been merged and tested, I'll go ahead and close this issue for now; if folks have more feedback or can have a reproducible test case we can have in CI that would be great to further improve our mapper.

/close

k8s-ci-robot · 2023-05-04T20:26:24Z

@vincepri: Closing this issue.

In response to this:

In 0.15 a new dynamic and lazy RESTMapper implementation has been merged and tested, I'll go ahead and close this issue for now; if folks have more feedback or can have a reproducible test case we can have in CI that would be great to further improve our mapper.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

guymguym mentioned this issue Sep 10, 2019

🐛 Fix #537 - Add apiutil option WithDiscoveryFilter() for NewDynamicRESTMapper() #592

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 9, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 8, 2020

k8s-ci-robot assigned guymguym Jan 22, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jan 22, 2020

DirectXMan12 pushed a commit that referenced this issue Jan 31, 2020

Merge pull request #537 from mengqiy/usegcr

63bd360

use gcr auth proxy image

vincepri added this to the v0.5.x milestone Feb 21, 2020

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 21, 2020

guymguym mentioned this issue Sep 24, 2020

Support custom filters for discovery client to allow remote client loading less api groups kubernetes/client-go#870

Closed

negz mentioned this issue Oct 29, 2021

Don't rate limit discovery clients by default? #1707

Closed

k8s-ci-robot closed this as completed May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client.New() takes more than 10 sec when far from apiserver (apiutil.NewDiscoveryRESTMapper) #537

client.New() takes more than 10 sec when far from apiserver (apiutil.NewDiscoveryRESTMapper) #537

guymguym commented Jul 24, 2019

guymguym commented Jul 24, 2019

guymguym commented Jul 24, 2019

DirectXMan12 commented Aug 6, 2019

nicolaferraro commented Sep 9, 2019

guymguym commented Sep 10, 2019

fejta-bot commented Dec 9, 2019

fejta-bot commented Jan 8, 2020

vincepri commented Jan 22, 2020

vincepri commented Feb 21, 2020

vincepri commented May 4, 2023

k8s-ci-robot commented May 4, 2023

client.New() takes more than 10 sec when far from apiserver (apiutil.NewDiscoveryRESTMapper) #537

client.New() takes more than 10 sec when far from apiserver (apiutil.NewDiscoveryRESTMapper) #537

Comments

guymguym commented Jul 24, 2019

guymguym commented Jul 24, 2019

guymguym commented Jul 24, 2019

DirectXMan12 commented Aug 6, 2019

nicolaferraro commented Sep 9, 2019

guymguym commented Sep 10, 2019

fejta-bot commented Dec 9, 2019

fejta-bot commented Jan 8, 2020

vincepri commented Jan 22, 2020

vincepri commented Feb 21, 2020

vincepri commented May 4, 2023

k8s-ci-robot commented May 4, 2023