Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client.New() takes more than 10 sec when far from apiserver (apiutil.NewDiscoveryRESTMapper) #537

Closed
guymguym opened this issue Jul 24, 2019 · 11 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@guymguym
Copy link

I am developing and running the https://github.com/noobaa/noobaa-operator as a CLI tool and remote operator for testing.

When I work with a local cluster like minikube, I don't notice anything, but when I am working to a cluster across the Atlantic, it suddenly takes long time just to initialize the client.

I wrote a short program that shows exactly where the problem is:

package main

import (
	"log"

	"k8s.io/apimachinery/pkg/api/meta"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/client-go/discovery"
	"k8s.io/client-go/rest"
	"k8s.io/client-go/restmapper"
	"sigs.k8s.io/controller-runtime/pkg/client/config"
)

func main() {
	log.Printf("Calling config.GetConfigOrDie() ...")
	config := config.GetConfigOrDie()
	log.Printf("Calling apiutil.NewDiscoveryRESTMapper() ...")
	NewDiscoveryRESTMapper(config)
	log.Printf("Done.")
}

func NewDiscoveryRESTMapper(c *rest.Config) (meta.RESTMapper, error) {
	// Get a mapper
	log.Printf("Calling discovery.NewDiscoveryClientForConfigOrDie() ...")
	dc := discovery.NewDiscoveryClientForConfigOrDie(c)
	log.Printf("Calling restmapper.GetAPIGroupResources() ...")
	gr, err := GetAPIGroupResources(dc)
	if err != nil {
		return nil, err
	}
	log.Printf("Calling restmapper.NewDiscoveryRESTMapper() ...")
	return restmapper.NewDiscoveryRESTMapper(gr), nil
}

// GetAPIGroupResources uses the provided discovery client to gather
// discovery information and populate a slice of APIGroupResources.
func GetAPIGroupResources(cl discovery.DiscoveryInterface) ([]*restmapper.APIGroupResources, error) {
	log.Printf("Calling cl.ServerGroups() ...")
	apiGroups, err := cl.ServerGroups()
	if err != nil {
		if apiGroups == nil || len(apiGroups.Groups) == 0 {
			return nil, err
		}
		// TODO track the errors and update callers to handle partial errors.
	}
	var result []*restmapper.APIGroupResources
	for _, group := range apiGroups.Groups {
		groupResources := &restmapper.APIGroupResources{
			Group:              group,
			VersionedResources: make(map[string][]metav1.APIResource),
		}
		for _, version := range group.Versions {
			log.Printf("Calling cl.ServerResourcesForGroupVersion(%s) ...", version.GroupVersion)
			resources, err := cl.ServerResourcesForGroupVersion(version.GroupVersion)
			if err != nil {
				// continue as best we can
				// TODO track the errors and update callers to handle partial errors.
				if resources == nil || len(resources.APIResources) == 0 {
					continue
				}
			}
			groupResources.VersionedResources[version.Version] = resources.APIResources
		}
		result = append(result, groupResources)
	}
	return result, nil
}

Any this is the output running it on a far away cluster - notice the jump of seconds on the log print times -

$ go run a.go
2019/07/24 23:27:21 Calling config.GetConfigOrDie() ...
2019/07/24 23:27:21 Calling apiutil.NewDiscoveryRESTMapper() ...
2019/07/24 23:27:21 Calling discovery.NewDiscoveryClientForConfigOrDie() ...
2019/07/24 23:27:21 Calling restmapper.GetAPIGroupResources() ...
2019/07/24 23:27:21 Calling cl.ServerGroups() ...
2019/07/24 23:27:22 Calling cl.ServerResourcesForGroupVersion(v1) ...
2019/07/24 23:27:22 Calling cl.ServerResourcesForGroupVersion(apiregistration.k8s.io/v1) ...
2019/07/24 23:27:22 Calling cl.ServerResourcesForGroupVersion(apiregistration.k8s.io/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(extensions/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(apps/v1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(apps/v1beta2) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(apps/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(events.k8s.io/v1beta1) ...
2019/07/24 23:27:23 Calling cl.ServerResourcesForGroupVersion(authentication.k8s.io/v1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(authentication.k8s.io/v1beta1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(authorization.k8s.io/v1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(authorization.k8s.io/v1beta1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(autoscaling/v1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(autoscaling/v2beta1) ...
2019/07/24 23:27:24 Calling cl.ServerResourcesForGroupVersion(autoscaling/v2beta2) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(batch/v1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(batch/v1beta1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(certificates.k8s.io/v1beta1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(networking.k8s.io/v1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(policy/v1beta1) ...
2019/07/24 23:27:25 Calling cl.ServerResourcesForGroupVersion(rbac.authorization.k8s.io/v1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(rbac.authorization.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(storage.k8s.io/v1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(storage.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(admissionregistration.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(apiextensions.k8s.io/v1beta1) ...
2019/07/24 23:27:26 Calling cl.ServerResourcesForGroupVersion(scheduling.k8s.io/v1beta1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(coordination.k8s.io/v1beta1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(apps.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(authorization.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(build.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(image.openshift.io/v1) ...
2019/07/24 23:27:27 Calling cl.ServerResourcesForGroupVersion(oauth.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(project.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(quota.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(route.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(security.openshift.io/v1) ...
2019/07/24 23:27:28 Calling cl.ServerResourcesForGroupVersion(template.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(user.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(packages.operators.coreos.com/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(config.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(operator.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(autoscaling.openshift.io/v1) ...
2019/07/24 23:27:29 Calling cl.ServerResourcesForGroupVersion(autoscaling.openshift.io/v1beta1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(ceph.rook.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(cloudcredential.openshift.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(imageregistry.operator.openshift.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(k8s.cni.cncf.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(machineconfiguration.openshift.io/v1) ...
2019/07/24 23:27:30 Calling cl.ServerResourcesForGroupVersion(monitoring.coreos.com/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(network.openshift.io/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(operators.coreos.com/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(operators.coreos.com/v1alpha2) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(operators.coreos.com/v1alpha1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(samples.operator.openshift.io/v1) ...
2019/07/24 23:27:31 Calling cl.ServerResourcesForGroupVersion(tuned.openshift.io/v1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(velero.io/v1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(healthchecking.openshift.io/v1alpha1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(noobaa.io/v1alpha1) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(rook.io/v1alpha2) ...
2019/07/24 23:27:32 Calling cl.ServerResourcesForGroupVersion(machine.openshift.io/v1beta1) ...
2019/07/24 23:27:33 Calling cl.ServerResourcesForGroupVersion(metrics.k8s.io/v1beta1) ...
2019/07/24 23:27:33 Calling restmapper.NewDiscoveryRESTMapper() ...
2019/07/24 23:27:33 Done.

So it's pretty clear that with high latency network to the apiserver this loop over api groups and versions is really wasteful. we can do much better if we just run these in parallel.

I can suggest a PR to process this list with go routine workers if it sounds good to you.

Also perhaps a variant of apiutil.NewDiscoveryRESTMapper() that takes a filter function that can reduce the number of group-versions that will be read from the server.

Perhaps there's a better way to avoid all this network-chatty initialization?

Thanks!

@guymguym
Copy link
Author

Maybe a fix can also take into account #321 because they are around the same code.
I also encountered this issue that creating a CRD does not reflect in the RESTMapper.

@guymguym
Copy link
Author

Here is a function I used that in my case reduces the time from above 10 seconds to 2:

// NewFastDiscoveryRESTMapper loads the mapper data from the server with filter and concurrency
// See https://github.com/kubernetes-sigs/controller-runtime/issues/537
func NewFastDiscoveryRESTMapper(config *rest.Config, filter func(*metav1.APIGroup) bool) meta.RESTMapper {
	dc := discovery.NewDiscoveryClientForConfigOrDie(config)
	groups, err := dc.ServerGroups()
	Fatal(err)
	wg := wait.Group{}
	totalCount := 0
	pickedCount := 0
	var grs []*restmapper.APIGroupResources
	for _, group := range groups.Groups {
		pick := filter(&group)
		logrus.Debugf("Group: %s %v", group.Name, pick)
		totalCount++
		if !pick {
			continue
		}
		pickedCount++
		gr := &restmapper.APIGroupResources{
			Group:              group,
			VersionedResources: make(map[string][]metav1.APIResource),
		}
		grs = append(grs, gr)
		wg.Start(func() { discoverGroupResources(dc, gr) })
	}
	wg.Wait()
	logrus.Debugf("Picked %d/%d", pickedCount, totalCount)
	return restmapper.NewDiscoveryRESTMapper(grs)
}

func NewFastLazyDiscoveryRESTMapper(config *rest.Config, filter func(*metav1.APIGroup) bool) meta.RESTMapper {
	return meta.NewLazyRESTMapperLoader(func() (meta.RESTMapper, error) {
		return NewFastDiscoveryRESTMapper(config, discoveryGroupFilter), nil
	})
}

func discoverGroupResources(dc discovery.DiscoveryInterface, gr *restmapper.APIGroupResources) {
	for _, version := range gr.Group.Versions {
		resources, err := dc.ServerResourcesForGroupVersion(version.GroupVersion)
		Fatal(err)
		gr.VersionedResources[version.Version] = resources.APIResources
	}
}

@DirectXMan12
Copy link
Contributor

yes, parallel discovery, lazy discovery, and dynamic reloading of discovery would be useful. PRs definitely welcome, but be prepared for some design discussion.

I'd prefer lazy discovery to explicit filters, though.

@nicolaferraro
Copy link

@guymguym your approach really makes the difference. Thanks!

It would be awesome to have that in the standard controller.

guymguym added a commit to guymguym/controller-runtime that referenced this issue Sep 10, 2019
This adds concurrency and optional groups filtering
It also dynamically handles the case of new CRDs
by running re-discover on NoMatchError
@guymguym
Copy link
Author

@nicolaferraro I opened a PR - lets see if it looks OK

guymguym added a commit to guymguym/controller-runtime that referenced this issue Sep 10, 2019
This adds concurrency and optional groups filtering
It also dynamically handles the case of new CRDs
by running re-discover on NoMatchError
guymguym added a commit to guymguym/controller-runtime that referenced this issue Sep 10, 2019
Adds a discovery with concurrency and optional groups filtering.
It also dynamically handles the case of new CRDs,
by running re-discover on NoMatchError.
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 9, 2019
guymguym added a commit to guymguym/controller-runtime that referenced this issue Dec 10, 2019
Adds a discovery with concurrency and optional groups filtering.
It also dynamically handles the case of new CRDs,
by running re-discover on NoMatchError.
guymguym added a commit to guymguym/controller-runtime that referenced this issue Dec 10, 2019
Adds a DynamicRESTMapperOption for discovery with groups filtering.

Perhaps we can push an optional filter to discovery.DiscoveryClient
which will simplify the solution.
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 8, 2020
@vincepri
Copy link
Member

/lifecycle frozen
/assign @guymguym

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jan 22, 2020
DirectXMan12 pushed a commit that referenced this issue Jan 31, 2020
use gcr auth proxy image
@vincepri vincepri added this to the v0.5.x milestone Feb 21, 2020
@vincepri
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 21, 2020
guymguym added a commit to guymguym/controller-runtime that referenced this issue Feb 27, 2020
Adds a DynamicRESTMapperOption for discovery with groups filtering.
Perhaps we can push an optional filter to discovery.DiscoveryClient
which will simplify the solution.

Signed-off-by: Guy Margalit <guymguym@gmail.com>
guymguym added a commit to guymguym/controller-runtime that referenced this issue Feb 28, 2020
…for NewDynamicRESTMapper()

This new option provides a custom filter that is used to select only the needed
api groups that the controller really needs, instead of the default discover client
that will discover all the groups on the server.

The motivation is since controllers typically use a fixed small number of api groups,
while servers have many dynamic api groups, and in such cases if the controller
has some non-zero latency to the api server, or the server is loaded,
the full groups discovery can take long time (10+ seconds).
The filter can reduce the number of groups drastically.

We might want to consider contributing the filter option to
client-go/discovery client.

Signed-off-by: Guy Margalit <guymguym@gmail.com>
guymguym added a commit to guymguym/controller-runtime that referenced this issue May 31, 2020
…for NewDynamicRESTMapper()

This new option provides a custom filter that is used to select only the needed
api groups that the controller really needs, instead of the default discover client
that will discover all the groups on the server.

The motivation is since controllers typically use a fixed small number of api groups,
while servers have many dynamic api groups, and in such cases if the controller
has some non-zero latency to the api server, or the server is loaded,
the full groups discovery can take long time (10+ seconds).
The filter can reduce the number of groups drastically.

We might want to consider contributing the filter option to
client-go/discovery client.

Signed-off-by: Guy Margalit <guymguym@gmail.com>
guymguym added a commit to guymguym/controller-runtime that referenced this issue May 31, 2020
…for NewDynamicRESTMapper()

This new option provides a custom filter that is used to select only the needed
api groups that the controller really needs, instead of the default discover client
that will discover all the groups on the server.

The motivation is since controllers typically use a fixed small number of api groups,
while servers have many dynamic api groups, and in such cases if the controller
has some non-zero latency to the api server, or the server is loaded,
the full groups discovery can take long time (10+ seconds).
The filter can reduce the number of groups drastically.

We might want to consider contributing the filter option to
client-go/discovery client.

Signed-off-by: Guy Margalit <guymguym@gmail.com>
guymguym added a commit to guymguym/controller-runtime that referenced this issue May 31, 2020
…for NewDynamicRESTMapper()

This new option provides a custom filter that is used to select only the needed
api groups that the controller really needs, instead of the default discover client
that will discover all the groups on the server.

The motivation is since controllers typically use a fixed small number of api groups,
while servers have many dynamic api groups, and in such cases if the controller
has some non-zero latency to the api server, or the server is loaded,
the full groups discovery can take long time (10+ seconds).
The filter can reduce the number of groups drastically.

We might want to consider contributing the filter option to
client-go/discovery client.

Signed-off-by: Guy Margalit <guymguym@gmail.com>
guymguym added a commit to guymguym/controller-runtime that referenced this issue Jun 2, 2020
…for NewDynamicRESTMapper()

This new option provides a custom filter that is used to select only the needed
api groups that the controller really needs, instead of the default discover client
that will discover all the groups on the server.

The motivation is since controllers typically use a fixed small number of api groups,
while servers have many dynamic api groups, and in such cases if the controller
has some non-zero latency to the api server, or the server is loaded,
the full groups discovery can take long time (10+ seconds).
The filter can reduce the number of groups drastically.

We might want to consider contributing the filter option to
client-go/discovery client.

Signed-off-by: Guy Margalit <guymguym@gmail.com>
guymguym added a commit to guymguym/controller-runtime that referenced this issue Jun 2, 2020
…for NewDynamicRESTMapper()

This new option provides a custom filter that is used to select only the needed
api groups that the controller really needs, instead of the default discover client
that will discover all the groups on the server.

The motivation is since controllers typically use a fixed small number of api groups,
while servers have many dynamic api groups, and in such cases if the controller
has some non-zero latency to the api server, or the server is loaded,
the full groups discovery can take long time (10+ seconds).
The filter can reduce the number of groups drastically.

We might want to consider contributing the filter option to
client-go/discovery client.

Signed-off-by: Guy Margalit <guymguym@gmail.com>
@vincepri
Copy link
Member

vincepri commented May 4, 2023

In 0.15 a new dynamic and lazy RESTMapper implementation has been merged and tested, I'll go ahead and close this issue for now; if folks have more feedback or can have a reproducible test case we can have in CI that would be great to further improve our mapper.

/close

@k8s-ci-robot
Copy link
Contributor

@vincepri: Closing this issue.

In response to this:

In 0.15 a new dynamic and lazy RESTMapper implementation has been merged and tested, I'll go ahead and close this issue for now; if folks have more feedback or can have a reproducible test case we can have in CI that would be great to further improve our mapper.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

6 participants