Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation starting with v3.8.6 #4100

Closed
shapirus opened this issue Aug 2, 2021 · 17 comments · Fixed by #4568
Closed

Performance degradation starting with v3.8.6 #4100

shapirus opened this issue Aug 2, 2021 · 17 comments · Fixed by #4568
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@shapirus
Copy link

shapirus commented Aug 2, 2021

Starting with v3.8.6, a serious performance degradation is observed even on a very basic setup. Version 3.8.5 (and older) was fine.

Demonstration:

kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- service.yaml

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: "test"
spec:
  ports:
  - port: 80
    name: http
    targetPort: 80

I performed a test run with versions 3.8.5, 3.8.6, 3.8.7, 3.8.8, 4.2.0. While v3.8.5 is able to build the manifest in about 35 ms, v3.8.6 and v3.8.7 spend about 850 ms, and then starting with v3.8.8 and up to the latest v4.2.0 it takes about one second.
Output is identical in each run, so I filtered it out to make it easier to read.

$ for ver in 3.8.5 3.8.6 3.8.7 3.8.8 4.2.0;do echo;./kustomize-v$ver version;time ./kustomize-v$ver build &>/dev/null;echo;done

{Version:kustomize/v3.8.5 GitCommit:4052cd4fd8c76a17b5f64e32509f3fba9713fe75 BuildDate:2020-10-08T02:45:59Z GoOs:linux GoArch:amd64}

real    0m0.037s
user    0m0.033s
sys     0m0.013s


{Version:kustomize/v3.8.6 GitCommit:c1747439cd8bc956028ad483cdb30d9273c18b24 BuildDate:2020-10-29T23:07:50Z GoOs:linux GoArch:amd64}

real    0m0.863s
user    0m0.926s
sys     0m0.021s


{Version:kustomize/v3.8.7 GitCommit:ad092cc7a91c07fdf63a2e4b7f13fa588a39af4f BuildDate:2020-11-11T23:14:14Z GoOs:linux GoArch:amd64}

real    0m0.853s
user    0m0.906s
sys     0m0.032s


{Version:kustomize/v3.8.8 GitCommit:72262c5e7135045ed51b01e417a7e72f558a22b0 BuildDate:2020-12-10T18:05:35Z GoOs:linux GoArch:amd64}

real    0m0.931s
user    0m1.024s
sys     0m0.013s


{Version:kustomize/v4.2.0 GitCommit:d53a2ad45d04b0264bcee9e19879437d851cb778 BuildDate:2021-06-30T22:49:26Z GoOs:linux GoArch:amd64}

real    0m0.973s
user    0m1.028s
sys     0m0.017s

It must be noted that I do not observe any performance degradation on big setups, when bases and more resources are included. Kustomize apparently spends that extra second once on startup, and then works fast processing everything. It is still a problem in my case, however, because I have a lot of microservices that have to be processed using multiple invocations of kustomize.

Platform
Linux/amd64.

@shapirus shapirus added the kind/bug Categorizes issue or PR as related to a bug. label Aug 2, 2021
@k8s-ci-robot
Copy link
Contributor

@shapirus: This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 2, 2021
@shapirus
Copy link
Author

shapirus commented Aug 2, 2021

Attaching a Dockerfile that can be used to reproduce the test run described above by building it.

Dockerfile.zip

@natasha41575 natasha41575 added triage/under-consideration and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 3, 2021
@natasha41575
Copy link
Contributor

I'm linking the release notes for kustomize v3.8.6 that might help us narrow down what caused such a big decline in performance:

https://github.com/kubernetes-sigs/kustomize/releases/tag/kyaml%2Fv0.9.3
https://github.com/kubernetes-sigs/kustomize/releases/tag/cmd%2Fconfig%2Fv0.8.4
https://github.com/kubernetes-sigs/kustomize/releases/tag/api%2Fv0.6.4
https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv3.8.6

One possible culprit is 119d7ca, which adds OpenAPI parsing (known to be very slow) when checking if a resource is namespaceable.

@ChristianCiach
Copy link

ChristianCiach commented Aug 9, 2021

There is a futher performance degradation after updating from Kustomizer 4.1.2 to 4.2.0. My pipeline, which calls kustomize build over 20 times, now always takes ~18.5 seconds, where it previously took ~17.0 seconds. I can reproduce this at any time just by switching out the kustomize binary.

I exclusively use kustomize for its configMapGenerators and secretGenerators. I don't use any other features.

@ChristianCiach
Copy link

ChristianCiach commented Aug 9, 2021

@natasha41575

Bingo! You've found the culprit. In kyaml/openapi/openapi.go I've replaced the function IsCertainlyClusterScoped with a dummy implementation:

func IsCertainlyClusterScoped(typeMeta yaml.TypeMeta) bool {
        return true
}

With this change, the duration of my pipeline went down from 18.5 seconds to just 4.5 seconds.

Obviously, this is no real solution. Unfortunately, I am not a go developer and have almost no idea what I am doing here. But at least we've successfully identified the bottleneck.

@ChristianCiach
Copy link

ChristianCiach commented Aug 9, 2021

I've replaced my dummy implementation with the hardcoded list that was used before it has been replaced in 119d7ca:

var notNamespaceableKinds = []string{
        "APIService",
        "CSIDriver",
        "CSINode",
        "CertificateSigningRequest",
        "Cluster",
        "ClusterRole",
        "ClusterRoleBinding",
        "ComponentStatus",
        "CustomResourceDefinition",
        "MutatingWebhookConfiguration",
        "Namespace",
        "Node",
        "PersistentVolume",
        "PodSecurityPolicy",
        "PriorityClass",
        "RuntimeClass",
        "SelfSubjectAccessReview",
        "SelfSubjectRulesReview",
        "StorageClass",
        "SubjectAccessReview",
        "TokenReview",
        "ValidatingWebhookConfiguration",
        "VolumeAttachment",
}

// IsCertainlyClusterScoped returns true for Node, Namespace, etc. and
// false for Pod, Deployment, etc. and kinds that aren't recognized in the
// openapi data. See:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces
func IsCertainlyClusterScoped(tm yaml.TypeMeta) bool {
        for _, k := range notNamespaceableKinds {
                if k == tm.Kind {
                        return true
                }
        }
	return false
}

My pipeline is still just as fast as with my previous dummy: Still 4.5 seconds for just over 20 calls to kustomize build.

I guess this is good enough for my use cases, so I will use my hacked executable for the time being.

@ChristianCiach
Copy link

ChristianCiach commented Aug 9, 2021

I guess that the parsing of the huge openapi-schema (json) takes most of the time. Maybe it would help to convert the json to gop (using https://pkg.go.dev/encoding/gob) at build-time, so that kustomize could read the pre-converted gop-file at runtime?

Edit: Instead of using gop or protobuf, maybe this is a good use-case for code generation (https://blog.golang.org/generate). This way we could generate a go-array of non-namespaced kinds and then include the generated list in go build.

Sorry for the comment spam.

@natasha41575
Copy link
Contributor

natasha41575 commented Aug 9, 2021

Another option is to re-add a hardcoded list of namespaceable resources. We may be able to automatically generate it (so that we don't have to manually maintain the list) with the scripts that fetch OpenAPI data.

However, OpenAPI parsing being a bottleneck is a common theme so using gop, proto, or some other type of encoding scheme that is quicker to parse is probably a good idea.

I would gratefully accept any PRs that are able to achieve an improvement in OpenAPI parsing time (with benchmark tests). One requirement however would be that the public kyaml interface would have to stay the same, as there are library consumers outside of kustomize that use it.

I am seriously considering submitting a PR to revert the commit that removed the hardcoded list for the sake of performance.

@natasha41575 natasha41575 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 9, 2021
@natasha41575 natasha41575 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/under-consideration labels Aug 30, 2021
@natasha41575
Copy link
Contributor

natasha41575 commented Sep 9, 2021

#4152 will merge soon, which should improve performance as it effectively undoes 119d7ca. Please comment and reopen if you find that the issue is still unresolved.

@shapirus
Copy link
Author

shapirus commented Sep 9, 2021

awesome, thank you, I will try to test it tomorrow.

@shapirus
Copy link
Author

I can confirm that the issue is now resolved.

With kustomize built from HEAD:

$ time ~/go/bin/kustomize build
apiVersion: v1
kind: Service
metadata:
  name: test
spec:
  ports:
  - name: http
    port: 80
    targetPort: 80

real    0m0.021s
user    0m0.021s
sys     0m0.003s

@ChristianCiach
Copy link

ChristianCiach commented Sep 27, 2021

There is an improvement with Kustomize 4.4.0, but not as large as I expected.

I just tested the new kustomize release 4.4.0 against 4.3.0 and also against my hack as decribed in #4100 (comment).

My pipeline consists of about 20 calls to kustomize build.

Result:

  • 4.3.0: ~19 seconds
  • 4.4.0: ~11 seconds
  • my hacked 4.3.0 build: ~4 seconds

This is strange, because I've reviewed #4152 and it does essentially the same as my hack. Maybe there is a new performance degradation introduced with 4.4.0?

solsson added a commit to Yolean/ystack that referenced this issue Sep 29, 2021
kubernetes-sigs/kustomize#4100

This version mismatches that embedded with kubectl, but let's assume
that it will make it to Kubernetes 1.23
@solsson
Copy link

solsson commented Sep 29, 2021

My pipeline consists of about 20 calls to kustomize build.

We have a similar pipeline and I get the following results:

  • 4.1.2: ~24s
  • 4.4.0: ~4s

@shapirus
Copy link
Author

shapirus commented Oct 6, 2021

Unfortunately, 4.4.0, while being much faster on the simple test case that I described here, is still much slower than, for example, 3.5.4 on a more complex (but still not too complex, just a few resources included) setup, and is comparable to 3.9.4, for example. I still have to keep my argocd at kustomize v3.5.4.

I will create a test case that demonstrates the difference later and reopen this issue, or open a new one.

@natasha41575
Copy link
Contributor

@shapirus I have left a comment on #2987, which is where I think we should continue the discussion. I was wondering if you had an update, and if so, could you continue this thread over there on that issue?

@shapirus
Copy link
Author

shapirus commented Nov 1, 2021

@shapirus I have left a comment on #2987, which is where I think we should continue the discussion. I was wondering if you had an update, and if so, could you continue this thread over there on that issue?

@natasha41575 I have commented on that issue as well. Sounds like the discussion should be moved back here, but that's up to you.

@shapirus
Copy link
Author

For those who come here from search results, it's resolved by #5076, see #4569 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants