Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CRD openapi serving speed #86967

Open
roycaihw opened this issue Jan 8, 2020 · 10 comments
Open

Improve CRD openapi serving speed #86967

roycaihw opened this issue Jan 8, 2020 · 10 comments
Assignees
Labels
area/custom-resources kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@roycaihw
Copy link
Member

roycaihw commented Jan 8, 2020

CustomResourcePublishOpenAPI e2e tests are flaky because the time for a given CRD change to get reflected in the openapi spec exceeds the 30s timeout.

Here is a analysis for the worst case-- assuming three CRD changes were made at the same time (change A&B by tests running in parallel, and change C by the current test), it may take up to three 200OK downloads before the current e2e test sees change C in the openapi spec. The 30s timeout can be reached before the test starts the third 200OK download.

Copy of Untitled drawing

Things that we can improve:

  1. openapi service update (for both kube-aggregator and CRD) is most expensive. Adding log shows both the first marshaling and ToProtoBinary can cost more that 5s for kube-aggregator. The time spent also varies a lot even though the spec size should be rather stable (due to CPU pressure?). More efficient ways to do the marshaling and generate proto binary would help.
  2. although the CRD openapi spec is much smaller compared to kube-aggregator, openapi service update still costs 0.5~1.5s to apiextensions-apiserver for each CRD change. Since openapi service update requires lock, the time queues up when there are lots of CRD changes at the same time (e.g. three CRD changes in the image above). Solving the previous point also helps here
  3. apiextensions-apiserver supports serving openapi spec in PB and gzip, but kube-aggregator downloads and deserializes the spec through JSON (which makes sense for aggregated apiservers). Potentially kube-aggregator can use PB when it's talking to apiextensions-apiserver.

cc @liggitt @sttts @wojtek-t

@roycaihw roycaihw added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 8, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 8, 2020
@roycaihw
Copy link
Member Author

roycaihw commented Jan 8, 2020

/sig api-machinery
/area custom-resources

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. area/custom-resources and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 8, 2020
@wojtek-t
Copy link
Member

wojtek-t commented Jan 8, 2020

Adding log shows both the first marshaling and ToProtoBinary can cost more that 5s for kube-aggregator.

Re ToProtoBinary - did you split that into: NewDocument + jsonToYAML and proto.Marshal()?
I guess that the victim may be the first part, not necessary proto.Marshal...

@liggitt liggitt added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Jan 8, 2020
@roycaihw
Copy link
Member Author

roycaihw commented Jan 8, 2020

Re ToProtoBinary - did you split that into: NewDocument + jsonToYAML and proto.Marshal()?

Not I didn't. I will add logging to break that down

@tedyu
Copy link
Contributor

tedyu commented Jan 8, 2020

Looking in the kube-openapi repo, I only found one benchmark under pkg:

func BenchmarkMergeSpecsIgnorePathConflictsWithKubeSpec(b *testing.B) {
pkg//aggregator/aggregator_test.go

Wondering if more benchmark should be added.

@liggitt liggitt added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 9, 2020
@liggitt liggitt added this to the v1.18 milestone Jan 9, 2020
@fedebongio
Copy link
Contributor

/assign @roycaihw @sttts

@annajung
Copy link

Hello @roycaihw @sttts 👋 I'm one of the Bug Triage shadows for the 1.18 release. Since this issue is tagged for 1.18, I'd like to check its status. Is this issue intended for milestone 1.18?

Friendly reminder: Code freeze is scheduled for March 5th.

@annajung
Copy link

Hello @roycaihw @sttts 👋 Checking in again. Do you think this will make it in to 1.18 release?

@roycaihw
Copy link
Member Author

@annajung I don't have the cycle to work on it this release. Let's move it to milestone 1.19. Thank you

@annajung
Copy link

/milestone v1.19

@bai
Copy link

bai commented May 29, 2020

👋 Hello from Bug Triage team! Wanted to follow up on this issue with a friendly reminder that the code freeze for 1.19 is starting June 25th (about 4 weeks from now).

As this issue is tagged for 1.19, is it still planned for this release?

@liggitt liggitt removed this from the v1.19 milestone Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/custom-resources kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

9 participants