-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve: generate crds in parallel and optimize code #4644
Conversation
Hi @nicoloboschi , thanks for this PR! This sounds super interesting indeed! |
e18cfa1
to
1b39934
Compare
crd-generator/api/src/main/java/io/fabric8/crd/generator/AbstractCustomResourceHandler.java
Outdated
Show resolved
Hide resolved
@nicoloboschi are you still actively working on this? if that's the case, would you mind converting to Draft? |
it's ready for review |
List<CompletableFuture<Void>> futures = new ArrayList<>(); | ||
if (config.specClassName().isPresent()) { | ||
futures.add(CompletableFuture.runAsync(() -> { | ||
TypeDefBuilder builder = new TypeDefBuilder(def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreaTP mentioned a change in semantics, which is true.
However, it shouldn't make a difference as in this particular case we are not interested in building something (we completely ignore that), but instead we just need to traverse the object graph.
So, in my book it's acceptable but I would add a comment so that it's clear to people less familiar with the code.
Since, we are interested in optimizing things, I would also try to pass both visitors in a single accept invocation (it does accept var-arg). Since, that would reduce the graph traversals to half it might make a difference for large objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it shouldn't make a difference as in this particular case we are not interested in building something (we completely ignore that), but instead we just need to traverse the object graph.
yes, exactly, we just do a LOT of recursive calls and the only output is stored in the visitor object itself (getPath()
)
@iocanel I've already tried with the var-args solution but it doesn't help.
Sharing my dummy benchmark numbers
- using quarkus-operator-sdk mojo to generate crd
- jdk 17
- Intel mac 6 core
- 8 CRD in the project with ~200 props in total, spec and status classes in each one
- 3 rounds per solution
Results:
- current version: ~40s
- concurrency of 5: ~18s
- var args: ~45s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good.
Added a comment with two suggestion, which I consider nice ot have.
Kudos, SonarCloud Quality Gate passed! |
@iocanel any chance we can merge this? this is my latest comment about performances comparison
Results:
|
Hi @nicoloboschi and thanks again for your effort! |
Hi @nicoloboschi |
@manusa I can totally add a flag. I haven't find any configuration present at the moment. Do you have any suggestion on how to make it configurable? For example, I could add a system property but I'm not sure how it is supposed to be set if someone is using the |
Currently, the crd generator is purely using annotations and has no additional configuration: |
@andreaTP let me know if I missed something |
I just skimmed through on the phone, and, making parallel execution optional makes a lot of sense. I might be wrong, but it looks like there are a few tiny bits that are not source compatible (which is appropriate to merge before 6.4) and @metacosm would need to have a look and follow up in the Quarkus Operator SDK. |
99e5191
to
8d8244e
Compare
I created the related pr in for the quarkus sdk: quarkiverse/quarkus-operator-sdk#488 (it's only a draft since we need this to get released) |
Thanks for the follow-ups @nicoloboschi , LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I haven't checked downstream projects yet.
crd-generator/api/src/main/java/io/fabric8/crd/generator/AbstractCustomResourceHandler.java
Outdated
Show resolved
Hide resolved
crd-generator/api/src/main/java/io/fabric8/crd/generator/AbstractCustomResourceHandler.java
Outdated
Show resolved
Hide resolved
@gastaldi I've addressed your comments, please review it again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'd appreciate if you could please squash the commits. Good job!
doc/CRD-generator.md
Outdated
## Experimental | ||
|
||
### Generate CRDs in parallel | ||
Starting from 6.4, it's possible to speed up the CRDs generation by using parallel computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6.4.0 is already released, since this change doesn't break the API it could be introduced in a patch 6.4.x release
@@ -68,6 +69,11 @@ public CRDGenerator withOutput(CRDOutput<? extends OutputStream> output) { | |||
return this; | |||
} | |||
|
|||
public CRDGenerator withParallelGenerationEnabled(boolean parallel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: what if the ExecutorService is passed instead? That would give a better control over the execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this makes sense.
In this specific case, given the crd generation is a short-running task, I don't believe it's worth to manage the thread pool at higher level
4d112ac
to
cc37460
Compare
Fixed the doc, rebased to the current master and squashed @gastaldi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thx!
SonarCloud Quality Gate failed. |
Description
In my project I have 10 CRDs classes.
I've taken a CPU flamegraph and I figured out the usage of java streams api adds a relevant overhead while visiting specs/status classes via reflection.
Also the visitors can access the crd in parallel since the base typeref is stateless.
Changes included:
FYI I did a similar fix in sundrio: sundrio/sundrio#352
The net result of this patch is that generation time moved from ~30 seconds to to ~11 seconds.
Type of change
test, version modification, documentation, etc.)
Checklist