Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.2.4.Final is slower compared to 2.15.1.Final #35392

Closed
Karthikrao121 opened this issue Aug 17, 2023 · 24 comments
Closed

3.2.4.Final is slower compared to 2.15.1.Final #35392

Karthikrao121 opened this issue Aug 17, 2023 · 24 comments
Labels
kind/bug Something isn't working

Comments

@Karthikrao121
Copy link

Description

We recently migrated to the latest version of quarkus. After the upgrade to the latest version we are seeing slower startup times compared to the 2.15.1.Final version.

native (powered by Quarkus 2.15.1.Final) started in 0.308s.
native (powered by Quarkus 3.2.4.Final) started in 0.423s.

Need migration guide for performance enhancements on 3.2.4.Final.

Implementation ideas

No response

@Karthikrao121 Karthikrao121 added the kind/enhancement New feature or request label Aug 17, 2023
@gsmet
Copy link
Member

gsmet commented Aug 17, 2023

Need migration guide for performance enhancements on 3.2.4.Final.

I think we would first need a lot more data :).

The list of extensions would be a good start. For instance, we know that Hibernate ORM 6 comes with more classes that need to be initialized and causes a slow down.

Then I would continue by checking what's happening in JVM mode and if it is significantly slower as well, you could have a look at https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md to profile it.

Then we might want to actually profile the native executable but I'm not sure how to do that. @zakkak would know if it's possible.

@zakkak
Copy link
Contributor

zakkak commented Aug 17, 2023

Then we might want to actually profile the native executable but I'm not sure how to do that. @zakkak would know if it's possible.

That's definitely possible, but as you noted it should be the last step :)

You can see some examples in https://quarkus.io/guides/native-reference#inspecting-and-debugging

Furthermore, @galderz and @roberttoyonaga recently wrote a couple of great guides related to profiling native executables:

  1. https://quarkus.io/blog/native-startup-rss-troubleshooting/
  2. https://quarkus.io/blog/profile-and-monitor-native-executables-with-jfr/

@Karthikrao121
Copy link
Author

These are from 2.15.1.Final
Installed features: [amazon-dynamodb, amazon-lambda, amazon-s3, amazon-secretsmanager, amazon-ssm, cdi, logging-json, rest-client-reactive, rest-client-reactive-jackson, resteasy-reactive, resteasy-reactive-jackson, security, smallrye-context-propagation, smallrye-openapi, vertx]"
These are from 3.2.4.Final
"Installed features: [amazon-dynamodb, amazon-lambda, amazon-s3, amazon-secretsmanager, amazon-ssm, cdi, logging-json, rest-client-reactive, rest-client-reactive-jackson, resteasy-reactive, resteasy-reactive-jackson, security, smallrye-context-propagation, vertx]"

@geoand geoand added kind/bug Something isn't working triage/needs-feedback We are waiting for feedback. and removed kind/enhancement New feature or request triage/needs-triage labels Aug 21, 2023
@maxandersen
Copy link
Contributor

@Karthikrao121 have you tried some of suggestions above to trouble shoot where changes in performance occur? are you only seeing startup changes or also changes in memory and throughput runtime performance?

@Karthikrao121
Copy link
Author

Karthikrao121 commented Aug 28, 2023

@maxandersen memory consumption also increased on 3.2.4.Final.
on 2.15.Final Memory Size: 128 MB Max Memory Used: 116 MB startup times are around 400 ms to 500 ms
on 3.2.4.Final Memory Size: 256 MB Max Memory Used: 154 MB even with memory increase startup times are consistently over 600ms

@gsmet
Copy link
Member

gsmet commented Aug 28, 2023

@Karthikrao121

For the startup time, have a look at https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md , you should be able to get a CPU profile that would be a good start. Get one for both versions so that we can compare.

For the memory, it would be helpful if you could get a heap dump.

There's a good chance you won't want to make these public but if you're comfortable sharing them with me (and they don't contain sensitive information), you can send me an email with a download link at gsmet at redhat dot com.

The other option is that you have a look yourself and share your insights.

Thanks.

@Karthikrao121
Copy link
Author

@gsmet please provide your insights on the shared CPU profile details.

@geoand
Copy link
Contributor

geoand commented Aug 30, 2023

@franz1981 do we have any resources to help @Karthikrao121 answer the question above?
I can obviously point to the AsyncProfiler repo, but I am wondering if we have any of our own resources about using it :)

@franz1981
Copy link
Contributor

franz1981 commented Aug 30, 2023

Sadly we don't have any resources re quarkus for async profiler, because it's a kind of "moving target".

Before using async profiler we need to understand first if they can observe the same behaviour in JVM mode.
Regardless, If they are running in a container or not, and how many resources they dedicate to it. The latter part is very important, because an increased allocation pressure can pass nearly unnoticed with enough available memory (GCs works like that); in short: if they have lessen constrains (cpu/mem) things get better? If yes, which resource impact the most?
That would narrow If is just a memory related problem (as it seems).

As suggest by @gsmet If they could collect profiling data and/or use the method from @galderz to check if more (native) allocations are happening and where, on native mode, it would be great/perfect. If they can provide collapsed/folded profiling data of the 2 versions, flamegraphs allow to easily create diff flamegraphs for comparison, which are pretty handy (see https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html).

If they cannot troubleshoot in native we have 2 other ways:

  • the same behaviour is reproducible with JVM mode (in an unconstrained env, CPU and memory-wise)? If yes, and it's related allocation, they can use Startup RSS regression introduced by #35246 #35406 (comment) to provide data about allocations happening at startup in the 2 cases. It would help to trivially find and fix this, given that the difference in memory used was noticiable
  • Occam rule: exclude the easier things to check between the versions which can impact startup time. Context propagation should have changed a lot; they really need it? What happen if they tune it to reduce its overhead? @mkouba told me about a property to do it (maybe @geoand remember as well). It's just an example, I don't know if ctx prop can impact startup time, but I hope it's clear what I meant

@Karthikrao121
Copy link
Author

I did provide the async cpu profiles from both versions to @gsmet. what we observed is if we increase the memory and cpu the start up times for 3.2.4.Final are as same as 2.15.1.Final. But 2.15.1.Final is doing faster start up times at 128 MB where as 3.2.4.Final is doing the same start up times at around 700 MB to 1024 MB.

@franz1981
Copy link
Contributor

franz1981 commented Aug 30, 2023

Which seems to suggest:

  • either the live set is increased and the available one to allocate young gen obj is reduced; which means that less memory will make more work to be moved into the slower major/mixed pauses (JVM mode eh), and more moving/compacting activity from the GC perspective (which, with less CPU resources, code compilation will kick in slower, and application code will pause more frequently)
  • we allocate more volatile/temporary objects and having less memory would make more obj to prematurely become tenured (because surviving more, If their life cycles is short both not short enough): the effects on the CPU resources are similar to the previous point, given that GC will behave the same

I will stop my wild guesses and waiting the @gsmet analysis

Thanks for sharing the data with him already

@galderz
Copy link
Member

galderz commented Aug 31, 2023

But 2.15.1.Final is doing faster start up times at 128 MB where as 3.2.4.Final is doing the same start up times at around 700 MB to 1024 MB.

What are these memory sizes you mention? -Xmx values? Container memory size limits?

@galderz
Copy link
Member

galderz commented Aug 31, 2023

Also, can you post here what you sent to @gsmet?

@Karthikrao121
Copy link
Author

These are memory allocations on AWS lambda.

@gsmet
Copy link
Member

gsmet commented Sep 4, 2023

@Karthikrao121 sorry, I just found your email in my spam folder after my PTO. I think what you sent is dev mode profiling which makes it hard to pinpoint potential slow downs in your app itself.
Also given you found the regression with 3.2.4.Final, could you stick to this version? We have a known regression in 3.3.0 so that will make it harder to track what you noticed in 3.2.4 already.

What we would need is this one: https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md#profiling-application-startup-with-async-profiler , profiling startup by running the application jar.

Send me a flamegraph with 2.x and the same app with 3.2.4.Final.

Thanks!

@franz1981
Copy link
Contributor

franz1981 commented Sep 4, 2023

@gsmet still on PTO but to save troubles due to different architecture they can use https://github.com/jvm-profiling-tools/ap-loader which pack different binaries in.
In addition I suggest to produce jfr output which allow to extract much more info via jfr2flame conversion eg --classify, --lines and others...
@gsmet we can have a quick call in 2 days when I am back too

@gsmet
Copy link
Member

gsmet commented Sep 4, 2023

Let's start simple first. And if we need additional instructions in TROUBLESHOOTING.md, we should work on it and make them as easy to follow as possible.

@geoand
Copy link
Contributor

geoand commented Sep 4, 2023

And if we need additional instructions in TROUBLESHOOTING.md, we should work on it and make them as easy to follow as possible.

+1

@franz1981
Copy link
Contributor

Yeah, my point is to save users to capture data over and over: If they capture the data in jfr format we can extract different types of flamegraphs ourselves, without requiring the users to collect more...

@geoand geoand removed the triage/needs-feedback We are waiting for feedback. label Sep 20, 2023
@franz1981
Copy link
Contributor

franz1981 commented Sep 24, 2023

@Karthikrao121 any update?

@radcortez worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet knows, and If you will give it a shot this can give us an additional (indipendent) feedback.

Additionally, in our https://quarkus.io/guides/performance-measure we explained that startup time is for us what makes the application available to users, which imply awaiting the first request completion. This, compared to what quarkus print as startup time, include few more class loading and initializations which can affect the final numbers: did you perform the same measurements in your startup time tests?
I am saying that, because this is likely the measure you really are interested in.

@radcortez
Copy link
Member

@radcortez worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet knows, and If you will give it a shot this can give us an additional (indipendent) feedback.

Yes, we have made a bunch of improvements to both start time and RSS in both JVM and Native mode. A few of these improvements are already available in 3.4.x, but the big chunk should only be available in 3.5.x.

@Karthikrao121
Copy link
Author

Karthikrao121 commented Sep 29, 2023 via email

@radcortez
Copy link
Member

Gsmet, The startup time decreased with 3.4.1 version around 100 ms . Sorry for the late reply. We have seen some improvements with the newer version.

Good to hear; thanks for lettings us know.

Some additional improvements are coming in #35322

@radcortez
Copy link
Member

I believe this can be closed due to the improvements done in:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants