Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.

More actuator optimizations #259

Closed
sdeleuze opened this issue Aug 31, 2020 · 21 comments
Closed

More actuator optimizations #259

sdeleuze opened this issue Aug 31, 2020 · 21 comments
Assignees
Labels
type: optimization Related to optimizing image size, performance or memory consumption
Milestone

Comments

@sdeleuze
Copy link
Contributor

Despite our effort to modularize and optimize actuator hints in our 0.8.0 milestone, adding actuator on a Spring Boot + Spring MVC @Controller + Tomcat + Jackson increase RSS memory by 34M of RSS (97M with, 63M without). This is a lot especially when we think that most users just want their default web endpoints.

I think there are multiple things to explore here in order to find the best way how to optimize this:

  • @dsyer How much using spring-init functional configuration decrease the footprint? How much remaining hints required after?
  • @bclozel What is the footprint gain with your functional actuator branch? How much remaining hints when used with spring-init?
  • @snicoll @wilkinsona Could it be possible to ship less infra on Spring Boot side by default?
@sdeleuze sdeleuze added this to the v0.9.0 milestone Aug 31, 2020
@dsyer
Copy link
Contributor

dsyer commented Sep 1, 2020

The spring-init sample with functional actuators with has only /health and /info (but adding more might not impact the footprint much). The footprint is currently around 40MB (cf 36MB for plain webflux). Some of the difference is probably Jackson - those would be the hints required (see reflect-config.json for details).

@sdeleuze
Copy link
Contributor Author

sdeleuze commented Sep 7, 2020

See related spring-projects/spring-boot#20290 issue.

@sdeleuze
Copy link
Contributor Author

sdeleuze commented Sep 8, 2020

The first step is to review this diff of used classes between a basic Spring MVC app and the same with the actuator starter.
diff.txt

There are 3 levels where we could do something:

  • Fine tune the hints declarations, especially the access attribute
  • Fine tune the hint engine with @aclement
  • Contact Boot team to ask specific changes if we identify specific points they could change to make things easier here

@sdeleuze sdeleuze changed the title Explore how to reduce actuators footprint More actuator optimizations Sep 25, 2020
@aclement
Copy link
Contributor

First round of optimizations have gone in purely related to hint adjustment and smarter feature analysis, I wanted to share the results. I have a variant of the actuator-webflux sample. I've added a single / endpoint so I can exercise the app and the actuator separately.

Initial build with the starter for actuator removed from the pom to give us the baseline. (Macbook Pro)

Exercising the `/` endpoint:                  RSS=49.5m/ImageSize=43.3m

Now with the actuator added back to the pom.xml

Exercising `/` but not the health endpoint:   RSS=71.1m/ImageSize=52.1m
Exercising `/` *and* the health endpoint:     RSS=73.3m/ImageSize=52.1m

Recompiling with evaluate-cop=true set - this evaluates conditional on property checks at image build time, throwing more away as the properties are not typically set.

Exercising `/` but not the health endpoint:   RSS=69.1m/ImageSize=51.1m
Exercising `/` *and* the health endpoint:     RSS=71.4m/ImageSize=51.1m

Very roughly speaking if the observation in the first comment is that actuator had a 34M impact on RSS, we have reduced that to ~22M with a first pass. (I recognize the actuator-webflux sample is a little different to the original app). And down to ~20M with evaluate-cop=true.

We should probably review what is in the latter image to see if anything is obvious that doesn't need to be there.

@sdeleuze
Copy link
Contributor Author

sdeleuze commented Oct 27, 2020

@aclement Could we compare the additional classes loaded by the JVM with actuator + exercising the /health endpoint with the one added to the native image? That would allow use to check with Boot team if there are or not optimization possible on the JVM and if we ship too much on native.

@aclement
Copy link
Contributor

Now we can see the wood for the trees after the first round of optimizations, I was able to dig a little deeper. As I commented before reviewing the difference between a pure app with no actuator with actuator added and only wanting to exercise /health. Using the diff tool we have between the -H:+PrintAOTCompilation output in each case. There are 1623 extra things added to the actuator app. Looking through I could see Cloud related info and that is because we don't eagerly evaluate the @ConditionalOnCloudPlatform condition - so the ReactiveCloudFoundryActuatorAutoConfiguration kicks in. Yes it is additionally guarded with:

@ConditionalOnProperty(prefix = "management.cloudfoundry", name = "enabled", matchIfMissing = true)

But that matchIfMissing=true means even if you do eagerly evaluate conditional on property the fact that this matches if no property is specified means we aren't deactivating these things.

So a crude hack to do eager evaluation of ConditionalOnCloudPlatform. The changes are from:

Entries in the PrintAOTCompilation diff: 1623
Build memory: 6.42GB
Image build time: 114.3s
RSS memory: 71.4M
Image size: 51.2M
Startup time: 0.113 (JVM running for 0.115)

to:

Entries in the PrintAOTCompilation diff: 1215
Build memory: 6.97GB
Image build time: 102.2s
RSS memory: 70.4M
Image size: 48.6M
Startup time: 0.121 (JVM running for 0.123)

A meg saved in RSS. I also noticed a lot of micrometer stuff in there. I excluded micrometer in the pom (from the actuator dependency) to see the impact it would have:

Entries in the PrintAOTCompilation diff: 860
Build memory: 7.04GB
Image build time: 106.2s
RSS memory: 63.9M
Image size: 47.1M
Startup time: 0.09 (JVM running for 0.092)

Notice the giant decrease in RSS in this case. I suspect something similar could be achieved via not excluding micrometer as a dependency but evaluating conditions like ConditionalOnEnabledMetricsExport more eagerly.

@aclement
Copy link
Contributor

Eager evaluation of ConditionalOnEnabledMetricsExport doesn't seem to be sufficient - it helps but there is still a lot of metrics stuff included - there seem to be other routes to creating meter registry related beans that are not guarded with anything beyond class path checks. For example:

@Configuration(proxyBeanMethods = false)
@ConditionalOnBean(Clock.class)
@ConditionalOnMissingBean(MeterRegistry.class)
class NoOpMeterRegistryConfiguration {

@dsyer
Copy link
Contributor

dsyer commented Oct 28, 2020

Just checking: eager evaluation of @ConditionalOnCloudPlatform is a bad idea right? It's just to collect a data point?

@sdeleuze
Copy link
Contributor Author

sdeleuze commented Oct 28, 2020

After more discussions with @wilkinsona and @dsyer I think the solution is the following:

  • Introduce a spring.native.remove-metrics-support flag (Boot team is not ok to add such flag on their side because there is a clear classpath signal via the io.micrometer:micrometer-core dependency) that would discard metrics related beans (and do not add the related reflection entries) when set to true (default would be false)
  • Introduce a build time transformation that set spring.native.remove-metrics-support to true when spring.native.remove-jmx-support=true and (management.endpoint.metrics.enabled=false or management.endpoints.web.exposure.include does not include metrics). It would remove metrics by default since the endpoint is not exposed by default.

@ConditionalOnCloudPlatform seems to be mostly Cloud Foundry specific right now and I am not sure this is a pattern we would like to increase. I am even wondering if we could think about removing it in Boot 3 timeframe. For now I would suggest we set management.cloudfoundry.enabled=false by default in spring-gralvm-native and document it, that will allow people to enable it if needed at build time.

@snicoll
Copy link
Contributor

snicoll commented Oct 28, 2020

In Spring Boot 2.4 there is management.metrics.export.defaults.enabled=false to disable all known metric exporters. That does not prevent the metrics infrastructure to be auto-configured though as you can still opt-in for a given exporter. For instance adding management.metrics.export.simple.enabled=true alongside the previous property would disable all available metrics exporters but the simple in-memory one.

That's what we do to disable metrics export in Integration tests.

@wilkinsona
Copy link

Introduce a build time transformation that set spring.native.remove-metrics-support to true when spring.native.remove-jmx-support=true and (management.endpoint.metrics.enabled=false or management.endpoints.web.exposure.include does not include metrics). It would remove metrics by default since the endpoint is not exposed by default.

I'm not sure that this makes sense. If you're exporting metrics out-of-process – to Prometheus or whatever – you will care about metrics but in all likelihood won't have the metrics endpoint enabled.

@sdeleuze
Copy link
Contributor Author

Yeah we probably need to refine a little bit the logic.

@aclement
Copy link
Contributor

I've managed to get close to simulating micrometer not being around but not quite. Whilst I've managed to eliminate the Spring beans by tweaking the GraalVM feature analysis, there is a class reactor.util.Metrics that sets a flag isMicrometerAvailable and if it is set to true then it creates a few classes like reactor.netty.http.MicrometerHttpMetricsRecorder - when this happens we do start to pull in a (small) part of micrometer. About 1Meg of RSS' worth of micrometer and reactor netty micrometer infrastructure - so this is pulled in by reactor just if micrometer is on the classpath. Could feasibly put a substitution on there at some point but maybe that's too messy.

Pushing on optimizations I've made it to:

Entries in the PrintAOTCompilation diff: 481 // was 860 after last optimizations and ~1600 originally
Build memory: 6.75GB
Image build time: 85.8s
RSS memory: 60.4M // was 63.9 after last optimizations and 71.4 originally
Image size: 45.4M // was 47.1 after last optimizations and 51.1 originally (app was 43.3 with no actuator)
Startup time: 0.083 (JVM running for 0.085)

Which would make the health endpoint basically add 11M RSS to a base application and 2Meg to the image size.

I now need to refactor my optimizations into real commits so not sure I'll quite get to that target as I undo some liberties I took, but I'll post things I've done and we can see if any small Spring refactorings might be able to help.

@sdeleuze
Copy link
Contributor Author

If that's 1M of RSS and Reactive stack specific I think we can live with that.

@aclement
Copy link
Contributor

aclement commented Oct 29, 2020

Hey @wilkinsona - you might be able to help me with this (I'm no expert on actuators!). As I take apart my hacks, I'm looking at MappingsEndpointAutoConfiguration. Although the first @Bean method has @ConditionalOnAvailableEndpoint the other inner classes within it, and the MappingsEndpointAutoConfiguration itself are not conditional on anything related to whether the endpoint is active. That seems to mean that even if I totally turn off the mappings endpoint, the servlet or reactive configurations inside this auto configuration will build beans? Whatever I do (in my web flux app), my debug report shows:

   MappingsEndpointAutoConfiguration.ReactiveWebConfiguration matched:
      - @ConditionalOnClass found required class 'org.springframework.web.reactive.DispatcherHandler' (OnClassCondition)
      - found ConfigurableReactiveWebEnvironment (OnWebApplicationCondition)
      - @ConditionalOnBean (types: org.springframework.web.reactive.DispatcherHandler; SearchStrategy: all) found bean 'webHandler' (OnBeanCondition)

(that was with - overkill I know - --management.endpoints.enabled-by-default=false --management.endpoints.jmx.exposure.exclude=* --management.endpoints.web.exposure.exclude=* — management.endpoint.mappings.enabled=false). Is it possible to make the whole configuration conditional on something? Or does it need to always create those beans for another possible use? (One quick edit - realized maybe I'm using the wrong properties if they've changed with a more recent version of Spring...)

@wilkinsona
Copy link

Thanks, Andy. I think there's some scope to make Boot more efficient there, albeit with a non-zero (but small) risk of breaking someone if they're doing something a bit unusual. I've opened spring-projects/spring-boot#23977.

@aclement
Copy link
Contributor

Just thinking about the comment from @snicoll above:

In Spring Boot 2.4 there is management.metrics.export.defaults.enabled=false to disable all known metric exporters.
That does not prevent the metrics infrastructure to be auto-configured though as you can still opt-in for a given exporter.
For instance adding management.metrics.export.simple.enabled=true alongside the previous property would disable
all available metrics exporters but the simple in-memory one.

That's what we do to disable metrics export in Integration tests.

In a native-image world (maybe in a JVM world...) isn't it a shame that we build all that metrics infrastructure when all you might have wanted from actuator was the health endpoint? I guess as a percentage memory cost on the JVM it may not be so large but bringing in micrometer seems to be about 10% of the memory for our native-image case here.

Some crude measurements in JVM mode, running the actuator sample and just hitting the /actuator/health endpoint. The memory is very roughly around 600meg. So the 6Meg feeling more like 1% than the 10% it is on native (I know probably a bit of apples vs oranges comparison).

Now something super crude, checking how many classes are loaded for the app (count -verbose:class):

vanilla app without actuator (hitting /actuator/health but of course not found): ~6273 (~100meg less memory than when actuator is added)

vanilla app with actuator: ~7038

With --management.metrics.export.defaults.enabled=false: ~6980

With --management.metrics.export.defaults.enabled=false 
     --management.endpoints.enabled-by-default=false 
     --management.endpoint.health.enabled=true 
     --management.endpoints.web.exposure.include=health: ~6979

With io.micrometer:micrometer-core excluded: ~6650

From my point of view all I wanted was /health but without excluding micrometer I was loading an extra 330 classes, and I didn't seem to have any property controls over whether that happened. If there were an option to say no-metrics-please, I would have used it. (maybe pom exclusion is the way I'm supposed to do that, but tricky on native-image, I think - need to double check)

@aclement
Copy link
Contributor

aclement commented Nov 2, 2020

Final comment on this in 0.8.2 timeframe. I've pushed the changes I want to make and got some decent figures. What I ended up doing was building on a facility we already had to filter our configurations during feature processing. I wrote a filter for the metrics ones that bring in micrometer. If you set the right property it will filter our metrics configurations during image build. Here are the figures:

Figures for hitting '/' endpoint and the health endpoint:

0.8.1 rough figures from above.
RSS=73.3M/ImageSize=52.1M

Various tweaks to just hints and moving to boot 2.4.0-RC1.
RSS=68.8M/ImageSize=52.2M

Remove metrics related auto configuration from actuators: (-Dspring.native.factories.no-actuator-metrics=true).
RSS=62.4M/ImageSize=50.7M

Switch on build-time property checking for some conditions (basically ConditionalOnProperty and actuator related ones):
RSS=60.1M/ImageSize=49.6M

Switch on final extreme option - maybe going too far (flip treatment of matchIfMissing from true to false - effectively making a lot of things opt-in rather than opt-out):
RSS=58.3M/ImageSize=46.7M

(Remember 49.5M was the RSS without actuator, so <10M impact if taking it to the extreme with the options). None of these options are on by default.

@sdeleuze
Copy link
Contributor Author

sdeleuze commented Nov 3, 2020

Thanks @aclement, let's refine the defaults and have a deeper look on this with @bclozel before 0.9.

@sdeleuze sdeleuze added the type: optimization Related to optimizing image size, performance or memory consumption label Jan 8, 2021
@sdeleuze
Copy link
Contributor Author

@aclement Should we close this issue and focus on #340?

@aclement
Copy link
Contributor

Yes, I don't plan to do more here at the moment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: optimization Related to optimizing image size, performance or memory consumption
Development

No branches or pull requests

6 participants