Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve configuration property binding performance with extremely large input files #16401

Closed
wilkinsona opened this issue Apr 1, 2019 · 9 comments
Labels
theme: performance Issues related to general performance type: enhancement A general enhancement
Milestone

Comments

@wilkinsona
Copy link
Member

Courtesy of @martinvisser, we have an example of the binder still being slow in 2.2 snapshots relative to 1.5's performance. The input YAML file is 7500 lines long so considerably bigger than the case we investigated with Initializr.

@wilkinsona wilkinsona added the type: task A general task label Apr 1, 2019
@wilkinsona wilkinsona added this to the 2.2.x milestone Apr 1, 2019
@dreis2211
Copy link
Contributor

Most of the time is spent in ConfigurationPropertyName.elementEquals still. Or ConfigurationPropertyName.isAncestorOf depending on how you see it. Here's a screenshot from an application that simply tries to bind the large file:

grafik
(This is a snapshot in between, so don't trust the overall invocations or timings. The share of CPU usage is more important here)

@andresetevejob
Copy link

I want to investigate on this issue.can i do?

@dreis2211
Copy link
Contributor

I'm currently working already on some improvements as well, @wilkinsona . See the referenced issue #16447 for one of those. I'm currently working on another one already, but I'm not confident enough in the solution yet.

@dreis2211
Copy link
Contributor

dreis2211 commented May 2, 2019

Other than the already provided PRs I'm still trying around on this and found one interesting place that you might want to look at. When simply assigning the cache key in SpringIterableConfigurationPropertySource.getCache() instead of copying it, benchmarks yield the following results:

Baseline New
13.386 8.062
12.658 8.121
12.530 8.365
12.147 8.813
12.016 8.401
12.316 9.393
11.928 8.388
11.754 8.242
11.663 8.674
11.704 8.622
Mean 12.210 8.508
Range 1.723 1.331

Unfortunately, this is meant to fix #13344 and can thus not be removed at the moment. For the given example this means that AbstractSet.equals() is called on Sets with 5000 entries basically every time someone interacts with SpringIterableConfigurationPropertySource.getCache() - which is the majority of interaction.

I'm currently not able to try out things a lot, but I wanted to let you know at least. Maybe there is a way to change the CacheKey creation. Maybe there is a way to create something like an UnmodifiableMapPropertySource that might be helpful as well. After all, #13344 is about reacting to changes on the underlying map. If we can make sure that the underlying map is not subject of change, there is a possibility for better performance.

Cheers,
Christoph

@wilkinsona
Copy link
Member Author

Ouch, that's a heavy price to pay for something that's only of benefit to a subset of users. Thanks for identifying it.

I wonder if we can use an == check against the key of any existing cache key and reuse it when it matches:

diff --git a/spring-boot-project/spring-boot/src/main/java/org/springframework/boot/context/properties/source/SpringIterableConfigurationPropertySource.java b/spring-boot-project/spring-boot/src/main/java/org/springframework/boot/context/properties/source/SpringIterableConfigurationPropertySource.java
index 241ce2180b..50d3680336 100644
--- a/spring-boot-project/spring-boot/src/main/java/org/springframework/boot/context/properties/source/SpringIterableConfigurationPropertySource.java
+++ b/spring-boot-project/spring-boot/src/main/java/org/springframework/boot/context/properties/source/SpringIterableConfigurationPropertySource.java
@@ -44,7 +44,7 @@ import org.springframework.util.ObjectUtils;
 class SpringIterableConfigurationPropertySource extends SpringConfigurationPropertySource
                implements IterableConfigurationPropertySource {
 
-       private volatile Object cacheKey;
+       private volatile CacheKey cacheKey;
 
        private volatile Cache cache;
 
@@ -131,7 +131,7 @@ class SpringIterableConfigurationPropertySource extends SpringConfigurationPrope
        }
 
        private Cache getCache() {
-               CacheKey cacheKey = CacheKey.get(getPropertySource());
+               CacheKey cacheKey = CacheKey.get(this.cacheKey, getPropertySource());
                if (ObjectUtils.nullSafeEquals(cacheKey, this.cacheKey)) {
                        return this.cache;
                }
@@ -204,9 +204,14 @@ class SpringIterableConfigurationPropertySource extends SpringConfigurationPrope
                        return this.key.hashCode();
                }
 
-               public static CacheKey get(EnumerablePropertySource<?> source) {
+               public static CacheKey get(CacheKey existingCacheKey,
+                               EnumerablePropertySource<?> source) {
                        if (source instanceof MapPropertySource) {
-                               return new CacheKey(((MapPropertySource) source).getSource().keySet());
+                               Object key = ((MapPropertySource) source).getSource().keySet();
+                               if (existingCacheKey != null && existingCacheKey.key == key) {
+                                       return existingCacheKey;
+                               }
+                               return new CacheKey(key);
                        }
                        return new CacheKey(source.getPropertyNames());
                }

@wilkinsona
Copy link
Member Author

@dreis2211 What's the Baseline in your numbers above? With 2.2.0.M2, I'm seeing times similar to those in your New column. Times with latest 2.2 snapshots plus the diff above applied are a bit quicker:

Baseline New
8.176 8.037
9.292 8.983
8.128 7.794
9.074 7.756
8.543 8.493
8.383 8.823
8.631 8.525
8.675 7.742
8.563 7.859
9.072 7.622
Mean 8.654 8.163
Range 1.164 1.361

@dreis2211
Copy link
Contributor

dreis2211 commented May 3, 2019

Interesting - I tested with quite a few baselines. Basically I switch to a new one after one of my provided PRs was merged to master. So the current baseline is a 2.2.0 snapshot, but even with M2 I get very different results to yours.

Baseline New
15.258 7.887
12.686 7.335
13.028 7.911
11.813 9.204
13.435 8.029
12.673 8.041
12.549 7.937
12.681 7.954
12.286 7.882
12.565 7.995
Mean 12.897 8.018
Range 3.445 1.869

Just for completeness - I added a System.exit() call to the example ConfigurationPropertiesApplication so I don't get any port clashes. I wonder if that makes a difference (I wouldn't expect that at least).

To your suggestion. That shouldn't make a difference as we're still comparing a LinkedKeySet vs. a HashSet where == shouldn't apply.

@dreis2211
Copy link
Contributor

dreis2211 commented May 4, 2019

@wilkinsona If I interpret your benchmark correctly, you're comparing a M2 as baseline vs. a current snapshot plus your diff, right? As noted I don't think your patch should make a difference, but maybe you're seeing the improvements of my previous PRs (although with a smaller impact than on my machine).

So maybe your machine is simply a lot faster than mine!? I'm running on a Windows machine with an Intel Core (i7) 4xCore @2.40Ghz and 8GB of RAM and Java 1.8.0_201 at the moment. Maybe you're already running on a higher JDK version (Compact Strings and String Concat optimizations come to my mind here)?

@dreis2211
Copy link
Contributor

@wilkinsona Any chance you had time to look into my questions? :)

philwebb pushed a commit that referenced this issue Jun 4, 2019
Update `SpringIterableConfigurationPropertySource` so that cache keys
do not need to be checked if property sources are immutable.

See gh-16401
philwebb added a commit that referenced this issue Jun 4, 2019
Make immutable properties more explicit and trust that they
truly won't change.

See gh-16401
philwebb added a commit that referenced this issue Jun 4, 2019
* pr/16717:
  Polish "Optimize CacheKey handling for immutable sources"
  Optimize CacheKey handling for immutable sources

Closes gh-16401
@wilkinsona wilkinsona changed the title Investigate configuration property binding performance with extremely large input files Improve configuration property binding performance with extremely large input files Jun 5, 2019
@wilkinsona wilkinsona added type: enhancement A general enhancement and removed type: task A general task labels Jun 5, 2019
@wilkinsona wilkinsona modified the milestones: 2.2.x, 2.2.0.M4 Jun 5, 2019
@wilkinsona wilkinsona added the theme: performance Issues related to general performance label Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme: performance Issues related to general performance type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants