Build fast "has type" lookup into TypesInUse cache, speeding up repeated type queries#7483
Build fast "has type" lookup into TypesInUse cache, speeding up repeated type queries#7483
Conversation
LSTs already have a TypesInUse field which is a WeakReference'd cache of type information present in a given LST. But UsesType and UsesMethod (another very popular precondition / implementation detail) still iterate over every single type in use looking for matches. So in a large composite recipe: thousands of recipe instance are, via UsesType / UsesMethod, iterating over lists of hundreds or thousands of types, for hundreds or thousands of source files. It explains why AddDependency dominates the scanning phase in terms of time spent in the spring boot migration, followed closely by ChangeType. So build a lazily produced prefix tree into TypesInUse. Big savings in the asociated benchmark. In the benchmark which simulates running 100 HasType in succession this is 100x faster
| * Visibility splits each pair into explicit (reachable from types-in-use or imports — visible | ||
| * without {@code includeImplicit}) and implicit (reachable only through used-method types). | ||
| */ | ||
| @Nullable |
There was a problem hiding this comment.
There was a problem hiding this comment.
Indeed it is. Good callout that this change requires redeploy.
| @Nullable | ||
| private volatile Map<String, Boolean> usedMethodCache; | ||
|
|
||
| /** Same shape as {@link #usedMethodCache} but over {@link #declaredMethods}. */ | ||
| @Nullable | ||
| private volatile Map<String, Boolean> declaredMethodCache; |
There was a problem hiding this comment.
will we interrogate the same TypesInUse instance for the same methodPattern usages/declarations?
Not sure if we have a lot of invocations with exactly the same method pattern in our codebase
| * walks) and misses (types not used by the sample file). The proportions are roughly | ||
| * representative of recipe preconditions in a mixed migration composite — most queries miss. | ||
| */ | ||
| static final List<String> QUERIES = Arrays.asList( |
There was a problem hiding this comment.
Should we also benchmark different wildcard patterns used throughout our recipes?
*..*
some.packagename..*
some.packagename..*Identifier*
some.*.packagename.Identifier
some..packagename.Identifier
...
There was a problem hiding this comment.
Nothing runs the benchmarks on the regular, so after the initial validation they're more showing-your-work/documentation than anything. Not strictly necessary to check in.
| * without {@code includeImplicit}) and implicit (reachable only through used-method types). | ||
| */ | ||
| @Nullable | ||
| private volatile FqnTrie trie; |
There was a problem hiding this comment.
Would we need to add tests here for all the patterns?
some.packagename.Identifier
*..*
some.packagename..*
some.packagename..*Identifier*
some.*.packagename.Identifier
some..packagename.Identifier
...
| import org.openrewrite.java.tree.JavaType; | ||
| import org.openrewrite.marker.SearchResult; | ||
|
|
||
| public class DeclaresMethod<P> extends JavaIsoVisitor<P> { |
There was a problem hiding this comment.
As these are not parent loaded, I think we should wrap the codechanges in try/catch NoSuchMethod blocks keeping the old code alive also for the time being?
| boolean leafExplicit; | ||
| boolean leafImplicit; | ||
| boolean aliasExplicit; | ||
| boolean aliasImplicit; | ||
| boolean descendantExplicit; | ||
| boolean descendantImplicit; |
There was a problem hiding this comment.
We could pack these 6 flags into a single byte field. When running with Java 25 and -XX:+UseCompactObjectHeaders this would with the padding result in 24-byte objects rather than 32-byte objects.
| for (int i = 0; i <= len; i++) { | ||
| if (i == len || path.charAt(i) == '.') { | ||
| // Descendant rollup propagates only for raw-leaf insertions; alias paths do | ||
| // not contribute to "this package has any class beneath it" queries. | ||
| if (!alias) { | ||
| if (explicit) { | ||
| n.descendantExplicit = true; | ||
| } else { | ||
| n.descendantImplicit = true; | ||
| } | ||
| } | ||
| n = n.findOrAddChild(path, start, i); | ||
| start = i + 1; | ||
| } | ||
| } |
There was a problem hiding this comment.
By leveraging the JVM intrinsic and SIMD-friendly indexOf() we can gain another 15%. Here is the patch for that:
diff --git a/rewrite-java/src/main/java/org/openrewrite/java/internal/TypesInUse.java b/rewrite-java/src/main/java/org/openrewrite/java/internal/TypesInUse.java
index 723084f0a4..6b085292a7 100644
--- a/rewrite-java/src/main/java/org/openrewrite/java/internal/TypesInUse.java
+++ b/rewrite-java/src/main/java/org/openrewrite/java/internal/TypesInUse.java
@@ -296,20 +296,23 @@ private void insertPath(String path, boolean explicit, boolean alias) {
Node n = root;
int start = 0;
int len = path.length();
- for (int i = 0; i <= len; i++) {
- if (i == len || path.charAt(i) == '.') {
- // Descendant rollup propagates only for raw-leaf insertions; alias paths do
- // not contribute to "this package has any class beneath it" queries.
- if (!alias) {
- if (explicit) {
- n.descendantExplicit = true;
- } else {
- n.descendantImplicit = true;
- }
+ while (true) {
+ int dot = start < len ? path.indexOf('.', start) : -1;
+ int end = dot < 0 ? len : dot;
+ // Descendant rollup propagates only for raw-leaf insertions; alias paths do
+ // not contribute to "this package has any class beneath it" queries.
+ if (!alias) {
+ if (explicit) {
+ n.descendantExplicit = true;
+ } else {
+ n.descendantImplicit = true;
}
- n = n.findOrAddChild(path, start, i);
- start = i + 1;
}
+ n = n.findOrAddChild(path, start, end);
+ if (dot < 0) {
+ break;
+ }
+ start = end + 1;
}
if (alias) {
if (explicit) {
@@ -401,14 +404,14 @@ private static boolean anyLeafFqn(Node node, StringBuilder fqn, TypeNameMatcher
Node n = root;
int start = 0;
int len = pkg.length();
- for (int i = 0; i <= len; i++) {
- if (i == len || pkg.charAt(i) == '.') {
- n = n.findChild(pkg, start, i);
- if (n == null) {
- return null;
- }
- start = i + 1;
+ while (start < len) {
+ int dot = pkg.indexOf('.', start);
+ int end = dot < 0 ? len : dot;
+ n = n.findChild(pkg, start, end);
+ if (n == null) {
+ return null;
}
+ start = end + 1;
}
return n;
}There was a problem hiding this comment.
Excellent thank you for the tip
LSTs already have a TypesInUse field which is a WeakReference'd cache of type information present in a given LST. But UsesType and UsesMethod (another very popular precondition / implementation detail) still iterate over every single type in use looking for matches. So in a large composite recipe:
thousands of recipe instance are, via UsesType / UsesMethod, iterating over lists of hundreds or thousands of types, for hundreds or thousands of source files. It explains why AddDependency dominates the scanning phase in terms of time spent in the spring boot migration, followed closely by ChangeType.
So build a lazily produced prefix tree into TypesInUse. Big savings in the asociated benchmark. In the benchmark which simulates running 100 HasType in succession this is 100x faster since the trie only gets built once.