Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize find ancestor #4294

Merged
merged 3 commits into from
Jan 29, 2024
Merged

Conversation

magicwerk
Copy link
Contributor

HasParentNode.findAncestor() uses Arrays.stream() to handle the varargs types parameters.
This may be convenient, but is both inefficient in terms of performance and memory consumption.
The proposed new implementation uses the good old iteration which improves numbers by factor 2, see JMH benchmark below.

Eval_FindAncestor.testImproved                     thrpt    2  34631766.865           ops/s
Eval_FindAncestor.testImproved:gc.alloc.rate       thrpt    2     10830.080          MB/sec
Eval_FindAncestor.testImproved:gc.alloc.rate.norm  thrpt    2       328.000            B/op
Eval_FindAncestor.testImproved:gc.count            thrpt    2        34.000          counts
Eval_FindAncestor.testImproved:gc.time             thrpt    2        24.000              ms

Eval_FindAncestor.testCurrent                      thrpt    2  18087357.962           ops/s
Eval_FindAncestor.testCurrent:gc.alloc.rate        thrpt    2      9795.026          MB/sec
Eval_FindAncestor.testCurrent:gc.alloc.rate.norm   thrpt    2       568.000            B/op
Eval_FindAncestor.testCurrent:gc.count             thrpt    2        36.000          counts
Eval_FindAncestor.testCurrent:gc.time              thrpt    2        22.000              ms

@jlerbsc
Copy link
Collaborator

jlerbsc commented Jan 28, 2024

Thank you for this suggestion. Before accepting, could you share your test application?

@magicwerk
Copy link
Contributor Author

magicwerk commented Jan 28, 2024

sure, here we go!

generally streams should be used with care as they are not only slower than the traditional alternatives, but also create temporary objects which cannot be optimized away by the JIT until now (including Java 21).

	public static class Eval_FindAncestor {

		@State(Scope.Benchmark)
		public static class CheckState {
			CompilationUnit cu = parse(
					"class Foo {\n" +
							"    void foo() {\n" +
							"        try {\n" +
							"        } catch (Exception e) {\n" +
							"        } finally {\n" +
							"            try {\n" +
							"            } catch (Exception e) {\n" +
							"                foo();\n" +
							"            } finally {\n" +
							"            }\n" +
							"        }\n" +
							"\n" +
							"    }\n" +
							"}\n");

			// find the method call expression foo()
			MethodCallExpr methodCallExpr = cu.findFirst(MethodCallExpr.class).orElse(null);
		}

		@Benchmark
		public Object testCurrent(CheckState state) {
			BlockStmt block = findAncestorCurrent(state.methodCallExpr, x -> true, BlockStmt.class).orElse(null);
			return block;
		}

		@Benchmark
		public Object testImproved(CheckState state) {
			BlockStmt block = findAncestorImproved(state.methodCallExpr, x -> true, BlockStmt.class).orElse(null);
			return block;
		}

		<N> Optional<N> findAncestorCurrent(Node node, Predicate<N> predicate, Class<N>... types) {
			if (!node.hasParentNode())
				return Optional.empty();
			Node parent = node.getParentNode().get();
			Optional<Class<N>> oType = Arrays.stream(types).filter(type -> type.isAssignableFrom(parent.getClass()) && predicate.test(type.cast(parent)))
					.findFirst();
			if (oType.isPresent()) {
				return Optional.of(oType.get().cast(parent));
			}
			return parent.findAncestor(predicate, types);
		}

		<N> Optional<N> findAncestorImproved(Node node, Predicate<N> predicate, Class<N>... types) {
			if (!node.hasParentNode())
				return Optional.empty();
			Node parent = node.getParentNode().get();
			for (Class<N> type : types) {
				if (type.isAssignableFrom(parent.getClass()) && predicate.test(type.cast(parent))) {
					return Optional.of(type.cast(parent));
				}
			}
			return parent.findAncestor(predicate, types);
		}
	}

@jlerbsc
Copy link
Collaborator

jlerbsc commented Jan 29, 2024

You highlight the result of a micro-benchmark which shows an overload linked to the use of streams. But I don't think that eliminating the use of streams in this case will significantly improve the way JP works.

Furthermore, although the results seem correct, your test case isn't because in the case of the "current" implementation the JP logic is tested twice, and in the case of the "improved" implementation the JP logic is systematically executed.

We would like to thank you for your investigative work, but we are not going to accept your proposal until we can demonstrate a significant improvement in JP's performance. In the case you highlight, the use of streams makes the processing a little more readable.

However, I'll leave this PR open if you'd like to add to your demonstration or correct your test case.

@magicwerk
Copy link
Contributor Author

First of all I have to say that I am somehow embarassed that the benchmark I sent to you is not fully correct, sorry for that.
Second it is true, that eliminating the use of streams will not signifiantly improve the way JP works.
However I think as JP is a library designed for general use, it should do its work in the most efficient work possible, e.g. reduce the allocation load and the pressure to the GC the best it can - which is demonstrated by the benchmark.
If I use the JP to analyze a huge amount of classes, I can see quite some pressure on GC and the proposed change would help alleviate this a little bit.

You can find the corrected benchmark appended together with the updated performance figures. As I did not fully understand all your comments, I also added testCurrentJavaparser to prove that findAncestorCurrent and the implementation actually in the library behave identical.

After the correction, the performance and allocation load are now improved even more compared to the current implementation, now by factor 4. Of course these numbers depend on the code snippet used etc, but it nevertheless shows that an improvement is possible by changing a single line of code.
You can find a lot of discussions about how streams should be used, but it seems to be generally accepted, that they should not be used in constructs like tight loops.
I think it will be hard to come up with a demonstration which in not a micro benchmark, so I ask you to review the corrected test case again.
As user of JP, I would definitely prefer the implementation which excels in performance and memory consumption, even if the internals could end up a little less readable.

Eval_FindAncestor.testCurrent                               thrpt    2  10152629.747           ops/s
Eval_FindAncestor.testCurrent:gc.alloc.rate                 thrpt    2      5497.955          MB/sec
Eval_FindAncestor.testCurrent:gc.alloc.rate.norm            thrpt    2       568.000            B/op
Eval_FindAncestor.testCurrent:gc.count                      thrpt    2        25.000          counts
Eval_FindAncestor.testCurrent:gc.time                       thrpt    2        12.000              ms

Eval_FindAncestor.testCurrentJavaparser                     thrpt    2  10117285.996           ops/s
Eval_FindAncestor.testCurrentJavaparser:gc.alloc.rate       thrpt    2      5478.827          MB/sec
Eval_FindAncestor.testCurrentJavaparser:gc.alloc.rate.norm  thrpt    2       568.000            B/op
Eval_FindAncestor.testCurrentJavaparser:gc.count            thrpt    2        25.000          counts
Eval_FindAncestor.testCurrentJavaparser:gc.time             thrpt    2        12.000              ms

Eval_FindAncestor.testImproved                              thrpt    2  49480564.416           ops/s
Eval_FindAncestor.testImproved:gc.alloc.rate                thrpt    2      4906.273          MB/sec
Eval_FindAncestor.testImproved:gc.alloc.rate.norm           thrpt    2       104.000            B/op
Eval_FindAncestor.testImproved:gc.count                     thrpt    2        22.000          counts
Eval_FindAncestor.testImproved:gc.time                      thrpt    2        10.000              ms
	public static class Eval_FindAncestor {

		@State(Scope.Benchmark)
		public static class CheckState {
			CompilationUnit cu = parse(
					"class Foo {\n" +
							"    void foo() {\n" +
							"        try {\n" +
							"        } catch (Exception e) {\n" +
							"        } finally {\n" +
							"            try {\n" +
							"            } catch (Exception e) {\n" +
							"                foo();\n" +
							"            } finally {\n" +
							"            }\n" +
							"        }\n" +
							"\n" +
							"    }\n" +
							"}\n");

			// find the method call expression foo()
			MethodCallExpr methodCallExpr = cu.findFirst(MethodCallExpr.class).orElse(null);
		}

		@Benchmark
		public Object testCurrentJavaparser(CheckState state) {
			BlockStmt block = state.methodCallExpr.findAncestor(x -> true, BlockStmt.class).orElse(null);
			return block;
		}

		@Benchmark
		public Object testCurrent(CheckState state) {
			BlockStmt block = findAncestorCurrent(state.methodCallExpr, x -> true, BlockStmt.class).orElse(null);
			return block;
		}

		@Benchmark
		public Object testImproved(CheckState state) {
			BlockStmt block = findAncestorImproved(state.methodCallExpr, x -> true, BlockStmt.class).orElse(null);
			return block;
		}

		<N> Optional<N> findAncestorCurrent(Node node, Predicate<N> predicate, Class<N>... types) {
			if (!node.hasParentNode())
				return Optional.empty();
			Node parent = node.getParentNode().get();
			Optional<Class<N>> oType = Arrays.stream(types).filter(type -> type.isAssignableFrom(parent.getClass()) && predicate.test(type.cast(parent)))
					.findFirst();
			if (oType.isPresent()) {
				return Optional.of(oType.get().cast(parent));
			}
			return findAncestorCurrent(parent, predicate, types);
		}

		<N> Optional<N> findAncestorImproved(Node node, Predicate<N> predicate, Class<N>... types) {
			if (!node.hasParentNode())
				return Optional.empty();
			Node parent = node.getParentNode().get();
			for (Class<N> type : types) {
				if (type.isAssignableFrom(parent.getClass()) && predicate.test(type.cast(parent))) {
					return Optional.of(type.cast(parent));
				}
			}
			return findAncestorImproved(parent, predicate, types);
		}
	}

@jlerbsc
Copy link
Collaborator

jlerbsc commented Jan 29, 2024

Your test case is still not correct. It seems to me that this is what you want to test on the current implementation.

	<N> Optional<N> findAncestorCurrent(Node node, Predicate<N> predicate, Class<N>... types) {
		return node.findAncestor(predicate, types);
	}

We are ready to improve the operation of JP as soon as it makes sense and the improvement is clearly perceptible. In the case you present, the pressure on memory is marginal (gc.count and gc.time), and the memory allocation rate is not a determining factor in the choice of optimisation either. All that's left is the rate of operations per second, which is much higher in the version you're proposing than in the current implementation.

This is probably due to the use of streams. But once again the micro-benchmark raises a problem of consistency because this method is rarely used in loops. Furthermore, when used in a loop, the jit compiler improves the efficiency of the method from the second iteration onwards.

My feeling is that this improvement is very marginal and will not bring any visible results to JP users. If you want to improve JP, I suggest you use a profiler on a concrete case and identify hotspots (memory, cpu, etc.). We can then see how to improve JP's behaviour in these use cases.

@magicwerk
Copy link
Contributor Author

IMHO the test case is correct now: testCurrentJavaparser() tests the current implementation contained in the JP library, testCurrent() tests the same code but copied into the test code as findAncestorCurrent() so it can easily be compared with findAncestorImproved().

As said, it will be hard to come up with a more meaningful benchmark.
JIT is doing a lot of awful optimizations but even with Java 21 it cannot optimize the overhead of streams away.
The micro benchmark shows that performance will be better and less memory allocated with the change proposed.
I agree that the improvement will be marginal and hardly noticeable except if the method is used heavily.
On the other hand, the improvement can be realized by changing 3 lines of code and has no negative effect.

Finally it's up to you to make the decision, so feel free to close the PR.

@jlerbsc
Copy link
Collaborator

jlerbsc commented Jan 29, 2024

Thank you for your proposal and the time you've devoted to it, but as I've already told you, given that the improvement is marginal, I prefer to focus on visibility rather than the performance of an algorithm.

@jlerbsc jlerbsc closed this Jan 29, 2024
@jlerbsc jlerbsc reopened this Jan 29, 2024
@jlerbsc
Copy link
Collaborator

jlerbsc commented Jan 29, 2024

Finally, I'm reconsidering my initial position, even if the improvement is marginal. In fact, the current code is no more readable than the one in your proposal, so we can accept this one. Thank you for your contribution.

@jlerbsc jlerbsc merged commit ae0a80e into javaparser:master Jan 29, 2024
37 of 38 checks passed
@jlerbsc jlerbsc added this to the next release milestone Jan 29, 2024
@jlerbsc jlerbsc added the PR: Changed A PR that changes implementation without changing behaviour (e.g. performance) label Jan 29, 2024
@magicwerk
Copy link
Contributor Author

magicwerk commented Jan 29, 2024

that was now really an unexpected change after the discussion, but thanks for accepting.

@jlerbsc
Copy link
Collaborator

jlerbsc commented Jan 29, 2024

I simply reread the current code and it didn't seem any easier to read than your proposal. I'm just waiting to be convinced by the new proposals. Yours was just on the edge. Thank you for your insistence, which has enabled me to challenge my position.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: Changed A PR that changes implementation without changing behaviour (e.g. performance)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants