New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Unsafe and merykitty's branchless parser #194
Conversation
Cool, I think this is getting close to optimal! Now soon what remains is reading assembly and compiler graphs ;-). Did you check out if building an "-O3 -march" native image still provides only the same performance? I might check out your branch and test with different native image options tomorrow. |
TreeMap<String, Measurement> results = IntStream.range(0, chunks.length - 1) | ||
.mapToObj(chunkIndex -> process(chunks[chunkIndex], chunks[chunkIndex + 1])).parallel() | ||
.flatMap(MeasurementRepository::get) | ||
.collect(Collectors.toMap(e -> e.city, MeasurementRepository.Entry::measurement, Measurement::updateWith, TreeMap::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambdas are not super good for startup, but maybe with CDS isn't like that! Anonymous classes tends to have more predictable costs (lower usually)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And that loading is single-threaded. This is why it should be a bigger benefit in terms of percentage of execution time on a stronger machine to create a native image. Overall, the loaded and compiled code is still relatively minimal compared to what is loaded in your typical Spring / Quarkus / Micronaut application ;-).
index = (index + 1) & tableMask; | ||
} | ||
|
||
if (tableEntry != null) { | ||
return tableEntry.measurement; | ||
tableEntry.measurement.updateWith(temperature); | ||
return; | ||
} | ||
|
||
// --- This is a brand new entry, insert into the hashtable and do the extra calculations (once!) do slower calculations here. | ||
Measurement measurement = new Measurement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move slow path code out of the hot path, to be sure the hot will be inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the measurement with the new temperature, always. I'm not sure what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part related the brand new entry, I didn't checked if goes beyond the inlining bytecode threshold, so If you move it in a separate method you are sure the hot path to be inlined
@thomaswue I've checked, on my machine your code is also slightly faster running in JVM mode; could be different on the target machine of course. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once comment on CDS inline. Holding back with updating the result until this has been sorted out, but this looks very good!
calculate_average_royvanrijn.sh
Outdated
@@ -19,9 +19,9 @@ source "$HOME/.sdkman/bin/sdkman-init.sh" | |||
sdk use java 21.0.1-graal 1>&2 | |||
# Added for fun, doesn't seem to be making a difference... | |||
if [ -f "target/calculate_average_royvanrijn.jsa" ]; then | |||
JAVA_OPTS="-XX:SharedArchiveFile=target/calculate_average_royvanrijn.jsa -Xshare:on" | |||
JAVA_OPTS="--enable-preview -XX:SharedArchiveFile=target/calculate_average_royvanrijn.jsa -Xshare:on" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the CDS stuff, I'm just seeing this now, and I think it should be moved to a separate script, akin to @thomaswue's script for building the native image. The reasoning being that there should be no artifacts passed between runs (where would we draw the line, is for instance recording statistics fine, just caching the entire result, etc.). I don't think it makes a real difference as only a very small amount of custom classes is added to what's in the distro's static CDS archive, but it should be separated as a matter of principle.
|
||
public String toString() { | ||
return round(min) + "/" + round((1.0 * sum) / count) + "/" + round(max); | ||
private static Unsafe initUnsafe() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can just export jdk.internal.misc.Unsafe
to your application in the execution script and do getUnsafe()
.
Discussed with @royvanrijn, I'll remove the dynamic App CDS archive creation and evaluate this submission after that. |
@royvanrijn, you're on #1 with 00:06.159! Seeing this kind of improvement at this stage is amazing. There are some discussions still in this PR, so might worth addressing them and see what you can gain. @merykitty, you also might consider running on Graal, it kinda consistently made a nice improvement "for free" for all the top contenders. |
@royvanrijn Great job! I think looking at compiler graphs is next for you ;-). @merykitty Graal might indeed help your solution, but only if you are using the plain old unsafe instead of the Foreign Function & Memory API. This API is still in incubator phase and it seems like we are missing some intrinsification optimizations for it. We are investigating this already. |
@thomaswue Indeed, using Graal brings huge regression that I suspect might be related to the intrinsification of Vector API or FFM. @gunnarmorling Thanks a lot for your suggestions, currently Roy's and Thomas's solutions outpace mine with OpenJDK so I will try to fix that first. The glaring problem is that my hash function is really bad which is definitely worth improving. |
Check List:
./test.sh <username>
)0m1.897s
3m47.094s