Initial version for thomaswue with Oracle GraalVM Native Image #70

thomaswue · 2024-01-04T01:00:58Z

The script "additional_build_step_thomaswue.sh" will generate the image as "image_calculateaverage_thomaswue". The script "calculate_average_thomaswue.sh" will either execute the native image if the file exists or otherwise run the program in JVM mode with the Graal JIT compiler.

The program finishes on my system in 1.67s (total CPU user time is 42.6s). It is using an Intel 13th Gen Core i9-13900K processor.

Update: Thanks to tuning from @mukel using sun.misc.Unsafe to directly access the mapped memory, it is now down to 1.28s (total CPU user time 32.2s) on my machine. Also, instead of PGO, this is now just using a single native image build run with tuning flags "-O3" and "-march=native".

lobaorn · 2024-01-04T01:17:35Z

I was thinking in that direction in the next couple of days to leverage that from the 5 runs, since the fastest and lowest are discarded, and use native-image. But my main interest was to see how everything compares with the "best" of the others, since you have already done it @thomaswue I will think again on how to do it, haha. And thank you for bringing this approach :)

Guess I will wait a little more and if by the end of the deadline no one tried with Lilliput or Valhalla builds, I will try to mix and converge some approaches that would best work with those ;)

thomaswue · 2024-01-04T01:37:57Z

You are certainly very welcome to copy the GraalVM native image generation scripts and see if it helps also with your solution! Key is to use profile-guided optimizations and to have an already optimized solution that does not run for too long. I don't think Lilliput or Valhalla (or Loom) can be beneficial for this benchmark.

lobaorn · 2024-01-04T01:51:20Z

I agree that by the constraints specifically, they would not help on being top of the leaderboard. It is more in the sense of comparing the performance or profiling using those builds, against a "default" JDK build, it falls more on the experimentation side of things, that is why I will probably wait to see the approaches emerging. It has been lots of fun. We can just think "How about if..." and in a couple of hours there is a good chance someone will submit that idea, even if no one said or wrote anything aloud. So right now my idea is to think some more "How about if..." and pick the top implementations and mix-and-match some JDK builds and configs, and then mix-and-match the implementations themselves and see what happens. But more and more I think someone will even do that idea I am talking about now, of this mix-and-match. Probably @gunnarmorling is the one having more fun than all by seeing so many ideas arise.

mariusstaicu · 2024-01-04T10:23:54Z

Was thinking of trying this too combined with one of the top implementations.
Also, would be a lot of fun comparing same algo with different jdks with or without native image.

thomaswue · 2024-01-04T13:17:34Z

Agreed. And there possibly also different input sizes to more isolated show the startup vs long term peak performance characteristics.

gunnarmorling · 2024-01-04T18:27:12Z

calculate_average_thomaswue.sh

+        if [ -f ./profile_thomaswue.iprof ]; then
+            echo 'Picking up profiling information from thomaswue.iprof.'
+        else
+            echo 'Could not find profiling information, therefore it will be now regenerated.'


Hey @thomaswue, thanks a lot for this, very interesting to see PGO!

In its current shape, I think though that it goes somewhat against the spirit of this challenge, specifically "Implementations must not rely on specifics of a given data set". Which I think we kinda do when leveraging profile data from the first run during following ones. Admittedly, the wording could be more precise, but the general spirit is that runs shouldn't benefit from outcomes from previous runs (as otherwise, when taken to the extreme, one could calculate results once store them in a file and then simply load the content of that file in the subsequent runs).

What would be acceptable to me though is creating the profile data at build time, not relying on the specific data set used during evaluation (similar to how creating a static CDS archive is in bounds). Would that be an option?

Sure, absolutely. The profiling is only used to automate compiler tuning knowledge one would otherwise need to manually specify (e.g. "inline a lot in this method"). I can for example train on the existing test data files or check in a slightly larger test file with maybe 1k lines?

Another thing maybe in the spirit of the competition to generate interesting insights could be to run that same algorithm in 3 different ways - i.e. JIT, AOT, AOT+PGO.

The profiling is only used to automate compiler tuning knowledge one would otherwise need to manually specify (e.g. "inline a lot in this method")

Yepp, that's what I kinda thought.

check in a slightly larger test file with maybe 1k lines?

Yeah, if you could add a 1K file and the expected output (as obtained via calculate_average_baseline.sh) to src/test/resources that would be perfect. It would extend the tests and you can use it for "training". Can the PGO data be generated via Maven? If so, that'd be best, but it's not a strict requirement. A one-off script to run before the evaluation would be ok, too.

Another thing maybe in the spirit of the competition to generate interesting insights could be to run that same algorithm in 3 different ways - i.e. JIT, AOT, AOT+PGO.

Agreed. Thing is, I'm really overwhelmed by the number of submissions and I won't have any capacity for further increasing scope of this right now. But you're more than welcome to do this experiment and publish any interesting insights on your own end.

OK, sounds good. As a quick fix, I have turned off PGO by default and instead added global tuning flags in a commit added to this PR. Maybe approaches that require running the application before should be kept separate in terms of evaluation anyway.

Agreed that running with different configs probably increases the scope too much. It could be interesting to do this after 31st of Jan for a few of the submissions.

Let me know if we can assist you with some of the work. Obviously a downside of the success of the project to have so much attention ;-).

Ok, excellent, will evaluate this one tomorrow (need to install the native image tool first). In the mean time, could you make sure that running this one shows no differences to the expected output:

./test.sh thomaswue

We've added this one just earlier today so as to have at least some basic assurance of correctness for implementations. Thanks again!

Yeah, I can see that angle, OTOH, it's super-hard to impossible to objectively rate what's "most idiomatic". Probably something better suited for a blog post or a talk where one can explore these nuances. That said, I think there's somewhere an interesting threshold of how far comparatively "standard" Java solutions go up on the leaderboard (quite far in fact) before the more extreme solutions kick in. I.e. "idiomatic" gets you surprisingly far.

The script will generate on first invocation the image with some extra output though. The image file would have to be cleaned up when updating the code (e.g., after the evaluation). Let me know if this is appropriate or there should be a different way to integrate this into the build process.

@thomaswue, so IIUC, this still does the profiling with the actual dataset in the first run, right? Didn't you mean to provide a separate dataset file and use this one when creating the native image? I.e. the build (with profiling) should be clearly be separated from the evaluation run. Can we extract the part which builds the native image (including PGO) to a separate script and invoke this one to build the application (instead of the usual mvn verify)?

So the flow would become this:

Build:

ln -s src/test/data/yourprofilingdataset.txt measurements.txt ./your_native_build_script.sh # builds the native binary and PGO data

Evaluation (i.e. what I am doing):

rm -f measurements.txt ln -s measurements_1B.txt measurements.txt for i in {1..5} do ./calculate_average_thomaswue.sh # just the actual launch command, no build steps done .

The use of PGO is disabled (just had the option to enable via environment variable). To avoid confusion and make this simpler, I deleted the code for optionally using PGO for now. There is an "additional_build_step_thomaswue.sh" that builds the image. The "calculcate_average_thomaswue.sh" script runs in JVM mode if the file is not there and otherwise picks up the generated image. Let me know if this works better for you.

I have also updated the PR description to clarify the new behavior and that PGO is not in use for this version.

Yepp, this LGTM now, thanks!

variable to be set. Add -O3 -march=native tuning flags for better performance.

mmap the entire file, use Unsafe directly instead of ByteBuffer, avoid byte[] copies. These tricks give a ~30% speedup, over an already fast implementation.

Contribution by mukel to tune thomaswue submission.

thomaswue · 2024-01-05T13:12:16Z

@gunnarmorling This PR is now slightly updated with tuning from @mukel to use sun.misc.Unsafe for accessing the NIO direct byte buffer (which I believe is within the boundaries of the rules). It gains ~30% and is now down to 1.28s (32s CPU user) on my machine from 1.67s (42s CPU user) before. Still passes the test script.

simplify run script and remove PGO option for now.

gunnarmorling · 2024-01-06T09:54:03Z

@thomaswue, very cool. Coming in at 9.625sec on the (8 core) eval machine, i.e. 2nd place! Would love to see the PGO version as a follow-up, created with that additional build script you've provided. Thanks for participating!

Btw. I was very impressed by the low variance of the results:

0:9.626
0:9.612
0:9.626
0:9.627
0:9.622

gunnarmorling · 2024-01-06T09:55:26Z

Squashed and merged. Thx!

gunnarmorling · 2024-01-06T09:59:06Z

Maybe approaches that require running the application before should be kept separate in terms of evaluation anyway.

I have added a "Note" column to the leaderboard, stating for this entry that this is a GraalVM native binary.

Agreed that running with different configs probably increases the scope too much. It could be interesting to do this after 31st of Jan for a few of the submissions.

Let me know if we can assist you with some of the work. Obviously a downside of the success of the project to have so much attention ;-).

Thanks :) I might take you up on this at some point. I think once the dust has settled a bit, we can explore how to make use of this set-up for running all different kinds of comparisons.

thomaswue · 2024-01-06T17:05:44Z

Cool, thank you for merging and evaluating!

Yes, making comparisons for a few solutions with different run options (and maybe also input sizes or target hardware) is certainly something I would be interested in doing.

Initial version.

eebd8c5

gunnarmorling reviewed Jan 4, 2024

View reviewed changes

thomaswue and others added 8 commits January 4, 2024 20:28

Make PGO feature optional off-by-default. Needs PGO_MODE environment

b650967

variable to be set. Add -O3 -march=native tuning flags for better performance.

Merge branch 'gunnarmorling:main' into main

cd0e144

Adjust script to be more quiet.

06974f4

Adjust max city length. Fix an issue when accumulating results.

12087cc

Tune thomaswue submission.

137be04

mmap the entire file, use Unsafe directly instead of ByteBuffer, avoid byte[] copies. These tricks give a ~30% speedup, over an already fast implementation.

Merge pull request #1 from mukel/main

10cad28

Contribution by mukel to tune thomaswue submission.

Optimize parsing of numbers based on specific given constraints.

e67870e

Fix for segment calculation for case of very small input.

495f0b0

thomaswue added 6 commits January 5, 2024 17:42

Minor shell script fixes.

e050953

Merge branch 'gunnarmorling:main' into main

f2c62c3

Merge branch 'gunnarmorling:main' into main

4e39032

Merge branch 'gunnarmorling:main' into main

7de846d

Separate out build step into file additional_build_step_thomaswue.sh,

6cd3206

simplify run script and remove PGO option for now.

Minor corrections to the run script.

c67abfc

gunnarmorling merged commit a53aa2e into gunnarmorling:main Jan 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial version for thomaswue with Oracle GraalVM Native Image #70

Initial version for thomaswue with Oracle GraalVM Native Image #70

thomaswue commented Jan 4, 2024 •

edited

lobaorn commented Jan 4, 2024

thomaswue commented Jan 4, 2024

lobaorn commented Jan 4, 2024

mariusstaicu commented Jan 4, 2024

thomaswue commented Jan 4, 2024

gunnarmorling Jan 4, 2024

thomaswue Jan 4, 2024

gunnarmorling Jan 4, 2024

thomaswue Jan 4, 2024

gunnarmorling Jan 4, 2024

gunnarmorling Jan 5, 2024

gunnarmorling Jan 5, 2024

thomaswue Jan 6, 2024 •

edited

thomaswue Jan 6, 2024

gunnarmorling Jan 6, 2024

thomaswue commented Jan 5, 2024 •

edited

gunnarmorling commented Jan 6, 2024 •

edited

gunnarmorling commented Jan 6, 2024

gunnarmorling commented Jan 6, 2024

thomaswue commented Jan 6, 2024

Initial version for thomaswue with Oracle GraalVM Native Image #70

Initial version for thomaswue with Oracle GraalVM Native Image #70

Conversation

thomaswue commented Jan 4, 2024 • edited

lobaorn commented Jan 4, 2024

thomaswue commented Jan 4, 2024

lobaorn commented Jan 4, 2024

mariusstaicu commented Jan 4, 2024

thomaswue commented Jan 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaswue Jan 6, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaswue commented Jan 5, 2024 • edited

gunnarmorling commented Jan 6, 2024 • edited

gunnarmorling commented Jan 6, 2024

gunnarmorling commented Jan 6, 2024

thomaswue commented Jan 6, 2024

thomaswue commented Jan 4, 2024 •

edited

thomaswue Jan 6, 2024 •

edited

thomaswue commented Jan 5, 2024 •

edited

gunnarmorling commented Jan 6, 2024 •

edited