Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of output executable #287

Open
ianopolous opened this issue Jan 20, 2018 · 47 comments
Open

Reduce size of output executable #287

ianopolous opened this issue Jan 20, 2018 · 47 comments

Comments

@ianopolous
Copy link

Compiling hello world with substrate vm on ubuntu results in a 6.1 MiB executable. Is it possible to reduce this? The equivalent in golang is 1.6 MiB or < 1 MiB without debug information.

@vjovanov vjovanov self-assigned this Jan 25, 2018
@vjovanov
Copy link
Member

True, I have evaluated the image size and even for an empty main program we get ~5MB of an image. There are a few reasons for that:

  1. In our features we use JDK code that has non-negligable footprint. To see all kinds of things that get pulled in you can add -H:+PrintUniverse to the image build.
  2. Some of our features are included into the image although they are never used in the code.
  3. The points-to analysis is imprecise and sometimes catches elements that are never used.

On the bright side, if you include much of your code the 5MB overhead will remain the same. So this is an issue only for very small images.
This is a great issue. If you have a need for small images in your use-case, please mention it here and we will raise the priority.

@vjovanov
Copy link
Member

vjovanov commented Jan 30, 2018

unused-pkgs-hw.txt
unused-classes-hw.txt
unused-methods-hw.txt

These are the packages, classes, and methods that are never invoked. They can use as an indicator for elements that should not be in the image. Some things like the heap package must be included into the image, although for this particular program they are never used.

@pejovica thanks for the data.

@ianopolous
Copy link
Author

The general use-case is to remove a common argument for people to use Go-lang. One specific use-case that this would severely impact is something like implementing many small command line utilities as in Linux.

Does that 5 MiB include the GC? At least for simple things like helloworld you can prove you don't need a GC.

@vjovanov
Copy link
Member

vjovanov commented Jan 30, 2018

It does, but by looking at the list of included elements, I would not say that GC is the biggest problem. I would rather invest that time to remove things that should not be there by any means. For example, org.graalvm.compiler.truffle, java.util.zip, java.util.regex, java.util.Calendar.

By removing these I am confident that we can reach the size of the GOs "Hello, Word!". At one point we removed all methods that were never executed and the image size was 400 KB. This is the lower bound of course, but could be used as a guideline of what we should reach.

@ianopolous
Copy link
Author

Thanks @vjovanov, that would be amazing!

@CremboC
Copy link

CremboC commented Apr 30, 2018

You can also use https://upx.github.io/ as a temporary solution to make compressed binaries. Reduces the size by a lot in my experience.

@miere
Copy link

miere commented Apr 11, 2019

Any thoughts on this, guys?

I'm targeting Graalvm as the (probably/hopefully) the solution for the long cold-starts in AWS Lambda functions written in Java. Smaller binaries would make our deployments faster. Also, AWS has some limits on deployment size, I'm afraid that binaries would become too big if we have multiple dependencies in our project - which is usually the case when using AWS SDK.

I think that's a game changer functionality that would make JVM more attractive to the community, especially those who have been flirting with Go and Rust as an alternative.

@SchulteMarkus
Copy link

(not issue relevant) @miere , already discovered https://quad.team/blog/Micronaut-to-AWS-Lamda-guide ?

@cosmicdan
Copy link

cosmicdan commented Nov 22, 2019

Any thoughts on this, guys?

I'm targeting Graalvm as the (probably/hopefully) the solution for the long cold-starts in AWS Lambda functions written in Java. Smaller binaries would make our deployments faster. Also, AWS has some limits on deployment size, I'm afraid that binaries would become too big if we have multiple dependencies in our project - which is usually the case when using AWS SDK.

I think that's a game changer functionality that would make JVM more attractive to the community, especially those who have been flirting with Go and Rust as an alternative.

Now that we have GraalVM building against JDK 11, it's only a matter of time until the native compiler can work with the new modularity. I doubt file sizes will ever be improved on JDK 8 though since the class library was very.... let's say "monolithic" before the Project Jigsaw refactor.

So until those native compiler improvements, I suggest updating JDK 8 projects to JDK 11 and making them modular in preparation for that :D

Also, see what CremboC said - UPX is pretty good. ~11MB exe down to ~3MB.

@nithyasharabheshwara
Copy link

nithyasharabheshwara commented May 20, 2020

@thomaswue @vjovanov
I would like to share an approach I took five years ago (2015) and made custom jvms which were extremely small (A JavaFX UI application with runtime totally to only 5MB (after zipping)).

I used the following to achieve this result

  • javafx native packing tool
  • spyfs
  • Xbootclasspath flag

Steps

  1. So what I did was, I extracted all runtime/bootstrap classes/jars in a single folder. Not just rt.jar, anything which is used. This was my custom bootstrap classes folder.
  2. I packaged my application using javafx native packing tool
  3. I replaced some setting in this, using Xbootclasspath flag so that it picked up classes from the custom bootstrap classes folder.
  4. I made a virtual clone of this using spyfs.
  5. I ran the application on this virtual clone.
  6. SpyFS detected which classes were actually loaded and saved this information.
  7. SpyFS copied only the classes which were actually loaded into a third folder - application output folder.
  8. The logic which I used was - Case 1: if a classfile was only visited (touched) and not read, the class file would be copied but it's size would be zero. Case 2 : If a class file was read, even one byte, the entire class would be copied. Case 3 : If a class was neither touched for read, it will not be copied. Case 4: For native libraries, anything which was loaded was copied to the destination.
  9. This application folder had only the javafx ui app itself and only those bootstrap classes which were actually used. I test it, and it ran successfully. I zipped it, and found the size was as small as 5MB.

Back in 2015 I shared this idea with RoboVM guys. Here is the link to the discussion
https://groups.google.com/d/msg/robovm/-LEeLkGJodA/qFGwVfKQm3QJ
Niklas Therning (founder of robovm) had found this interesting and had said,

Interesting approach! :-) We're working on improving the stripping done by RoboVM to reduce file sizes. Recording which classes are actually used at runtime is something we could do easily by patching RoboVM slightly. We're currently looking into an approach which is much less aggressive, using static analyses. One nice advantage with the dynamic approach is no special handling is required for classes loaded via reflection. Maybe we could use this for generating forceLinkClasses patterns automatically for users. The drawback is of course that you have to make sure you touch all codepaths of your app when recording.

Thanks for the info and links. We'll see where we end up eventually...

However, soon after the company was sold and then came Xamarin.

Much later, GluonVM picked it up, and then later Gluon dropped it own VM and started using GraalVM and only very recently it has started giving tools to create GraalVM powered binaries which even an average developer like me can use to build and run my javafx applications on mobiles (android, iphone) and desktop, everywhere.

So I felt it is time I could raise this matter again.
And as already pointed out, such approaches would make GraalVm extremely competitive compared to Go-lang etc. also.

To be honest, I don't know how much optimization has been already implemented and put in place in GraalVM. GraalVM is amazing no doubt and performance difference is clearly felt from end user experience point of view, no doubt.

I might be over expecting, but I feel, if this size issue/feature is cracked, GraalVM can replace every language/platform/runtime in the world, as the first default choice.

So to give a summary, the idea/suggestion is

  • Apart from static analysis, (optionally) recording which classes are actually used at runtime, both for the runtime (jvm) bootstrap and the application.
  • Keep only the classes which are actually used, remove classes which were never used both from bootstrap and the end application.

Please let me know your thoughts.

Thank you

BTW to additionally mention, I had packaged youtube-dl a python app, with a full python runtime environment (stripped) not more than 3MB (after compression).

@johanvos
Copy link
Contributor

johanvos commented Jun 4, 2020

That's an interesting comment. I never used SpyFS but maybe it can help here.
Does that work on class level, or method level?

Getting a JavaFX app under 5 MB sounds very challenging. Did that include the native libraries (e.g. libglass, libprism_es2 etc?).

@vjovanov
Copy link
Member

vjovanov commented Jun 4, 2020

It is possible we make a 400 kB "Hello, World!" (@pejovica did this). But this code is completely unsafe and insecure and can lead to segfaults. This could be made as an experimental feature with a strong emphasis on experimental (use at your own risk). For making it a feature, we would need a very strong use-case.

@nithyasharabheshwara
Copy link

That's an interesting comment. I never used SpyFS but maybe it can help here.
Does that work on class level, or method level?

Getting a JavaFX app under 5 MB sounds very challenging. Did that include the native libraries (e.g. libglass, libprism_es2 etc?).

Hey sorry, my apologies, I didn't notice your question.
So SpyFS neither works at the class level nor at the method level. It works at the filesystem level. All the runtime and bootstrap classes are extracted in a folder and this custom bootstrap class bundle is used instead of the default java runtime classes using the Xbootclasspath flag. This folder is spied by SpyFS and it knows exactly which classes were actually read (opened and >0 bytes read), which were accessed (opened but zero bytes read) and which class files were not opened at all.

Then SpyFS data is used to make a duplicate of this custom bootstrap class bundle in another folder. All the classes which were read ( > 0 bytes) and copied completely, all class files which were opened but not read (total read bytes = 0) are copied like dummy class files of zero size, all class files which were neither read nor opened are not copied. This basically forms the stripped-down runtime bootstrap class bundle for that particular application. It tried it like 5 years ago, and haven't had the opportunity to replicate it, however. The old 2015 example I am not able to run anyway, so probably some native libraries I am guessing it must have been pulling out from somewhere.

Now to answer the question regarding the native libraries (e.g. libglass, libprism_es2 etc?), yes it included all of them. During the runtime which libraries are actually loaded and used was separately analyzed and all those libraries were copied and used.

I hope I was able to explain the approach. It was a very raw method I can say. Because I had made my own kernel filesystem library (binding) in java, I was able to get this done easily.

@strogiyotec
Copy link

strogiyotec commented Jun 27, 2020

Hey, same problem here, I'm working on small CLI app , the only dependency I have is Jline3 but the final executable weights 14 MB, how could I decrease the size ? (The same app in Golang takes 3 MB). I use Java 11

openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02, mixed mode, sharing)

@jrudolph
Copy link

jrudolph commented Aug 15, 2020

I had a look into the size of the generated binary for a hello world main with objdump -x:

`objdump -x` output
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       0000001c  00000000000002a8  00000000000002a8  000002a8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.gnu.build-id 00000024  00000000000002c4  00000000000002c4  000002c4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.ABI-tag 00000020  00000000000002e8  00000000000002e8  000002e8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     000001c0  0000000000000308  0000000000000308  00000308  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynsym       00000de0  00000000000004c8  00000000000004c8  000004c8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       00000e8a  00000000000012a8  00000000000012a8  000012a8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version  00000128  0000000000002132  0000000000002132  00002132  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version_r 000000e0  0000000000002260  0000000000002260  00002260  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rela.dyn     0001ebd0  0000000000002340  0000000000002340  00002340  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .rela.plt     000005a0  0000000000020f10  0000000000020f10  00020f10  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .init         0000001b  0000000000022000  0000000000022000  00022000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .plt          000003d0  0000000000022020  0000000000022020  00022020  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .plt.got      00000008  00000000000223f0  00000000000223f0  000223f0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .text         002c06a3  0000000000023000  0000000000023000  00023000  2**12
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .fini         0000000d  00000000002e36a4  00000000002e36a4  002e36a4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 15 .rodata       000095a3  00000000002e4000  00000000002e4000  002e4000  2**12
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .svm_heap     0038c9c0  00000000002ee000  00000000002ee000  002ee000  2**12
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 17 .eh_frame_hdr 0000027c  000000000067a9c0  000000000067a9c0  0067a9c0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 18 .eh_frame     00000c50  000000000067ac40  000000000067ac40  0067ac40  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 19 .init_array   00000010  000000000067cb88  000000000067cb88  0067bb88  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 20 .fini_array   00000008  000000000067cb98  000000000067cb98  0067bb98  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 21 .dynamic      00000230  000000000067cba0  000000000067cba0  0067bba0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 22 .got          00000230  000000000067cdd0  000000000067cdd0  0067bdd0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 23 .data         000019ec  000000000067d000  000000000067d000  0067c000  2**12
                  CONTENTS, ALLOC, LOAD, DATA
 24 .bss          00000188  000000000067e9f0  000000000067e9f0  0067d9ec  2**3
                  ALLOC
 25 .comment      00000046  0000000000000000  0000000000000000  0067d9ec  2**0
                  CONTENTS, READONLY
 26 .debug_aranges 000002b0  0000000000000000  0000000000000000  0067da32  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 27 .debug_info   0000501a  0000000000000000  0000000000000000  0067dce2  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 28 .debug_abbrev 00000647  0000000000000000  0000000000000000  00682cfc  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 29 .debug_line   000008ad  0000000000000000  0000000000000000  00683343  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 30 .debug_str    00002c1d  0000000000000000  0000000000000000  00683bf0  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 31 .debug_loc    00001eb6  0000000000000000  0000000000000000  0068680d  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 32 .debug_ranges 00000350  0000000000000000  0000000000000000  006886c3  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS

The full binary has 6866528 bytes. The biggest contributors to that size are the .text section with the compiled code of 2885283 bytes (42%) and the .svm_heap section with 3721664 bytes (54%).

@vjovanov already commented about the size of the unused code that was included. However, since the initial native heap seems to be even quite a bit bigger than that, it would be interesting to understand why that is the case and what's in there.

-H:+PrintHeapHistogram will print a histogram of the data in the heap:

abridged `-H:+PrintHeapHistogram` output
=== Summary ===
DynamicHub; 5821; 487376
ImageCodeInfo; 10; 868104
Other; 47113; 2362952
Total; 52944; 3718432

[switched sections around]

=== DynamicHub ===
   Count     Size   Size%    Cum% Class
    1455   314968  64.63%  64.63% java.lang.Class
    1456    87312  17.91%  82.54% byte[]
    1455    46560   9.55%  92.09% java.lang.String
    1455    38536   7.91% 100.00% int[]

=== ImageCodeInfo ===
   Count     Size   Size%    Cum% Class
       5   837632  96.49%  96.49% byte[]
       1    22064   2.54%  99.03% java.lang.String[]
       1     8240   0.95%  99.98% java.lang.Class[]
       1      112   0.01%  99.99% com.oracle.svm.core.code.ImageCodeInfo
       2       56   0.01% 100.00% java.lang.Object[]

=== Other ===
   Count     Size   Size%    Cum% Class
   13210   643456  27.23%  27.23% byte[]
   12855   411360  17.41%  44.64% java.lang.String
    5488   219520   9.29%  53.93% java.util.HashMap$Node
     270   148368   6.28%  60.21% char[]
     355   109744   4.64%  64.85% java.lang.String[]
      96    95376   4.04%  68.89% java.util.HashMap$Node[]
    1474    94336   3.99%  72.88% sun.util.locale.LocaleObjectCache$CacheEntry
    1516    84896   3.59%  76.47% java.util.concurrent.ConcurrentHashMap$Node
    1325    84800   3.59%  80.06% java.util.LinkedHashMap$Entry
     468    55248   2.34%  82.40% int[]
[snip]
  • Can parts of those heap parts be stripped?
  • Is there a way to create a heap dump for those to analyze roots? (-H:DumpHeap seems to dump the heap of the native-image process but not the native-heap)

@vjovanov
Copy link
Member

vjovanov commented Aug 17, 2020

@jrudolph this is an interesting analysis. 3721664 seems indeed big and we should investigate what takes that much. By looking at the output I would say:

  1. byte[] takes the most space. We should really see where this data originates and can we shrink it before building an image.
  2. What are the 17% of the strings in the image heap?
  3. Data structures seem to take quite-some space (e.g., HashMaps). We should see if we minimized those data structures before building an image?
  4. DynamicHub is significant in the image heap. We could maybe use a bitset for the boolean flags there. Potentially, we could also encode the class name in a more efficient form.

-H:DumpHeap is the best I see. I think you can quickly identify what comes from the image builder. For anything better, we would have to implement our own version of hosted heap dumping that accounts only for the image heap.

@oleksandr-ilin
Copy link

I refactored one of Real World app from Spring Boot to Quarkus/Panache.
That apps are usual micro-services. In my case with PostgreSQL DB, JWT security and RESTful API.
You can check different real world apps here:
https://github.com/gothinkster/realworld

My Quarkus app has Uber jar 43Mb and native linux binary is 82.5Mb!
The similar Go app has just 16Mb

5 time thinner!

Is it because native build do not remove all unused classes and methods and every new jar dependency will just add own size to the final binary? Even if it's true I can't realize why resulting native binary is 2 times bigger that fat jar which contains all classes?

May be that is because some testing/debug/diagnose/non-prod option is turned on by default?

Is there any ways or plans to do some analysis and do not include the unused code or any other redundant stuff?
Thanks

@Sanne
Copy link
Contributor

Sanne commented Sep 21, 2020

Is it because native build do not remove all unused classes and methods and every new jar dependency will just add own size to the final binary? Even if it's true I can't realize why resulting native binary is 2 times bigger that fat jar which contains all classes?

On this point specifically: consider that the native binary is including the whole of all JDK classes and Substrate, the "JVM" runtime. The "fat jar" only includes your application code and its dependencies, so you would need to add the size of the JDK for a fair comparison.

A good way to compare is via the (full) disk size of a docker image: in the case of native-image make you can wrap an empty image, while the one with the JDK will need not only the JDK but also the shared libraries to which it depends on.

That said, it's of course still interesting to try to get closer to what Go is able to - Just bear in mind that the code is possibly different, such as the Java libraries being much more mature and feature rich, they are likely to need more code to be included.

@Sanne
Copy link
Contributor

Sanne commented Sep 21, 2020

@vjovanov in Quarkus we make sure many immutable structures that frameworks needs are initialized as a constant during compilation, so for example many such String and HashMap are "ready to go" and guaranteed immutable.

I also noticed these take quite some space; I even had the impression Strings are not de-duplicated - I didn't have time to dig further into detail, but if someone wanted to pursue this I suspect there could be some quick and easy wins via:

  • de-duplicating all String constants being included in the binary
  • converting all constant (immutable) instances of HashMap and similar into a compact, read-only struct?

I would expect this could also give some good performance boosts: much of our code will read those maps extremely often.

I did obtain a minor win by de-duplicating some String instances during bootstrap of the Hibernate ORM metadata; that's why I think de-duplication isn't happening in GraalVM's constant pool - but I might be wrong.

@dougxc
Copy link
Member

dougxc commented Sep 21, 2020

Just a quick note on de-duplication: one would need to be sure that objects subject to de-duplication/converting are never synchronized or have their identity used.

@Sanne
Copy link
Contributor

Sanne commented Sep 21, 2020

@dougxc great point, I hadn't thought of that. Regarding - specifically - Strings, I think we can all agree that people should never do this, but I agree it could still be a thing to consider. Perhaps the safe option would be to de-duplicate the underlying byte array?
Some GC implementations do this at runtime, so one could expect to trigger the same process before "casting it all in stone" in the binary.

@oilin-clgx
Copy link

oilin-clgx commented Sep 21, 2020

@Sanne

consider that the native binary is including the whole of all JDK classes and Substrate, the "JVM" runtime.

That is not clear for me. I thought one of the purpose to have the new separate VM like Substrate was actually to have ability do not bring ALL JDK classes and unused stuff into the native binary. So basically having AOT we can do static analysis and remove everything unused and that why we have so long build process for native build, I thought.
Similarly like C LINK links exe and picks up only used functions from the libs.
More closer to the Java world is well known ProGuard (https://www.guardsquare.com/en/products/proguard). So I thought it is completely feasible.

BTW After all Go requires similar runtime and GC to do the job...

A good way to compare is via the (full) disk size of a docker image: in the case of native-image make you can wrap an empty image, while the one with the JDK will need not only the JDK but also the shared libraries to which it depends on.

Yes that was exactly I did.

REPOSITORY                                    TAG                 IMAGE ID            CREATED             SIZE                SHARED SIZE         UNIQUE SIZE         CONTAINERS
quarkus/real-world-app                        latest              822d99fce996        13 hours ago        105.8MB             17.86MB             87.93MB             1
go/real-world-app                             latest              356e06f919fa        15 hours ago        21.89MB             5.575MB             16.31MB             0

As you may see the SHARED SIZE is something like Alpine or ubi-minimal and here we can play little bit. Here you can see the image for Go was better than ubi-minimal used for quarkus but there could be found similar alternatives for quarkus.
However the UNIQUE_SIZE is exactly the binary artifact size. 16M for Go and 88M for Quarkus build artifact.
And that is major parts of full container size.

they are likely to need more code to be included.

That's actually scare me and why I'm asking :-)

@adinn
Copy link
Collaborator

adinn commented Sep 21, 2020

The thing that has not been mentioned yet is that much of the image size is contributed by the static OpenJDK static libraries that are now linked into every native image. These cannot be pruned during Java code analysis to remove unwanted code or data because they are not Java code.

In earlier versions of GraalVM Native the behaviour provided by the OpenJDK static libs was reimplemented as pure Java code and most of it was subsequently optimized out of the generated binary, giving sizes much closer to that of equivalent Go programs. However, maintaining all that re-implemented functionality across multiple JDK versions was determined to be pointless effort for little gain so the OpenJDK libs are now used instead.

Note carefully that last qualification. The redundant code and data which are linked into these libraries will not be referenced at runtime. So, it will make very little contribution to text or data segment pages in the running image i.e. the overhead you are so concerned about is essentially going to manifest as little more than some extra storage on disk. I know that's a cost but disk is very, very cheap.

If you really care about saving some few 10s of megabytes of disk space in your deployed container well then write your app in Go (including writing a great deal of the standard Java lib functionality you are going to need to implement and test and train your programmers to use). If not then stop comparing disk image sizes and start measuring the resident memory costs that will actualy affect your bottom line.

@oilin-clgx
Copy link

@adinn If the problem is static OpenJDK static libraries that are now linked into every native image, should it be some constant value for any size of app?
According to the 1st post the Hello World app is ~6.1 MiB so logically it should be added not more than 6+Mb. Rest should be your stuff?
How 43Mb in jar becomes almost 90Mb in the native executable?

Also disk space in your deployed container is not only issue. The prices for the traffic, time to download, install, startup time etc could make sense as well.
Especially for the niche where this technology expected to be used well, like microservice horizontal scaling on thousands of VMs.

@olpaw
Copy link
Member

olpaw commented Sep 21, 2020

The thing that has not been mentioned yet is that much of the image size is contributed by the static OpenJDK static libraries that are now linked into every native image. These cannot be pruned during Java code analysis to remove unwanted code or data because they are not Java code

@adinn for Linux we compile the static libs with -ffunction-sections -fdata-sections. If the image is built with -H:+RemoveUnusedSymbols (default on Linux) the native linker command makes use of -Wl,--gc-sections. While this is not as effective having the code available as Java code it can still remove bits of the static libs that are not referenced anywhere at image link-time.

@adinn
Copy link
Collaborator

adinn commented Sep 21, 2020

@adinn If the problem is static OpenJDK static libraries that are now linked into every native image, should it be some constant value for any size of app?

It would be if all the libs were always linked in. I'm not sure if that is the case.

According to the 1st post the Hello World app is ~6.1 MiB so logically it should be added not more than 6+Mb. Rest should be your stuff?

The libs provide code needed for various native methods e.g. io, maths functions etc. So, selective inclusion of libs according to which JDK classes get linked in may account for the disparity.

How 43Mb in jar becomes almost 90Mb in the native executable?

Jar sizes are a completely specious metric against which to compare executable size.

Firstly, the sizes are only very loosely coupled. Most of the content of classes in jar files is Symbols, Strings and numeric Constants (it's usually > 90%). Many of these are repeated across a large number of classes so they end up occupying a much tinier amount of space when they are deduplicated to a single Symbol, String or Constant. How much deduplication arises will depend on how much replication there is. So, there is no fixed divisor to apply. So, if you are seeing 90Mb of executable then that may possibly represent a large amount of Java String data in your heap but that would only be because many different Strings occur in that 43Mb of jar code. Other 43Mb jars might contain only a handful of unique Strings.

Secondly, most Symbols and many Strings and Constants can be omitted from the image because the analysis shows they are not needed. Symbols are rarely needed anyway so it is mostly Strings and numeric constants that will add to image size. How much they add, after deduplication, really depends on how many of the classes in the jars are actually referenced by the app. If clases methdos or fields are not used then GrallVM does not include them in th eimage. Once again that depends entirely on how the code in the jar is written in the first place plus what use client code makes of those classes. A 43Mb jar might end up contributing once class and a few methods or hundreds of classes and methods. So, I am sorry but the numbers you are quoting really don't corroborate your story about GraalVM being inefficient. It's more complicated than that.

Also disk space in your deployed container is not only issue. The prices for the traffic, time to download, install, startup time etc could make sense as well.
Especially for the niche where this technology expected to be used well, like microservice horizontal scaling on thousands of VMs.

Startup time is another red herring. If OpenJDK library code is not invoked then it won't slow you down having it in your disk image (you might possibly see slightly worse paging of the text section but thta's going to be micro effect).

Perhaps download time and costs are significant for you relative to development and maintenance costs. I find that unlikely but I cannot rule it out. As I said, do switch to Go if it suits your needs better. I am just pointing out that 1) this is not a one-way street but a trade-off and 2) your assumptions about where the costs and opportunities/need for improvement lie were incomplete and missing important elements.

@cyraid
Copy link

cyraid commented Mar 28, 2021

Kotlin native has about a 500K for a Hello World without debug? How do they do it?

Edit: Upon further inspection, going To this reddit thread, you will see a comment from the Kotlin Team, saying they are not competing with each other, but provide two different types of use cases.

Perhaps you might want to go to Kotlin Native? As it also can use Java Jar's too, right?

@rubyFeedback
Copy link

rubyFeedback commented Dec 29, 2021

upx has been mentioned in other threads as well, and I don't mind the large file size.

native-image is pretty nice, works well (at the least so far that I have used) and is
fast. Storage is almost never a bottleneck IMO on modern computer systems. Perhaps
on embedded, but ... I here have a cheap 3TB harddisc and that's already several
years old. I think storage-size wise all is fine.

Still, small is beautiful, and perhaps the GraalVM team could consider integrating
either upx, or something similar to upx, with that specific goal (reduce file size)
and perhaps make it available via some commandline variant too such as
--small or something like that. That way we could skip another extra step.
Right now I have to go to the upx homepage, download this, install it and
hope that it works. A commandline flag by default in native-image would be
more convenient though.

I'll explore upx but hopefully the GraalVM team considers this here, even if the
issue is +3 years old - I still think, even if not hugely important, small file size
CAN be useful (for instance, for downloads too, on any area of the world where
you can only download slowly, so that would be one use case; I am sure you
can think of many more use cases where that may seem useful, even if
on modern systems file size really very rarely is any bottleneck as such).

cyraid mentioned kotlin, and that's a fine comment, but I would like to add that
one big sell of GraalVM is kinda the "use any programming language". Ok ok
not every language works, I get it ;) but if you go from this point of view then
I think no individual language should necessarily be put "above" the other
languages, usage-wise. I get that kotlin is closer to java than the others, but
I have a ruby background, I am sure others have a python background, others
a javascript background etc ... - so ideally the "polyglot" focus should put these
languages on the "same" level whenever possible. I agree with him in regards
to the hello world example - as said, it's not any issue for me, but the
"helloworld" binary native-image generated here has 15MB. I'll see to chop off
stuff via upx soon, but the GraalVM team should take that into consideration
and see how much they could also omit, if that is possible too. 15MB seems
a bit much - is that all really necessary 1:1? I understand the issue is not
about the text output "hello world", but the associated tooling, but even then
it's kind of much, in my opinion. But, it's not such a big deal anyway, just
something to keep in mind for the future, IMO.

@cosmicdan
Copy link

cosmicdan commented Jan 1, 2022

UPX is not a solution. Not only is it an external compressor that has nothing to do with the JVM but executable compression always adds measurable time to decompression, which means the java natives will take longer to startup - greatly diminishing one of the main use cases for native compiled java applets.

UPX is widely known about, anybody who knows anything about compression will be familiar with it; its not necessary to pollute GraalVM build system with another dependency that users can easily find and plugin themselves. Size is important, but not at the expense of any performance. UPX is a band-aid, not a solution.

One of the main attractions for native executables is embedded systems, where space AND performance are a premium. If you want to design a KIOSK system for example, you always needed to bundle a full JRE with them, which makes deployment more complicated and adds another layer of vulnerability. So it's important that any executable size improvements have zero cost to performance, otherwise what's the point - just use a JRE and get all that advanced JIT and GC goodness tuned up.

We all need to remember that this is a pretty crazy project - it can take practically any existing Java code since forever and remove the VM from it, making it run natively. In my opinion, it's pretty amazing that the executables are already this small!

Hopefully someone figures out something, but I honestly wouldn't be surprised if this is the best we can get without leaving Java behind. I don't mind the executable size, personally - I've worked around it by using one executable with many entry points rather than compiling many individual executables.

EDIT: If you are using Java and don't need polygot in Graal native exe's, consider IBM's Quarkus/Mandrel for smaller exe's (it is a fork of Graal VM): https://quarkus.io/guides/building-native-image - though it is container based so yeah, not as simple.

@vjovanov vjovanov assigned christianwimmer and unassigned vjovanov Jan 27, 2022
@gocursor
Copy link

gocursor commented Aug 6, 2022

Info on UPX: tried UPX on native image GraalVM Hello World app (64-bit Windows) and the UPX compressed EXE does not work (does not print Hello World).

@thomaswue
Copy link
Member

@cosmicdan The Mandrel distribution of GraalVM does not produce smaller executable files compared to the GraalVM Community Edition.

We are currently investigating how we can provide a compression mechanism for executables built into the native image generator that is independent of external tools like UPX.

A primary contributor to native image sizes are certain parts of the JDK libraries like for example time zone and localization data. This needs to be taken into account when comparing with "hello world" of for example Kotlin Native as those executables are missing those elements.

@mikehearn
Copy link
Contributor

The Avian project used to make a kinda micro-JVM that could (with compression) make GUI binaries that were ~1mb in size. That JVM fully supported Java 8 and had a reasonably sophisticated GC. How:

  1. They could use the OpenJDK libs but also had a reimplementation of parts of the JDK libraries to make them more loosely coupled, less dynamic and less featured. If you don't need timezones/localization, well they just didn't offer it in those libs.
  2. Used SWT which is quite small because the core GUI toolkit is part of the OS.

The interesting part was the lite libraries. There's probably uses for something like that, for smaller apps where you don't need many of the features or can rely on thin wrappers around the OS instead of pure Java reimpls. Kotlin/Native can make smaller binaries because there's virtually no standard library.

@cyraid
Copy link

cyraid commented Aug 17, 2022

@mikehearn Interesting. It would've been nice to have more awareness around that project. Seems you could also have a standalone executable which ran the micro VM and executed the main method in the same executable.

@mikehearn
Copy link
Contributor

@cyraid It could indeed do that (bundle into a single EXE).

@acodervic
Copy link

acodervic commented Feb 12, 2023

is there has any link? i want to know how he did it (400k helloword)? thanks !

@FireController1847
Copy link

FireController1847 commented May 26, 2023

+1 on this and @acodervic's comment. Would be curious for a status update on this front.

A "Hello, world!" program taking up around ~12MB seems like a significant amount; surely there must be some optimization that is able to be done on the backend that could reduce the output file sizes for situations like that, especially considering we're compiling to native execution. Were this to be a JDK optimizer, I'd say anything else, but I find it interesting the amount of extra data being used for situations like these. Notably, as stated this only really affects smaller projects and programs.

With that said, even a program like Notepad (the modern notepad) only uses ~900-1kb as a native executable. Now that's not particularly a fair comparison, considering Notepad utilizes the .NET Framework of the Windows computer running it. Nevertheless, I would consider this to be a comparison since the field we are discussing are native binary images, which at this point a large focus on optimization would be necessary (as, if you're going through the effort to create a native image, then you must need optimization of some form).

With all of this said, I've noticed that there are significant performance increases found by compiling to a native image. So, this absolutely is more of a "nice to have" rather than a "need to have."

@cosmicdan
Copy link

cosmicdan commented Jul 4, 2023

even a program like Notepad (the modern notepad) only uses ~900-1kb as a native executable

Hate to nitpick but this is false, it's not a native app - it's a UWP app. That EXE is just a stub, and the UWP app still depends on the Windows Runtime. Additionally, "classic" Notepad on Windows 10 is a ~200kb exe (plus a ~100kb resource file), and even then I believe it still depends on many other Windows DLL's to actually run (like most Windows EXE's).

So this is not a fair comparison, considering native Java executables are completely static and standalone.

It's enough of an argument to compare graal EXE's with other language "native EXE's" such as from Kotlin or other Java-AOT-compilers.

I believe the actual problem here is already mentioned; the "usage discovery" of the compiler is not aggressive enough in eliminating unused classes. Something like that. I guess it just isn't a huge priority is all.

I've noticed that there are significant performance increases found by compiling to a native image.

Only for short-lived or "one shot" type applications. Remember that, compiling to native means you lose all of the benefits that a long-running VM can provide with modern JIT and GC. It could result in less GC pauses (stutter) though, if you've not spent the time tuning GC for your application or doing your own "GC-friendly optimization" on GC-sensitive parts of your codebase. See https://github.com/ByerN/libgdx-graalvm-example for an example with results where a Java game was converted to Native EXE.

@mikehearn
Copy link
Contributor

The points-to analysis is pretty smart from what I understand. There might be some more juice there but I doubt it.

The big hammers exist but might not make sense given how native-image is used and the cheapness of bandwidth/disk space:

  • Split the outputs into shared libraries and inline less or not at all across the boundaries. Cost: runtime performance.
  • Encode very cold paths as bytecode and embed a small interpreter. I'd be willing to bet a lot that the bulk of the code size goes on code that's theoretically reachable but in practice hardly ever or never executed.
  • Split code into regions based on temperature and then actually download this code on demand. Like paging but from a remote server.

etc

@FireController1847
Copy link

FireController1847 commented Jul 4, 2023

@cosmicdan

So this is not a fair comparison, considering native Java executables are completely static and standalone.

I don't wish to argue this here, since it's not really the place for it, but I do wish to point out that I explicitly stated that in my reply:

Now that's not particularly a fair comparison, considering Notepad utilizes the .NET Framework of the Windows computer running it.

And proceeded to elaborate on why I claim it's a valid comparison:

Nevertheless, I would consider this to be a comparison since the field we are discussing are native binary images, which at this point a large focus on optimization would be necessary (as, if you're going through the effort to create a native image, then you must need optimization of some form).

To elaborate further on why I would say this is a valid claim, you must first consider why people are making native images in the first place. There could be many reasons, but the two primary that I have found in my research are 1) performance and 2) size. Native images come with the benefits of not being platform-independent, meaning that native images have the benefits of utilizing the resources different platforms provide. This does include, in my opinion, things like the Win32 API and other systems' native calls.

What would be the purpose of compiling a platform-independent application using native-image? There would be virtually no benefits if the entire JVM needed to be embedded into the compiled executable, considering that very thing is an entirely different field already anyways (obviously, this is an exaggeration, but my point is clear). At that point, you might as well begin working in a language like C or C++, which should already be a consideration if you need (or want to) compile to a native-image. My reasoning behind using a Java native-compiler would be the ease-of-use the language provides to developers, as compared to C++. Nevertheless, it's a constant consideration in the back of my head; at this level of compilation, all options are always being considered.

For a project like GraalVM's native-image, I imagine one of the biggest limiting factors would be the large amount of work that is required to add support for the native APIs and reduce the overhead from other factors of Java bytecode compilation, due to the nature of Java being platform-independent. Nevertheless, it's important to consider the reasoning behind the project in the first place, and if there would be no potential gains from compiling rather than embedding, why bother?

@cosmicdan
Copy link

cosmicdan commented Jul 4, 2023

The points-to analysis is pretty smart from what I understand. There might be some more juice there but I doubt it.

If indeed this is true, then what else would be the cause of the large output - the only thing I can think of is that the JDK itself is still too tightly-coupled. If so, then I suppose the only way possible to "reduce the exe size" is to go the shared library route and start building individual libraries for each Java Runtime it needs to link against. This would at least reduce the footprint when using many binaries on one system, but introduces a whole new group of problems regarding dependency management...

...I ended up making this problem redundant for my use case; embedded Linux and IoT-ish things (where I only have about 20MB to 100MB of free storage space) - rather than literally compiling a binary for every console tool I desired I just bundled them all into one and used symlinks to achieve the result of "many individual binaries".

Latest GraalVM 22.x produces a 5MB or so Hello World executable IIRC; that's not THAT bad. Maybe we're just asking too much from this old language/runtime? 😄

For a project like GraalVM's native-image, I imagine one of the biggest limiting factors would be the large amount of work that is required to add support for the native APIs and reduce the overhead from other factors of Java bytecode compilation, due to the nature of Java being platform-independent. Nevertheless, it's important to consider the reasoning behind the project in the first place, and if there would be no potential gains from compiling rather than embedding, why bother?

Can completely agree with that. Sounds like the majority of the binary is related to the plumbing behind bytecode-converted-to-native. Maybe there's just no demand from paying Oracle clients to make it any better than it is, and really that's a fair enough reason for their devs to have no time to make this perfect.

@mikehearn
Copy link
Contributor

Yes if you have many small CLI tools, having them in one binary is the way to go. It's almost like having shared libs, but simpler. We've done some experimental support for this in Conveyor where it uses a small stub launcher and it works well enough.

I guess in your embedded use case, downloading code (pages) on demand isn't possible?

@mikehearn
Copy link
Contributor

Oh, w.r.t. why so big. I think (not an expert) it's a combination of:

  1. Instructions for things that are needed in managed code like safepoints and associated spilling, exception handling code etc. In the interpreter it needs only a few bytecodes to express something like throw new IllegalStateException("foo " + bar); and it'll never get compiled anyway, but in native code it needs lots more bytes to express.
  2. Graal removing Java overheads using a lot of inlining. So what you may think of as a method that calls lots of other methods and thus would be small, actually becomes quite large (but fast).
  3. The JDK isn't designed for deadcode elimination and there are lots of places where it's sub-optimal e.g. the Charsets class that makes lots of codecs statically reachable even if never used. Maybe native image is smart enough to tackle that, it's just an example.
  4. JDK duplicating a lot of stuff offered by the OS like, indeed, character encoding.

You could chip away at this in lots of ways, but it's a losing battle to use conventional techniques IMO. My own app has >100mb of just bytecode, god knows how big a native image would be. Java is easy and productive with lots of libs. The feature count of modern apps can grow uncontrollably and that's a good thing, but it means code will always grow faster than you can cut it down. Better to investigate big hammers like partial compression, using bytecode+interp to shrink code, on-demand paging from remote servers etc.

@adinn
Copy link
Collaborator

adinn commented Jul 4, 2023

@mikehearn I believe Thomas Wuerthinger identified the main problem here. There is a still lot of code and data linked into the final image because the JDK runtime requires it to be present in order to ensure that the runtime can execute it to prepare for a host of possible things the app might in principle do but in fact does not actually do.

In theory, a deeper analysis could remove a lot of this unnecessary code and data. In practice, the analysis has to complete in an acceptable time and this means that stuff does that could be removed does not always get removed.

The problem is exacerbated by the fact that the JDK runtime is a bundle of many different libraries, which depend on each other in a complex, multi-linked network. These libraries are structured in a fairly coarse hierarchy, with a base set of core libraries then other libraries layered over them. This organization is visible in the module files introduced with the Java platform module system. However, the size of java.base makes it very clear that there is no finely graded inclusion model for the runtime.

That's hardly surprising given the scale and scope of the runtime and the number of developers working on it. That does not mean that there is no long term goal to minimize the runtime and reduce dependencies across modules and within them. In particular, reduction and simplification of JDK runtime non-API classes has been a goal of the OpenJDK project for a very long time. The problem has been weaning users off relying on internal implementations (by enforcing module restrictions) which has gone very slowly. This is likely happen more and more in upcoming releases and it will very likely provide an opportunity to help with the image size problem Graal faces.

@NCLnclNCL
Copy link

5 mb and 12 mb

@tnikolai2
Copy link

Native image for .net 8 asp.net core web api "hello world rest app" has only 10mb and fast compilation.
Spring/quarkus/micronaut are significantly inferior in size, compilation speed, memory consumption, and even performance.
according to tiobe index c# will soon overtake java

@fniephaus
Copy link
Member

Native image for .net 8 asp.net core web api "hello world rest app" has only 10mb and fast compilation.

We are working on both, smaller image sizes (try Oracle GraalVM) and faster compilation (see #7626). It seems the .NET core libraries are quite well designed for AOT use cases. We are trying to achieve the same for the JDK, but that takes time.

Spring/quarkus/micronaut are significantly inferior in size, compilation speed, memory consumption, and even performance.

Nonetheless, Spring/Quarkus/Micronaut are very popular frameworks, and they run on the OpenJDK and not on .NET. Native Image can already improve all metrics you mentioned when compared with the OpenJDK. Even compilation speed can be better, for example if you consider JIT compilation overheads across potential hundreds of deployments of the same app. Native Image compilation just needs to happen once and at build-time.

according to tiobe index c# will soon overtake java

The GraalVM project aims to improve the Java ecosystem, and I think the Java community is excited about this.
Python has been the top language in the tiobe index for years, yet there are many other languages with lots of rich and powerful ecosystems and communities.

@NCLnclNCL
Copy link

Native image for .net 8 asp.net core web api "hello world rest app" has only 10mb and fast compilation.

We are working on both, smaller image sizes (try Oracle GraalVM) and faster compilation (see #7626). It seems the .NET core libraries are quite well designed for AOT use cases. We are trying to achieve the same for the JDK, but that takes time.

Spring/quarkus/micronaut are significantly inferior in size, compilation speed, memory consumption, and even performance.

Nonetheless, Spring/Quarkus/Micronaut are very popular frameworks, and they run on the OpenJDK and not on .NET. Native Image can already improve all metrics you mentioned when compared with the OpenJDK. Even compilation speed can be better, for example if you consider JIT compilation overheads across potential hundreds of deployments of the same app. Native Image compilation just needs to happen once and at build-time.

according to tiobe index c# will soon overtake java

The GraalVM project aims to improve the Java ecosystem, and I think the Java community is excited about this. Python has been the top language in the tiobe index for years, yet there are many other languages with lots of rich and powerful ecosystems and communities.

The reason dotnet compiles faster and lighter is because it only compiles the code used starting from the Main function, and multi-threaded compilation for aot,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests