Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ok with faster, compiled, Julia code? #291

Open
PallHaraldsson opened this issue Aug 13, 2022 · 7 comments
Open

Ok with faster, compiled, Julia code? #291

PallHaraldsson opened this issue Aug 13, 2022 · 7 comments

Comments

@PallHaraldsson
Copy link

PallHaraldsson commented Aug 13, 2022

Hi,

I'm willing to look into improving times for Julia language, but want to use tricks disallowed, unfairly, at the Debian benchmark game (at least currently).

Julia is currently optimized for long-running code, has a) a high startup-cost for the runtime itself, plus b) some for compiling the benchmarked code. That means Julia on default options can't win some benchmarks, such as "hello world", but a small/fast compiled such program has already been made with Julia.

Compiled code would be ok here, unlike at Debian? PackagesComiler.jl is to do that, and it seems you're working in that direction, at least I saw a merged "precompile" PR here, but unsure if it's already used.

Another option is a non-default sysimage, but with the benchmark code not in it. Ok? That's basicaly same as a non-default Julia runtime, or a fork of Julia (keeping compatibility with same Julia code).

One more option is mimalloc or other malloc, modifying the Julia binary. Ok? #257

I see Debian used 50000000 for nbody, while you have 5000000 and 500000, so for you Julia is nowhere close to the lead, because of the startup-overhead, unlike at Debian (there at 1.0x). However there, there's another category, and it goes down to 0.5x:

hand-written vector instructions | "unsafe"

https://programming-language-benchmarks.vercel.app/problem/nbody

@hanabi1224
Copy link
Owner

hanabi1224 commented Aug 15, 2022

That means Julia on default options can't win some benchmarks, such as "hello world", but a small/fast compiled such program has already been made with Julia.

Do you have numbers that support this point?

at least I saw a merged "precompile" PR here, but unsure if it's already used.

'Precompile' has already been used. AOT compilation is not because it's way too slow

Another option is a non-default sysimage

Hmm, only if it's officially recommended that everyone should use their own sysimage

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Aug 15, 2022

Do you have numbers that support this point?

Yes, https://discourse.julialang.org/t/successful-static-compilation-of-julia-code-for-use-in-production/79318

Most of the small examples above weigh in at about 16 kB, which is thus a lower limit to the library size. My full code ended up with a 20 kB library, which is really quite respectable and surprisingly close to the small examples.

You probably have PackageComplier.jl in mind with AoT (and yes, it was slow last time I checked). From memory StaticCompiler.jl was much faster when I made a "Hello world" binary, about that size, and it's also "AoT", so better to be specific what's meant. I'm not actually proposing the latter, for now, just stating Julia's default startup is too much of an overhead for some code.

From Julia's official docs (i.e. sysimages are certainly an official option):

Julia ships with a preparsed system image containing the contents of the Base module, named sys.ji. This file is also precompiled into a shared library called sys.{so,dll,dylib} on as many platforms as possible, so as to give vastly improved startup times. On systems that do not ship with a precompiled system image file, one can be generated from the source files shipped in Julia's DATAROOTDIR/julia/base folder.

This operation is useful for multiple reasons. A user may:

  • Build a precompiled shared library system image on a platform that did not ship with one, thereby improving startup times.
  • Modify Base, rebuild the system image and use the new Base next time Julia is started.
  • Include a userimg.jl file that includes packages into the system image, thereby creating a system image that has packages embedded into the startup environment.

The PackageCompiler.jl package contains convenient wrapper functions to automate this process.

Not only do Julia's official docs reference that package, but it's also made by Kristoffer, one of Julia's core developers. I do however not know of any specific official sysimage. Since Julia ships with a sysimage on "as many platforms as possible", e.g. all support Julia platforms, it's not a relatively much known or used option I think.

Until maybe: https://discourse.julialang.org/t/a-julia-dataanalysis-sysimage-from-packagecompiler-its-so-easy-you-should-do-it-too/68127

https://julialang.github.io/PackageCompiler.jl/dev/sysimages.html

Julia ships with a sysimage that is used by default when Julia is started. That sysimage contains the Julia compiler itself, the standard libraries, and also compiled code that has been put there to reduce the time required to do common operations, like working in the REPL.

Sometimes it is desirable to create a custom sysimage with custom precompiled code. This is the case if one has some dependencies that take a significant time to load or where the compilation time for the first call is uncomfortably long. This section of the documentation is intended to document how to use PackageCompiler to create such sysimages.

https://discourse.julialang.org/t/packagecompile-system-image-for-different-computers/47319/4

I just found a different tool:
https://github.com/terasakisatoshi/sysimage_creator

One more way to improve startup:
https://discourse.julialang.org/t/juliad-and-julias-wrappers-to-daemonmode-and-j-sysimage-for-quicker-linux-cygwin-scripts/76690/2

@hanabi1224
Copy link
Owner

hanabi1224 commented Aug 15, 2022

Wow, so many tools! But which one is officially recommended to solve the exact problem here?

I have enabled AOT with PackageCompiler.jl for a few problems in this PR to demonstrate the perf implication but I cannot do that for all problems only because it's toooooooo slow.

@PallHaraldsson
Copy link
Author

Thanks for adding!

I'm still a bit confused by:

https://programming-language-benchmarks.vercel.app/problem/nbody

julia | 7.jl | 565ms | 0.7ms | 176.1MB | 527ms | 113ms | julia/aot 1.7.3
julia | 7.jl | 759ms | 1.0ms | 223.5MB | 727ms | 110ms | julia 1.7.3
julia | 7.jl | 263ms | 3.6ms | 173.9MB | 230ms | 110ms | julia/aot 1.7.3
julia | 7.jl | 446ms | 1.2ms | 225.2MB | 400ms | 123ms | julia 1.7.3

Do you have outdated numbers there, and why were they slow to begin with?

I guess I'm looking at 263/446 = 41% speedup, or even 263/759 = 65%?

And now Julia 3rd, ahead of Rust on (previously 13th, 29% slower):
https://programming-language-benchmarks.vercel.app/problem/nsieve

only 9% slower than best, ccp, explained in full by "time(sys)", if could be eliminated, then would be 19% faster.

I would just still with PackageCompiler.jl, at least for now, maybe using its options, strip out stdlibs, if you didn't already.

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Aug 18, 2022

I cannot do that for all problems only because it's toooooooo slow.

I've announced the result, as is, but can you enable for all the programs? I'm confused, the compilation is run just once per program and stored, so a non-issue? If not could be...

@oscardssmith
Copy link

Note that a lot of the speed issues will be significantly helped by JuliaLang/julia#46045

@PallHaraldsson
Copy link
Author

One more thing, Julia 1.8.0 is out, so you may want to benchmark with that one. If it's uniformly faster than 1.7.3 then just drop that otherwise could you keep both in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants