New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sbt 1.x Performance improvement targets #4012
Comments
Some learnings from Mill:
Mill/Ammonite use ScalaParse to split expressions in the top-level script file, but even that takes a while to initialize the first time purely due to classloading, and so it caches the output of
Would swapping in a fast JSON serializer be a solution here? Mill uses uPickle, which in my arbitrary benchmarks (which are more complex/branchy/intricate that typical cache JSON, which is full of long strings) does 65-70 mb/s on my macbook by default, 85-90mb/s if you cache the implicitly constructed serializer objects (Mill does when hot). This is for If you don't want to use uPickle, Circe's performance is within a factor of 2 (~60mb/s?), and Play-Json within a factor of 3-4 (~30mb/s?), which may be enough to make serialization disappear off your profiles One thing that Mill faced that your profiles show SBT facing too is classloading time: according to There's not much to do here except to aggressively cache things so that cached startups don't need to load as many classes. e.g. classloading Notably, heavy dependencies like Scalaz or Cats work against you in trying to fight classloading time: even if you don't run much code, just touching the library in a few places is enough to force the bulk of classloading to take place. That's one reason in Ammonite/Mill I have been very aggressive about using culling libraries with large transitive dependency graphs in favor of 0-dependency libraries like uPickle |
@lihaoyi Thanks for the tips! These are all useful points to consider. |
With regards to cold performance (both of SBT and Scalac), it is worth thinking about how to integrate the now-free-in-OpenJDK feature, Application Class Data Sharing, to shave a few hundred millis of startup time. I'm also keenly waiting for JWarmup to persist JIT profiles from a previous run to speedup warmup. |
I had a little go with the
|
More experiments, little progress:
|
Did you guys abandon CP sharing idea? |
Would it be appropriate for sbt to use Without
With
With
With
|
Have you checked actual compilation? Try compiling a loop of compilations (e.g. |
Indeed, the compilation performance is not impressive 👎
|
I've been reading through advancedThresholdPolicy.hpp
|
During the last weeks I spent a bit of time analyzing the performance of starting an already fully cached sbt build (i.e. taking the times with
time sbt exit && time sbt exit && time sbt exit
and ignoring the first result). I probably won't have time to act on these but I wanted to dump these here for anyone who wants to help improving the performance.Calculating aggregations is slow. Aggregations are cached at the beginning. For each key the settings structure is queried for "What's the value of
aggregate in X
". This calculation is somewhat slow because aggregation usually falls back to the default value in global scope. To find this value usually the whole delegation chain has to be walked.Validating key references. Before evaluating settings, sbt validates if the settings dependency structure is valid. This is slow because it uses slow generic abstractions like the AList and creates lots of garbage. I haven't checked but I wonder if it necessary at all to check dependency chains before settings evaluation or if dependency problems could be collected while evaluating the settings tree.
Parallel settings evaluation is ineffective. Settings are evaluated concurrently using the
INode
setup. In my tests, I didn't observe any notable parallelism even in cases with at least two slow tasks that didn't seem to depend on each other. I haven't looked into that deeply, but I suspect a few issues with parallel settings evaluation:I had a try at removing the parallel execution completely, which seemed to work and wasn't slower in my limited testing at least. However, it wasn't quite correct as some builds (cinnamon) started to fail with unresolvable setting dependencies. (I thought that after topological sort, dependencies should have always been executed before the dependees but that assumption might be wrong e.g. for Bind nodes). Parallel setting execution might be worth it on some builds, though.
splitExpressions
during parsing.sbt
files is slow because a Scala compiler has to be initialized for parsing. I played around with caching the results ofsplitExpressions
in jrudolph@b7a9f74 which seemed to work quite well (I didn't figure out how to use sbt's caching infrastructure correctly, though).As already noted in Performance observations (sbt 1.0 startup performance) #3694, a main performance problem is the de(serialization) of
update
task caches (which are accessed to build the meta-project and the project itself). There are several issues:Tracked.lastOutput
isn't well-suited for the task at hand, as the cache reading code runsmarkAsCached
to set a flag that the data comes from the cache, after the data was read, and afterwards the whole data that was just deserialized from the cache is again written to the cache (see jrudolph@3836732 for a rather manual attempt to fix that).update
task when run in the project itself. (Imo using a binary format like protobuf for the update cache makes most sense as that will likely be faster than any json representation).I attached a zip with the flamegraphs captured with async-profiler. One including GC and JIT compilation threads and one with only the Java threads.
flame-graphs.zip
The text was updated successfully, but these errors were encountered: