Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corpus for async #1

Closed
wants to merge 5 commits into from
Closed

Corpus for async #1

wants to merge 5 commits into from

Conversation

retronym
Copy link
Owner

@retronym retronym commented Mar 13, 2020

Comparing the status quo (scala-async macro) to the forthcoming compiler-integrated, post-erasure version.

We expect the new version to be faster to compile because:

  • It runs later in the compiler chain (after erasure), so it deals with simpler trees and the heft of the trees it creates is seen be fewer subsequent phases
  • It now generates far fewer states when await are inside if/else or patterns.
    • This can have a run-on benefit of needing to lift fewer locals into fields
  • It uses compiler internal APIs for tranformers that avoids some overhead of the scala.reflect.api layer
  • Much of the implementation has been reworked to reduce allocations (ie using ListBuffers judiciously rather than temporary lists)
  • Some inefficiencies in compiler infrastructure have been remedied (tree attachments, typing transformers)

The results look good:

Compile time: 0.365x
Allocations 0.67x

Take these with a pinch of salt at the benchmark corpus is designed to stress async. Real applications containing relatively fewer async blocks would see smaller improvements in compile time.

Benchmark                                                    (corpusPath)  (corpusVersion)  (extraArgs)  (resident)  (scalaVersion)      (source)    Mode  Cnt         Score          Error     Units
HotScalacBenchmark.compile                                      ../corpus           latest                    false         2.12.10  async-legacy  sample  131       468.485 ±       11.269     ms/op
HotScalacBenchmark.compile:compile·p0.00                        ../corpus           latest                    false         2.12.10  async-legacy  sample            403.702                    ms/op
HotScalacBenchmark.compile:compile·p0.50                        ../corpus           latest                    false         2.12.10  async-legacy  sample            459.276                    ms/op
HotScalacBenchmark.compile:compile·p0.90                        ../corpus           latest                    false         2.12.10  async-legacy  sample            508.979                    ms/op
HotScalacBenchmark.compile:compile·p0.95                        ../corpus           latest                    false         2.12.10  async-legacy  sample            534.564                    ms/op
HotScalacBenchmark.compile:compile·p0.99                        ../corpus           latest                    false         2.12.10  async-legacy  sample            669.956                    ms/op
HotScalacBenchmark.compile:compile·p0.999                       ../corpus           latest                    false         2.12.10  async-legacy  sample            684.720                    ms/op
HotScalacBenchmark.compile:compile·p0.9999                      ../corpus           latest                    false         2.12.10  async-legacy  sample            684.720                    ms/op
HotScalacBenchmark.compile:compile·p1.00                        ../corpus           latest                    false         2.12.10  async-legacy  sample            684.720                    ms/op
HotScalacBenchmark.compile:·compiler.nmethodCodeSize            ../corpus           latest                    false         2.12.10  async-legacy  sample    6    126011.531                       Kb
HotScalacBenchmark.compile:·compiler.nmethodSize                ../corpus           latest                    false         2.12.10  async-legacy  sample    6    251547.297                       Kb
HotScalacBenchmark.compile:·compiler.osrBytes                   ../corpus           latest                    false         2.12.10  async-legacy  sample    6        21.299                       Kb
HotScalacBenchmark.compile:·compiler.osrCompiles                ../corpus           latest                    false         2.12.10  async-legacy  sample    6        54.000                  methods
HotScalacBenchmark.compile:·compiler.osrTime                    ../corpus           latest                    false         2.12.10  async-legacy  sample    6       348.016                       ms
HotScalacBenchmark.compile:·compiler.standardBytes              ../corpus           latest                    false         2.12.10  async-legacy  sample    6     10203.511                       Kb
HotScalacBenchmark.compile:·compiler.standardCompiles           ../corpus           latest                    false         2.12.10  async-legacy  sample    6     50396.000                  methods
HotScalacBenchmark.compile:·compiler.standardTime               ../corpus           latest                    false         2.12.10  async-legacy  sample    6    122174.371                       ms
HotScalacBenchmark.compile:·compiler.totalBailouts              ../corpus           latest                    false         2.12.10  async-legacy  sample    6         2.000                  methods
HotScalacBenchmark.compile:·compiler.totalCompiles              ../corpus           latest                    false         2.12.10  async-legacy  sample    6     50450.000                  methods
HotScalacBenchmark.compile:·compiler.totalInvalidates           ../corpus           latest                    false         2.12.10  async-legacy  sample    6           ≈ 0                  methods
HotScalacBenchmark.compile:·compiler.totalTime                  ../corpus           latest                    false         2.12.10  async-legacy  sample    6    122522.387                       ms
HotScalacBenchmark.compile:·gc.alloc.rate                       ../corpus           latest                    false         2.12.10  async-legacy  sample    6       169.508 ±        7.103    MB/sec
HotScalacBenchmark.compile:·gc.alloc.rate.norm                  ../corpus           latest                    false         2.12.10  async-legacy  sample    6  87433008.026 ±   149609.993      B/op
HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space              ../corpus           latest                    false         2.12.10  async-legacy  sample    6       175.200 ±      163.365    MB/sec
HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm         ../corpus           latest                    false         2.12.10  async-legacy  sample    6  90361469.094 ± 83291753.373      B/op
HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space          ../corpus           latest                    false         2.12.10  async-legacy  sample    6         2.411 ±       10.329    MB/sec
HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm     ../corpus           latest                    false         2.12.10  async-legacy  sample    6   1258719.547 ±  5408119.661      B/op
HotScalacBenchmark.compile:·gc.count                            ../corpus           latest                    false         2.12.10  async-legacy  sample    6        14.000                   counts
HotScalacBenchmark.compile:·gc.time                             ../corpus           latest                    false         2.12.10  async-legacy  sample    6       825.000                       ms
HotScalacBenchmark.compile:·rt.safepointSyncTime                ../corpus           latest                    false         2.12.10  async-legacy  sample    6       927.182                       ms
HotScalacBenchmark.compile:·rt.safepointTime                    ../corpus           latest                    false         2.12.10  async-legacy  sample    6      4249.463                       ms
HotScalacBenchmark.compile:·rt.safepoints                       ../corpus           latest                    false         2.12.10  async-legacy  sample    6     18349.000                   counts
HotScalacBenchmark.compile:·rt.sync.contendedLockAttempts       ../corpus           latest                    false         2.12.10  async-legacy  sample    6      5446.000                    locks
HotScalacBenchmark.compile:·rt.sync.fatMonitors                 ../corpus           latest                    false         2.12.10  async-legacy  sample    6       256.000                 monitors
HotScalacBenchmark.compile:·rt.sync.futileWakeups               ../corpus           latest                    false         2.12.10  async-legacy  sample    6      3394.000                   counts
HotScalacBenchmark.compile:·rt.sync.monitorDeflations           ../corpus           latest                    false         2.12.10  async-legacy  sample    6      1218.000                 monitors
HotScalacBenchmark.compile:·rt.sync.monitorInflations           ../corpus           latest                    false         2.12.10  async-legacy  sample    6      1220.000                 monitors
HotScalacBenchmark.compile:·rt.sync.notifications               ../corpus           latest                    false         2.12.10  async-legacy  sample    6        90.000                   counts
HotScalacBenchmark.compile:·rt.sync.parks                       ../corpus           latest                    false         2.12.10  async-legacy  sample    6      3871.000                   counts
HotScalacBenchmark.compile:·stack                               ../corpus           latest                    false         2.12.10  async-legacy  sample                NaN                      ---
HotScalacBenchmark.compile:·threads.cpu.time.norm               ../corpus           latest                    false         2.12.10  async-legacy  sample            443.695                    ms/op
HotScalacBenchmark.compile:·threads.user.time.norm              ../corpus           latest                    false         2.12.10  async-legacy  sample            429.893                    ms/op
Benchmark                                                    (corpusPath)  (corpusVersion)  (extraArgs)  (resident)                (scalaVersion)  (source)    Mode  Cnt         Score          Error     Units
HotScalacBenchmark.compile                                      ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample  354       171.165 ±        3.637     ms/op
HotScalacBenchmark.compile:compile·p0.00                        ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            144.703                    ms/op
HotScalacBenchmark.compile:compile·p0.50                        ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            166.199                    ms/op
HotScalacBenchmark.compile:compile·p0.90                        ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            201.458                    ms/op
HotScalacBenchmark.compile:compile·p0.95                        ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            217.711                    ms/op
HotScalacBenchmark.compile:compile·p0.99                        ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            235.065                    ms/op
HotScalacBenchmark.compile:compile·p0.999                       ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            251.920                    ms/op
HotScalacBenchmark.compile:compile·p0.9999                      ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            251.920                    ms/op
HotScalacBenchmark.compile:compile·p1.00                        ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            251.920                    ms/op
HotScalacBenchmark.compile:·compiler.nmethodCodeSize            ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6     73071.906                       Kb
HotScalacBenchmark.compile:·compiler.nmethodSize                ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6    146320.453                       Kb
HotScalacBenchmark.compile:·compiler.osrBytes                   ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6        37.067                       Kb
HotScalacBenchmark.compile:·compiler.osrCompiles                ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6        56.000                  methods
HotScalacBenchmark.compile:·compiler.osrTime                    ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       707.930                       ms
HotScalacBenchmark.compile:·compiler.standardBytes              ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6      7706.624                       Kb
HotScalacBenchmark.compile:·compiler.standardCompiles           ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6     28620.000                  methods
HotScalacBenchmark.compile:·compiler.standardTime               ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6    102229.056                       ms
HotScalacBenchmark.compile:·compiler.totalBailouts              ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6         3.000                  methods
HotScalacBenchmark.compile:·compiler.totalCompiles              ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6     28676.000                  methods
HotScalacBenchmark.compile:·compiler.totalInvalidates           ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6           ≈ 0                  methods
HotScalacBenchmark.compile:·compiler.totalTime                  ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6    102936.986                       ms
HotScalacBenchmark.compile:·gc.alloc.rate                       ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       310.930 ±       14.182    MB/sec
HotScalacBenchmark.compile:·gc.alloc.rate.norm                  ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6  58648185.313 ±    38309.377      B/op
HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space              ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       297.827 ±      137.348    MB/sec
HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm         ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6  56168723.665 ± 25638540.244      B/op
HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space          ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6         0.506 ±        1.071    MB/sec
HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm     ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6     95240.352 ±   200428.077      B/op
HotScalacBenchmark.compile:·gc.count                            ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6        18.000                   counts
HotScalacBenchmark.compile:·gc.time                             ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       335.000                       ms
HotScalacBenchmark.compile:·rt.safepointSyncTime                ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       758.286                       ms
HotScalacBenchmark.compile:·rt.safepointTime                    ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6      3383.799                       ms
HotScalacBenchmark.compile:·rt.safepoints                       ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6     18279.000                   counts
HotScalacBenchmark.compile:·rt.sync.contendedLockAttempts       ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6      6422.000                    locks
HotScalacBenchmark.compile:·rt.sync.fatMonitors                 ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       128.000                 monitors
HotScalacBenchmark.compile:·rt.sync.futileWakeups               ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6      9166.000                   counts
HotScalacBenchmark.compile:·rt.sync.monitorDeflations           ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       256.000                 monitors
HotScalacBenchmark.compile:·rt.sync.monitorInflations           ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       258.000                 monitors
HotScalacBenchmark.compile:·rt.sync.notifications               ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6       117.000                   counts
HotScalacBenchmark.compile:·rt.sync.parks                       ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample    6      9937.000                   counts
HotScalacBenchmark.compile:·stack                               ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample                NaN                      ---
HotScalacBenchmark.compile:·threads.cpu.time.norm               ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            166.254                    ms/op
HotScalacBenchmark.compile:·threads.user.time.norm              ../corpus           latest                    false  2.12.11-bin-d053e66-SNAPSHOT     async  sample            156.229                    ms/op

Benchmark result is saved to /Users/jz/code/compiler-benchmark/target/profile-basic/result.json

@kiranbayram
Copy link

Hello there, how exactly do you calculate the suggested improvements from the given tables?

@retronym
Copy link
Owner Author

$ sbt 'hot -psource=async -prof gc'  'hot -psource=async-legacy -prof gc'

That runs the JMH benchmark suite with the old and new implementations of async.

@retronym retronym closed this May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants