Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-31342] Implemented node object inlining, lock free specialization and other features for Truffle DSL. #5566

Merged
merged 22 commits into from
Jan 12, 2023

Conversation

graalvmbot
Copy link
Collaborator

@graalvmbot graalvmbot commented Dec 5, 2022

Changes in this PR

  • GR-31342 Implemented several new features for Truffle DSL and improved its performance:
    • Added an @GenerateInline annotation that allows Truffle nodes to be object-inlined automatically. Object-inlined Truffle nodes become singletons and therefore reduce memory footprint. Please see the tutorial for further details.
    • Added an @GenerateCached annotation that allows users to control the generation of cached nodes. Use @GenerateCached(false) to disable cached node generation when all usages of nodes are object-inlined to save code footprint.
    • Updated Truffle DSL nodes no longer require the node lock during specialization, resulting in improved first execution performance. CAS-style inline cache updates are now used to avoid deadlocks when calling CallTarget.call(...) in guards. Inline caches continue to guarantee no duplicate values and are not affected by race conditions. Language implementations should be aware that the reduced contention may reveal other thread-safety issues in the language.
    • Improved Truffle DSL node memory footprint by merging generated fields for state and exclude bit sets and improving specialization data class generation to consider activation probability. Specializations should be ordered by activation probability for optimal results.
    • Improved memory footprint by automatically inlining cached parameter values of enum types into the state bitset
    • Added @Cached(neverDefault=true|false) option to indicate whether the cache initializer will ever return a null or primitive default value. Truffle DSL now emits a warning if it is beneficial to set this property. Alternatively, the new @NeverDefault annotation may be used on the bound method or variable. The generated code layout can benefit from this information and reduce memory overhead. If never default is set to true, then the DSL will now use the default value instead internally as a marker for uninitialized values.
    • @Shared cached values may now use primitive values. Also, @Shared can now be used for caches contained in specializations with multiple instances. This means that the shared cache will be used across all instances of a specialization.
    • Truffle DSL now generates many more Javadoc comments in the generated code that try to explain the decisions of the code generator.
    • Added inlined variants for all Truffle profiles in com.oracle.truffle.api.profiles. The DSL now emits recommendation warnings when inlined profiles should be used instead of the allocated ones.
    • Truffle DSL now emits many more warnings for recommendations. For example, it emits warnings for inlining opportunities, cached sharing or when a cache initializer should be designated as @NeverDefault. To ease migration work, we added several new ways to suppress the warnings temporarily for a Java package. For a list of possible warnings and further usage instructions, see the new warnings page in the docs.
    • The DSL now produces warnings for specializations with multiple instances but an unspecified limit. The new warning can be resolved by specifying the desired limit (previously, default "3" was assumed)
    • Added the capability to unroll specializations with multiple instances. Unrolling in combination with node object inlining may further reduce the memory footprint of a Truffle node. In particular, if all cached states can be encoded into the state bit set of the generated node. See @Specialization(...unroll=2) for further details

Interpreter Benchmarks

Interpreter-only benchmarks results from NodeInlining benchmark:

Results:

Benchmark                                        Mode  Cnt          Score          Error  Units
NodeInliningBenchmark.createCached              thrpt   10  462825978,228 ± 34317974,329  ops/s
NodeInliningBenchmark.createInlined             thrpt   10  514584698,328 ±  1633221,580  ops/s
NodeInliningBenchmark.executeFastCached         thrpt   10   98003482,532 ±  2376017,729  ops/s
NodeInliningBenchmark.executeFastInlined        thrpt   10  289358050,116 ±  5683095,481  ops/s
NodeInliningBenchmark.executeSpecializeCached   thrpt   10   22543004,143 ±   256920,230  ops/s
NodeInliningBenchmark.executeSpecializeInlined  thrpt   10  343462433,712 ±  1086340,120  ops/s

Here are the runnable benchmarks from master:

Benchmark                                       Mode  Cnt          Score         Error  Units
NodeInliningBenchmark.createCached             thrpt   10  488411151,352 ± 3021114,919  ops/s
NodeInliningBenchmark.executeFastCached        thrpt   10   81300906,544 ±  489476,555  ops/s
NodeInliningBenchmark.executeSpecializeCached  thrpt   10   10639216,826 ±  147950,425  ops/s

In summary:
executeFastCached got slightly faster after this changes
executeSpecializeCached improved 2x, because executeAndSpecialize is now lock-free.
executeFastInlined improved 3x over the cached counterpart. likely because of fewer reads that are necessary.
executeSpecializeInlined is 15x faster than the cached counterpart because no nodes need to be allocated anymore. All we do is set the state bit. Interesting this is also faster than executeFastInlined because due to the changed frequencies here we also inline executeAndSpecialize which can be beneifical for the fast-path.

Memory Footprint

The biggest benefit of node object inlining is memory footprint. But for that we need it adopt it in the language first to get real results. Due to the lock-free specialization changes it is sometimes necessary to use specialization classes where we previously didn't.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Dec 5, 2022
@fniephaus fniephaus linked an issue Dec 5, 2022 that may be closed by this pull request
@graalvmbot graalvmbot force-pushed the chumer/GR-31342/dsl-object-inlining branch 2 times, most recently from c136507 to 6b63a92 Compare January 10, 2023 14:42
@graalvmbot graalvmbot force-pushed the chumer/GR-31342/dsl-object-inlining branch from f34a539 to dc0eb00 Compare January 10, 2023 22:28
@graalvmbot graalvmbot force-pushed the chumer/GR-31342/dsl-object-inlining branch from bee59ff to 85dfa56 Compare January 12, 2023 10:57
@graalvmbot graalvmbot merged commit fe73df9 into master Jan 12, 2023
@graalvmbot graalvmbot deleted the chumer/GR-31342/dsl-object-inlining branch January 12, 2023 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node Object Inlining
2 participants