[GR-31342] Implemented node object inlining, lock free specialization and other features for Truffle DSL. #5566

graalvmbot · 2022-12-05T18:17:04Z

Changes in this PR

GR-31342 Implemented several new features for Truffle DSL and improved its performance:
- Added an @GenerateInline annotation that allows Truffle nodes to be object-inlined automatically. Object-inlined Truffle nodes become singletons and therefore reduce memory footprint. Please see the tutorial for further details.
- Added an @GenerateCached annotation that allows users to control the generation of cached nodes. Use @GenerateCached(false) to disable cached node generation when all usages of nodes are object-inlined to save code footprint.
- Updated Truffle DSL nodes no longer require the node lock during specialization, resulting in improved first execution performance. CAS-style inline cache updates are now used to avoid deadlocks when calling CallTarget.call(...) in guards. Inline caches continue to guarantee no duplicate values and are not affected by race conditions. Language implementations should be aware that the reduced contention may reveal other thread-safety issues in the language.
- Improved Truffle DSL node memory footprint by merging generated fields for state and exclude bit sets and improving specialization data class generation to consider activation probability. Specializations should be ordered by activation probability for optimal results.
- Improved memory footprint by automatically inlining cached parameter values of enum types into the state bitset
- Added @Cached(neverDefault=true|false) option to indicate whether the cache initializer will ever return a null or primitive default value. Truffle DSL now emits a warning if it is beneficial to set this property. Alternatively, the new @NeverDefault annotation may be used on the bound method or variable. The generated code layout can benefit from this information and reduce memory overhead. If never default is set to true, then the DSL will now use the default value instead internally as a marker for uninitialized values.
- @Shared cached values may now use primitive values. Also, @Shared can now be used for caches contained in specializations with multiple instances. This means that the shared cache will be used across all instances of a specialization.
- Truffle DSL now generates many more Javadoc comments in the generated code that try to explain the decisions of the code generator.
- Added inlined variants for all Truffle profiles in com.oracle.truffle.api.profiles. The DSL now emits recommendation warnings when inlined profiles should be used instead of the allocated ones.
- Truffle DSL now emits many more warnings for recommendations. For example, it emits warnings for inlining opportunities, cached sharing or when a cache initializer should be designated as @NeverDefault. To ease migration work, we added several new ways to suppress the warnings temporarily for a Java package. For a list of possible warnings and further usage instructions, see the new warnings page in the docs.
- The DSL now produces warnings for specializations with multiple instances but an unspecified limit. The new warning can be resolved by specifying the desired limit (previously, default "3" was assumed)
- Added the capability to unroll specializations with multiple instances. Unrolling in combination with node object inlining may further reduce the memory footprint of a Truffle node. In particular, if all cached states can be encoded into the state bit set of the generated node. See @Specialization(...unroll=2) for further details

Interpreter Benchmarks

Interpreter-only benchmarks results from NodeInlining benchmark:

Results:

Benchmark                                        Mode  Cnt          Score          Error  Units
NodeInliningBenchmark.createCached              thrpt   10  462825978,228 ± 34317974,329  ops/s
NodeInliningBenchmark.createInlined             thrpt   10  514584698,328 ±  1633221,580  ops/s
NodeInliningBenchmark.executeFastCached         thrpt   10   98003482,532 ±  2376017,729  ops/s
NodeInliningBenchmark.executeFastInlined        thrpt   10  289358050,116 ±  5683095,481  ops/s
NodeInliningBenchmark.executeSpecializeCached   thrpt   10   22543004,143 ±   256920,230  ops/s
NodeInliningBenchmark.executeSpecializeInlined  thrpt   10  343462433,712 ±  1086340,120  ops/s

Here are the runnable benchmarks from master:

Benchmark                                       Mode  Cnt          Score         Error  Units
NodeInliningBenchmark.createCached             thrpt   10  488411151,352 ± 3021114,919  ops/s
NodeInliningBenchmark.executeFastCached        thrpt   10   81300906,544 ±  489476,555  ops/s
NodeInliningBenchmark.executeSpecializeCached  thrpt   10   10639216,826 ±  147950,425  ops/s

In summary:
executeFastCached got slightly faster after this changes
executeSpecializeCached improved 2x, because executeAndSpecialize is now lock-free.
executeFastInlined improved 3x over the cached counterpart. likely because of fewer reads that are necessary.
executeSpecializeInlined is 15x faster than the cached counterpart because no nodes need to be allocated anymore. All we do is set the state bit. Interesting this is also faster than executeFastInlined because due to the changed frequencies here we also inline executeAndSpecialize which can be beneifical for the fast-path.

Memory Footprint

The biggest benefit of node object inlining is memory footprint. But for that we need it adopt it in the language first to get real results. Due to the lock-free specialization changes it is sometimes necessary to use specialization classes where we previously didn't.

…fore node plugins are invoked.

…ning instead.

graalvmbot assigned chumer Dec 5, 2022

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Dec 5, 2022

fniephaus linked an issue Dec 5, 2022 that may be closed by this pull request

Node Object Inlining #5044

Closed

graalvmbot force-pushed the chumer/GR-31342/dsl-object-inlining branch 2 times, most recently from c136507 to 6b63a92 Compare January 10, 2023 14:42

graalvmbot force-pushed the chumer/GR-31342/dsl-object-inlining branch from f34a539 to dc0eb00 Compare January 10, 2023 22:28

chumer added 22 commits January 12, 2023 11:53

Make checkstyle checks less strict for Truffle.

23b9c19

Ensure unsafe accesses are tried to be transformed to loads/stores be…

43efd0a

…fore node plugins are invoked.

Implemented Node Object Inlining and other features for Truffle DSL.

a42ced0

Refactor TruffleString for node object inlining.

269d59e

Disable selected DSL warnings for Sulong. (GR-42838)

cda49f8

Disable selected DSL warnings for TRegex (GR-42839)

2769154

Disable selected DSL warnings in NFI (GR-42818)

4e15373

Fix DSL warnings for wasm.

aebe728

Fix warning in MemoryDump class.

9726433

Ignore and fix some warnings for Espresso. (GR-43114)

ae92de6

Fix host interop test that fails differently JDT vs javac. (GR-42882)

4afe23a

Better debug output for errors in the TCK.

41b9c4b

Add no warning as error to the regex down stream gate.

79ffd1a

Fix argument order.

89137ab

Remove -ea from tck again.

4cb3a1f

Fix copyrights in the SDK.

43200a6

Disable transiently failing tests (GR-43473)

bb93acf

Make more Truffle methods implicit never default methods.

41d13a4

Ignore transiently failing test.

c798902

Remove complex fallback static inference, we can just supress the war…

5707ccc

…ning instead.

Fix duplicate count check in fallback check.

8a06709

Fix imported members should not be picked up as specializations.

85dfa56

graalvmbot force-pushed the chumer/GR-31342/dsl-object-inlining branch from bee59ff to 85dfa56 Compare January 12, 2023 10:57

graalvmbot merged commit fe73df9 into master Jan 12, 2023

graalvmbot deleted the chumer/GR-31342/dsl-object-inlining branch January 12, 2023 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GR-31342] Implemented node object inlining, lock free specialization and other features for Truffle DSL. #5566

[GR-31342] Implemented node object inlining, lock free specialization and other features for Truffle DSL. #5566

graalvmbot commented Dec 5, 2022 •

edited by chumer

Loading

[GR-31342] Implemented node object inlining, lock free specialization and other features for Truffle DSL. #5566

[GR-31342] Implemented node object inlining, lock free specialization and other features for Truffle DSL. #5566

Conversation

graalvmbot commented Dec 5, 2022 • edited by chumer Loading

Changes in this PR

Interpreter Benchmarks

Memory Footprint

graalvmbot commented Dec 5, 2022 •

edited by chumer

Loading