-
-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locally defined structs are slow in RacketCS #3535
Comments
I see two possibilities here: re-implement Racket-style lifting (currently in schemify) at the Chez Scheme layer some time after cp0, or just inline struct operations at the schemify layer. The later looks much easier. Racket-style lifting ensures that when a function is locally defined and only referenced as a called function (i.e., not as a first-class function), then a closure is never allocated. Instead, values that would otherwise be in the closure are converted to arguments that are passed at each call site. There are cases where creating a closure can be faster than passing extra arguments, but usually not. Experience from the first couple of years of porting Racket to Chez Scheme convinced me that this part of Racket's cost model is important to keep. There's even a macro-based implementation of lifting for the Rumble layer in "racket/src/cs/rumble/define.ss" so that the Racket cost model works there. (Otherwise, for example, the Rumble layer's This lifting interferes with inlining, and that was the original motivation for inlining support in schemify. However, schemify doesn't currently try to inline the application of struct operations like predicates and selectors. The authentic operations are more opaque to schemify than to cp0, but schemify could replace an authentic struct selector with Adding a Racket-style lifting pass to Chez Scheme would be more general, and that might reduce the need for inlining in schemify. Then again, unless support for cross-linklet optimization is also pushed down to the Chez Scheme level, then inlining is still needed at schemify. Finally, lifting at schemify is probably still needed for code that has to be interpreted (when the enclosing module is too large), unless we find another solution to that problem. |
It turns out that inlining at the schemify layer is especially easy: it's just a matter of removing the constraint on the existing inlining implementation that makes it apply only to imported variables. I'm looking into this in combination with potential changes to avoid the tables of constructors, predicates, accessors, and mutators. |
Just experimented on a lift pass for chez (working on
vs
The gc pressure was reduced by about 18%, though the influence on running time is small. |
Nice! I look forward to your results. |
Avoid a global table to register structure procedures, and instead use a wrapper procedure. At the same time, adjust schemify to more agressively inline structure operations, which can avoid a significant performance penalty for local structure types. Closes racket#3535
Avoid a global table to register structure procedures, and instead use a wrapper procedure. At the same time, adjust schemify to more agressively inline structure operations, which can avoid a significant performance penalty for local structure types. Closes racket#3535
Avoid a global table to register structure procedures, and instead use a wrapper procedure. At the same time, adjust schemify to more agressively inline structure operations, which can avoid a significant performance penalty for local structure types. Closes racket/racket#3535 Original commit: racket/ChezScheme@497351b
Avoid a global table to register structure procedures, and instead use a wrapper procedure. At the same time, adjust schemify to more agressively inline structure operations, which can avoid a significant performance penalty for local structure types. Closes racket/racket#3535 Original commit: racket/ChezScheme@54b5839
What version of Racket are you using?
7.9 [cs]
What program did you run?
What should have happened?
I expect they have similar performance, but the local version is 3-4x slower.
I consider this behavior comes from the lift pass of schemify. The cp0 output is
module-level (seems optimal):
local (struct operations are not inlined):
The text was updated successfully, but these errors were encountered: