-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store both ends of the stack chain in continuations #12735
Conversation
The wording is a bit confusing. I agree that resuming a continuation is linear in the length of the fiber chain. This PR makes it constant time as we hold the handle to the last fiber. However, |
Yes, sorry, I meant |
I am yet to review this change, but I am broadly in favour of the idea. We have discussed this optimisation, holding a pointer to both ends of the linked list of fibers, in the continuation as a means of speeding up resumption. Good to see this implemented. Many papers that propose efficient compilation of effect handlers by translating them through CPS, evidence translation, etc., compare the performance of their system against OCaml. The chosen benchmark is usually a microbenchmark with a deeply nested handler stack (which makes sense if the aim is to effect-handler-oriented programming without relying on built-in features in OCaml such as ref cells). OCaml is usually slower than the other systems on such benchmarks. There are two parts to the slowdown observed in OCaml compared to other compilation methods:
While (1) cannot be easily avoided in current scheme, (2) certainly can be and this PR fixes (2). This would make OCaml's performance closer to other systems where there is a deeply nested handler stack. This change does not modify the user-facing API and technically is a very small change. I'll leave a detailed review. The PR needs change in every architecture where OCaml 5 is supported. It would be great if @dustanddreams can help review and potentially implement the missing bits. For TSan bits that may be affected, may I ask @fabbing or @OlivierNicole to see whether the changes are sensible? |
The following diff appears to work for me.
|
@kayceesrk your comment on the performance of OCaml native handlers compared to related works in deeply-nested scenarios is interesting. It would be nice to invite someome motivated to take a benchmark from the literature, and re-run the OCaml measurements before and after the present PR. Do you have a more precise paper/benchmark in mind that we could point someone to? (I am not up to date on recent handlers-compilation work, so myself I would just look at the latest handler-optimisation work by Daan Leijen's group, but there may be better choices.) |
Daan Leijen's work on Koka and Jonathan Brachthäuser work on Effekt are the ones that I have in mind. @dhil are you aware of papers where deeply nested handlers were evaluated? A deeply nested state handler microbenchmark should be easy to write. Let me give this a go. |
I think Handlers in Action may feature such a benchmark. There is my own UNIX operating system implementation (built from ~20 handlers), though I do not yet have an OCaml implementation of it. The C++-effects paper has something with generators, but I cannot remember offhand whether they measure wrt. to the depth of the call stack or handler stack. I have a contrived microbenchmark for this sort of thing in our WasmFX implementation. The program installs a toplevel handler which handles |
I wrote up the microbenchmark here: https://gist.github.com/kayceesrk/92e029ee70ad8cb98c5ef34e88b48959. The benchmark installs a number of nested integer state handlers ( The program takes the number integer state handlers (
The PR makes the program run ~10% faster for deep handler stack. I should note that this 10% is a reasonable improvement. On trunk, the search for the right handler as well as the resumption is linear in the depth of the handler stack (length of the linked list of fibers). In this PR, the former is still linear whereas the latter is constant time. However, the former has a larger constant factor since the search for the matching handler requires executing OCaml code at every handler. The latter is just a traversal of the linked list of fibers, handwritten in assembly, which is now replaced with a constant time operation. |
If we were to give a collective name to operations (e.g. as in #12736) then we could potentially optimise the matching too by storing this name in the header of the each fiber stack. Then rather than having to execute user code to find a matching handler, we simply have to compare identities. With such a scheme in place we can also eliminate (most) unnecessary context switches, as the search for a matching handler can be performed entirely on the performing fiber stack (e.g. as I do here). If a matching handler happens to be incomplete, then we can fallback to the |
This is a very good point. I wonder if @lpw25 was thinking about such an optimisation in separating effects from operations and having a given handler handle all the operations of a given effect. |
Yes, that is what I had in mind in the "Supports more efficient implementation" of that PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me overall. The PR improves performance of deeply nested handlers by around 10% in my experiments. I support the inclusion of this change.
Change to be done:
- There are a few minor correctness and clarity fixes that @dustanddreams has pointed out, which need to be fixed before the PR can be merged.
backtrace/backtrace_effects_nested.ml
test fails on flambda since the reference files have not been promoted. All other tests pass.
The PR also needs a precheck on the INRIA CI to confirm that it works on all the supported platforms. It has been a while since I triggered a precheck. The instructions here seem out of date? https://github.com/ocaml/ocaml/blob/b58aafcdd1335fd0d13ab6214a6a884dfb9b87ab/HACKING.adoc#running-inrias-ci-on-a-publicly-available-git-branch. I don't see a "Build with parameters" option anywhere.
Visiting https://ci.inria.fr/ocaml/job/precheck/ (logged in with my CI-enabled account), I see a list of icons with text on the left column of the page, one of them being "Build with Parameters". |
See run at https://ci.inria.fr/ocaml/job/precheck/914/ . There are some problems:
|
You need to test with this fix applied. |
CI tests branches, not branches + patches. The fix needs to be merged in this branch. |
I confirmed that the PR passes the test suite on the RISC-V backend. |
@lpw25 would you be able to complete the pending tasks? I'm happy to approve after that: #12735 (review). |
d0f918f
to
3437f2e
Compare
Branch rebased, comments addressed and Changes entry added |
LGTM. Thanks for this work @lpw25. |
"Precheck" CI is green on all platforms. Feel free to merge. |
This PR changes the representation of continuations to include pointers to both ends of the chain of stacks that it represents. Currently, when resuming a continuation we must walk the whole chain to find the other end, making
continue
linear in the length of the chain. With this PR the walk is not needed andcontinue
is constant time.Keeping both ends of the chain in the continuation also allows for a
reperform
operation that takes a continuation as input. Such an operation could technically be implemented now, but would produce quadratic behaviour. I don't include this operation in this PR, but it is implemented in a later one.The other end of the chain is stored tagged in the second field of the continuation. The behaviour of the first field is unchanged. The underlying perform primitive now takes 4 parameters and we project out the end of the chain and pass it in.
The bytecode interpreter and all the native code backends have been updated to remove the walk over the stack chain. I've only tested the bytecode and x86_64 implementations though. I also just rebased this over some TSAN related changes, and probably someone who understands those should check things still look right.