New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start from emit #9003
Start from emit #9003
Conversation
0bb655a
to
985674e
Compare
985674e
to
467fd85
Compare
I've made comments on commit 02e5554. This unfortunately doesn't show up as a "review". In a similar way to what I wrote on #8939, someone else should review the general modus operandi of this patch, although I think it is generally fine. I think it would be sensible if someone more familiar with the driver (perhaps @diml or @trefis ) reviewed those parts of the patch in detail too. |
It's probably because the original commit referenced in the PR description changed after a rebase, sorry. I can see your comments on the commit, and I've addressed them with the latest update. |
I had a look at the changes to driver/. Overall they seem fine to me as well, but I'm not completely sure about opening and closing the file in the driver to determine its type, which in particular means that the compiler is going to open an input file twice. My guess is that this is not necessary? |
Driver changes look good to me! |
f907cc6
to
21d5d26
Compare
21d5d26
to
5214860
Compare
It seems a bit strange to me to I'd rather have a |
@damiendoligez yes, -start-after is better than -start-from. I updated the PR accordingly. |
Apologies for the bikeshedding, but what about "stop before" and "start from"? My washing machine (a fine analogy for a multi-pass compiler) has a "stop before spinning" button, it's not labeled "stop after last rinse" :-) |
The way I understood the comment from Damien, he was supposing that it was a change of semantics, not just of naming: for what you were previously using |
@xavierleroy it may be the case that the two should be offered, but I thought about this as well and I think that
For |
@xavierleroy Initially, I had the three options referring to the pass that actually matters: Keeping -stop-after unchanged, it seems better to have -start-after than -start-from so they talk about the same pass at least. It's still not ideal, as the user needs to know which pass comes before "emit", and if the order of passes changes, the build system rules would need to be adjusted. Having both -stop-after and -stop-before might be confusing, and another syntax for it didn't seem worth the complexity and breaking backwards compatibility. I was thinking of -stop before,pass -stop after,pass -save-ir after,pass and so on.
There is a change in optcompile.ml from |
I guess that I don't understand what the semantics of
Currently in the compiler we use a mix of contextual information and specific flags to decide where to start from: |
The implementation in the driver uses
I like this approach! I don't think we can detect, based on the IR alone, which pass to start from. In the backend, there is more than one pass operating on the same IR. For Linear IR, this is currently only "scheduling" and "emit". For Mach, there are many passes and compilation can break if IR is not processed by the correct pass, but we don't have an option to save Mach IR yet. We could record in the IR which pass produced it. For Linear IR, simply starting from the first pass that consumes it would also work. Scheduling doesn't do anything on amd64 and arm64. For other targets, scheduling is not idempotent, I think, but it will still generate correct code, just not the same as "normal" compilation from .ml. Is that what you had in mind? |
I hadn't considered the which-pass issue, but yes I would find it natural to record in the serialized output the pass that produced the IR, and then start from the next pass when receiving the dump as input.
I may be looking at the wrong place but what I see is - Compile_common.implementation info ~backend
+ if Clflags.(should_start_after Compiler_pass.Scheduling) then
+ emit info
+ else
+ Compile_common.implementation info ~backend This is not a very regular integration of In contrast, here is the let parsed = parse_impl info in
if Clflags.(should_stop_after Compiler_pass.Parsing) then () else begin
let typed = typecheck_impl info parsed in
if Clflags.(should_stop_after Compiler_pass.Typing) then () else begin
backend info typed
end;
end; I wouldn't call it nice code, but at least it is clear that it is a natural extension of a pass pipeline (parsing-typing-backend) with some conditionals to stop along the way. If I wanted to support more passes for I realize that this is not a very constructive comment because I can describe what I'm not fond of, but I don't have suggestions for a better implementation of the feature. With If our pass structure was just a sequence of imperative functions with no explicit input-output: let driver () =
parse ();
typecheck ();
simplif ();
bytegen ();
byteemit (); then it would be very clear how to do it: let ifneeded pass action =
if rank pass > !stop_after || rank_pass <= !start_after then ()
else action ()
let driver () =
ifneeded Parsing parse;
ifneeded Typing typecheck;
ifneeded Simplif simplif;
ifneeded Bytegen bytegen;
ifneeded Byteemit byteemit; The idea of automatically starting from the correct pass when an IR file is given as input solves this problem with a difference interface. Again in pseucode form, this could maybe look like this: let rec from_source source =
let parsetree = parse source in
if should_stop_after Parsing then ()
else from_parsetree parsetree
and from_parsetree parsetree =
let typedtree = typecheck parsetree in
if should_stop_after Typing then ()
else from_typedtree typedtree
and from_typedtree typedtree = ...
let driver () =
let info = ... in
match input with
| Source source -> from_source source
| Parsetree parsetree -> from_parsetree parsetree
| Typedtree typedtree -> from_typedtree typedtree
| ... I'm not saying that the code should be like that, but at least I have the impression that we could implement it this way if we wanted something more generic, which is reassuring. |
@gasche thank you for the detailed explanation. Following your suggestion, I removed -start-from command line option. The new version uses the input filename extension to decide which pass to start from. This version simplifies code from the The implementation goes some way towards the generic interface you outlined, with minimal changes to the existing code. Specifically, the mapping from IR to compiler pass is made by |
ocaml#8938 from upstream trunk (cherry-pick 4353c75) ocaml#9090 from upstream trunk (cherry-pick fe7c8ed) ocaml#9097 from upstream trunk (cherry-pick 6daaf62) ocaml#8939 from gretay-js/save-linear (cherry-pick 1200de9) ocaml#9003 from gretay-js/start-from-emit (cherry-pick 21d5d26)
This reverts commit 4673d4e.
e8c8540
to
9473284
Compare
I've rebased this PR, all CI checks pass, and it should be easier to review now (after the merge of #8939). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed the code. I believe the change is correct, and if I am not mistaken, the interface that is now implemented is consensual.
I think that all global state is properly managed at resumption but I am not 100% sure about the control flow of the driver. I will keep testing it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the implementation overall, but I found one minor bug.
When passing simultaneously -save-ir-after scheduling
and a .cmir-linear
argument,
the file is compiled properly but is then overwritten with an empty linear program.
This is because write_linear
is always called by Asmgen.compile_unit
, even when resuming.
However, when resuming, the global linear_unit_info
(the one saved to disk) is left empty (instead, a local linear_unit_info
is used by Asmgen.linear_gen_implementation
).
I am not sure what the best fix is, but I think that when resuming from a cmir-linear
input nothing should be saved (scheduling is not part of the compilation pipeline in this case).
Other minor remark: the -for-pack
argument when resuming should be set to the same value as in the original call. I wonder if instead the for-pack
setting could be saved in the linear format. This is strictly more robust and does not make the driver any less flexible.
Thank you very much for the detailed review!
Ah, yes, thanks for catching this problem! If this information is recorded during
Done. |
Thanks. I think it is good for merging now 👍 . |
Thank you. I've updated the Changes file. |
This reverts commit 79c2664.
Unless I am missing something, there is currently no documentation for the feature introduced here and in #8939. The github issues are not great, given that the interface changed between the initial proposal and the actual implementation. In fact I don't even know what are the command-line options to use the feature, and I don't know where to find this information anymore. Could this be fixed? (This question arose through Kakadu asking about the feature from the 4.12 changelog on Discuss.) The feature should be documented in (at least) two places:
The documentation needs not be very detailed, I understand that this is a more experimental feature that is mostly there to enable external tools (you could of course link to those external tools, or decide not to do it), but I think there should be at-least-minimal documentation for all features. |
Thanks for pointing out the discussion. I'll submit a PR to document these options asap. |
See ocaml/ocaml#9003 for the introduction of the related breaking change within OCaml.
See ocaml/ocaml#9003 for the introduction of the related breaking change within OCaml.
See ocaml/ocaml#9003 for the introduction of the related breaking change within OCaml.
This PR adds a command line option "-start-from " to start compilation from a given pass. Currently, only "emit" pass is supported. The input files are expected to be in Linear format. This is analogous to the way ppx rewriting is implemented in the frontend.
This option, along with "-stop after" and "-save-ir-after", provides a way to split compilation into fine-grained phases. The motivation for this is to perform code layout optimizations: an external tool can read Linear IR from a file, manipulate it via compiler libs, save to a file, and then emit the code as usual by invoking the compiler. We are planning to release a tool that will do this.
An alternative to "-start-from emit" option would be to call Emit directly from compiler-libs. It would require replicating intricate code from asmgen in the external tool, and command line parsing to set Clflags fields that may affect the emitter and later stages. Also, the option "-start-from emit" improves the integration with build systems that can automatically avoid redundant recompilation when only the layout changes.
This PR is on top of PR #8939. The only new commit here is 02e5554.