Skip to content

Conversation

lefessan
Copy link
Contributor

This PR add hooks in the compiler that can be filled by custom drivers (or by a forthcoming PR adding plugins to ocamlc/ocamlopt):

  • Hooks at the parsetree level (interface and implementation)
  • Hooks at the typedetree level (interface and implementation)
  • Hooks at the lambda code level

@alainfrisch
Copy link
Contributor

I think this is a good direction to allow custom drivers (or later, perhaps, plugins) to register hooks in various places of the compiler. Some comments:

  • I don't think the shared machinery should be implemented in Ast_mapper, since it is used in context which have nothing to do with this module. Perhaps in Misc, or in a new ad hoc module.
  • The term "hook" seems to general for the current proposal, which is really about "global rewriting passes". Many other kinds of hooks (which would not be instances of that common scheme) could be possible, including more local ones such as extension point expanders or general type-checking hooks (on expressions, etc). I'd suggest to rename Hook to Rewriter (or RewriterHook) everywhere.
  • The lambda rewriters could be called directly from Simplif. Same for Parsetree rewriters (-> Pparse) and others. This would guarantee that they are called from all drivers (including the toplevel, which is not currently covered by your PR, AFAICT).
  • Why is the "sourcefile" passed explicitly to rewriters? There are other contextual information which would be extracted from global references (e.g. in Clflags), and I'm not sure the sourcefile deserves a specific treatment. But if you keep it, you should document in HookSig the meaning of the string arguments.
  • What about the application order? Currently it depends on the registration order, i.e. the (static or dynamic) linking order of plugins. Do you think it would be worth passing an optional "priority" argument upon registration to allow some explicit precedence (and fallback to registration order in case of equal priority)? Anyway, this could be added later if needed.
  • I believe that Parsetree mappers are pretty consensual (it is the same processing model as ppxs, and could actually be used to have in-process ppx). Lambda is rather simple to work on, and I can see several interesting use cases for it (such as experimenting with optimization of local exceptions into static jumps). The use cases for Typedtree mapper are less clear to me, considering how difficult it is to rewrite this data structure without breaking its invariants. Do you have some concrete examples in mind?

@alainfrisch
Copy link
Contributor

(In addition to the toplevel, syntactic hooks should probably be used for ocamldoc and ocamldep as well, no?)

@lefessan lefessan force-pushed the 2016-06-29-compilation-hooks branch from 4c77951 to 21ac600 Compare June 29, 2016 17:33
@lefessan
Copy link
Contributor Author

Thanks @alainfrisch for these comments.

I don't think the shared machinery should be implemented in Ast_mapper, since it is used in context which have nothing to do with this module. Perhaps in Misc, or in a new ad hoc module.

In the new version, I moved the functor in Misc, and hooks in the modules where they are applied.

The term "hook" seems to general for the current proposal, which is really about "global rewriting passes".

Actually, they were called rewriters in my former implementation, and I decided to rename them as most of them are not rewriters (they check something on the AST, and usually return directly their argument), in our current use cases.

The lambda rewriters could be called directly from Simplif. Same for Parsetree rewriters (-> Pparse) and others. This would guarantee that they are called from all drivers (including the toplevel, which is not currently covered by your PR, AFAICT).

Done.

Why is the "sourcefile" passed explicitly to rewriters? There are other contextual information which would be extracted from global references

In our current use-cases (for example, the -make option in ocpwin), we found passing the source file more convenient than using a global reference. So, unless it is really a problem, I will keep it that way.

What about the application order? Currently it depends on the registration order

Actually, no, they are applied in the lexicographical order of their names... We usually use names such as 78-rewrite-loops so that it is easy to order them. I am open to a better approach.

The use cases for Typedtree mapper are less clear to me, considering how difficult it is to rewrite this data structure without breaking its invariants. Do you have some concrete examples in mind?

We have some analyses that checks some properties on the typedtrees (in SecurOCaml for example). As I said, they are not rewriters, but really hooks. Pierrick Couderc also has a type-checker for example, that verifies that inferred types are correct, that could be used that way.

@lefessan lefessan force-pushed the 2016-06-29-compilation-hooks branch from 21ac600 to c43c986 Compare June 29, 2016 17:56
@alainfrisch
Copy link
Contributor

  • If nobody has concrete idea about actual Typedtree rewriters, perhaps the API for Typedtree hooks should reflect that (i.e. "typedtree -> unit", not "typedtree -> typedtree"). One can always add rewriters later if needed, but it's good if the API sends the signal that while lambda and parsetree can be easily transformed, the typedtree is more a read-only structure.
  • About the "string" argument passed to the hooks: if we ever want to pass extra contextual arguments in the future, there will be a strong pressure to go through global references to avoid breaking existing code. What about making the API more future proof, e.g. by passing a record argument (a single field for now)?

@alainfrisch
Copy link
Contributor

Concerning Parsetree rewriters, would this be subsumed by #386?

@damiendoligez
Copy link
Member

If nobody has concrete idea about actual Typedtree rewriters, perhaps the API for Typedtree hooks should reflect that (i.e. "typedtree -> unit", not "typedtree -> typedtree"). One can always add rewriters later if needed, but it's good if the API sends the signal that while lambda and parsetree can be easily transformed, the typedtree is more a read-only structure.

IIUC this would require making a second MakeHooks functor and duplicating some code, just for the dubious pleasure of having a non-uniform API. I don't think it's such a great idea.

About turning the string argument into a record, I agree it would be better, if only for documenting the argument itself.

utils/misc.ml Outdated
let stop = loop 0 0 in
Bytes.sub_string dst 0 stop

exception HookExit of exn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the rationale for this HookExit, and the fact that any other exception terminates the compiler (or toplevel) immediately. I'd rather have fold_hooks wrap the exception into another constructor that records the hook name, and let that new exception propagates as usual.

@alainfrisch
Copy link
Contributor

IIUC this would require making a second MakeHooks functor and duplicating some code, just for the dubious pleasure of having a non-uniform API. I don't think it's such a great idea.

I see it the other way around. It's a coquetry to force code sharing (about 10 lines of code) when the various uses don't match exactly. The non-uniform API would document that we don't really support Typedtree rewriters (before someone come with a valid use case), only Typedtree "validators". In particular, ordering of validators doesn't really matter, contrary to rewriters.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 4, 2016

I don't see the point of adding extra lines to prevent a use, just because we don't have the idea yet. Often, the idea comes when the possibility is there.

I will add the record if you think it is useful to document this parameter.

@alainfrisch
Copy link
Contributor

Ok, I won't fight on that.

@Drup
Copy link
Contributor

Drup commented Jul 4, 2016

I agree with @lefessan, I think it should be allowed. In particular, ppx annotations can allow to encode information in the typedtree that can be used by tools later on (documentation, etc). A typedtree rewriter would allow to do non-regular propagation of such annotations, without compromising the correctness of the typedtree.

@lefessan lefessan force-pushed the 2016-06-29-compilation-hooks branch from c43c986 to 6ef534c Compare July 4, 2016 14:56
utils/misc.mli Outdated
lexicographical order of their names.
*)

exception HookExn of exn
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that a hook raising an exception is almost always an error, in which case it should be detected and caught immediately. However, in our experience, there are cases where you really want to raise an exception, that should go outside of the hook. Raising HookExn exn is thus the correct way to raise an exception outside of the hook. For the error, I could indeed create an exception for that case, but in Misc, there is currently no support for printing errors in the standard way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are designing a new API for external use, so let's make it as smooth as possible. One could simply register the wrapper constructor in Misc and record the printer somewhere else (e.g. in Location). I'd also use this wrapper, which would normally terminate the compiler, but not necessarily the toplevel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand what you mean. Could you provide a patch for that ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lefessan lefessan force-pushed the 2016-06-29-compilation-hooks branch from 6ef534c to 4504e48 Compare July 4, 2016 15:19
@alainfrisch
Copy link
Contributor

Concerning Parsetree rewriters, would this be subsumed by #386?

Ping on that. There is a clear overlap between ppx registered in the process (originally designed for the toplevel) and Parsetree rewriters. Adding both APIs at the same time does not seem right.

By the way, one should specify how ppx (provided on the command-line) and Parsetree rewriters are combined (which are applied first).

@lefessan
Copy link
Contributor Author

lefessan commented Jul 4, 2016

Concerning Parsetree rewriters, would this be subsumed by #386?

Parsetree rewriters and ppx serve two different purposes, so there is no reason to link them. Ppx are for users, to extend the syntax of OCaml. Parsetree rewriters are for compiler devs, to easily extend the compiler (their own version of the compiler).

Parsetree rewriters are applied once ppx have been applied and invariants checked.

@alainfrisch
Copy link
Contributor

Parsetree rewriters are for compiler devs, to easily extend the compiler (their own version of the compiler).

Especially with dynamically linked plugins, rewriters will certainly be used by "external" users. One can say they become de facto "compiler devs" if they inject code in the compiler, but the line between ppx and Parsetree rewriters is really not clear to me. Typically, one might want to link a custom compiler driver with built-in ppx. Should this be done with Parsetree rewriters or in-process ppx from the other PR?

Also I don't see why Parsetree rewriters should be run after Parsetree invariants are checked. These invariants are assumptions made by the type-checker and the check could detect mistakes in the rewriter. I don't see a good reason for allowing Parsetree rewriters to break those invariants.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 5, 2016

The rationale for checking invariants before was that internal rewriters were supposed to work on a correct parsetree, but indeed, they could be applied before, so that they would be able to add syntax extensions as ppx do.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

If there is no more opposition, and as it was reviewed by Alain, I will merge by the end of the week.

@bluddy
Copy link
Contributor

bluddy commented Jul 7, 2016

Just to clarify, this is like inserting an API at every compiler layer. Will this inhibit development of said layers? My thinking is that we cannot let considerations about plugins that chose to use unstable interfaces such as these slow down compiler development / refactoring, meaning that the policy should be that it's the plugin authors' problem, but I'd like to hear what others have to say about this point.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

@bluddy I think that there is a consensus on this, yes, using these hooks exposes your code to unstable interfaces, as using the compiler-libs in general. But that's the same thing for ppx, that work on the AST, that is changing all the time...

@mshinwell
Copy link
Contributor

It is worth noting that ppx has been a complete nightmare due to the absence of even a remotely stable interface. This has led to a proliferation of code that is costly to maintain. It hasn't directly slowed down compiler development, but it has indirectly: some people who would otherwise have been working on the compiler have been spending time fixing ppx rewriters. It would be nice not to get ourselves into that position again for something else. This is not to say that rewriters aren't a good idea, but if they become commonplace---which tends to be the case when things like this get into the wild---they might start imposing a burden.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

some people who would otherwise have been working on the compiler have been spending time fixing ppx rewriters

My understanding is that ppx have replaced camlp4 extensions, and these developers would have spent the same time (or maybe more) fixing the camlp4 extensions that they have spent fixing the ppx... unless camlp4 has a more stable interface.

@bluddy
Copy link
Contributor

bluddy commented Jul 7, 2016

@mshinwell, do you have any insight into what the main required fixes were to your (I guess Jane Street's) ppx rewriters? The shape of the AST stays mostly the same from version to version, with only the minor details changing. Would smart constructors have helped? Is it mostly a destructing (matching) problem, where pattern views might have helped?

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

I think Mark was talking about Pierre Chambart, who is spending a lot of time upgrading ppx to work with trunk, to be able to benchmark his modifications to flambda with a lot of opam packages. One of the problems is probably that, if you update version 1 of my-ppx to trunk, then, you have to redo the work again when version 1.1 is released, and also when trunk is modified, since you cannot merge your modifications directly into the source tree of my-ppx, as updates for trunk would prevent it from working with 4.03.0. The only solution I see would be to use preprocessors such as cppo or ocp-pp.

@mshinwell
Copy link
Contributor

I wasn't talking about Pierre actually, but yes, I think he did in fact suffer from this too. We spent a lot of peoples' time at Jane Street over the past few months on various ppx-related changes due to changes in the parsetree, etc. Some form of smart constructors / a generally better interface for building the parsetree might help. The problems are not limited to destruction. It is possible we may work on that at some point, but we cannot commit to it at present.

@lefessan lefessan force-pushed the 2016-06-29-compilation-hooks branch from e9f7ac6 to d9f43d7 Compare July 12, 2016 16:05
@lefessan lefessan closed this Jul 12, 2016
@lefessan lefessan reopened this Jul 12, 2016
@lefessan
Copy link
Contributor Author

I think I replied to all requests for modifications, and Appveyor has its own problems, so merging

@lefessan lefessan merged commit 18cd8a6 into ocaml:trunk Jul 12, 2016
@lefessan lefessan deleted the 2016-06-29-compilation-hooks branch July 12, 2016 18:14
camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017
EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request Dec 17, 2021
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants