Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] unit headers for OCaml source files #26

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gasche
Copy link
Member

@gasche gasche commented Apr 17, 2021

The RFC stems from (old thinking/discussions and) the discussion of ocaml/ocaml#10319 . The RFC text is reproduced below.


Unit headers for OCaml source files

Context

In OCaml, source files play a double role:

  • They are interpreted inside the language as modules, formed by
    sequence of structure items. (Modules can be nested, but a file
    always acts as a toplevel module.)

  • They are interpreted by the compilation tools as "compilation
    units", the primary units of compilation and linking, whose
    dependencies on other compilation units are tracked and whose
    linking order determines the program semantics.

    (Technically a compilation unit is formed by a pair of a .ml and
    a .mli file, but sometimes only one of them when the other does not
    exist.)

Some things that OCaml programmers can express only make sense for
compilation units, not modules. Currently they can only be expressed
through compiler command-line options, typically stored in the build
system. For example:

  • Dependencies on other compilation units or (in general)
    archives/libraries/packages.
  • Global compilation options (-safe-string, -rectypes).

Sometimes it would be convenient, even important, to specify those
aspects in the source code itself, but there is no place in the syntax
to specify them: they are not valid structure items as they don't make
sense inside an arbitrary (nested) module.

One example of the problem

One example use-case is [@@@warning "-missing-mli"]: we would like to
let users explicitly disable the new missing-mli warning
(introduced by #9407 in
4.13~dev) inside a particular .ml file, indicating that it
intentionally does not have a corresponding .mli file.

This warning is implemented at the level of compilation units, not
during the checking/compilation of the module code, so the current
implementation of [@@@warning ..] does not support disabling it: it
only enables/disable warnings for the following structure items in the
current module.

A proposal exists to change the semantics of toplevel @@@warning
attributes to remain in scope for the whole checking/compilation of
the compilation unit, see
#10319. This is
a special case of one the two options discussed in this RFC, and it
led to the present discussion.

Proposals

Two proposals to address this issue, one "implicit" and one
"explicit".

The [@@@warning "-missing-mli"] PR implements the implicit proposal.

I prefer the explicit proposal.

Implicit proposal: handle toplevel attributes/extensions at the compilation unit level

We could consider that floating attributes and extensions that are at
the toplevel of a file are not interpreted as "normal" structure
items, at the level of the module, but instead as "unit"
attributes/extensions at the level of the unit.

let foo = ...

[@@@warning "-missing-mli"] (* warning setting for the whole compilation unit *)

module Foo = struct
  [@@@warning "..."] (* warning setting for a submodule only
end

Pros:

  1. Reasonably easy to implement, no syntax change (we reinterpret
    syntax differently).

  2. This is consistent with the way toplevel directives #foo ;; are
    handled today: toplevel directives are only valid at the toplevel,
    but can be mixed with other structure items.

Cons:

  1. Confuses two notions.

2 We lose the current property that any OCaml code can be moved inside
a submodule, preserving its meaning.

  1. We cannot hope to extend this idea in the future to support global
    settings, such as -rectypes, because it would be a mess to allow
    those to change in the middle of other structure items.

Variant

One possible variant of this proposal would be to specify certain
attributes/extensions as "header attributes", that have the same
syntax as floating structure/signature-level attributes/extensions,
but can only be used at the beginning of the file (before any
non-header construct). This solves Cons.3, but aggravates Cons.1 by
creating more surprises for users (certain toplevel floating
attributes can be moved around and other not, etc.).

Explicit proposal: create a "header" extension for compilation-unit configuration

Instead of implicitly treating toplevel attributes/extensions as
scoping over the whole compilation-unit, we propose a builtin
ocaml.unit_header extension whose content should be understood as scoping
over the whole compilation unit, not just a module.

[%%unit_header
  [@@@warning "-missing-mli"]
  [@@@rectypes]
]

let foo = ...

"unit headers" must be before any other structure/signature items
(comments are allowed before headers). They are the only component of
the .ml syntax that cannot be moved into a submodule (doing so results
in an error).

Note: this RFC does not propose a new @@@rectypes attribute to be
supported here, it is an example of the sort of feature that could,
over time, become available in unit headers. [@@@warning "-missing-mli"] would be immediately adapted to work (only) in unit
headers, but the RFC itself proposes the "header" notion itself, and
not any specific item to be part of it.

Future extensions

In the future, certain toplevel directives could be allowed in the
unit header. This is not proposed here.

We could imagine certain tools querying the unit header of source
files for configuration (or querying the compiler to ask for them),
for example to support a #require ... directive integrated in the
build system. This is not proposed here, and in fact not necessarily
the best approach.
I think it probably makes more sense to reserve the header for aspects
of OCaml programs that the compiler knows about (that correspond to
command-line options), so that header interpretation is left entirely
in the compiler. If someday we have a compiler that handles
dependencies (and/or ppx resolution, etc.) by itself, then those
aspects would become naturally specifiable in unit headers.

@gasche gasche changed the title New RFC: unit headers for OCaml source files [RFC] unit headers for OCaml source files Apr 17, 2021
@Octachron
Copy link
Member

Another point that might be worth investigating is the integration with the functor unit RFC #11 . It seems that functor units would benefit from the possibility of communicating information about the compilation units in the file themselves.

@Octachron
Copy link
Member

Another data point: we already have header attributes for alerts (that I had forgotten). For instance, with

(* a.mli *)
[@@@alert Scylla]
module M: sig [@@@alert Charibdys] end

the Scylla alert is attached to the A module (or compilation unit?), but the Charibdys alert is not attached to M.
If we had an explicit header, we could avoid some confusion.

@gasche
Copy link
Member Author

gasche commented Feb 24, 2022

I must admit that I didn't push for this RFC in any particular way. Florian, if you are willing to openly declare yourself as a supporter, maybe we should start lobbying around? Is there a concrete proposal that we both like that we could push specifically to start a conversation, for example [%%unit_header ...]?

@ivg
Copy link
Member

ivg commented Feb 24, 2022

What is your stand on extending the existing syntax in an explicit way by naturally adding an extra @ to indicate the extended scope? E.g., using x4 @ for the CU-level annotations,

[@@@@warning "-missing-mli"]
[@@@@rectypes]

I personally find this easier to remember and use rather than

[%%unit_header
  [@@@warning "-missing-mli"]
  [@@@rectypes]
]

which requires us (OCaml users) to remember an extra bit of syntax and protocol.

Are there any specific reasons why we need to pack CU annotations in blocks?

@gasche
Copy link
Member Author

gasche commented Feb 24, 2022

The advantage of the %%unit_header proposal is that it requires no syntactic change. This said, I think the proposal of using a new level for unit-global annotation is also fine -- as long as, like %%unit_header, we forbid using those annotations after any other kind of structure item. And it does look a bit nicer on the eye.

@Octachron
Copy link
Member

My main concern with [@@@@ ...] is that it begs the question of where do we stop, will we one day need [@@@@@ ... ] or [@@@@@@ ...] attributes? Maybe pack level annotations should be denoted with [@@@@@ ... ] and library level with [@@@@@@ ... ]? (and don't forget that I cannot(don't want) to count beyond 7 in unary without pretty pictograms like 𓆼).

Nevertheless, I agree that by itself [@@@@ ...] is alright and probably slightly nicer than a [%%header ... ] node in isolation.

@ivg
Copy link
Member

ivg commented Feb 25, 2022

Yes, the backward compatibility concern totally justifies it. For some reason, I was thinking that the parser will accept [@@@@foo] and just ignore it. I also agree with Florian, that after some number of repetitions the syntax becomes ugly and unreadable.

With that said, I am still not sure that the language really needs this. The build system things should be handled by the build systems. E.g., we had the nice notion of tags in ocamlbuild for that. Something similar could be easily implemented in dune. Unfortunately, the current state of compilers (not just OCaml, the whole industry of compilers) still forces us to have various pragmas for optimization and inlining and so on, but I really like that OCaml was (and still quite is) free of them so that the source code is about the program and its logic that keeps it maximum declarative. I am afraid that this change might open the pandora box, after which we will start specifying dependencies and other build system-related stuff in the source code.

@gasche
Copy link
Member Author

gasche commented May 5, 2024

Apparently a similar Rust RFC has just been approved. Their main use-case seems to be single-file programs (cargo script), rather than module-specific data in a larger program.

@dbuenzli
Copy link
Contributor

dbuenzli commented May 5, 2024

Note that there seem to be quite a bit of overlap with existing # ocaml directives which I always found a pity that ocamlc/ocamlopt refuses to parses (if only to ignore).

For example in b0caml, a native scripting system for OCaml, I tried to use (and repurpose) them in an initial preamble (the #directory primitive was the equivalent of #require, something designed in a time where I was optimistic we could get to something simple on the library handling front). Another example is B0.ml files of the b0 build system which adds both # and a few B0 specific directives in order to describe how the build description should be compiled.

@gasche
Copy link
Member Author

gasche commented Jun 5, 2024

I had a brief discussion about this with @Octachron a few days ago, and we agree that accepting a preamble of toplevel directives would be nice. This would push us to add interesting directives that may be missing today (systematically reflect relevant command-line flags as directives), which could also help other users.

It should be easy to parse the preamble for external tools, and in particular to know where it ends. I wonder if we could make explicit ;; mandatory in that part of the source file.

@gasche
Copy link
Member Author

gasche commented Sep 30, 2024

Over at ocaml/ocaml#13471 (comment), @stedolan discussed the need for something in the spirit of this RFC:

There are three reasonable places to specify they keyword set / language edition in use:

  1. On the compiler command line (this PR, proposed in RFC 27)
  2. In the file, as a toplevel attribute or lexer directive (not in this PR, but also proposed in RFC 27)
  3. Globally, as parameters to configure or to the opam switch (neither in this PR nor RFC 27)

I now think it is a mistake to provide only (1). It might even be a mistake to provide (1) at all.

The problem is that there are lots of tools other than the compiler which care what a keyword is: some preprocessors, merlin, ocamlformat, ocp-indent, editor syntax highlighting, etc. In many cases, the tool is given the text of the file, and there is no obvious way to communicate other information from the build. This can work with mechanism (2) (since the tool can see the attribute or directive) and mechanism (3) (since the tool is configured at install time), but not with mechanism (1).

Providing mechanism (1) as a fallback can be useful for when you want to compile old code without modification, where you accept that tooling won't work properly on said code. But it should at least not be the only provided mechanism.

We had this experience on the Jane Street branch when we first added locals. Initially, we used mechanism (1), a command-line flag, and regretted it. In that context, there is only one build system and a limited number of editors / syntax highlighters, and it was still painful.

"Toplevel attribute or lexer directive" seems fairly close to the proposals we made about to use a dedicated extension, or to use toplevel(REPL) directives. (I think that "toplevel attribute" just means "an attribute at the beginning of the file", not toplevel in the REPL sense.)

@gasche
Copy link
Member Author

gasche commented Oct 28, 2024

We discussed this at the maintainer meeting today. There was some interest, and people suggested to try to flesh out a couple examples under different possible syntaxes, to see what would be easier to parse -- in particular, it's nice if we don't need a full OCaml parser to figure out where these header stop and the real code starts.

Syntaxes that were mentioned, in addition to toplevel directives, are {%header <arbitrary string payload>} extension points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants