Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't remove module aliases in `module type of` and `with module` #1652

Merged
merged 5 commits into from Mar 23, 2018

Conversation

Projects
None yet
9 participants
@lpw25
Copy link
Contributor

lpw25 commented Mar 8, 2018

This PR stops module type of and with module from recursively removing all module aliases from the resulting type. This behaviour was added for MPR#6307, but it is problematic:

  • It is responsible for a large portion of the type-checking time in our code base, because code like include module type of Core creates a .cmi file containing the entire definition of the Core library. We frequently get large jumps in compile-time when a developer inadvertently adds such a line. For example, we had a 15% jump in the compile time of our entire code base two weeks ago from the addition of a single such line.

  • It means there is no direct equivalent to include Foo in the signature language even include module type of struct include Foo end will not give you the same type.

This PR is not a backwards compatible change. To ease with transition I've included a [@remove_aliases] attribute that can be attached to module type of or with module to get the old behaviour. @trefis did a run over opam to see how much this change would break -- outside of Jane Street libraries there only appeared to be two cases. So perhaps having this attribute is over-kill.

I'm also working on a series of patches to change the semantics of module type of to be less problematic, which somewhat supersede this patch, and I hope to have them ready in time for 4.07. However, I decided to submit this patch anyway because it's not certain those patches will make it in time, and removing this behaviour is a high priority for us.

@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Mar 9, 2018

Which are the packages concerned? Does this include Frama-C?

In general, for changing the semantics one should be conservative. The attribute seems to be a minimum in case anybody depends on the old semantics, since it is not easy to simulate.

@@ -1733,19 +1751,26 @@ and normalize_signature_item env = function
(* Extract the module type of a module expression *)

let type_module_type_of env smod =
let remove_aliases = has_remove_aliases_attribute smod.pmod_attributes in
let tmty =
match smod.pmod_desc with
| Pmod_ident lid -> (* turn off strengthening in this case *)
let path, md = Typetexp.find_module env smod.pmod_loc lid.txt in

This comment has been minimized.

@garrigue

garrigue Mar 12, 2018

Contributor

Could you avoid changes that only change the layout of the code?

@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Mar 12, 2018

Actually, I'm a bit confused about what this PR does.
For a coherent behavior I would have expected it to return the strengthened version of the module type (including types), but apparently in only keeps module aliases.
I'm not sure what would be a good definition of this behavior.

If the only thing that bothers you is that removing aliases makes module types bigger, another solution would be to have weak aliases, which only mean that the module has the same type, but not that it is the same module.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Mar 12, 2018

Which are the packages concerned? Does this include Frama-C?

The two we saw were in the General and ocamlformat packages. The testing was with opam builder, which seems to fail for quite a lot of packages on the ordinary compiler, including Frama-c, so we don't have results for those. @trefis is just trying to build Frama-C with the patch now.

@jberdine

This comment has been minimized.

Copy link

jberdine commented Mar 12, 2018

@trefis

This comment has been minimized.

Copy link
Contributor

trefis commented Mar 12, 2018

Just to confirm: frama-c compiles fine with this patch.

@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Mar 12, 2018

Well, testing opam is one side of the problem. With this change, you have more aliases than you used to, which can lead to exposing more type equalities than intended. You can have code compiling that previously didn't. In particular, I would have to check specifically for tyxml/eliom, but I wouldn't be surprised if we, on purpose, don't reexport modules in a way that hides some of the type equalities.

@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Mar 12, 2018

To make this more specific: I believe this is not a problem in practice for us (iirc, I made the change to explicit signatures a while ago), but the problem remains in general, and this change might compromise the soudness of some libraries without producing any sort of warning in the code, which is a bit annoying.

All that being said, I think the idea behind this change is for the best.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Mar 12, 2018

For a coherent behavior I would have expected it to return the strengthened version of the module type (including types), but apparently in only keeps module aliases.

I'm not quite sure what you mean here. Do you mean "I thought this would be the patch that makes module type of always give the strengthened module type" or do you mean something more specific about this patch.

For clarity, I'll give an example of what this patch does. Given:

module Foo : sig
  type t
end

module Bar : sig
  module M = Foo
  module N : sig type t end
  type t
end

module Baz = Bar

We currently get:

sig
  module M : sig type t = Foo.t end
  module N : sig type t end
  type t
end

for module type of Bar, and:

sig
  module M : sig type t = Foo.t end
  module N : sig type t = Bar.N.t end
  type t = Bar.t
end

for module type of Baz.

With this patch, we instead get:

sig
  module M = Foo
  module N : sig type t end
  type t
end

and:

sig
  module M = Foo
  module N : sig type t = Bar.N.t end
  type t = Bar.t
end

I intend to make a different patch later, which would instead give:

sig
  module M = Foo
  module N = Bar.N
  type t = Bar.t
end

for both module type of Bar and module type of Baz. This other patch is not quite ready yet, and may not be ready in time for 4.07, so I would like to consider merging this current patch in the mean time to make sure that we fix the worst part of module type of in 4.07.

@lpw25 lpw25 changed the title Don't remove aliases in `module type of` and `with module` Don't remove module aliases in `module type of` and `with module` Mar 12, 2018

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Mar 12, 2018

This other patch is not quite ready yet, and may not be ready in time for 4.07

Perhaps I should say more about this. The actual change to use the strengthened module type is quite simple. From my experiments with it, this change causes a fair few compilation errors. These errors can be categorised into the following three sorts:

  1. Cases where the user genuinely wants an unstrengthened type. These cases can be fixed in the short-term by just adding a [@weak] attribute to the module type of -- similar to the [@remove_aliases] attribute in this patch. In the long term the code can be rewritten to give a proper name to the module type and share it between the two locations.
  2. Cases where they want the strengthened type, but use it in a way that only happened to work due to getting a less strengthened version. In particular, code of the form:
module M : sig include module type of Foo.M ... end
include module type of Foo with module M := M

is quite common, but should really be written as:

module M : sig include module type of Foo.M ... end
include module type of Foo with module M := Foo.M
  1. Cases where they want the strengthened version, but their module doesn't quite implement that type because of places where we drop module aliases. In particular, the fact that we don't alias functor parameters and that with module M = Foo does not give an alias to Foo causes confusing errors here.

The first two sorts of error are easy to understand and fix, but errors of the third sort are quite painful. So I came to the conclusion that it would be best to simultaneously change module type of and keep aliases in more places. It is the patches to keep these additional module aliases that are not yet complete.

@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Mar 13, 2018

I was indeed talking about the fully strengthened type, which makes more sense in the long run, and I suppose you agree on that.
For the 3rd category of problems, we shall eventually get more precise aliases by allowing "type-only" aliases, that are not used for compilation.
From this point of view, what is the status of this PR?
An intermediate semantics that is going to be discarded anyway?
Or do you want to argue that it makes sense to keep explicit aliases (after all module type of does keep type abbreviations), independently of the strengthening question?

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Mar 13, 2018

An intermediate semantics that is going to be discarded anyway?
Or do you want to argue that it makes sense to keep explicit aliases (after all module type of does keep type abbreviations), independently of the strengthening question?

I think that it is both an intermediate semantics that we intend to discard, and an improvement over the current semantics. As you say, module type of keeps type abbreviations, and I think keeping module aliases is what users would expect to happen. Also with the current semantics there is no equivalent to:

include Foo

in the signature language, since even:

include module type of struct include Foo end

will remove the module aliases in Foo. At the moment, we currently have to leave some files without .mli files to avoid this issue. Combined with the performance improvements, I think this justifies making this change even though we intend to change the semantics again in a later PR.

@schrodibear

This comment has been minimized.

Copy link

schrodibear commented Mar 16, 2018

Just wanted to note that the current behaviour with removing module aliases in with module feels quite unpredictable and makes using module aliases in module signatures quite hard. Several examples:

module type S = sig end
module type S2 = sig module M1 : S module M2 = M1 end
module type S' = sig module M : S2 end
module M_ok : S2 = struct module M1 = struct end module M2 = M1 end
module U : S' with module M = M_ok = struct module M = M_ok end
module OM1 : S = struct end
module M_err : S2 with module M1 = OM1 = struct module M1 = OM1 module M2 = M1 end
module U' : S' with module M = M_err = struct module M = M_err end

gives error in the last line, but e.g.

module type S = sig end
module type S2 = sig module M1 : S module M2 = G end
module type S' = sig module M : S2 end
module M_ok : S2 = struct module M1 = struct end module M2 = G end
module U : S' with module M = M_ok = struct module M = M_ok end
module OM1 : S = struct end
module M_err : S2 with module M1 = OM1 = struct module M1 = OM1 module M2 = G end
module U' : S' with module M = M_err = struct module M = M_err end

(with an unresolved module G) successfully passes with -no-alias-deps.
Also

module type S = sig end
module type S2 = sig module M1 : S module M2 = M1 end
module type S' = sig module M : S2 end
module M_ok : S2 = struct module M1 = struct end module M2 = M1 end
module U : S' with module M = M_ok = struct module M = M_ok end
module M_err = M_ok
module U' : S' with module M = M_err = struct module M = M_err end

(the only difference between successful and failed substitution is module vs. its alias) fails.
Even if the alias is preserved it can fail to match the alias required by the signature:

module type S = sig end
module type S2 = sig module M1 : S module M2 = M1 end
module type S' = sig module M : S2 end
module M_ok : S2 = struct module M1 = struct end module M2 = M1 end
module U : S' with module M = M_ok = struct module M = M_ok end
module type S'' = sig module M : S' end
module U' : S'' with module M = U = struct module M = U end
@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Mar 19, 2018

OK, I have to admit that the removal of module aliases was intended to keep full compatibility with a world without module aliases. Now that many people use module aliases, this changes the rules of the game.

I am fine with this PR (since it particular it keeps a way to get the old behavior punctually if needed).
I suppose you will tell the authors of the concerned packages.

Just give me a bit of time to review the changes.

@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Mar 19, 2018

This may be seen as purely cosmetic, but I think the code would be clearer if you would replace remove_aliases and scrape_for_type_of in Mtype by a single function:

val scrape_for_type_of: remove_aliases:bool -> Env.t -> module_type -> module_type

At least this would avoid some code duplication.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Mar 22, 2018

but I think the code would be clearer if you would replace remove_aliases and scrape_for_type_of in Mtype by a single function

I've made that change and added a changes entry.

@lpw25 lpw25 force-pushed the lpw25:dont-remove-aliases branch from f05c1ad to 2b412c8 Mar 22, 2018

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Mar 22, 2018

I've rebased and will merge once the CI passes.

@lpw25 lpw25 merged commit dd34dd5 into ocaml:trunk Mar 23, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Apr 17, 2018

This actually breaks containers as well, which does something like this to extend the ArrayLabels module, compiled with -nolabels.

CCArrayLabels.ml:

include CCArray

CCArrayLabels.mli

include module type of struct include ArrayLabels end

(** additional functions **)

cc @c-cube

So with this change, the relation between module type of Foo and module type of struct include Foo end is even weirder and impossible to understand for mortals than before. Let's hope we will make the next part of the change quickly. :)

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Apr 17, 2018

Out of interest, what is the error message for this case? It's worth noting that any error from this change would also be error with the more complete version, although I agree that knowing when they will occur is harder with this version than the full version.

@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Apr 17, 2018

# File "src/core/CCArrayLabels.ml", line 1:
# Error: The implementation src/core/CCArrayLabels.ml
#        does not match the interface src/core/.containers.objs/CCArrayLabels.cmi:
#        In module Floatarray:
#        Modules do not match:
#          (module Array.Floatarray)
#        is not included in
#          (module ArrayLabels.Floatarray)
@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Apr 17, 2018

Good, that is at least an error I would expect (or would have done if I'd remembered that the Floatarray module existed). It would indeed be an error in the fuller version of this proposal as well.

@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Apr 17, 2018

@lpw25 That particular problem is fixable by using module type of ArrayLabels instead, but only because Array doesn't contain any type definition that need re-exporting.

The next version of the semantics that always strengthen the types will fix that issue. But in the meantime, it's impossible to extend a module that has both modules and type declarations without using the attribute. That's a bit unfortunate.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Apr 18, 2018

If it introduces subtle incompatibilities that affect real programs (a lot of programs extend modules with submodules and type definitions), should we remove this change from 4.07?

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Apr 18, 2018

The next version of the semantics that always strengthen the types will fix that issue.

The next version would make module type of ArrayLabels fail the same way that module type of struct include ArrayLabels end currently fails. I'm not sure whether that is what you meant.

But in the meantime, it's impossible to extend a module that has both modules and type declarations without using the attribute.

Sure it is.

The problem here is that the interface essentially says that this module includes ArrayLabels -- which means that it must contain a module called Floatarray that is equal to (i.e. an alias to) ArrayLabels.Floatarray. Now either:

  1. CCArray contains such a module but does not expose the fact that it is an alias in it's interface. In which case the simplest solution is to add this alias in the interface.
  2. CCArray contains a module that has the same interface as ArrayLabels.Floatarray but is not equal to it because it changes some implementations. In which case the interface should be written as:
include module type of struct include ArrayLabels end
  with module Floatarray := ArrayLabels.Floatarray
module Floatarray : module type of struct include ArrayLabels.Floatarray end

Note that in case 1 we get a better module type for CCArrayLabels with the new version, whilst in case 2 we get a clearer mli file that indicates that Floatarray has been modified and also mirrors the implementation code that you would need to write to produce it.

If it introduces subtle incompatibilities that affect real programs (a lot of programs extend modules with submodules and type definitions), should we remove this change from 4.07?

The only subtlety here is that module type of ArrayLabels fixes the problem here, when really it shouldn't. This change now gives the correct behaviour for module type of struct include Foo end, but we need further changes to also fix module type of Foo.

I expect this change to surprise some people, because people are used to thinking about the equalities on types but tend to ignore equalities on modules. That's why there is an attribute to get back the old behaviour -- people don't need to understand the module system to get things back to working again -- and one they have got used to thinking about module aliases then the new system is easy to think about.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Apr 18, 2018

I've had a look at what Containers is doing here, and this case is actually part way between 1 and 2. Morally it is an instance of 1: ArrayLabels.Floatarray is an alias to Array.Floatarray -- but only when -no-labels is turned on otherwise those modules have different types, which means you cannot expose the alias in the interface. So in practice it is an instance of 2 and should be fixed in the same way.

The problem here really lies with the -no-labels hack used by both Containers and the standard library.

@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Apr 18, 2018

@lpw25 Except that doesn't scale. If ArrayLabels gets an additional submodule, it will break again.
This is a valid (and relatively common) use case, we need a proper solution that involves neither manually restating all the submodules nor using annotations.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Apr 18, 2018

This is not a common use case: the problem is caused by -no-labels. If it wasn't for -no-labels then you would only be specifying submodule that had themselves been changed.

@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Apr 18, 2018

Ah, fair enough, I see what you mean. Still, it's a fairly problematic case and even if it's a bit hackish, we should find a natural way to handle it.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Apr 18, 2018

I think for now you'll just have to treat modules that use the -no-labels trick as modifying all the submodules. Personally, I'd like to get rid of -no-labels going forwards I suspect it is used to avoid writing about 10 files world-wide -- we can probably just right those files. I also have serious doubts about it's soundness.

@bluddy

This comment has been minimized.

Copy link

bluddy commented Apr 18, 2018

The manual should be updated at 8.10, where it has a line on expanding aliases: For the same reason, module aliases in the inferred type are expanded.

It should be fairly easy to create a simple generator for the non-labeled version files from the labeled versions. This consistency is probably the biggest advantage of using the -no-labels hack.

If I may ask, is the only difference between module type of X and module type of struct include X end that the latter version creates aliases to X's internal types/modules, or is there anything else? I couldn't find anything to explain this difference in the manual, and if there is a difference, it'd be nice to have it elucidated either in 8.10 or somewhere in chapter 2.

@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Apr 18, 2018

Interesting: -nolabels is clearly unsound when combined with GADTs (which were added more than 10 years later...). There is a fix, like was done for -rectypes, but I wonder if it's even worth doing it (its uses are so rare anyway, and it would have to be really intentional).

On the other hand, wouldn't just keeping the old behavior when -nolabels was given be sufficient here?

Also, is this really the only case where the problem arises?

@yallop

This comment has been minimized.

Copy link
Member

yallop commented Apr 18, 2018

Interesting: -nolabels is clearly unsound when combined with GADTs (which were added more than 10 years later...).

There was one fix for this in 6e694c6 (PR7432). Are there still some problematic interactions between -nolabels and GADTs remaining?

@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Apr 19, 2018

Indeed, this is already fixed. And I don't see any other possible unsoundness with -nolabels.
I.e., for "positive" information, what it does is trivial: just forget the name of the label, which has clearly no impact runtime (there is a paper with the specification on my publication page). The subtle case is for "negative" information, exploited by GADTs, but it is fixed by always assuming -nolabels when unifying GADT parameters. If there is a problem with that, it would appear even without using -nolabels...

So, back with my second proposal: what about assuming [@remove_aliases] when in -nolabels mode? Would this still break some existing code?

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Apr 27, 2018

MPR#7787 ("module type of" + recursive modules + functors segfault) seems related to this change.

trefis added a commit to trefis/ocaml that referenced this pull request Apr 27, 2018

PR#7787: fix module type of and recursive modules interaction
This completes the revert of 65b1193 which was
started in GPR ocaml#1652 .

@hhugo hhugo referenced this pull request Apr 29, 2018

Merged

Add support for bisect #169

cyberhuman added a commit to ahrefs/ocaml-sodium that referenced this pull request Mar 5, 2019

Fix tests for ocaml 4.07
Use [@remove_aliases] until 4.08 is out that will strengthen the types.
ref ocaml/ocaml#1652 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.