Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative implementation of type-checking for `open` #828

Closed
wants to merge 7 commits into from

Conversation

Projects
None yet
6 participants
@alainfrisch
Copy link
Contributor

commented Sep 27, 2016

Currently, open is type-checked by extracting the signature of the opened module, rewriting its paths to add the opened path prefix, and then inserting each element of the signature (as if type-checking that signature). With this approach, components of the opened module are recomputed on every open. For instance, labels and constructors are processed by Datarepr for each data type declaration (to produce label_descriptions/constructor_descriptions).

This PR proposes an alternative implementation technique for type-checking an open statement: extracting the components of the opened module and re-inserting them in the environment with a local name.

This guarantees for instance that if a name X refers to a component in module Foo after an open Foo, then this lookup for X returns exactly the same component as a lookup for Foo.X. This is not the case with the current implementation, which is one way to explain http://caml.inria.fr/mantis/view.php?id=7372 (fixed by this PR).

Also, this alternative technique can be much faster in some cases (see below) and it simplifies a bit some code by removing from many functions the need to support open-related warnings (shadowing and unused open).

Concerning performance, consider for instance:

module A = struct
  module X1 = struct end
  ...
  module X500 = struct end
end

open A;;
.... (* 2000 times *)
open A;;

type-checking this modules goes from 10s to 1.6s on my machine. (Similar speedup if the opened module contains instead "type t1 = A of {x:'a}, etc", and the ratio grows with larger type declarations.)

Note that with local opens (esp. with their short form), it is not uncommon for the same module (potentially large) to be opened many times in a given compilation unit.

[edit: also relevant to the discussion is http://caml.inria.fr/mantis/view.php?id=6826]

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2016

(ocamldoc is failing, will investigate that...)

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

FYI, the only failures I see in the testsuite are:

List of failed tests:
    tests/warnings/deprecated_module_use.ml

List of unexpected errors:
    tests/parsetree
    tests/ast-invariants

List of skipped tests:
    tests/lib-bigarray-2
    tests/lib-dynlink-csharp
    tests/asmcomp/unrolling_flambda
    tests/asmcomp/unrolling_flambda2
    tests/asmcomp/is_static_flambda

So this seems to be a problem with how ocamldoc is using compilerlibs, rather than something wrong with open_signature.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2016

The problem with ocamldoc: odoc_analyse calls Env.open_pers_signature "Pervasives" initial, even when handling Pervasives itself. Previously, this forced to read pervasives.cmi (with find_pers_struct) to load the signature; and now the components are resolved with the normal resolution mechanism, and this doesn't allow the .cmi corresponding to the current unit. I've fixed that by changing the logic in ocamldoc: when processing a unit named "Pervasives", do not try to open pervasives.cmi (this actually simplifies the code).

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

This seems like a reasonable change, but I'm quite concerned that it affects the behavior of inline records at all. It seems that inline record types are created on the fly in datarepr.ml which seems like quite a bad idea. As with other "hidden associated types", like the private abstract row for a private row type, these types should be part of the signature in which they are created, otherwise you have this situation where a nominative type doesn't have a proper name.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2016

Many different techniques were tried when implementing inline records. One of them involved representing these types explicitly in signatures, but it leads to a lot of complications. In the current implementation, the inline record does have a proper name, of the form M.t.X; it's only that the representation of this type is computed on demand.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

In the current implementation, the inline record does have a proper name, of the form M.t.X; it's only that the representation of this type is computed on demand.

But there is no element X inside of the description of the module M. It only exists in the environment not inside the module type. Whilst having it in the module type is more work it is also much more correct, and so much less likely to have unexpected behaviour like that which you are currently trying to fix.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

It might also be a good idea to clean up how private row types are handled at the same time. Currently they are siblings of the main type_delcaration, but ideally they would actually be members of the type_declaration (e.g. an optional row_type field, or maybe an optional extra_types field for both rows and inline records). I suspect this would noticeably simplify some things in includemod/includecore.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2016

I thought you were suggesting to use a trick similar to row_type, which is precisely what I tried and reverted in favor of the current approach, which is more robust and simpler. So what you're suggesting is to create and keep the type_declaration corresponding to the inline record as part of the constructor_declaration / extension_constructor (that would be an extra argument to Cstr_record, I guess; or perhaps a replacement for the current label_description list). This inner type_declaration would simply be copied over (instanciated) to produce the cstr_inlined field of constructor_description. Is that right? I don't see any big issue with that, but I'm not sure it would simplify/robustify a lot compared to the current solution. One would still need a way to refer to the inner type, with a path such as M.t.X

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

Is that right? I don't see any big issue with that, but I'm not sure it would simplify/robustify a lot compared to the current solution.

Basically, although it is a little more involved than that. These inner types should be treated essentially the same as any other type definition within the signature. For example, they should be appropriately handled by substitutions and strengthening.

Grouping them inside of the type_declaration is just to make operations like:
S with type t = foo simpler to implement correctly -- in this case it should add equalities on both the type t and any "extra" types inside of t. Similarly, for inclusion between modules -- which is currently quite awkward for row types because they are the only signature item which can be added to a module as part of ascription.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

and strengthening
...
in this case it should add equalities on both the type t and any "extra" types inside of t

Another alternative to these two parts would be to take the M.t.X style path seriously and allow such paths to be normalized by normalize_path so that if M.t expands to M.s then normalize_path would change M.t.X to M.s.X.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2016

in this case it should add equalities on both the type t and any "extra" types inside of t

That would not really work because equalities are not mutually recursive, but they should in order to support types such as type t = A of {x : t} | B.

so that if M.t expands to M.s then normalize_path would change M.t.X to M.s.X.

I'm concerned that this would precisely create a problem with the lack of canonical ordering between free type variables for the inner inline record.

Remember that M.t.X is not exposed to the user, and is only used to allow code patterns such as A r -> r.x <- r.x + 1 (r is not allowed to escape). I'm not sure to see which benefits one would get in extending normalize_path.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

I'm concerned that this would precisely create a problem with the lack of canonical ordering between free type variables for the inner inline record.

Yes I was just about to post a correction on this. You really can't think of M.t.X as a real path, or at least not as a real path to the type.

I need to give this some more thought. Both the way these types are handled and the way row types are handled are pretty dubious. I'm convinced there is a cleaner way to represent what is going on, but it requires some more thought.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

(and whilst I'm at it the way classes and class types are made up of three/four different components with the same name is also pretty questionable -- I'd love to have a cleaner solution to that as well).

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2016

You really can't think of M.t.X as a real path, or at least not as a real path to the type.

Indeed, it's an implementation trick (which leaks into error messages in reasonably user-friendly way). I believe the only real concern for making the path a first-class citizen (and allow inline records to escape) is the choice of parameters for the inline record definition.

You're more than welcome to propose an alternative implementation! It would be great if it would allow using paths to inline records and letting them escape.

@mshinwell mshinwell changed the title [WIP] Alternative implementation of type-checking for `open` Alternative implementation of type-checking for `open` Oct 3, 2016

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2016

Would anyone like to review that?

@let-def

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2016

I started reviewing the patch. I am not done but it looks good to me.

I wonder if the prefixed_sg table could be removed. It was introduced to reduce the cost of open after #5877. With this implementation, it doesn't benefit open anymore.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Oct 4, 2016

Are you sure about prefixed_sg? It is used in components_of_module_maker as soon as the substitution is the identity. It's not clear to me that this becomes completely useless, even though it is much less frequently used than before. For instance, compiling hashtbl.mli has two "hits" in the cache (for "H"), caused by the "with type" constraints on functors. And the code below has one "hit" for "X":

module X : sig
  type 'a t
end

module Y : sig
  val x: 'a X.t
end
@let-def

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2016

I agree it is still used, but not by open anymore.

Since it was introduced to speed-up opens, I was just questioning whether it benefits other use cases enough to justify the increased complexity.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2016

I agree that removing some of the cruft when it is not needed anymore would be good, as Env seems to be full of caches that tried to solve some efficiency problem at some point, but may have been subsumed by something else. Of course we need sufficient testing.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Oct 5, 2016

I fully agree with removing cruft, but perhaps in a later stage (so as not to "pollute" benchmarks).

@let-def Let me know if/when you have time to finish reviewing. I'd also love to get feedback on the more ambitious #834, which builds on this one.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Oct 18, 2016

@alainfrisch This is too big to get into 4.04.0 at this point, so you have to decide between merging #824 or leaving the bug in 4.04.0 and trying to get this one in the next release.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Oct 18, 2016

This seems too risky for a last minute change; I prefer to leave the bug in 4.04.

@let-def

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2016

I finally looked at it again, it seems fine to me.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Oct 31, 2016

@let-def Thanks! Do you think you might want to review #834 (which built on the current PR) as well?

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2017

Superseded by #834.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.