Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove positions from paths #1610

Merged
merged 5 commits into from Nov 21, 2018

Conversation

@lpw25
Copy link
Contributor

commented Feb 14, 2018

Currrently, paths contain the integer position of a module field within the runtime representation of the module:

type t =
    Pident of Ident.t
  | Pdot of t * string * int
  | Papply of t * t

Since Path.ts appear within types, module types etc. they must remain valid through the various manipulations of the type system. Unfortunately, the positions do not remain valid under module substitution -- which leads to various soundness bugs:

        OCaml version 4.06.0

# module type S = sig
  module M : sig
    module A : sig end
    module B : sig end
  end
  module N = M.A
end

module Foo = struct
  module B = struct let x = 0 end
  module A = struct let x = "hello" end
end

module Bar : S with module M := Foo = struct module N = Foo.A end

let () = print_string Bar.N.x;;
                              Segmentation fault
        OCaml version 4.06.0

# module M : sig
  module N : sig
    module A : sig val x : string end
    module B : sig val x : int end
  end
  module F (X : sig module A = N.A end) : sig val s : string end
end = struct
  module N = struct
    module B = struct let x = 0 end
    module A = struct let x = "hello" end
  end
  module F (X : sig module A : sig val x : string end end) = struct
    let s = X.A.x
  end
end

module N = M.F(struct module A = M.N.A end)

let () = print_string N.s;;
                                    Segmentation fault

Rather than trying to change module substitution to take account of path indices, which appears quite difficult and will become even more difficult as the module system evolves, this PR removes the positions from paths. Instead we add an address type to the Env module:

type address =
  | Aident of Ident.t
  | Adot of address * int

along with functions for finding the "address" of a given path:

val find_value_address: Location.t -> Path.t -> t -> address
val find_module_address: Location.t -> Path.t -> t -> address
val find_class_address: Location.t -> Path.t -> t -> address
val find_constructor_address: Location.t -> Path.t -> t -> address

These functions are used in bytecomp to translate paths to lambda code. Since we previously had to call normalize_path at that point anyway, this is a pretty simple change to make.

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 14, 2018

If you read PRs through email and you were, like me, confused by the absence of an int argument in the Path.t type given in Leo's message, this was a typo which is now fixed. Before the PR, Path.t is defined as

type t =
    Pident of Ident.t
  | Pdot of t * string * int
  | Papply of t * t

while the PR changes it to just (only Pdot changes, dropping the int argument):

type t =
    Pident of Ident.t
  | Pdot of t * string
  | Papply of t * t
@shindere

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2018

Yeah, sorry about that. I should remember to make a comment when correcting particularly confusing typos.

@shindere

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

@let-def

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

.depend seems out of date.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

Exposing the position in the path itself always seemed particularly dubious to me, so I'm happy to see that go away.

Before I really dive into the code, can you give some indications on what this module_presence that is now sparkled everywhere is for ? Is it still only for module aliases ?

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2018

To make the addresses a little easier to calculate I moved the presence information for module aliases from the module type (i.e. Mty_alias(Mta_present, path)) to the signature (i.e. (Sig_module(id, Mta_present, Mty_alias(path))). This also enabled me to fully remove absent aliases from the runtime representation -- previously they would get a slot containing () that was never used for anything.

@let-def

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

From a first review, to me it looks like a clear improvement (I remember doing sometime wrong things with path positions when playing with modular implicits) and the patch seems correct but I don't understand all the parts.

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 14, 2018

Maybe you want to point at the parts you don't understand, and they could benefit from a comment :-)

@lpw25 lpw25 force-pushed the lpw25:no-positions-in-paths branch from 2b7194f to c377d2b Feb 14, 2018

@Drup
Copy link
Contributor

left a comment

The main commit is quite difficult to review (mostly trivial or loosely related changes spread out over 50+ files, and the occasional subtle thing). I would have preferred to have the the purely trivial parts split out from the subtle ones, and then squash before merge. That would have made it much easier to review.

But apart from that, I think this is a clear improvement. The handling of address is much more obvious than before, and this removes some weird invariants from the typedtree. The code is also a tiny bit cleaner.

I mostly looked at the typechecking-related parts, as I'm not an expert in bytecomp/, but it all looks pretty good to me.

@@ -167,7 +167,7 @@ and record_representation =
| Record_float (* All fields are floats *)
| Record_unboxed of bool (* Unboxed single-field record, inlined or not *)
| Record_inlined of int (* Inlined record *)
| Record_extension (* Inlined record under extension *)
| Record_extension of Path.t (* Inlined record under extension *)

This comment has been minimized.

Copy link
@Drup

Drup Apr 5, 2018

Contributor

I'm not very familiar with compilation of inlined record. Why is the path now needed here?

This comment has been minimized.

Copy link
@trefis

trefis Apr 27, 2018

Contributor

My guess would be: to avoid doing things like this.
That is: unless I missunderstood, it is a cleanup unrelated to the rest of this PR.

This comment has been minimized.

Copy link
@Drup

Drup Apr 27, 2018

Contributor

Ah, I missed that piece of code. That's a very valid reason, although if it is, it really should be in a different commit.

let name = Ident.name param in
let exp =
{ exp with exp_loc = loc; exp_desc =
{ exp with exp_loc = loc; exp_env = env; exp_desc =

This comment has been minimized.

Copy link
@Drup

Drup Apr 5, 2018

Contributor

On various occasion where a typechecking function synthesized a new desc, you now add it to the environment before returning it, where previously it was just returned. I suppose it is to make sure it has an address, but I'm not really sure why it is ok to add new things to the env that were not added previously.

Also, in that specific case, it's added only to the expression part, not the pattern one.

This comment has been minimized.

Copy link
@trefis

trefis Apr 27, 2018

Contributor

On various occasion where a typechecking function synthesized a new desc, you now add it to the environment before returning it, where previously it was just returned. I suppose it is to make sure it has an address, but I'm not really sure why it is ok to add new things to the env that were not added previously.

Was "typechecking" a typo here?
Also, regardless of why it is now added in the environment (though I think I agree with the reason you propose), I think it's more correct (in this particular instance, i.e. in push_defaults) to add it to the environment than not.

Also, in that specific case, it's added only to the expression part, not the pattern one.

This is similar to what happens in the typechecker, if you look at Typecore.type_cases you'll see that the pattern variables (i.e. the bindings produced by the pattern) only get added to the environment of the body, not of the pattern.

if Ident.global id && not (Ident.same id (Path.head path'))
then add_required_global id;
path'
end

This comment has been minimized.

Copy link
@Drup

Drup Apr 5, 2018

Contributor

When normalizing, you consider modules that are aliases but have an address. I failed to find where you introduce aliases that have an address different than None. Where does that happen?

This comment has been minimized.

Copy link
@lpw25

lpw25 Aug 6, 2018

Author Contributor

So if a module is present (i.e. it's presence is Mta_present in a module type) then it will be given an address (see store_module and components_of_module_maker). This can happen for aliases than are produced via strengthening (see strengthen_decl which can produce aliases, and strengthen_sig which then attaches these aliases to a presence that can be Mta_present).

match Env.find_class_address Location.none path env with
| addr -> eval_address addr
| exception Not_found ->
fatal_error ("Cannot find address for: " ^ (Path.name path))

This comment has been minimized.

Copy link
@Drup

Drup Apr 5, 2018

Contributor

Maybe factorize that code a little bit? :)

This comment has been minimized.

Copy link
@lpw25

lpw25 Aug 6, 2018

Author Contributor

Done

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2018

Sorry for the slow response. I've finally found time to look at this again. I've rebased the code and addressed @Drup's comments.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2018

My small concerns have been addressed. I think we need a review from someone familiar with bytecomp now.

@lpw25 lpw25 force-pushed the lpw25:no-positions-in-paths branch from 7dfe6f8 to 5a74793 Aug 8, 2018

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

Since this fixes a soundness bug, I would like it to see it move forward.
@trefis would you do a formal review?

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Oct 26, 2018

Could we hold off for a week or so? I've got an improved version implemented as part of my transparent ascription patch, and I'm planning to extract it into this PR.

@lpw25 lpw25 force-pushed the lpw25:no-positions-in-paths branch from 5a74793 to cd31809 Nov 9, 2018

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Nov 9, 2018

Ok I've pushed the improved version and rebased it up to trunk. Should now be ready for review.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Nov 13, 2018

I'm having a look.

@trefis
Copy link
Contributor

left a comment

This overall looks good (and correct) modulo the few comments I left here and there.


let enter_module ~scope ?arg s mty env =
let enter_value ?check name desc env =
let id = Ident.create_local name in

This comment has been minimized.

Copy link
@trefis

trefis Nov 15, 2018

Contributor

To be clear:

scope (create_scoped ~scope:0 "foo") == 0
scope (create_local "foo")           == highest_scope (== 100_000_000)

So that's an actual change, somewhat unrelated to this PR.
But I believe it is fine: we only check the scope of type constructors, we should never check the scope of value paths.


and alias_presence =
and module_presence =
| Mta_present
| Mta_absent

This comment has been minimized.

Copy link
@trefis

trefis Nov 15, 2018

Contributor

Should we rename these to Mp_present | Mp_absent?

pc1,
(fun pc ->
match pc with
| _, (Tcoerce_primitive _ | Tcoerce_alias _) -> pc

This comment has been minimized.

Copy link
@trefis

trefis Nov 15, 2018

Contributor

I think this deserves a comment.
Apparently p1 in these cases would be -1, because it's not a runtime component (cf. includemod.ml), so we can't really go have a look in v2.

cont)

in
let aliases = List.fold_right store_alias aliases lambda_unit in

This comment has been minimized.

Copy link
@trefis

trefis Nov 15, 2018

Contributor

You could drop the cont parameter of store_alias, have it return the Lprim and simplify this last line to: make_sequence store_alias aliases.

@@ -2943,13 +2937,16 @@ let compile_matching repr handler_fun arg pat_act_list partial =


let partial_function loc () =
let slot =
transl_extension_path Location.none

This comment has been minimized.

Copy link
@trefis

trefis Nov 15, 2018

Contributor

Is there a reason to use Location.none over loc here?

@@ -270,7 +282,8 @@ module TycompTbl =
let keys2 = local_keys tbl2 [] in
List.filter
(fun id ->
is_local (find_same id tbl2) &&
let descr, _ = find_same id tbl2 in
is_local descr &&

This comment has been minimized.

Copy link
@trefis

trefis Nov 15, 2018

Contributor

That feels wrong: it "works", but it's weird.
Here is the type of the function after this change:

val diff_keys: ('a -> bool) -> 'b t -> ('a * 'c) t -> Ident.t list

Before that it was:

val diff_keys: ('a -> bool) -> 'b t -> 'a t -> Ident.t list

which seems more natural.

I think it's the caller of this function that should be changed, while this function should remain untouched. (And the function appears to be called only once, so it shouldn't be too tedious to update)

let (desc, pos) = NameMap.find name components in
(Pdot (root, name, pos), desc) :: find_all name next
let (desc, _) = NameMap.find name components in
(Pdot (root, name), desc) :: find_all name next
with Not_found ->
find_all name next

This comment has been minimized.

Copy link
@trefis

trefis Nov 15, 2018

Contributor

Like for TycompTbl.diff_keys the changes to this function are weird, and inconsistent with the other changes in this module: this function is making the assumption that the 'a in current : Ident.tbl (* in [t] *) and in components: 'a list NameMap.t (* in [opened] *) is a tuple:

val find_all: string -> ('a * 'b) t -> (Path.t * 'a) list

But there's no reason to. And, for example, I believe you wouldn't be able to call find_all on the modtypes field of the environment with this implementation.
It's the callers that should change.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Nov 16, 2018

Ok, I've addressed the review comments and fixed the failing tests.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Nov 19, 2018

I forgot to say it during my review: but you updated toploop,topdirs,etc. But not the opt versions. And you'll need to :)

bytecomp/translclass.ml Outdated Show resolved Hide resolved
bytecomp/translclass.ml Show resolved Hide resolved
@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Nov 19, 2018

I've updated ocamlnat and addressed @let-def's comments.

@lpw25 lpw25 force-pushed the lpw25:no-positions-in-paths branch from 604c4a4 to e47cb6a Nov 20, 2018

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Nov 20, 2018

Rebased. Merging once CI passes.

@lpw25 lpw25 force-pushed the lpw25:no-positions-in-paths branch from e47cb6a to 68f2b7c Nov 20, 2018

@lpw25 lpw25 merged commit 6bdce9e into ocaml:trunk Nov 21, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.