Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Env: always freshen persistent signatures before using them #2231

Merged
merged 6 commits into from Feb 11, 2019

Conversation

Projects
None yet
4 participants
@trefis
Copy link
Contributor

commented Jan 31, 2019

When accessing components of a module, the typechecker will prefix the siblings components by the name of the module.
That is, if you have:

module M : sig
  type t
  type s
  val to_s : t -> s
end

and you access M.to_s, it will have the type M.t -> M.s.
This prefixing is done lazily by components_of_module and components_of_module_maker.

When loading a .cmi, one needs to refresh all the identifiers it defines, so as to avoid any clash with identifiers already present in the environment.
For toplevel identifiers, this is done by this call to Subst.
For all the other identifiers, e.g. the ones that are defined in submodules, this freshening is done lazily by components_of_module_maker, as a side effect of the prefixing.

However, when populating the environment with the components of that signature, it sometimes needs to traverse indirections in that signature (cf. the call to scrape_alias at the beginning of components_of_module_maker).

For example:

$ cat -n foo.mli
     1	type t
     2
     3	module M : sig
     4	  type s
     5
     6	  module type S = sig
     7	    type nonrec t = t
     8	    type nonrec s = s
     9
    10	    val to_s : t -> s
    11	  end
    12	end
    13
    14	module N : M.S
    15
    16	val x : t

If we were to load Foo in another module, when trying to add the components of Foo.N to the environment, the compiler needs to resolve M.S. Which means it needs to get the components of M (that is: M, not Foo.M), and that it will run the same little loop again, and prefix all references to the sibling idents by M.
So, for instance, the updated declaration of M.S will be:

type nonrec t/x = t/y (* where /x and /y are the stamps *)
type nonrec s = M.s

val to_s : t/x -> s

because s is defined in M, but t comes from the outside.

Note that when doing that prefixing, we've assigned a new stamp to t -- the /x -- and to s (that is: we have freshened them). But the /y part is still the "old" stamp, that comes directly from the cmi.

At this point, everything could be fine: t/x can become Foo.N.t, t/y can become Foo.t, etc.
But everything could also be pretty bad: because x might happen to be equal to y¹, so t/y would also be replaced by Foo.N.t. Which is a very good way to get the typechecker stuck in an infinite loop (inside expand_abbrev) when you try to use the type.

Here is a dummy program exercising that issue:

$ cat bar.ml
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0

let _ =
  Foo.N.to_s Foo.x
$ ocamlc -c -nostdlib -nopervasives foo.mli
$ ocamlc -c -nostdlib -nopervasives bar.ml
Fatal error: exception Stack overflow

This can be reproduced with any compiler from 4.02 to trunk, and probably before that, but I haven't tried.

We ran into this problem recently at Jane Street.

A solution

For performance reasons, we can't afford to eagerly freshen the signature.

So what we must do instead, if we are to keep environments with unprefixed things in them, is to make sure that all the idents that go into these environments have been refreshed².

Which is what this PR does.


Notes

[1]: Even though this code (components_of_module_maker, prefix_idents) is exercised in other situations than the loading of cmis, the issue can only happen in that case, because in the other cases all the idents involved were generated in the current run of the compiler, so the stamps won't ever conflict.

[2]: More precisely, and this is related to [1], we only need to refresh idents coming from external units. The other ones can stay the same.


PS: This PR is based on #2229, there is no hard dependency, it was just written this way.

PPS: This PR appears to conflict with trunk, I assume it's because of the PRs of diml and gasche that were merged recently.

I think it can be looked at before I rebase though, there shouldn't be any substantial changes.

trefis added some commits Jan 28, 2019

components_of_module_maker: always freshen sigs before using them
Otherwise:

$ cat foo.mli
type t

module M : sig
  type s

  module type S = sig
    type nonrec t = t
    type nonrec s = s

    val to_s : t -> s
  end
end

module N : M.S

val x : t
$ cat bar.ml
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0

let _ =
  Foo.N.to_s Foo.x
$ ocamlc -c -nostdlib -nopervasives foo.mli
$ ocamlc -c -nostdlib -nopervasives bar.ml
Fatal error: exception Stack overflow

@trefis trefis force-pushed the trefis:freshen-before-prefix branch from ac0fa9f to 9cc8568 Feb 7, 2019

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Feb 7, 2019

I have now rebased on trunk (which includes #2229, #2041 and #2235).

I think this should be included in 4.08, @damiendoligez, @gasche do you guys agree?

@garrigue: I've worked on this with Leo, so I think you should be the one to review this PR. Although perhaps @Drup might be able to as well?

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Feb 7, 2019

I'd also like to point out that travis is unhappy simply because I'm missing a changelog entry. But I'm waiting for a reviewer and an approval for 4.08 inclusion before adding it.

@Drup

Drup approved these changes Feb 7, 2019

Copy link
Contributor

left a comment

For a while, I really couldn't understand why this refreshing was needed on the theoretical level, until I re-read note 1:

Even though this code is exercised in other situations than the loading of cmis, the issue can only happen in that case, because in the other cases all the idents involved were generated in the current run of the compiler, so the stamps won't ever conflict.

So, the bug is that, due to insufficient freshening in recursive cases when loading signatures, you can make the compiler observe colliding stamps between different runs of the typechecker. To solve it, you split the refreshing and the prefixing so that refreshing can ensure all the recursive knots are properly solved, and prefixing doesn't have to deal with them.

Given all that, I'm pretty happy with your solution and it's quite clear to me that the patch implements what you described. The code looks as good as env.ml can look. The issue is pretty subtle, so I think @garrigue should take a look as well.

Does it has a perf impact ? The third commit should ensure we only pay the refresh when loading a new cmi, so I expect it to be fine.

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Feb 7, 2019

Does it has a perf impact ? The third commit should ensure we only pay the refresh when loading a new cmi, so I expect it to be fine.

It does indeed have a small perf impact.
If you allow me to reuse the "benchmark" from #2229, then we get this:

revision compilation time
initial ±16m55s
#2229 ±15m05s
#2231 ±15m45s

So: it has an impact that I can notice at janestreet, but it's smaller than the one that the cache of prefixed signatures had.

Also, as a follow up to this, I experimented with adding lazyness to the prefixing of module types. Which, at janestreet, brings the compilation time back down to ±15m05s.
I'd be surprised if that last change (which I plan to send a PR for in the next few days) had such an impact outside of janestreet however.

Thanks for the review!

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

Re. 4.08: yes, it's a bugfix, it's fine to cherry-pick into 4.08.

I read your initial description twice and still couldn't understand exactly what the problem was, so I won't review, but given that @Drup has approved I think this is good to go (in the Changes, put it in the 4.08 section).

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Feb 7, 2019

Thanks @gasche!

I'll wait another day before to give Jacques a chance to have a look if he wants to (unless Damien wants to do a beta with this patch today).

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

Could you add a testsuite entry?

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Feb 7, 2019

Could you add a testsuite entry?

I don't think so: the test that I gave in my initial message is in my opinion too brittle, despite reproducing the issue on several versions of the compiler. The issue is that the stamps have to line up for the infinite loop to appear, and any change to the number of stamps generated would make the symptom disappear while the bug would remain. That's also why it was there for several years without ever causing any issue to anyone.

It feels to me like that kind of issues (using idents / signatures in an incorrect context) is more likely to be avoided by some static checks, than by testing.

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

Maybe there is another way to think of it, in the long term (not this PR): is there a dynamic check, that would be costly but only enabled in some special extra-safe mode, that would raise a proper error when this kind of issues happen, in a non-heisenbug way?

(For example, we could add a special mark on all stamps before serializing them into cmi files (by making them negative for example), and pervasively using a stamp-comparison function that aborts if a marked stamp is compared to a non-marked stamp.)

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Feb 11, 2019

Actually, it's what Subst.for_saving is supposed to do. Originally I introduced it to make cmis more stable: i.e. the ids would only depend on what is in the cmi, not what happened during compilation. However, For this to work, this has to be used just before saving. We would have to check that it's really what is done now.

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Feb 11, 2019

Actually, it's what Subst.for_saving is supposed to do.

Not quite. It's changing the id of type_exprs, not the Ident.t.

Anyway, I'll merge now and cherry-pick on 4.08. Thanks everyone!

@trefis trefis merged commit 6fb33f7 into ocaml:trunk Feb 11, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@trefis trefis deleted the trefis:freshen-before-prefix branch Feb 11, 2019

trefis added a commit to trefis/ocaml that referenced this pull request Feb 11, 2019

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Feb 11, 2019

Right. But if we're only interested by the signature, doing it for all IDs in it wouldn't be much more violent.

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Feb 11, 2019

I agree, we could easily adapt Subst to handle that case. I was just pointing out that the current code isn't actually doing it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.