Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Env: always freshen persistent signatures before using them #2231
When accessing components of a module, the typechecker will prefix the siblings components by the name of the module.
module M : sig type t type s val to_s : t -> s end
and you access
When loading a .cmi, one needs to refresh all the identifiers it defines, so as to avoid any clash with identifiers already present in the environment.
However, when populating the environment with the components of that signature, it sometimes needs to traverse indirections in that signature (cf. the call to
If we were to load
Note that when doing that prefixing, we've assigned a new stamp to
At this point, everything could be fine:
Here is a dummy program exercising that issue:
This can be reproduced with any compiler from 4.02 to trunk, and probably before that, but I haven't tried.
We ran into this problem recently at Jane Street.
For performance reasons, we can't afford to eagerly freshen the signature.
So what we must do instead, if we are to keep environments with unprefixed things in them, is to make sure that all the idents that go into these environments have been refreshed².
Which is what this PR does.
: Even though this code (
: More precisely, and this is related to , we only need to refresh idents coming from external units. The other ones can stay the same.
PS: This PR is based on #2229, there is no hard dependency, it was just written this way.
PPS: This PR appears to conflict with trunk, I assume it's because of the PRs of diml and gasche that were merged recently.
I think it can be looked at before I rebase though, there shouldn't be any substantial changes.
This was referenced
Feb 4, 2019
Drup left a comment
For a while, I really couldn't understand why this refreshing was needed on the theoretical level, until I re-read note 1:
So, the bug is that, due to insufficient freshening in recursive cases when loading signatures, you can make the compiler observe colliding stamps between different runs of the typechecker. To solve it, you split the refreshing and the prefixing so that refreshing can ensure all the recursive knots are properly solved, and prefixing doesn't have to deal with them.
Given all that, I'm pretty happy with your solution and it's quite clear to me that the patch implements what you described. The code looks as good as
Does it has a perf impact ? The third commit should ensure we only pay the refresh when loading a new cmi, so I expect it to be fine.
It does indeed have a small perf impact.
So: it has an impact that I can notice at janestreet, but it's smaller than the one that the cache of prefixed signatures had.
Also, as a follow up to this, I experimented with adding lazyness to the prefixing of module types. Which, at janestreet, brings the compilation time back down to
Thanks for the review!
Re. 4.08: yes, it's a bugfix, it's fine to cherry-pick into 4.08.
I read your initial description twice and still couldn't understand exactly what the problem was, so I won't review, but given that @Drup has approved I think this is good to go (in the Changes, put it in the 4.08 section).
I don't think so: the test that I gave in my initial message is in my opinion too brittle, despite reproducing the issue on several versions of the compiler. The issue is that the stamps have to line up for the infinite loop to appear, and any change to the number of stamps generated would make the symptom disappear while the bug would remain. That's also why it was there for several years without ever causing any issue to anyone.
It feels to me like that kind of issues (using idents / signatures in an incorrect context) is more likely to be avoided by some static checks, than by testing.
Maybe there is another way to think of it, in the long term (not this PR): is there a dynamic check, that would be costly but only enabled in some special extra-safe mode, that would raise a proper error when this kind of issues happen, in a non-heisenbug way?
(For example, we could add a special mark on all stamps before serializing them into cmi files (by making them negative for example), and pervasively using a stamp-comparison function that aborts if a marked stamp is compared to a non-marked stamp.)
Actually, it's what