Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect (existential) type scope tracking #7835

vicuna opened this issue Aug 3, 2018 · 3 comments

Incorrect (existential) type scope tracking #7835

vicuna opened this issue Aug 3, 2018 · 3 comments


Copy link

vicuna commented Aug 3, 2018

Original bug ID: 7835
Reporter: @trefis
Assigned to: @trefis
Status: resolved (set by @trefis on 2018-09-21T19:16:35Z)
Resolution: fixed
Priority: normal
Severity: minor
Category: typing
Monitored by: @hhugo @Drup

Bug description

Since the introduction of GADTs, we've been reserving a slot for up to a
thousand existentials to be introduced during typing of pattern matches.

This is done by:

The reason for this is that scoping for type constructors is tracked using their
binding time.
If that line wasn't present, then if we did introduce new types, their binding
time would be above the current level. Which means that any use of such a type
would be out of the scope of the type constructor.

#1951 proposes to explicitely check that we
do not introduce more than a thousand type constructors, instead of letting them
through and then potentially reporting weird (i.e. incorrect) scope errors.

The hole in that reasonning (and in the GPR linked to above) is that type
constructors are not the only identifiers created during pattern matches:
pattern variables will also imply creating some idents, for example.
illustrates that.

But pattern variables are not the only thing implying the creation of idents
though, otherwise we could almost dismiss the problem with "meh, no reasonable
hand written code will ever contain more than X many pattern variables, so it's
alright" (which I assume was the rational behind the 1K limit on existential per

An example as trivial as
can also get us over the limit if [Sub] contains a few too many idents. Indeed
in that example, thanks to the 1st line introducing a module alias, a
substitution on the signature of [Sub] happens during the typing of the
[M.Sub.Pack] pattern, which will generate as many fresh idents as there are
idents in [Other_module.Sub].

If you clone that repository and run "make failure OCAMLC=/path/to/trunk/ocamlc"
in the wrong-escape directory, then you'll get:

File "", line 6, characters 11-12:
Error: This expression has type $Pack_'a
       but an expression was expected of type 'a
       The type constructor $Pack_'a would escape its scope

The error also happens at the tip of the 4.07 branch.

This example was encountered when trying to compile janestreet's code base with
the tip of the 4.07 branch.

The reason why it wasn't a problem on 4.07.0 is that the slot happened to be
made much bigger (as was already discussed), which makes it much less likely
to create so many idents in real code.

The reason why it wasn't a problem before 4.07 is that the 1k limit was actually
just a big lie. The scope of type constructors introduced for existential
variables wasn't tracked using the ident binding time: we were looking up the
type declaration in the environment, and checking at which level it was
That is, you could have added 10k existential, and the scoping would have been
correct for each of them. is an
example of this.

However, that piece of code disappeared with #1609

There are a few possible ways to fix (or minimize the issue):

  1. Let's remove the double meaning associed to stamps in identifiers. Introduce
    a dedicated "scope" field and rely on that.

  2. Reintroduce the piece of code which disappeared. That is basically a worse
    version of (1): it removes the double meaning associed to identifiers stamps but
    only for existentials, and the "scope" field is not a proper field but we go and
    fetch the information in the environment

  3. Leave a bigger slot when typing a match. 1k is clearly too small for real
    code, 100k causes an other kind of issues. How about 10k? I have a good feeling
    about 10k ( ).

I have started working on (1), but I am disappearing for my vacations next
Thursday, and it's unlikely that I will have a GPR opened before then. The
change is nothing too complicated but it is fairly invasive, so it takes a bit
of time.

That is: I hope (1) will be the fix chosen for 4.08, but if we want to release
4.07.1 it probably won't be ready in time, and it's also too much for a bugfix

If we want to release a 4.07.1 compiler soonish, then (2) and (3) should be

(2) is "more correct", in the sense that we're back to a world where we can have
as many existential as we want. Though it will most likely have a negative
impact on typing time (because we have to do a lookup in the environment
whenever we call [update_level]).

Personnaly I think I'd go with (3): it's the simplest change, it's unlikely to
be problematic in practice, and won't incur any performance regression compared
to 4.07.0.
That's the choice we've made internally at janestreet.

Copy link

vicuna commented Aug 4, 2018

Comment author: @hhugo

spoiler .. 10k is not enough.

Copy link

vicuna commented Aug 29, 2018

Comment author: @trefis

In #1997 , lpw25 implemented option (2) on the 4.07 branch.

#1980 proposes an implementation of (1) on top of trunk.

Copy link

vicuna commented Sep 21, 2018

Comment author: @trefis

#1980 has been merged.
This is now resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

2 participants