Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPR#248: unique names for weak type variables #1225

Merged
merged 1 commit into from Jul 13, 2017

Conversation

Projects
None yet
4 participants
@Octachron
Copy link
Contributor

commented Jul 6, 2017

MPR#248:
This PR proposes to store separately the name given to weak type variable when printing them and to only forget those name once the associated weak type variable have been attributed a non weak type.
With this modification, in a given toplevel phrase or signature, each weak type variable is given an unique name. In other words, the types of

let mk _ x = x
let f = mk ()
let g = mk ();;
let h = f;;

are printed with this PR as

val f:'_a ->'_a
val g:'_b ->'_b (* and not '_a -> '_a as previously *)
val h:'_a -> '_a

and adding in the toplevel

let x = f 1;;
f;;
let i = mk ();;

yields

val x:int=1
val f: int -> int
val i: '_a -> '_a

Note that one disadvantage of remembering the names of weak type variables is that this deprives non weak type variables from using these names. Personally, I believe that adding a w prefix to the name of weak type varibles could favorably resolve this tension: the _ prefix can be hard to see and replacing it with _w by convention would mitigate this issue. Another issue is that currently, the number of weak variable names kept in memory is unbounded. I am not sure if it can matter in practice.

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 6, 2017

You say that the number of weak variables you store is unbounded, so I was worried. In fact, if I understand correctly, each printing call collects/removes the weak variables stored that have become instantiated since. So the size of your tables is bounded. at any point in time, by the number of weak variables that exist in the types being printed. (And not all weak variables that have been created since the toplevel started, as I feared.)

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jul 6, 2017

Instantiated weak type variables are indeed collected just before printing toplevel phrases. The number of stored weak types variables should therefore be equal to the number of weak types variables that have never been instantiated since the beginning of the toplevel session . However, those never instantiated type variables could accumulate with (very) long toplevel session, and it could make sense to drop weak types variables that have not been printed in the last 1000 toplevel phrases for instance.

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 6, 2017

Yes, but this space usage grows proportionally to the space used by the weak type variable itself, so there is no memory leak. I don't think it is worth trying to be more clever than that.

(It would have been more natural to me to have the name chosen and stored in the type variable structure itself, instead of in a table on the side, but your approach has the strong benefits that it non-invasive for the type-checking code so it's probably best.)

@Drup

This comment has been minimized.

Copy link
Contributor

commented Jul 6, 2017

If the naming "environment" grows large, so does the typing environment, and the former will always be negligible compared to the latter, which is already not so big in practice, so it really doesn't matter.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2017

If you remove weak type variables from the map, isn't there a risk of reusing the same name twice?
Even if the previous variable has been instantiated, this could be confusing.
I agree with @Drup that the space taken by these weak type variables should never matter in practice.
The idea of giving distinct names to weak type variables seems good to me too (not to pollute the name space of generalizable type variables)

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2017

If you remove weak type variables from the map, isn't there a risk of reusing the same name twice?
Even if the previous variable has been instantiated, this could be confusing.

Indeed, but I felt that this was a desirable feature if generalizable and non generalizable type variables share the same namespace and we want to keep the generalizable type variable names as readable as possible. Thinking about it, it is true that it could mislead people into thinking that something important is going on.

Both issues would be solved by moving to a distinct naming scheme for non generalizable type variables. Currently, I am leaning towards: ng1, ng2, …, ng%d (which would be printed as '_ng1 … ) with no reuse of indices.

The generated names will be longer, but there will be no intersection with the generalizable type variable scheme and probably few collisions with user-defined type variable names.

Moreover, the link with the various non generalizable error messages may be clearer to beginners while keeping name length to a reasonable level:

let x = ref []
Error: The type of this expression, '_ng1 list ref,
       contains type variables that cannot be generalized

Other options that I considered:

  • '_w1, '_w2, … : maybe a little to close of 'w?
  • '_weak1, '_weak2, …: too long? and too close from 'weak which is a sensible
    type name
  • '_nongen1, …, _nongen2: too long?
  • nogen: valid word in Danish
  • '_1, '_2, …: this feels far too much like some builtin type variables
  • …_1 using _ for separating the word part from the id part may improve readability but using _ with two different meanings in the same identifier (e.g. '_ng_1) may be confusing.
@gasche

This comment has been minimized.

Copy link
Member

commented Jul 10, 2017

Note that you could keep a scheme that would start with '_a, '_b etc. and yet never generalize the same name twice: it suffices to keep a counter and have an idea int->name mapping. Decomposing in base 26 is one choice (so '_ach), doing a modulo-26 plus numbers is another ('_c23), and your proposal is the simplest -- I'm fine with all of '_weakN '_wN and '_ngN.

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jul 13, 2017

In this case, let's go for the searchable root name weak since I have the impression that weak type variable are rare enough that having a 6+ character-long names should be alright.

| None -> ()
end;
name
)

This comment has been minimized.

Copy link
@gasche

gasche Jul 13, 2017

Member

I think this code would be more readable if you had two separate recursive functions, one corresponding to the weak = None (the pre-PR new_name function) and one corresponding to the weak case (new_weak_name). Then, instead of a non_gen parameter, name_of_type could take a new_var parameter that would be either new_name or new_weak_name.

(I wondered if a better name that new_name was possible that would indicate that it is "not a weak variable"; maybe new_tyvar_name, or maybe using new_ty_var / new_weak_var, but I think that new_name and new_weak_name are also ok.)

This comment has been minimized.

Copy link
@Octachron

Octachron Jul 13, 2017

Author Contributor

Splitting the functions in two seemed indeed clearer (I may have changed few names around).
Concerning new_name, new_weak_name and new_name always tend to appear in pair, so it did not seem very useful to precise that new_name is new_not_weak_name.

@gasche

gasche approved these changes Jul 13, 2017

Copy link
Member

left a comment

Could you squash and rebase to remove unnecessary intermediary states? Otherwise the code looks nice now and I think it's good to go, but I would like the CI reports before merging.

@Octachron Octachron force-pushed the Octachron:remember_weak_types branch from e6fdeaa to 563f3d7 Jul 13, 2017

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jul 13, 2017

Rebased and squashed.

@gasche gasche merged commit 3dd7819 into ocaml:trunk Jul 13, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.