Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate scope from stamp in idents #1980

Merged
merged 19 commits into from Sep 21, 2018

Conversation

Projects
None yet
5 participants
@trefis
Copy link
Contributor

commented Aug 9, 2018

As said this a work in progress, so there is no need to do a careful review, but I thought people might be interested in having a look and could give some early feedback.

This is done by adding a dedicated scope field to idents. This is done in the first commit. Although I don’t have any measurement yet, it seems reasonable to assume that this change will increase the size of cmis. The second commit (which is admittedly unrelated to the original aim of the PR) should reduce the memory usage back to what it was before. (I chatted with @Drup and the second commit, while somewhat annoying, should be compatible with the changes done in Eliom)

The main benefit of this change is the removal of the hard coded limit on the number of existential that can be introduced in match. (Cf. the related mantis issue)

Future work:

  • cleanup, address some remaining fixmes
  • test properly (the test suite currently passes, but that only gives me a weak confidence in the actual correctness of the implementation)
  • reintroduce the code at the bottom of predef.ml that was wrongly removed
  • (optional) stop using idents after typing and move to simpler “variable” type, since scopes are useless at that point. This poses some questions related to the debugger / debut events
  • (optional) store the current (and nongen, etc.) level in the environment instead of having global refs in ctype. This way the level gets “magically” updated when things enter the environment
@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2018

Ping @garrigue :)

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

You could make the diff much shorter by making scope an optional parameter (or rename create_var and create into create and create_in_scope), but I suppose that you are doing this precisely to force you to review all the callsites of Ident.create and increase robustness in the future?

The second commit looks very nice as far as I'm concerned.

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2018

Yes. I could also have create be what I called create_var and introduce a dedicated create_scoped but that seems more error prone.

Btw it occurs to me that I forgot to mention the most important part of the changes: when minting a new ident, one assigns it the current level as scope. One must then be careful to bump the level when minting a fresh type, or module. This is similar to the way one had to be careful to synchronise binding time and levels when one potentially created type / module identifiers.
In its current state the PR bumps the level more often than necessary, that is not a problem though and can always be cleaned up later. But I tried to keep the present changes minimal.

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2018

Oh I had missed your parenthesis. Well... yes. :)

@gasche
Copy link
Member

left a comment

A naive question.

Looking at the patch from a distance, it looks like all the uses of create_var are after the type-checking pass (and there it looks like you don't care about the scope), and all the uses of scopes are in typing.

If scopes only make sense during typing, do they really need to be attached to the identifiers themselves? Couldn't we store them, for each identifier, in the type environment?

File "morematch.ml", line 1050, characters 8-65:
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
(A `D|B (`B, (`A|`C)))

This comment has been minimized.

Copy link
@gasche

gasche Aug 9, 2018

Member

Could we have expect tests?

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

(Or: around identifiers in the typedtree.)

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2018

If scopes only make sense during typing, do they really need to be attached to the identifiers themselves? Couldn't we store them, for each identifier, in the type environment?

I think that really says more about Ident.t being abused as variables in the later stages. The flags also aren't used after type-checking, nor are persistent identifiers kept after type-checking. As he mentioned, @trefis has another patch that replaces all the later uses of Ident.t with a simpler variable type. Personally, I'd like to take that one step further and mint fresh variable types for each IR, but that can be done later.

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

The flags also aren't used after type-checking, nor are persistent identifiers kept after type-checking.

Well Ident.global and Ident.is_predef_exn both seem used in the compilation pipeline (bytecode, middle_end and backend) and this is what the flags are for, if I understand correctly. (Ident.persistent is tested by asmcomp/debug/reg_availability_set.ml as well.)

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2018

I was including the uses in bytecomp in my description -- they are used whilst reading the typedtree. The uses of Ident.global and Ident.persistent after that are not actually needed.

The uses of Ident.is_predef_exn are only used on the identifiers in Psetglobal and Pgetglobal, which I think should instead be using some Global.t type. These are separate from the uses of Ident to represent variables in the IRs that I was complaining about. Still I agree it means that my declaration that the flags are not needed isn't quite right -- the proposed Global.t type would still need to distinguish predefined exceptions from other global identifiers.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2018

This seems a bit heavy-handed: scope only matters for types, and 99% of identifiers are for values. This field is only going to make sense in very few cases.

So I'm not sure this is the right approach, and we should at least consider other possibilities.
For instance, adding a field to Tconstr.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2018

This seems a bit heavy-handed: scope only matters for types, and 99% of identifiers are for values. This field is only going to make sense in very few cases.

Technically, types and modules, which are a lot of the identifiers that the type system actually manipulates in any interesting way, but still this is a good point -- there are going to be a lot of identifiers where this doesn't matter.

If the scope isn't in the identifier then it needs to at least be kept in the environment, and then maybe, as you suggest, be cached in places like Tconstr.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2018

Indeed. This is what we did until 4.06 with type_newtype_level and Ctype.get_level, but this was only for locally abstract types. Generalizing it to all types (and modules as a consequence) is a good idea, as this completely avoids the synchronization machinery between identifiers and levels.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2018

@trefis: After reading the code, I understand better what you are doing.
As always, it would have been better to mention upfront that you take this chance to remove the flags field, which means there should be no extra memory cost.

Also, having thought a bit more about what would be involved in generalizing the previous approach to all types and modules, with caching in Tconstr, I see now that the bottleneck would be Subst (this should be added to the rationale). So using a new field in Ident makes things much simpler.

Finally, I suppose that the problem of useless scopes in variables could be solved by using a different kind for scoped identifiers. I.e., have both Local and Scoped.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2018

Also, it should be possible to change idents from scoped to local before writing the cmi exploiting the for_saving flag in Subst.
Note however that in this case one would need to add a different flag with the original behavior, as it is needed in type_cases to do copy a type introducing fresh type variables (and there we still need the scopes).

@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2018

I'm cautiously in favor of this PR, as I think adding scopes in idents is much simpler than trying to track them in envs. I actually had the same problem when trying to figure out where to add locations information for Eliom.

Note however that the removals of flag is not without consequences, as it could be used to store additional information in idents. For example, in Eliom I use two bits to decide if something is client or server (see this version). Those flags are orthogonal to global/local. It's possible to adapt to this patch by adding two cases (Located_local and Located_global), but it means external tools that read cmis like ocp-index and odoc will not be able to read eliom cmis anymore. I think metaocaml used to employ a similar technique.

I don't want this to be considered as a point against this patch, as I think it's the right direction for the typechecker, but it's probably worth thinking on ways to support these use-cases cleanly.

@lpw25 lpw25 referenced this pull request Aug 20, 2018

Merged

Track newtype level again #1997

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2018

Unless I missed something, here is a summary of the discussion which happened in my absence:

  • people are happy to keep the field on idents (compared to the other option of saving it in the environment)
  • I should split Ident.Local into Ident.Local and Ident.Scoped. Turning all Scoped idents into Local ones when preparing to save to a .cmi

I will update this PR with this change soon.


As always, it would have been better to mention upfront that you take this chance to remove the flags field, which means there should be no extra memory cost.

I find that remark a bit unfair! I did mention in my first message that:

  • this is work in progress, put here for people to have an early look
  • the second commit (which does the change you mention) is here to reduce the memory usage

It's hard to be more upfront than that! :)

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

@trefis

I find that remark a bit unfair!

I both agree and disagree. Your explanation combined with the code is certainly sufficient. But there is still this assumption, that one is going to understand by reading the code, and in the way you expect (I usually do not read by commits). Maybe it is just that I'm unable to get used to the "GitHub workflow".

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

Concerning the flags field, does anybody know whether Oleg is using it for BER MetaOCaml?
Or anybody else?
The fact Eliom used it suggests that other may do so too.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

I think BER MetaOCaml used to rely on this, but as far as I can tell from the current version, it doesn't anymore. I'm not aware of any other fork that does. Maybe @yallop will know ?

@trefis trefis force-pushed the trefis:scoped-idents branch 3 times, most recently from 3f1c205 to 592211d Aug 28, 2018

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2018

I updated the PR with the discussed change (i.e. the split between Ident.Scoped and Ident.Local), but the change to Subst was actually not necessary: Subst.signature (which is called, in particular, when saving a signature to a cmi) already calls Ident.rename on the idents bound by the signature, making them all local and unscoped.
So all the idents appearing in a signature will be either global (which are never scoped) or local and unscoped.

I also took the opportunity to go further than initially planned: I removed all functions related to stamps (i.e. stamp, previously known as binding_time, and set_current_time) from the interface of Ident.
This required two small changes to Printtyp which are done in two preliminary commits ( efdc425, 0aafde6 ), as well as a simple change / cleanup to Predef in f7855cb (the commit message explains the change).

Otherwise things are in roughly the same state as before.

I believe this is now ready for a careful review.

@trefis trefis force-pushed the trefis:scoped-idents branch from 592211d to 0fa91d0 Aug 29, 2018

@trefis trefis changed the title WIP separate scope from stamp in idents separate scope from stamp in idents Aug 29, 2018

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Sep 7, 2018

The privilege seems to usually be reserved to @damiendoligez but I took the liberty of adding a few labels.

While a fix for MPR#7835, was pushed to 4.07, the problem is still present on trunk.

@trefis trefis force-pushed the trefis:scoped-idents branch from 0fa91d0 to 47e0f7a Sep 14, 2018

@trefis trefis force-pushed the trefis:scoped-idents branch 2 times, most recently from 4a527c0 to 7c0a971 Sep 20, 2018

trefis added some commits Aug 28, 2018

ident: add an explicit scope field
- Ident.create now takes a scope as argument
- added Ident.create_var to use when the scope doesn't matter
- the current_time and the current_level are unrelated as of this
  commit. But one has to remember to bump the level when creating new
  scopes.
ident: split Local into Local and Scoped
Also rename [create] into [create_scoped] and [create_var] into
[create_local].
ident: hide stamp related functions from the interface
The only remaining user was [Predef], which was bumping the counter by
999 to allow cmis to remain compatible if new predefs were defined.
This commit removes that use by making sure that every ident defined in
predef.ml is marked as predef and use a different stamp counter as user
defined idents. That way idents in cmi always start at 1, no matter how
many predefs there are.

Stamps on predef aren't strictly necessary: predefs names are unique.
However, predef idents comparison is just faster with stamps.
properly scope idents
- when including: the elements are rescoped to the current level (as
well as being given a fresh stamp, which was already the case)
- extension constructors idents cannot be local: they can be used as
types when the constructor's argument is an inlined record. They must be
given a scope
- in check_recmod_inclusion: create a new scope at each iteration
- when checking that type declarations inside recursive modules are well
founded, we now take a generic instance of the declaration (this is
reminiscent of what is done in Ctype.moregeneral)

@trefis trefis force-pushed the trefis:scoped-idents branch from 7c0a971 to 3c41ea4 Sep 21, 2018

@lpw25

lpw25 approved these changes Sep 21, 2018

@trefis trefis merged commit 2318ff2 into ocaml:trunk Sep 21, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@trefis trefis deleted the trefis:scoped-idents branch Sep 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.