Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor typing/env to separate the filesystem-related logic #2228

Merged
merged 13 commits into from Feb 18, 2019

Conversation

@gasche
Copy link
Member

commented Jan 28, 2019

This PR, which builds on top of #2227, is again only a refactoring PR: it should not affect the semantics of the compiler (there are two small changes in behavior that I mention below; the rest should be strictly identical, only moving code and data around).

The PR came as I was trying to understand the code of Env that is related to compilation units. typing/env.ml has more than 2K lines, and about 300 of them are related to compilation units (reading and writing cmi files, consistency checking, various caches, etc.), but they are blended in with the rest in a way that makes it difficult (for me) to follow the logic of this part of the code.

The present PR separates this persistent-files logic to a separate module, Persistent_env, that Env then uses internally. The last commit does the actual file split, all other preceding commits are small changes that were necessary for it, either because they abstract away certain details of the Env module that are affected by the split, or because they disentangle the mess of dependencies within the module that makes the split difficult -- it took me 3 attempts to discover everything that could create a cyclic dependency.

The change might be useful to the Merlin people (cc @trefis, @let-def), as Merlin handles cmi files in a different way from the upstream type-checker.

Now on the two small behavior changes:

  1. The avoid external uses of {add_import,crc_units} commit merges together three loops that are almost identical, but have a difference in the way Env.add_import is called or not on transitive dependencies that have no CRC. This specific change should be reviewed by @lpw25. (Understanding the mess of direct/indirect and weak/strong and transparent/opaque dependencies is my next goal.) Making this merge only makes the overall code cleaner, it's easy to separate into two functions if it turns out to be incorrect.

  2. The remove the current unit name from some exceptions commit removes some information from Persistent_env exception that was constructed, at raising sites, by using internal state from Env (the name of the current compilation unit). This changes the error messages slightly (see the commit message for an example), but I believe the change is harmless. On the other hand, trying to preserve the previous behavior exactly would be doable, but make the code more complex (I could register the exception-handler in Env instead of Persistent_env, to have the information available at printing-time, but the irregular structure would be a bit vexing).

@gasche gasche force-pushed the gasche:refactor-env branch 2 times, most recently from 366d3c3 to a733665 Jan 28, 2019
@gasche

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

I just handled a somewhat-painful rebase on top of #2041 (now merged in trunk). Nothing much to declare (I wasn't sure whether persistent_structures_of_dir should stay in Env or move in Persistent_env, I left it in Env for now). @diml, I think you might be interested in (reviewing?) the present PR: I think that some of the code structure changes I made reasonate with some of the parts you touched; maybe the split in this PR would have made your work easier.

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

P.S.: I made sure that the PR is split in several clean commits, and it should be easiest to review commit-by-commit.

@diml

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

@gasche sure, I'm having a look

Copy link
Member

left a comment

The changes look correct to me, I just left a couple of suggestions. I don't know either whether the changes related to add_import are correct or not.

typing/env.ml Outdated Show resolved Hide resolved
typing/persistent_env.ml Outdated Show resolved Hide resolved
@gasche

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

@diml Thanks! I'll adress your comments shortly. Do you have an opnion on the interest of the refactoring -- do you agree that it makes the final code nicer, or does it seems roughly equivalent to you?

@gasche gasche force-pushed the gasche:refactor-env branch 2 times, most recently from bd2f90e to 6fa23da Feb 1, 2019
@gasche

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

(I pushed two additional commits to implement the suggestions of @diml)

@gasche

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

I forgot to say: I grepped the code of utop to see how Env is used there (in particular I was worried about uses of add_import, etc.), and I believe that it would not require any code change with this PR. (Otherwise I would change the PR, I guess.)

@gasche gasche force-pushed the gasche:refactor-env branch from ed931a2 to ec81de9 Feb 1, 2019
@diml

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

I agree that it makes the final code look nicer. I find Env a bit hard to digest, so I welcome splitting it into smaller individual parts.

@diml
diml approved these changes Feb 4, 2019
@diml

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

BTW, in utop we use cppo to be compatible with several versions of the compiler, so we can adapt to breaking changes in the compiler API.

@gasche

This comment has been minimized.

Copy link
Member Author

commented Feb 4, 2019

@diml (and I guess @trefis?): merging this now could create rebase pains for #2235, so I propose to delay a bit here until you know what to do there.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Feb 4, 2019

I haven't looked at the code in this PR in details yet but:

  • I don't think this should be too painful for #2235, but it might be for #2231 (which I think should also make it into 4.08)
  • I'm surprised by the copyright header on persistent_env.ml[i]. Shouldn't the name of the original authors of the code also appear on there? (or, simpler: shouldn't it at least include the names that were present at the top of env.ml[i]?)
@gasche

This comment has been minimized.

Copy link
Member Author

commented Feb 4, 2019

Good point on the copyright; in the past I've just put my copyright when creating new files, but they were completely new.

Re. #2231: given that the present PR is non-urgent, I guess that the most reasonable thing is for me to wait until those recent PRs are settled to merge -- and deal with the rebase pains on my end. Grmbl...

typing/env.ml Outdated Show resolved Hide resolved
| Some crc->
Consistbl.check Env.crc_units name crc filename)
cu.cu_imports
try Env.import_crcs ~source:filename cu.cu_imports
with Consistbl.Inconsistency(name, user, auth) ->
fprintf ppf "@[<hv 0>The files %s@ and %s@ \
disagree over interface %s@]@."

This comment has been minimized.

Copy link
@trefis

trefis Feb 7, 2019

Contributor

Why not put the try .. with in Env.import_crcs itself? Given that it's the one calling Consistbl.check.

Also, it looks like Toploop calls that function during initialization, without any try ... with; apparently it can't fail, but if it ever did, it would be with an uncaught exception, which doesn't seem very nice.

This comment has been minimized.

Copy link
@gasche

gasche Feb 16, 2019

Author Member

I prefer for the refactoring commit to stay as close to the existing behavior as possible, and I don't know what Toploop is doing with this function, so I did not change anything here.

type filepath = string
type modname = string
type crcs = (modname * Digest.t option) list

type alerts = string Stdlib.String.Map.t


module EnvLazy = struct

This comment has been minimized.

Copy link
@trefis

trefis Feb 7, 2019

Contributor

Do we really want to grow misc? Couldn't EnvLazy be in its own file?

This comment has been minimized.

Copy link
@gasche

gasche Feb 16, 2019

Author Member

I would rather grow misc than risking to introduce module name conflicts. We can always revisit this later.

gasche added 8 commits Jan 23, 2019
There is a small change of behavior in this patch due to a different
handling of weak dependencies (those with crco=None); in
Env.check_consistency, only non-weak dependencies would get
[Env.add_import] called, while the `toplevel/` implementations would
also call [Env.add_import] on weak dependencies. After this patch, we
systematically call [add_import] only on non-weak dependencies, even
in `toplevel/`.

([Gabriel:] As far as I can see, the use of [add_import] in the
toplevel never leads to a use of [Env.imports()] for producing
a dependency list, as the toplevel does not produce cmi/cmo files; are
they just no-ops?)
This small change of behavior simplifies the internal plumbing of env
by avoiding the need to passes the 'current_unit_name' state to
cmi-checking exceptions -- it allows to separate the cmi/crc logic to
a separate module in a future commit.

We believe that the change does not actually reduce error message
clarity, as the name of the offending unit appears in the location
filename anyway (see how these exceptions are handled by
Location.error_of_printer_file in the error printer).

Before:

    File "a.mli", line 1:
    Error: Unit A imports from B, which uses recursive types.
           The compilation flag -rectypes is required

After:

    File "a.mli", line 1:
    Error: Invalid import of B, which uses recursive types.
           The compilation flag -rectypes is required
@gasche gasche force-pushed the gasche:refactor-env branch from ec81de9 to d15cb8e Feb 16, 2019
@gasche

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2019

I just rebased on top of trunk. I'll re-read the rebased diffs, and my plan is to merge once the CI passes.

@gasche gasche force-pushed the gasche:refactor-env branch from d15cb8e to 8245ee6 Feb 17, 2019
gasche added 5 commits Jan 27, 2019
…module

Persistent_env is a new module that handles the relation between the
type-checking state and the "persistent" typing information laying in
.cmi files on the filesystem. In particular, it handles the collection
and production of CRC information for the .cmi files being read and
written to the filesystem; the using modules (in our case, only Env)
are in charge of turning the cmi files into higher-level information
(components and signatures).

Persistent_env exposes a type `'a t` of a persistent environment,
which acts as a mutable store of `'a` values. There is no global state
in the module itself: while Env (and thus the OCaml type-checker) uses
a single global persistent environment, it should be possible to
create several independent environments to represent, for example,
several independent type-checking sessions.
(suggested by Jérémie Dimino)
The hope is that the (env => persistent_env) refactoring does not
break reasonable user code; the fact that this test had to be updated
is a bad sign. On the other hand, we believe that utop is unaffected
by the change, which suggests that real-world toplevel are less likely
to be affected.
The hope is that a tailor-made algebraic datatype is more readable /
less confusing than using ('a option) directly -- one may confuse
getting None when looking in a table with the Not_found case.

(Suggested by Jérémie Dimino)
@gasche gasche force-pushed the gasche:refactor-env branch from 8245ee6 to 7c638e3 Feb 18, 2019
@gasche gasche merged commit b145a41 into ocaml:trunk Feb 18, 2019
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lpw25

This comment has been minimized.

Copy link
Contributor

commented Mar 8, 2019

Obviously very late to this, but I confirm that the CRC related change is correct.

gasche added a commit to gasche/ocaml that referenced this pull request Mar 25, 2019
The debugger reimplements its own error-reporting logic without using
the reporter-registration mechanism of the compiler, so it needs to be
adapted after the split between `Env` and `Persistent_env` in ocaml#2228.

(Interestingly, this forced me to expose the `Error of error`
exception in the Persistent_signature, which was not the case
before. It was probably a mistake to not expose an exception value
that can be raised by (correctly-written) consumers of the module.)

I noticed the issue while inspecting a testsuite failure (ocaml#8544).

Before this patch:

```
$ cat tests/tool-debugger/find-artifacts/_ocamltest/tests/tool-debugger/find-artifacts/debuggee/ocamlc.byte/debuggee.byte.output
Loading program... done.
Breakpoint: 1
10   <|b|>print x;
Uncaught exception: Persistent_env.Error(_)
```

After:
```
$ cat tests/tool-debugger/find-artifacts/_ocamltest/tests/tool-debugger/find-artifacts/debuggee/ocamlc.byte/debuggee.byte.output
Loading program... done.
Breakpoint: 1
10   <|b|>print x;
Debugger [version 4.09.0+dev0-2019-01-18] environment error:
 The files /usr/local/lib/ocaml/stdlib.cmi
 and [...]_ocamltest/tests/tool-debugger/find-artifacts/debuggee/ocamlc.byte/out/blah.cmi
 make inconsistent assumptions over interface Stdlib
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.