Skip to content

Conversation

Julow
Copy link
Collaborator

@Julow Julow commented May 6, 2021

This PR does some code improvements:

  • Simplify the loading of odoc files. There is now a single module taking care of this with a simpler API.
  • Stop looking up imports. The result was not used, the only observable side-effect was eventual warnings (which would also occur later).
  • Rename Odoc_odoc.Env to Odoc_odoc.Resolver and simplify its API (remove the builder part).
  • Load odoc files entirely. Remove the intermediate step of loading only the "root" header, which was only taken advantage of when looking up imports. (5% speed up on dune build @doc in core_kernel)

close_out oc

let units_cache = Hashtbl.create 23 (* because. *)
let save_page file page =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks wrong that Compilation_unit is saving/loading pages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is now saving and loading .odoc files. How should we call it ?
I used the name "compilation_unit" for Lang.Page | Lang.Compilation_unit in the Resolver module, the second case is named Module. This will need to be changed too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation_unit (and t = Page | Module) ?

Copy link
Collaborator Author

@Julow Julow May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module will continue to load both pages and modules. This type t is like this:

type t =
  | Page_content of Lang.Page.t
  | Module_content of Lang.Compilation_unit.t

An other mismatch is that Lang.Compilation_unit.t contains either a Module or Pack.
Perhaps we can say that a "compilation unit" is either a (compiled) page, a module or a pack and merge Lang.Page.t into Lang.Compilation_unit.t ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly fussy about what we end up calling it, so long as we steer clear of uses of 'compilation unit' that are different from what the manual says they are.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to unambiguous names. This module is now Odoc_file and this record is Page_content ... | Unit_content ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unit" alone doesn't really mean much (except maybe the type). Either you put the whole thing ("Compilation unit") or choose another word.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We commonly use the word "unit" for "compilation unit" already. Compilation_unit_content of Lang.Compilation_unit.t sounds unnecessary to me.

(** Lookup a module. First looks into [imports_map] then searches into the paths. *)
let lookup_unit ~important_digests ~imports_map ap target_name =
match StringMap.find target_name imports_map with
| Odoc_model.Lang.Compilation_unit.Import.Unresolved (_, Some digest) ->
Copy link
Member

@jonludlam jonludlam May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the imports ever anything other than Unresolved after this patch? If not, can we get rid of Compilation_unit.Import.t?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the loader code, imports are indeed always Unresolved. I'll simplify the type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that resolved imports are used by the link-deps command, it was wrong to remove that code.
I've pushed a fix and a test case for the link-deps command but performances have regressed a lot (15% slower on core_kernel).
A small part of this PR needs to be reverted to make this work as before.

The list of imports can be big and sometimes most of them are not necessary. Instead of resolving as much as possible for every units, we can mark as resolved only the ones that were used while compiling. The output of link-deps shouldn't change. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced link-deps can ever produce the correct set of dependencies. In any case, I'm also not convinced the logic you're talking about is sound. It's currently doing this:

List.iter odoctree.Odoc_model.Lang.Compilation_unit.imports
        ~f:(fun import ->
          match import with
          | Odoc_model.Lang.Compilation_unit.Import.Unresolved _ -> ()
          | Odoc_model.Lang.Compilation_unit.Import.Resolved (root, _) ->
              Hash_set.add deps root);

and iterating this function over all odoc files in a directory. The only thing this gains you over using simply unioning the output of compile-deps is that it won't mention modules that couldn't be resolved at compile time - ie, modules for which we're missing the cm{t,ti,i} file. There's probably one minor difference - if we're making a module that's entirely aliases to other modules, and nothing else within the same directory dereferences those aliases, then we might resolve that at compile time and thus miss it if we don't make a note of resolved imports. This is pretty rare though, and I only know of one instance of this (ocaml-compiler-libs).

The second point is that this entirely ignores all the references in the odoc files, so if there are any references to modules that aren't a compilation dependency then this logic will miss those dependencies. Even if we were to resolve the first part of the reference at compile time (as was suggested in a recent meeting), we'd need to look recursively at the referenced module - e.g. if we had the reference Stdlib.List.t and we resolved the Stdlib part, we'd need to have a link dependency not just on stdlib.odoc but also stdlib__list.odoc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Ideally, the link-dependencies of a unit is everything else. The smallest reasonable set would be the transitive compile-deps, which the link-deps command doesn't even compute.
Do you suggest removing this command ?

it won't mention modules that couldn't be resolved at compile time

With my proposal it will also remove modules that weren't used at compile time, which may not be what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's likely worse, as it's much more likely we'll hit the problem of references above. At least with the 'union of compile-deps' scheme we're not producing fewer link deps than before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-implemented the efficient lookup for imports, the performance regression is gone. The link-deps command is still here and has a testcase.


exception Fetch_failed of msg

let build_resolver :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very glad this thing has gone :-)

@Julow Julow force-pushed the safer-loading branch 2 times, most recently from 20ef746 to dea0894 Compare May 12, 2021 19:05
Julow added 9 commits May 17, 2021 17:17
Do "lookup" and "resolve" in one operation to avoid opening each files
twice.
Remove unecessary caches and simplify the code a bit.
The result of these lookups is not used at all, except for the warnings
that might be printed in case of error.

Some modules have a lot of imports but need few lookups while resolving,
especially the entry point of wrapped libraries.
Remove the separate loading of the root.
Handle pages and modules from the same loading function.
Remove unecessary caching.
Rename Odoc_odoc.Env into Odoc_odoc.Resolver.
Remove the "builder" step.
This reverts commit 9549f36.

Imports need to be resolved for the link-deps command.
Now done in Compile rather than in Env. A test is added.
Imports are needed for the link-deps command.
Some modules have a lot of imports declared but only a few are needed.
Only the "Root" is needed for imports, which is written at the beginning
of the files.
Previous commits were using "module" for Lang.Compilation_unit and
"compilation unit" for .odoc files.
@jonludlam jonludlam merged commit ed60dca into ocaml:master May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants