shapes: do not read_back entire shape to get aliases uids #13001

voodoos · 2024-03-01T11:19:03Z

Fully reducing the shapes of large modules (and all their components) is an expensive process. Merlin, when trying to jump to a module, used to do that to get the uid of the module, and thus it's location. This wasteful reduction slowed the query and we introduced "weak_reduction" which does not reduce a module's components when all we are interested in is the module's uid itself.

In #12508, we redesigned this feature along with @Ekdohibs and @gasche in the form of the reduce_for_uid function that doesn't leak incorrect shapes anymore. This function performs weak reduction and returns only the resulting uid. When looking for a module alias it returns the list of the aliases uids and the aliased module uid:

module X = struct end
module A_1 = X
...
module A_N = A_N-1

reduce_for_uid 'path to A_N' returns [uid_of_A_N; ...; uid_of_A_1; uid_of_X].

This information allows Merlin to traverse aliases and jump to the actual definition of X, but keep aliasing information which is useful for other use cases like occurrences.

To return this list I used a call to read_back before looking at the shape description to get the head aliases. This was, of course, a mistake since it will perform full reduction of the module's body. This lead to blow up when indexing large codebases and slower locate queries when jumping to an aliased module.

This PR fixes that by only forcing step by step the shape reduction to get the head aliases.

@Octachron I think the performance issue I described should not have a large impact during the partial indexation that we are introducing in 5.2 because it only reduce shapes locally. It would take a very pathological compilation unit for the issue to manifest itself in a significant way. Still, it might be better to have this fix in the 5.2 branch.

gasche · 2024-03-01T13:06:01Z

typing/shape_reduce.ml

+  (* When interested only of in the uids of aliased modules we do not read_back
+    the entire shape of the module, just enough to unroll the chain of aliases.
+  *)
+  let read_back_aliases_uids env (nf : nf) =


My feeling is that the code is correct, the comment is somewhat redundant (I think it is clear from the code), and the naming is slightly wrong, as you are not reading back (to generate a term). I would call this reduce_aliases_for_uid for example.

My reasoning for the name was that functions of the "reduce" take a shape as an input while "read_back" functions take a normal form as input. I am ok with both and change it in 4c3f43f

typing/shape_reduce.ml

voodoos · 2024-03-06T12:47:11Z

@gasche I applied your type-change suggestion in 83108d3.

While adding new tests to show that we can return aliases to approximate modules I noticed that we didn't mark the resulting shape as approximated for values coming from a first-class-module. I did that in 19a778f. The reduction is "successful", but there is no uid.

gasche

Thanks! This looks nicer. See two questions inline.

testsuite/tests/shape-index/index_aliases.reference

typing/shape_reduce.ml

voodoos · 2024-03-07T09:45:57Z

@gasche I removed the fix for approximation. I think we do want to have the NLeaf cases mark the shape as approximated in both the App and Proj case. But we need first improve reduce_for_uid's lazyness (or have another way to check for approximation):

Right now, when looking for the uid of M when M is defined by module M = F (X), reducing the shape will reduce the application. However we don't need to perform that reduction, since the uid of M's definition is the uid of the module M = ... binding, independently of the right-handside.

Reducing the right hand-side might result in an approximated shape, but this is not something we care about when reducing the shape for the uid of M.

gasche · 2024-03-08T10:21:01Z

My current (and ever-changing) understanding is that there are two different needs:

For the compiler, you want a type of "partially reduced shapes" that is computed and then serialized in the cmt_ident_occurrences field of cmt files. We want to work now instead of having the tools do more work, so we want an "eager" reduction.
For the tools, you want a type of "head normal form" that gives the minimal amount of information you need while doing the minimal amount of work, in particular the reduction should "lazy".

Currently we are trying to use a single result type for this, and the PR here and the discussion suggests that this might not be the right choice. If we separated the two, we could use for (2) a version that has some laziness and is not serializable.

For (1), I wonder why we are not "just" using the Shape.t value that results from strong reduction -- instead of storing the result of reduce_for_uid, the result of reduce. You report that earlier versions tried to store the shapes of all values and it lead to a blowup in cmt size, but were those the input shapes or the strongly-reduced shapes? Or do we maybe know that we will only ever nead the head uid of those shapes, not the whole shape, computing below the head normal form is a waste?

For (2), I think that we should use a representation that allows on-demand computation, for example:

| Resolved_alias of uid.t * result Lazy.t

voodoos · 2024-04-04T13:40:56Z

My current (and ever-changing) understanding is that there are two different needs:

1. For the compiler, you want a type of "partially reduced shapes" that is computed and then serialized in the `cmt_ident_occurrences` field of `cmt` files. We want to work now instead of having the tools do more work, so we want an "eager" reduction.

2. For the tools, you want a type of "head normal form" that gives the minimal amount of information you need while doing the minimal amount of work, in particular the reduction should "lazy".

I don't think that this is correct. In (1) reducing the shapes to build the cmt_ident_occurrences table we have the same need as for (2): do the minimal amount of work to get the minimal amount of information: the uid of the definition. However, these reductions might stay incomplete since in (1) we don't load other modules' cmt files to respect separate compilation.

For (1), I wonder why we are not "just" using the Shape.t value that results from strong reduction -- instead of storing the result of reduce_for_uid, the result of reduce.

Apart from the fact that, right now, tools are only interested in the result uid, reduce performs unconditional read_backs which can be costly.

You report that earlier versions tried to store the shapes of all values and it lead to a blowup in cmt size, but were those the input shapes or the strongly-reduced shapes?

That was with the "original" shapes stored in the typing environment.

Octachron · 2024-04-08T14:44:43Z

typing/shape_reduce.mli

@@ -18,10 +18,10 @@
 (** The result of reducing a shape and looking for its uid *)
 type result =
  | Resolved of Shape.Uid.t (** Shape reduction succeeded and a uid was found *)
-  | Resolved_alias of Shape.Uid.t list (** Reduction led to an alias chain *)
+  | Resolved_alias of Shape.Uid.t * result (** Reduction led to an alias *)


If possible I would rather avoid changing the interface for OCaml 5.2.0, but I am not sure if we are still aiming for 5.2.0 for this PR.

If possible I would rather avoid changing the interface for OCaml 5.2.0

Is it because of the already released alpha and beta versions with bumped magic numbers ? Since the result type was actually introduced in 5.2 itself, it seems like it could be better to change it right now...

I can also easily rollback to the first iteration on this PR that fixes the performance issue without changing the type if that is required.

I am not sure if we are still aiming for 5.2.0 for this PR

The discussion diverged to other unclear parts of the shape reduction, but I think this PR's original scope is smaller than that and mostly uncontroversial: removing extraneous work done in some cases, with an agreed-on fix (stepped read-back). We do need to re-think shape reduction and it's handling of uids at some point, but that's out-of-scope for that pr. What do you think @gasche ?

It is more a question of API stability for shape clients. In particular, this change would require a new beta and a patch to odoc. This is still ok-ish, if the PR converges this week, but this is starting to get late in the release cycle for this kind of changes.

I understand, thanks for the details, I can rollback the API changes if it make things simpler. Now that you mention odoc, it reminds me that the performance fix might actually be important for their usage of the shapes. (We discussed it a few weeks ago with @panglesd and @Julow, they perform similar actions as we do in Merlin to identify definitions' uids.)

odoc would indeed would benefit from the improvements of this PR! (The "render source code" feature of odoc is probably very inefficient by itself, but we will work on it when it is usable by drivers.)

Regarding API changes, we haven't yet released a 5.2 compatible version of odoc, so I think that for us it is still fine to include it.

Ok, let's have a beta 2 and I will amend my patch for odoc for 5.2.

Ekdohibs

The changes look good, and will avoid useless reductions indeed.
I don't have a strong opinion about the API changes; the new API seems better than the previous one, however. In any case, I'd say that at least the basic version of this PR should be merged for 5.2, in order to have the performance improvements.

Octachron

I am green-ticking on behalf of @Ekdohibs

gasche · 2024-04-11T09:14:06Z

I won't have the time to look at this again in the next 10 days, but I trust @Ekdohibs' review and would be happy to approve on her behalf. @Octachron, can you say "yes" on improving the interface?

gasche · 2024-04-11T09:15:21Z

Changes

@@ -118,6 +118,10 @@ _______________
 - #12959, #13055: Avoid an internal error on recursive module type inconsistency
  (Florian Angeletti, review by Jacques Garrigue and Gabriel Scherer)

+- #13001: do not read_back entire shapes to get aliases' uids when building the
+  usages index
+  (Ulysse Gérard, review by Gabriel Scherer)


Let's also add Nathanaëlle.

And let's move the entry in the 5.2 section too.

gasche

Same here.

Octachron · 2024-04-11T09:20:11Z

And for people reading the conversation in a linear way, yes I agree with the change of API.

should be marked approximated but are not The tests also illustrate that we can get aliases to an "approximated" module.

* New result type : `Resolved_alias of Uid.t * result` * Add tests illustrating issue with shapes that should be marked approximated but are not The tests also illustrate that we can get aliases to an "approximated" module. * Add changelog entry for #13001 (cherry picked from commit da240ec)

Octachron · 2024-04-15T12:43:29Z

Cherry-picked to 5.2.0 in 89301a2 .

Octachron added this to the 5.2 milestone Mar 1, 2024

gasche reviewed Mar 1, 2024

View reviewed changes

gasche reviewed Mar 6, 2024

View reviewed changes

testsuite/tests/shape-index/index_aliases.reference Show resolved Hide resolved

typing/shape_reduce.ml Outdated Show resolved Hide resolved

voodoos force-pushed the shape-aliases-read-back branch from b9ba1b9 to a85ff1c Compare March 7, 2024 09:41

Octachron assigned gasche Mar 20, 2024

voodoos force-pushed the shape-aliases-read-back branch from a85ff1c to c2fc14a Compare April 4, 2024 13:25

voodoos added a commit to voodoos/ocaml that referenced this pull request Apr 4, 2024

Add changelog entry for ocaml#13001

848cae7

Octachron reviewed Apr 8, 2024

View reviewed changes

Ekdohibs approved these changes Apr 11, 2024

View reviewed changes

Octachron approved these changes Apr 11, 2024

View reviewed changes

gasche reviewed Apr 11, 2024

View reviewed changes

gasche approved these changes Apr 11, 2024

View reviewed changes

voodoos added a commit to voodoos/ocaml that referenced this pull request Apr 11, 2024

Add @Ekdohibs to ocaml#13001 changelog entry

2aea01b

voodoos added 6 commits April 11, 2024 11:20

shapes: do not read_back entire shape to get aliases uids

b3e677c

Rename function and extract the force utility.

383b185

New result type : Resolved_alias of Uid.t * result

ac8e151

Add tests illustrating issue with shapes that

6d59770

should be marked approximated but are not The tests also illustrate that we can get aliases to an "approximated" module.

Add changelog entry for ocaml#13001

d7748f9

Add @Ekdohibs to ocaml#13001 changelog entry

fe55ae0

voodoos force-pushed the shape-aliases-read-back branch from 2aea01b to fe55ae0 Compare April 11, 2024 09:21

Move changelog entry to section 5.2.0

29a9721

Octachron added the merge-me label Apr 11, 2024

Octachron merged commit da240ec into ocaml:trunk Apr 11, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shapes: do not read_back entire shape to get aliases uids #13001

shapes: do not read_back entire shape to get aliases uids #13001

voodoos commented Mar 1, 2024 •

edited

gasche Mar 1, 2024

voodoos Mar 4, 2024 •

edited

voodoos commented Mar 6, 2024

gasche left a comment

voodoos commented Mar 7, 2024 •

edited

gasche commented Mar 8, 2024

voodoos commented Apr 4, 2024

Octachron Apr 8, 2024

voodoos Apr 8, 2024

Octachron Apr 8, 2024

voodoos Apr 8, 2024 •

edited

panglesd Apr 10, 2024

Octachron Apr 11, 2024

Ekdohibs left a comment

Octachron left a comment

gasche commented Apr 11, 2024

gasche Apr 11, 2024

Octachron Apr 11, 2024

gasche left a comment

Octachron commented Apr 11, 2024

Octachron commented Apr 15, 2024

shapes: do not read_back entire shape to get aliases uids #13001

shapes: do not read_back entire shape to get aliases uids #13001

Conversation

voodoos commented Mar 1, 2024 • edited

Choose a reason for hiding this comment

voodoos Mar 4, 2024 • edited

Choose a reason for hiding this comment

voodoos commented Mar 6, 2024

gasche left a comment

Choose a reason for hiding this comment

voodoos commented Mar 7, 2024 • edited

gasche commented Mar 8, 2024

voodoos commented Apr 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

voodoos Apr 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ekdohibs left a comment

Choose a reason for hiding this comment

Octachron left a comment

Choose a reason for hiding this comment

gasche commented Apr 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche left a comment

Choose a reason for hiding this comment

Octachron commented Apr 11, 2024

Octachron commented Apr 15, 2024

voodoos commented Mar 1, 2024 •

edited

voodoos Mar 4, 2024 •

edited

voodoos commented Mar 7, 2024 •

edited

voodoos Apr 8, 2024 •

edited