Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shapes: do not read_back entire shape to get aliases uids #13001

Merged
merged 7 commits into from Apr 11, 2024

Conversation

voodoos
Copy link
Contributor

@voodoos voodoos commented Mar 1, 2024

Fully reducing the shapes of large modules (and all their components) is an expensive process. Merlin, when trying to jump to a module, used to do that to get the uid of the module, and thus it's location. This wasteful reduction slowed the query and we introduced "weak_reduction" which does not reduce a module's components when all we are interested in is the module's uid itself.

In #12508, we redesigned this feature along with @Ekdohibs and @gasche in the form of the reduce_for_uid function that doesn't leak incorrect shapes anymore. This function performs weak reduction and returns only the resulting uid. When looking for a module alias it returns the list of the aliases uids and the aliased module uid:

module X = struct end
module A_1 = X
...
module A_N = A_N-1

reduce_for_uid 'path to A_N' returns [uid_of_A_N; ...; uid_of_A_1; uid_of_X].

This information allows Merlin to traverse aliases and jump to the actual definition of X, but keep aliasing information which is useful for other use cases like occurrences.

To return this list I used a call to read_back before looking at the shape description to get the head aliases. This was, of course, a mistake since it will perform full reduction of the module's body. This lead to blow up when indexing large codebases and slower locate queries when jumping to an aliased module.

This PR fixes that by only forcing step by step the shape reduction to get the head aliases.

@Octachron I think the performance issue I described should not have a large impact during the partial indexation that we are introducing in 5.2 because it only reduce shapes locally. It would take a very pathological compilation unit for the issue to manifest itself in a significant way. Still, it might be better to have this fix in the 5.2 branch.

@Octachron Octachron added this to the 5.2 milestone Mar 1, 2024
(* When interested only of in the uids of aliased modules we do not read_back
the entire shape of the module, just enough to unroll the chain of aliases.
*)
let read_back_aliases_uids env (nf : nf) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that the code is correct, the comment is somewhat redundant (I think it is clear from the code), and the naming is slightly wrong, as you are not reading back (to generate a term). I would call this reduce_aliases_for_uid for example.

Copy link
Contributor Author

@voodoos voodoos Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for the name was that functions of the "reduce" take a shape as an input while "read_back" functions take a normal form as input. I am ok with both and change it in 4c3f43f

typing/shape_reduce.ml Outdated Show resolved Hide resolved
typing/shape_reduce.ml Show resolved Hide resolved
typing/shape_reduce.ml Outdated Show resolved Hide resolved
@voodoos
Copy link
Contributor Author

voodoos commented Mar 6, 2024

@gasche I applied your type-change suggestion in 83108d3.

While adding new tests to show that we can return aliases to approximate modules I noticed that we didn't mark the resulting shape as approximated for values coming from a first-class-module. I did that in 19a778f. The reduction is "successful", but there is no uid.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks nicer. See two questions inline.

typing/shape_reduce.ml Outdated Show resolved Hide resolved
@voodoos
Copy link
Contributor Author

voodoos commented Mar 7, 2024

@gasche I removed the fix for approximation. I think we do want to have the NLeaf cases mark the shape as approximated in both the App and Proj case. But we need first improve reduce_for_uid's lazyness (or have another way to check for approximation):

Right now, when looking for the uid of M when M is defined by module M = F (X), reducing the shape will reduce the application. However we don't need to perform that reduction, since the uid of M's definition is the uid of the module M = ... binding, independently of the right-handside.

Reducing the right hand-side might result in an approximated shape, but this is not something we care about when reducing the shape for the uid of M.

@gasche
Copy link
Member

gasche commented Mar 8, 2024

My current (and ever-changing) understanding is that there are two different needs:

  1. For the compiler, you want a type of "partially reduced shapes" that is computed and then serialized in the cmt_ident_occurrences field of cmt files. We want to work now instead of having the tools do more work, so we want an "eager" reduction.
  2. For the tools, you want a type of "head normal form" that gives the minimal amount of information you need while doing the minimal amount of work, in particular the reduction should "lazy".

Currently we are trying to use a single result type for this, and the PR here and the discussion suggests that this might not be the right choice. If we separated the two, we could use for (2) a version that has some laziness and is not serializable.

For (1), I wonder why we are not "just" using the Shape.t value that results from strong reduction -- instead of storing the result of reduce_for_uid, the result of reduce. You report that earlier versions tried to store the shapes of all values and it lead to a blowup in cmt size, but were those the input shapes or the strongly-reduced shapes? Or do we maybe know that we will only ever nead the head uid of those shapes, not the whole shape, computing below the head normal form is a waste?

For (2), I think that we should use a representation that allows on-demand computation, for example:

| Resolved_alias of uid.t * result Lazy.t

voodoos added a commit to voodoos/ocaml that referenced this pull request Apr 4, 2024
@voodoos
Copy link
Contributor Author

voodoos commented Apr 4, 2024

My current (and ever-changing) understanding is that there are two different needs:

1. For the compiler, you want a type of "partially reduced shapes" that is computed and then serialized in the `cmt_ident_occurrences` field of `cmt` files. We want to work now instead of having the tools do more work, so we want an "eager" reduction.

2. For the tools, you want a type of "head normal form" that gives the minimal amount of information you need while doing the minimal amount of work, in particular the reduction should "lazy".

I don't think that this is correct. In (1) reducing the shapes to build the cmt_ident_occurrences table we have the same need as for (2): do the minimal amount of work to get the minimal amount of information: the uid of the definition. However, these reductions might stay incomplete since in (1) we don't load other modules' cmt files to respect separate compilation.

For (1), I wonder why we are not "just" using the Shape.t value that results from strong reduction -- instead of storing the result of reduce_for_uid, the result of reduce.

Apart from the fact that, right now, tools are only interested in the result uid, reduce performs unconditional read_backs which can be costly.

You report that earlier versions tried to store the shapes of all values and it lead to a blowup in cmt size, but were those the input shapes or the strongly-reduced shapes?

That was with the "original" shapes stored in the typing environment.

@@ -18,10 +18,10 @@
(** The result of reducing a shape and looking for its uid *)
type result =
| Resolved of Shape.Uid.t (** Shape reduction succeeded and a uid was found *)
| Resolved_alias of Shape.Uid.t list (** Reduction led to an alias chain *)
| Resolved_alias of Shape.Uid.t * result (** Reduction led to an alias *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I would rather avoid changing the interface for OCaml 5.2.0, but I am not sure if we are still aiming for 5.2.0 for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I would rather avoid changing the interface for OCaml 5.2.0

Is it because of the already released alpha and beta versions with bumped magic numbers ? Since the result type was actually introduced in 5.2 itself, it seems like it could be better to change it right now...

I can also easily rollback to the first iteration on this PR that fixes the performance issue without changing the type if that is required.

I am not sure if we are still aiming for 5.2.0 for this PR

The discussion diverged to other unclear parts of the shape reduction, but I think this PR's original scope is smaller than that and mostly uncontroversial: removing extraneous work done in some cases, with an agreed-on fix (stepped read-back). We do need to re-think shape reduction and it's handling of uids at some point, but that's out-of-scope for that pr. What do you think @gasche ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more a question of API stability for shape clients. In particular, this change would require a new beta and a patch to odoc. This is still ok-ish, if the PR converges this week, but this is starting to get late in the release cycle for this kind of changes.

Copy link
Contributor Author

@voodoos voodoos Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, thanks for the details, I can rollback the API changes if it make things simpler. Now that you mention odoc, it reminds me that the performance fix might actually be important for their usage of the shapes. (We discussed it a few weeks ago with @panglesd and @Julow, they perform similar actions as we do in Merlin to identify definitions' uids.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odoc would indeed would benefit from the improvements of this PR! (The "render source code" feature of odoc is probably very inefficient by itself, but we will work on it when it is usable by drivers.)

Regarding API changes, we haven't yet released a 5.2 compatible version of odoc, so I think that for us it is still fine to include it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's have a beta 2 and I will amend my patch for odoc for 5.2.

Copy link
Contributor

@Ekdohibs Ekdohibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, and will avoid useless reductions indeed.
I don't have a strong opinion about the API changes; the new API seems better than the previous one, however. In any case, I'd say that at least the basic version of this PR should be merged for 5.2, in order to have the performance improvements.

Copy link
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am green-ticking on behalf of @Ekdohibs

@gasche
Copy link
Member

gasche commented Apr 11, 2024

I won't have the time to look at this again in the next 10 days, but I trust @Ekdohibs' review and would be happy to approve on her behalf. @Octachron, can you say "yes" on improving the interface?

Changes Outdated
@@ -118,6 +118,10 @@ _______________
- #12959, #13055: Avoid an internal error on recursive module type inconsistency
(Florian Angeletti, review by Jacques Garrigue and Gabriel Scherer)

- #13001: do not read_back entire shapes to get aliases' uids when building the
usages index
(Ulysse Gérard, review by Gabriel Scherer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add Nathanaëlle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's move the entry in the 5.2 section too.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

voodoos added a commit to voodoos/ocaml that referenced this pull request Apr 11, 2024
@Octachron
Copy link
Member

And for people reading the conversation in a linear way, yes I agree with the change of API.

@Octachron Octachron merged commit da240ec into ocaml:trunk Apr 11, 2024
17 checks passed
Octachron pushed a commit that referenced this pull request Apr 15, 2024
* New result type : `Resolved_alias of Uid.t * result`
* Add tests illustrating issue with shapes that
should be marked approximated but are not

The tests also illustrate that we can get aliases to an
"approximated" module.

* Add changelog entry for #13001

(cherry picked from commit da240ec)
@Octachron
Copy link
Member

Cherry-picked to 5.2.0 in 89301a2 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants