Skip to content

[tooling] Remember linked declarations#13308

Merged
Octachron merged 15 commits into
ocaml:trunkfrom
voodoos:link-declarations
Sep 19, 2024
Merged

[tooling] Remember linked declarations#13308
Octachron merged 15 commits into
ocaml:trunkfrom
voodoos:link-declarations

Conversation

@voodoos
Copy link
Copy Markdown
Contributor

@voodoos voodoos commented Jul 16, 2024

This PR generalizes a mechanism that is used to store "value dependencies" in cmt files introduced in 10abdce. It was meant to "track in external tools value declarations between implementations and interfaces".

There are two differences in these changes:

  1. We only store the paired uids and not the entire declaration. The declaration can be retrieved from the corresponding cmt_uid_to_decl table (this might require loading a .cmti file when the uid is from an interface).
  2. We not only store the paired values, but also modules, modules types, types, etc.

This information will enable external tools to navigate across the declaration graphs of any value, module, type, etc.
Additionally this information can be used to build equivalence classes of declarations. Enabling external tools to perform powerful refactoring by retrieving every usage of every definition whose uids are in the same equivalence class.

This PR compares uids to prevent noisy self-dependencies to be recorded (similarly to what record_value_dependency is doing). Since right now uids are not unique across a compilation unit this means that some links between an implementation uid and an interface uid with the same id are missing. Commit 7f5111b shows the expected test results with #13286 merged.

@alainfrisch I think you were the one to introduce value_dependencies which I remove in ec2cc6c. Do you now if that feature is actively used today ? I made a search on Sherlocode that return very few results. One of them is dead_code_analyser which is only available for OCaml < 4.05. The other looks active and compatible with 5.2: reanalyze. It might be safer to keep both for a while to give time for users to transition ?

cc @lpw25

@voodoos voodoos marked this pull request as ready for review July 16, 2024 16:06
@voodoos voodoos force-pushed the link-declarations branch 3 times, most recently from d308268 to e517502 Compare July 16, 2024 16:57
@voodoos voodoos force-pushed the link-declarations branch 3 times, most recently from f6e740f to d96a2b1 Compare August 2, 2024 09:35
Comment thread typing/includemod.ml Outdated
Comment thread typing/includemod.ml Outdated
Comment thread typing/includemod.ml Outdated
slowdown described above.
*)
type positivity = Negative | Positive | Strictly_positive
type directionality = { in_eq : bool; positivity : positivity }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasche , would you mind having a look at this part of the PR to check if it is understandable? (since the authorship of this part of the PR has become fuzzy between me and @voodoos )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what is the current exchange rate between Merlin PRs and Pattern-Matching Bug PRs.

I'll have a look.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look. My comments:

  • I don't know what Directed | Undirected in Cmt_format mean, some documentation comment (with an example!) would help.

  • Looking at includemod.ml and includemod.mli, the first time we encounter this idea of "positivity" is in the mark type, which is scarcely documented, so it's a bit cryptic. I learned more from the implementation comment on type positivity in the .ml file, not necessarily because it is very didactic (it focuses on explaining a subtle trick) but because it has meat and something that serves as an example. I wouldn't mind an explanation of this notion of positivity/direction higher in one of the file, with a small example.

  • I don't understand why mark and positivity are two separate mechanisms to trace directions. Maybe they evolve differently for good (unexplained) reasons? Maybe they are partly redundant with each other? Who knows.

  • Strictly_positive is not documented anywhere that I can see, but my best guess is that it is in fact a combination of Positive and a separate at_toplevel : bool field being true. I suspect that the separate-field presentation would be cleaner.

  • (see inline comment about check_modtype_equiv which I found supsicious)

Comment thread typing/includemod.ml Outdated
@alainfrisch
Copy link
Copy Markdown
Contributor

Do you now if that feature is actively used today ?

I confirm we still use it as part of our internal version of dead_code_analyzer (not synced with the public version), but don't let this block progress! I didn't follow developments around Uid, but I guess we should be able to switch to using them instead of declarations.

@Octachron
Copy link
Copy Markdown
Member

@gasche , the positivity/negativity information is the usual concept of positive/negative position for subtyping used for parameter variance. In particular, strict positivity is the same in both case sand denotes the absence of double negations.

For modules and Merlin, this information is useful because it corresponds to the only case where the subtyping relationship is matching implementation item with their inferred types to interface item (compared to the evenly-nested funtor argument cases

module F= functor(_:sig
  module G: functor(_:sig
      val non_strict_pos: int end
    end ) -> sig end
 end) -> struct  let strict_pos = 0 end

)

We have added a new commit which better separates the semantics of the marking condition and the positivity tracking.
Moreover, all information related to "directionality" of the subtyping relationship is now grouped in a single record.

With this refactoring is seemed clearer to simplify the mark argument used in various function into a boolean a remove the mark type from the interface of Includemod. A possible alternative would be to expose the directionality type and its four possible starting values.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the new commit that moves "mark" and "positivity" to be in a common submodule, and I like the new result better. Plus you did write some extra documentation, which is nice, thanks! I think that the current state is okay, as far as understandability by others is concerned.

Comment thread typing/includemod.ml
| Mark_negative | Mark_neither -> false
The [posivity] field is used in the [cmt_declaration_dependencies] to
distinguish between directed and undirected edges, and to avoid recording
matched declarations twice.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean [pos] rather than [positivity]. It is unclear to me how the [pos] type "distinguishes between directed and undirected edges". (Or what this means in the first place. I suppose the "edges" are the dependencies, but what does it mean for a dependency to be undirected? Maybe this is better explained somewhere else, but then you could refer to that place.)

Edit: in fact maybe you don't need to document cmt_declaration_dependencies here anymore: now that the pos information is necessarily to correctly "mark" the submodules, it is used for strictly more than that.

Comment thread typing/includemod.ml
are inside an auxiliary check function.)

The [in_eq] field is [true] when we are checking both directions inside of
module types which allows optimizing module type equality checks. The module
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this sentence as "inside (module types which allows optimizing module type equality checks)", as if certain module types allow this and others don't. I would rephrase this, for example as follows:

The [in_eq] field is [true] when we are checking one direction of module type equality, rather than subtyping.

Comment thread typing/includemod.ml Outdated
match dir.pos with
| Negative -> (Cmt_format.Undirected, elt2, elt1)
| Positive -> (Cmt_format.Undirected, elt1, elt2)
| Strictly_positive -> (Cmt_format.Directed, elt1, elt2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not strike me as something that clearly should be in this module. (It is not obvious why the notion of direction of module-checking would depend on the notion of dependency edges in the cmt format.) Given that it is only used once that I can see, maybe you could move the logic to the caller, which is still inside includemod.ml but not inside Directionality anymore. Later one maybe we would move the logic further away and change the dependency structures between these modules -- or maybe not.

Comment thread testsuite/tests/uid-deps/link_intf_impl.reference Outdated
Comment thread file_formats/cmt_format.mli Outdated
@Octachron Octachron added this to the 5.3 milestone Sep 13, 2024
@Octachron
Copy link
Copy Markdown
Member

With @gasche agreeing with the refactoring inside Includemod, the PR looks fine to me. I propose to move forward and merge this PR (and cherry-pick it to 5.3) once we agree on the names for the edge labels in the dependency table.

@Octachron Octachron merged commit b951207 into ocaml:trunk Sep 19, 2024
Octachron added a commit that referenced this pull request Sep 19, 2024
[tooling] Remember linked declarations

(cherry picked from commit b951207)
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 8, 2025
[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 8, 2025
[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 8, 2025
[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 8, 2025
[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 9, 2025
[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 10, 2025
[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 10, 2025
[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 10, 2025
Resolve conflicts from  ocaml/ocaml#13308

[tooling] Remember linked declarations
voodoos added a commit to voodoos/flambda-backend that referenced this pull request Apr 10, 2025
Resolve conflicts from  ocaml/ocaml#13308

[tooling] Remember linked declarations
voodoos added a commit to voodoos/ocaml that referenced this pull request Apr 11, 2025
…ures.

The bug was most certainly introduced in ocaml#13308

Illustrates issue ocaml#13955
voodoos added a commit to voodoos/ocaml that referenced this pull request Apr 11, 2025
…ures.

The bug was most certainly introduced in ocaml#13308

Illustrates issue ocaml#13955
voodoos added a commit to voodoos/ocaml that referenced this pull request Jun 3, 2025
…ures.

The bug was most certainly introduced in ocaml#13308

Illustrates issue ocaml#13955
voodoos added a commit to voodoos/ocaml that referenced this pull request Jun 4, 2025
…ures.

The bug was most certainly introduced in ocaml#13308

Illustrates issue ocaml#13955

Co-authored-by: Florian Angeletti <florian.angeletti@inria.fr>
NickBarnes pushed a commit to NickBarnes/ocaml that referenced this pull request Jul 1, 2025
* Merge pull request ocaml#13286 from voodoos/distinct-uids-for-interfaces

[tooling] Distinct uids for interfaces

* Merge pull request ocaml#13308 from voodoos/link-declarations

[tooling] Remember linked declarations

* Store declaration dependencies in CMS files

* Backport directionality fix from upstream ocaml#13956

* Undo format change
fantazio added a commit to fantazio/dead_code_analyzer that referenced this pull request Dec 9, 2025
Since the change from `cmt_value_dependencies` to
`cmt_declaration_dependencies`, the field holds more than value deps
(see ocaml/ocaml#13308). In particular, it now
also holds the link between the class def in `.ml` and decl in `.mli`.
Consequently, during `prepare` in `DeadCode`, the def and decl locations
are added to the `DeadObj.equals` table. They used to be added later on,
during the `.cmt` analysis, in `DeadObj.tstr` : when `last_class` is the
def and the `short` or `path`'s loc is the decl. Adding values to the
equals table achieved through `add_equal` which, as a side effect,
updates `last_class` to the second parameter if it was the first one.
Because the equality is not added at the same time using the new
`cmt_declaration_dependencies`, the side effect on `last_class` is also
not happening at the same time as before, leading to `last_class` and
the `expr`'s `loc` to be different in `collect_references` when they
should be equal.
Using the `repr_loc` in `DeadObj.tstr` for `last_class` enforces it to
be the loc of the decl again.
fantazio added a commit to fantazio/dead_code_analyzer that referenced this pull request Dec 9, 2025
Since the change from `cmt_value_dependencies` to
`cmt_declaration_dependencies`, the field holds more than value deps
(see ocaml/ocaml#13308). In particular, it now
also holds the link between the class def in `.ml` and decl in `.mli`.
Consequently, during `prepare` in `DeadCode`, the def and decl locations
are added to the `DeadObj.equals` table. They used to be added later on,
during the `.cmt` analysis, in `DeadObj.tstr` : when `last_class` is the
def and the `short` or `path`'s loc is the decl. Adding values to the
equals table achieved through `add_equal` which, as a side effect,
updates `last_class` to the second parameter if it was the first one.
Because the equality is not added at the same time using the new
`cmt_declaration_dependencies`, the side effect on `last_class` is also
not happening at the same time as before, leading to `last_class` and
the `expr`'s `loc` to be different in `collect_references` when they
should be equal.
Using the `repr_loc` in `DeadObj.tstr` for `last_class` enforces it to
be the loc of the decl again.
fantazio added a commit to fantazio/dead_code_analyzer that referenced this pull request Dec 9, 2025
Since the change from `cmt_value_dependencies` to
`cmt_declaration_dependencies`, the field holds more than value deps
(see ocaml/ocaml#13308). In particular, it now
also holds the link between the class def in `.ml` and decl in `.mli`.
Consequently, during `prepare` in `DeadCode`, the def and decl locations
are added to the `DeadObj.equals` table. They used to be added later on,
during the `.cmt` analysis, in `DeadObj.tstr` : when `last_class` is the
def and the `short` or `path`'s loc is the decl. Adding values to the
equals table achieved through `add_equal` which, as a side effect,
updates `last_class` to the second parameter if it was the first one.
Because the equality is not added at the same time using the new
`cmt_declaration_dependencies`, the side effect on `last_class` is also
not happening at the same time as before, leading to `last_class` and
the `expr`'s `loc` to be different in `collect_references` when they
should be equal.
Using the `repr_loc` in `DeadObj.tstr` for `last_class` enforces it to
be the loc of the decl again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants