Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve load path performance #1323

Merged
merged 7 commits into from
Dec 8, 2021
Merged

Improve load path performance #1323

merged 7 commits into from
Dec 8, 2021

Conversation

lpw25
Copy link
Contributor

@lpw25 lpw25 commented Apr 20, 2021

This PR makes 4 changes to speed up the handling of the load path:

  1. It restores the use of Directory_content_cache for the load path -- which seems to have been removed by accident in the switch to 4.12.
  2. It changes where 'Mocaml.setup_config` is called to ensure that it is only called once per command.
  3. It uses a hash table for the load path instead of a reference to a map.
  4. It reuses the previous hash table when the load path hasn't changed and none of it's directories' contents have changed.

On a randomly chosen command in a library with a large load path, each of these changes is worth around 0.1s. #1309 is also worth about 0.1s. Between them these changes take that command's runtime from 0.7s to 0.25s. (This is for the second call to the command, the initial call to the command is more like 1.9s vs 1.7s).

I also have a version of this branch for 4.11, which is the one I actually tested.

@lpw25 lpw25 changed the title Fast load path Improve load path performance Apr 20, 2021
let-def
let-def previously approved these changes Apr 21, 2021
Copy link
Contributor

@let-def let-def left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There a few things that could be further cleaned up but I am happy with the patch. Don't worry with the comments I left if you are busy, we can do this ourselves.

src/kernel/mreader.ml Show resolved Hide resolved
src/ocaml/utils/load_path.ml Outdated Show resolved Hide resolved
@lpw25
Copy link
Contributor Author

lpw25 commented Apr 27, 2021

I've updated the PR to match some adjustments that were made to the compiler PR, and I've addressed one of the comments.

@voodoos voodoos added this to Todo in Backport to 411 via automation May 5, 2021
@lpw25
Copy link
Contributor Author

lpw25 commented May 14, 2021

Rebased and CHANGES entry added

@voodoos voodoos added this to Todo in Backport to 3.4 via automation Jun 15, 2021
@let-def let-def self-requested a review June 21, 2021 00:56
@voodoos voodoos added this to the Release 4.3 / 3.6 milestone Jul 13, 2021
@voodoos voodoos removed this from the Release 4.3 / 3.6 milestone Jul 19, 2021
@let-def
Copy link
Contributor

let-def commented Jul 29, 2021

It is not correct to setup the loadpath only in the reader.

For instance, if we switch back-and-forth between two files with different build environments without changing them, the reader will be reused so Mocaml.setup_config will be invoked only the first time. The second time, the typer might run in the wrong environment.

Since it is Load_path.init that is expensive and only the typechecker needs it, I suggest to do it only when setting up the environment for the typechecker.

diff --git a/src/kernel/mocaml.ml b/src/kernel/mocaml.ml
index 77f1e2aa..6b4cd38d 100644
--- a/src/kernel/mocaml.ml
+++ b/src/kernel/mocaml.ml
@@ -25,12 +25,11 @@ let is_current_state state = match !current_state with
 
 (* Build settings *)
 
-let setup_config config = (
+let setup_reader_config config = (
   assert Local_store.(is_bound ());
   let open Mconfig in
   let open Clflags in
   let ocaml = config.ocaml in
-  Load_path.init (Mconfig.build_path config);
   Env.set_unit_name (Mconfig.unitname config);
   Location.input_name  := config.query.filename;
   fast                 := ocaml.unsafe ;
@@ -46,6 +45,11 @@ let setup_config config = (
   open_modules         := ocaml.open_modules ;
 )
 
+let setup_typer_config config = (
+  setup_reader_config config;
+  Load_path.init (Mconfig.build_path config);
+)
+
 (** Switchable implementation of Oprint *)
 
 let default_out_value          = !Oprint.out_value
diff --git a/src/kernel/mocaml.mli b/src/kernel/mocaml.mli
index ffb67072..3a8fb6d5 100644
--- a/src/kernel/mocaml.mli
+++ b/src/kernel/mocaml.mli
@@ -6,7 +6,8 @@ val with_state : typer_state -> (unit -> 'a) -> 'a
 val is_current_state : typer_state -> bool
 
 (* Build settings *)
-val setup_config : Mconfig.t -> unit
+val setup_reader_config : Mconfig.t -> unit
+val setup_typer_config : Mconfig.t -> unit
 
 (* Replace Outcome printer *)
 val default_printer :
diff --git a/src/kernel/mpipeline.ml b/src/kernel/mpipeline.ml
index f6add7e0..c9700e0d 100644
--- a/src/kernel/mpipeline.ml
+++ b/src/kernel/mpipeline.ml
@@ -156,7 +156,7 @@ let process
   let reader = timed_lazy reader_time (lazy (
       let lazy source = source in
       let config = Mconfig.normalize config in
-      Mocaml.setup_config config;
+      Mocaml.setup_reader_config config;
       let result = Mreader.parse ?for_completion config source in
       result, config
     )) in
@@ -169,6 +169,7 @@ let process
     )) in
   let typer = timed_lazy typer_time (lazy (
       let lazy { Ppx. config; parsetree; _ } = ppx in
+      Mocaml.setup_typer_config config;
       let result = Mtyper.run config parsetree in
       let errors = timed_lazy error_time (lazy (Mtyper.get_errors result)) in
       { Typer. errors; result }

Longer term, it might be better to think about a better way to do incrementality and rewrite this Load_path accordingly. But it will make maintenance more difficult, the current trade-off might not be so bad.

@lpw25
Copy link
Contributor Author

lpw25 commented Nov 9, 2021

Why is this still not merged?

@let-def
Copy link
Contributor

let-def commented Nov 22, 2021

I still had bug with it, I was able to track them down to the use of the hash table, but I am not able to explain where it is going wrong. When I stopped looking at that problem, I was thinking of rewriting parts of Load_path...
I will resume work on it on Wednesday, but the patch cannot be merged as it is.

@voodoos voodoos dismissed let-def’s stale review November 22, 2021 10:35

Not yet ready according to last comment

@let-def
Copy link
Contributor

let-def commented Nov 25, 2021

Finally, I think I nailed the issue (I will test for a few days). 🎆
I wrote a bug post-mortem in the commit message:

When an inconsistent environment is detected (a loaded module changed,
directly or because the way its path is resolved changed), the local
store is reset.

This has the unfortunate side-effect of resetting the Load_path storage.

Before the Load_path module, all caching was done by merlin based on the
value of Config.load_path, outside of the Local_store.

The code has been wrong since the introduction of Load_path: its
internal state was always reset.
However, Merlin frontend was "defensively" repopulating it, which
explained some of the slowdown fixed by PR#1323: most of the time this
had no other effect than scanning the file system twice, but if the
state had been reset since last call, the cache would be repopulated.

The bug was particularly hard to track because there is a background job
in Merlin that remove old entries from the cache: if a file hasn't been
used for 5 minutes, its deps are removed from the cache and its
environment is considered inconsistent, even if nothing has changed.
We would then have to wait more than 5 minutes to reproduce this bug.

This can be tweaked by changing Server.server_accept loop in
ocamlmerlin_server.ml:

-      Mocaml.flush_caches ~older_than:300.0 ();
+      Mocaml.flush_caches ~older_than:15.0 ();

for instance, to set a 15 seconds cache.

Fundamentally, the issue is due to Local_store scope being limited to
the typechecker state, while the load_path was considered as part of the
configuration, at least originally.
That way, resetting the Local_store would set the typer in a fresh state
ready to type with the current configuration.
The introduction of Load_path moved the state out of the global Config
and to the Local_store. Now, resetting the typer also resets the load
path. The code that used to work would now typecheck in an empty
path/environment.

@let-def
Copy link
Contributor

let-def commented Nov 25, 2021

On my fork, I rebased this branch on top of current master for testing: https://github.com/let-def/merlin/tree/fast-load-path
I will update this PR later if it works as expected.

lpw25 and others added 7 commits December 8, 2021 17:17
The typer needs to be executed with proper settings. It is too expensive
to setup load path for reader and typer, but the reader should not
depend on it (AFAICT). By setting the load path only for the typer, we
don't pay the cost twice and are executing each part of the pipeline in
the environment they expect.
When an inconsistent environment is detected (a loaded module changed,
directly or because the way its path is resolved changed), the local
store is reset.

This has the unfortunate side-effect of resetting the Load_path storage.

Before the Load_path module, all caching was done by merlin based on the
value of Config.load_path, outside of the Local_store.

The code has been wrong since the introduction of Load_path: its
internal state was always reset.
However, Merlin frontend was "defensively" repopulating it, which
explained some of the slowdown fixed by PR#1323: most of the time this
had no other effect than scanning the file system twice, but if the
state had been reset since last call, the cache would be repopulated.

The bug was particularly hard to track because there is a background job
in Merlin that remove old entries from the cache: if a file hasn't been
used for 5 minutes, its deps are removed from the cache and its
environment is considered inconsistent, even if nothing has changed.
We would then have to wait more than 5 minutes to reproduce this bug.

This can be tweaked by changing Server.server_accept loop in
ocamlmerlin_server.ml:
-      Mocaml.flush_caches ~older_than:300.0 ();
+      Mocaml.flush_caches ~older_than:15.0 ();
for instance, to set a 15 seconds cache.

Fundamentally, the issue is due to Local_store scope being limited to
the typechecker state, while the load_path was considered as part of the
configuration, at least originally.
That way, resetting the Local_store would set the typer in a fresh state
ready to type with the current configuration.
The introduction of Load_path moved the state out of the global Config
and to the Local_store. Now, resetting the typer also resets the load
path. The code that used to work would now typecheck in an empty
path/environment.
Copy link
Contributor

@let-def let-def left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the few problems I had and it has now been running smoothly for two weeks.
I will merge the PR asap.

@let-def let-def merged commit d136be6 into ocaml:master Dec 8, 2021
Backport to 411 automation moved this from Planned for backporting to Ready for backport Dec 8, 2021
@voodoos voodoos added this to Planned for backporting in Backport to 412 via automation Dec 9, 2021
@voodoos voodoos moved this from Planned for backporting to Ready for backport in Backport to 412 Dec 9, 2021
@voodoos voodoos moved this from Ready for backport to Planned for backporting in Backport to 412 Mar 29, 2022
@voodoos voodoos moved this from Ready for backport to Planned for backporting in Backport to 411 Mar 29, 2022
voodoos added a commit to voodoos/merlin that referenced this pull request Mar 30, 2022
voodoos pushed a commit to voodoos/merlin that referenced this pull request Mar 30, 2022
Restore use of Directory_content_cache in load path

Only setup config once

Reuse hash table when load path hasn't changed

Add CHANGES entry

Setup load path only for typer

The typer needs to be executed with proper settings. It is too expensive
to setup load path for reader and typer, but the reader should not
depend on it (AFAICT). By setting the load path only for the typer, we
don't pay the cost twice and are executing each part of the pipeline in
the environment they expect.

Restore the load path when resetting the local store

When an inconsistent environment is detected (a loaded module changed,
directly or because the way its path is resolved changed), the local
store is reset.

This has the unfortunate side-effect of resetting the Load_path storage.

Before the Load_path module, all caching was done by merlin based on the
value of Config.load_path, outside of the Local_store.

The code has been wrong since the introduction of Load_path: its
internal state was always reset.
However, Merlin frontend was "defensively" repopulating it, which
explained some of the slowdown fixed by PR#1323: most of the time this
had no other effect than scanning the file system twice, but if the
state had been reset since last call, the cache would be repopulated.

The bug was particularly hard to track because there is a background job
in Merlin that remove old entries from the cache: if a file hasn't been
used for 5 minutes, its deps are removed from the cache and its
environment is considered inconsistent, even if nothing has changed.
We would then have to wait more than 5 minutes to reproduce this bug.

This can be tweaked by changing Server.server_accept loop in
ocamlmerlin_server.ml:
-      Mocaml.flush_caches ~older_than:300.0 ();
+      Mocaml.flush_caches ~older_than:15.0 ();
for instance, to set a 15 seconds cache.

Fundamentally, the issue is due to Local_store scope being limited to
the typechecker state, while the load_path was considered as part of the
configuration, at least originally.
That way, resetting the Local_store would set the typer in a fresh state
ready to type with the current configuration.
The introduction of Load_path moved the state out of the global Config
and to the Local_store. Now, resetting the typer also resets the load
path. The code that used to work would now typecheck in an empty
path/environment.
@voodoos voodoos moved this from Planned for backporting to 4.5-412 in Backport to 412 Mar 30, 2022
voodoos added a commit to voodoos/opam-repository that referenced this pull request Apr 5, 2022
CHANGES:

Tue Apr  5 20:59:42 CEST 2020

  + merlin binary
    - don't reset the environment when running merlin in single mode so that the
      parent environement is forwarded the the child processes (ocaml/merlin#1425)
    - filter dups in source paths (ocaml/merlin#1218)
    - improve load path performance (ocaml/merlin#1323)
    - fix handlink of ppx's under Windows (ocaml/merlin#1413)
    - locate: look for original source files before looking for preprocessed
      files (ocaml/merlin#1219 by @ddickstein, fixes ocaml/merlin#894)
    - handle `=` syntax in compiler flags (ocaml/merlin#1409)
    - expose all destruct exceptions in the api (ocaml/merlin#1437)
    - fix superfluous break in error reporting (ocaml/merlin#1432)
    - recognise binding operators in locate and occurrences (ocaml/merlin#1398, @mattiase)
    - remove dependency on Result (ocaml/merlin#1441, @kit-ty-kate)
  + editor modes
    - fix an issue in Neovim where the current line jumps to the top of the
      window on repeated calls to `MerlinTypeOf` (ocaml/merlin#1433 by @ddickstein, fixes
      ocaml/merlin#1221)
    - add module, module type, and class imenu items for emacs (ocaml/merlin#1244, @ivg)
    - add prefix argument to force or prevent opening in a new buffer in locate
      command (ocaml/merlin#1426, @panglesd)
    - add type-on-hover functionality for vim (ocaml/merlin#1439, @nilsbecker)
    - add a dedicated buffer `*merlin-errors*` containing the last viewed error
      (ocaml/merlin#1414, @panglesd)
  + test suite
    - make `merlin-wrapper` create a default `.merlin` file  only when there is
      no `dune-project` to let tests use `dune ocaml-merlin` reader. (ocaml/merlin#1425)
    - cover locate calls on module aliases with and without dune
    - Add a test expliciting the interaction between locate and Dune's generated
      source files (ocaml/merlin#1444)
voodoos added a commit to voodoos/opam-repository that referenced this pull request Apr 5, 2022
CHANGES:

Tue Apr  5 21:12:42 CEST 2022

  + merlin binary
    - don't reset the environment when running merlin in single mode so that the
      parent environement is forwarded the the child processes (ocaml/merlin#1425)
    - locate: look for original source files before looking for preprocessed
      files (ocaml/merlin#1219 by @ddickstein, fixes ocaml/merlin#894)
    - fix handlink of ppx's under Windows (ocaml/merlin#1413)
    - handle `=` syntax in compiler flags (ocaml/merlin#1409)
    - fix superfluous break in error reporting (ocaml/merlin#1432)
    - recognise binding operators in locate and occurrences (ocaml/merlin#1398, @mattiase)
    - improve load path performance (ocaml/merlin#1323)
    - remove dependency on Result (ocaml/merlin#1441, @kit-ty-kate)
  + editor modes
    - fix an issue in Neovim where the current line jumps to the top of the
      window on repeated calls to `MerlinTypeOf` (ocaml/merlin#1433 by @ddickstein, fixes
      ocaml/merlin#1221)
    - add module, module type, and class imenu items for emacs (ocaml/merlin#1244, @ivg)
    - add prefix argument to force or prevent opening in a new buffer in locate
      command (ocaml/merlin#1426, @panglesd)
    - add type-on-hover functionality for vim (ocaml/merlin#1439, @nilsbecker)
    - add a dedicated buffer `*merlin-errors*` containing the last viewed error
      (ocaml/merlin#1414, @panglesd)
  + test suite
    - make `merlin-wrapper` create a default `.merlin` file  only when there is
      no `dune-project` to let tests use `dune ocaml-merlin` reader. (ocaml/merlin#1425)
voodoos added a commit to voodoos/opam-repository that referenced this pull request Apr 5, 2022
CHANGES for 414:

Tue Apr  5 20:51:42 CEST 2022

  + merlin binary
    - don't reset the environment when running merlin in single mode so that the
      parent environement is forwarded the the child processes (ocaml/merlin#1425)
    - filter dups in source paths (ocaml/merlin#1218)
    - improve load path performance (ocaml/merlin#1323)
    - fix handlink of ppx's under Windows (ocaml/merlin#1413)
    - locate: look for original source files before looking for preprocessed
      files (ocaml/merlin#1219 by @ddickstein, fixes ocaml/merlin#894)
    - handle `=` syntax in compiler flags (ocaml/merlin#1409)
    - expose all destruct exceptions in the api (ocaml/merlin#1437)
    - fix superfluous break in error reporting (ocaml/merlin#1432)
    - recognise binding operators in locate and occurrences (ocaml/merlin#1398, @mattiase)
    - remove dependency on Result (ocaml/merlin#1441, @kit-ty-kate)
    - use the new "shapes" generated by the compiler to perform precise
      jump-to-definition (ocaml/merlin#1431)
  + editor modes
    - fix an issue in Neovim where the current line jumps to the top of the
      window on repeated calls to `MerlinTypeOf` (ocaml/merlin#1433 by @ddickstein, fixes
      ocaml/merlin#1221)
    - add module, module type, and class imenu items for emacs (ocaml/merlin#1244, @ivg)
    - add prefix argument to force or prevent opening in a new buffer in locate
      command (ocaml/merlin#1426, @panglesd)
    - add type-on-hover functionality for vim (ocaml/merlin#1439, @nilsbecker)
    - add a dedicated buffer `*merlin-errors*` containing the last viewed error
      (ocaml/merlin#1414, @panglesd)
  + test suite
    - make `merlin-wrapper` create a default `.merlin` file  only when there is
      no `dune-project` to let tests use `dune ocaml-merlin` reader. (ocaml/merlin#1425)
    - cover locate calls on module aliases with and without dune
    - Add a test expliciting the interaction between locate and Dune's generated
      source files (ocaml/merlin#1444)

CHANGES for 413:

Tue Apr  5 20:59:42 CEST 2022

  + merlin binary
    - don't reset the environment when running merlin in single mode so that the
      parent environement is forwarded the the child processes (ocaml/merlin#1425)
    - filter dups in source paths (ocaml/merlin#1218)
    - improve load path performance (ocaml/merlin#1323)
    - fix handlink of ppx's under Windows (ocaml/merlin#1413)
    - locate: look for original source files before looking for preprocessed
      files (ocaml/merlin#1219 by @ddickstein, fixes ocaml/merlin#894)
    - handle `=` syntax in compiler flags (ocaml/merlin#1409)
    - expose all destruct exceptions in the api (ocaml/merlin#1437)
    - fix superfluous break in error reporting (ocaml/merlin#1432)
    - recognise binding operators in locate and occurrences (ocaml/merlin#1398, @mattiase)
    - remove dependency on Result (ocaml/merlin#1441, @kit-ty-kate)
  + editor modes
    - fix an issue in Neovim where the current line jumps to the top of the
      window on repeated calls to `MerlinTypeOf` (ocaml/merlin#1433 by @ddickstein, fixes
      ocaml/merlin#1221)
    - add module, module type, and class imenu items for emacs (ocaml/merlin#1244, @ivg)
    - add prefix argument to force or prevent opening in a new buffer in locate
      command (ocaml/merlin#1426, @panglesd)
    - add type-on-hover functionality for vim (ocaml/merlin#1439, @nilsbecker)
    - add a dedicated buffer `*merlin-errors*` containing the last viewed error
      (ocaml/merlin#1414, @panglesd)
  + test suite
    - make `merlin-wrapper` create a default `.merlin` file  only when there is
      no `dune-project` to let tests use `dune ocaml-merlin` reader. (ocaml/merlin#1425)
    - cover locate calls on module aliases with and without dune
    - Add a test expliciting the interaction between locate and Dune's generated
      source files (ocaml/merlin#1444)

CHANGES for 412:

Tue Apr  5 21:12:42 CEST 2022

  + merlin binary
    - don't reset the environment when running merlin in single mode so that the
      parent environement is forwarded the the child processes (ocaml/merlin#1425)
    - locate: look for original source files before looking for preprocessed
      files (ocaml/merlin#1219 by @ddickstein, fixes ocaml/merlin#894)
    - fix handlink of ppx's under Windows (ocaml/merlin#1413)
    - handle `=` syntax in compiler flags (ocaml/merlin#1409)
    - fix superfluous break in error reporting (ocaml/merlin#1432)
    - recognise binding operators in locate and occurrences (ocaml/merlin#1398, @mattiase)
    - improve load path performance (ocaml/merlin#1323)
    - remove dependency on Result (ocaml/merlin#1441, @kit-ty-kate)
  + editor modes
    - fix an issue in Neovim where the current line jumps to the top of the
      window on repeated calls to `MerlinTypeOf` (ocaml/merlin#1433 by @ddickstein, fixes
      ocaml/merlin#1221)
    - add module, module type, and class imenu items for emacs (ocaml/merlin#1244, @ivg)
    - add prefix argument to force or prevent opening in a new buffer in locate
      command (ocaml/merlin#1426, @panglesd)
    - add type-on-hover functionality for vim (ocaml/merlin#1439, @nilsbecker)
    - add a dedicated buffer `*merlin-errors*` containing the last viewed error
      (ocaml/merlin#1414, @panglesd)
  + test suite
    - make `merlin-wrapper` create a default `.merlin` file  only when there is
      no `dune-project` to let tests use `dune ocaml-merlin` reader. (ocaml/merlin#1425)

CHANGES for 411:

Tue Apr 5 21:17:21 PM CET 2022

  + merlin binary
    - don't reset the environment when running merlin in single mode so that the
      parent environement is forwarded the the child processes (ocaml/merlin#1425)
    - locate: look for original source files before looking for preprocessed
      files (ocaml/merlin#1219 by @ddickstein, fixes ocaml/merlin#894)
    - fix handling of ppx's under Windows (ocaml/merlin#1413)
    - handle `=` syntax in compiler flags (ocaml/merlin#1409)
    - fix superfluous break in error reporting (ocaml/merlin#1432)
    - recognise binding operators in locate and occurrences (ocaml/merlin#1398, @mattiase)
    - remove dependency on Result (ocaml/merlin#1441, @kit-ty-kate)
  + editor modes
    - update quick setup instructions for emacs (ocaml/merlin#1380, @ScriptDevil)
    - fix an issue in Neovim where the current line jumps to the top of the
      window on repeated calls to `MerlinTypeOf` (ocaml/merlin#1433 by @ddickstein, fixes
      ocaml/merlin#1221)
    - add module, module type, and class imenu items for emacs (ocaml/merlin#1244, @ivg)
    - add prefix argument to force or prevent opening in a new buffer in locate
      command (ocaml/merlin#1426, @panglesd)
    - add type-on-hover functionality for vim (ocaml/merlin#1439, @nilsbecker)
    - add a dedicated buffer `*merlin-errors*` containing the last viewed error
      (ocaml/merlin#1414, @panglesd)
  + test suite
    - make `merlin-wrapper` create a default `.merlin` file  only when there is
      no `dune-project` to let tests use `dune ocaml-merlin` reader. (ocaml/merlin#1425)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Backport to 411
Planned for backporting
Development

Successfully merging this pull request may close these issues.

None yet

4 participants