Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dune build -w restricting job number for no reason #5549

Open
Alizter opened this issue Apr 6, 2022 · 40 comments
Open

dune build -w restricting job number for no reason #5549

Alizter opened this issue Apr 6, 2022 · 40 comments
Labels

Comments

@Alizter
Copy link
Collaborator

Alizter commented Apr 6, 2022

Problem description

There seems to be an issue with dune build -w. It can be observably slow due to a restriction of jobs. This can be seen as slow, since stopping watch mode and doing a fresh dune build will be much faster. The issue seems to be during the rule finding phase.

Here is a demonstration GIF of this behaviour. Details of what is happening are below:

dune_build_watch_slowdown_rule_finding

  1. dune build -w is run, and it works as expected. Most of the time is spent rule finding, but due to cache and everything already being built, it finishes quickly.
  2. Next whilst watch mode is still running, an .ml file early in the dependency tree is changed, triggering a rebuild. Immediately you can observe the number of jobs drop to a low count. What is dune doing here?
  3. To demonstrate that this is an issue with watch mode, a fresh dune build is triggered, and as expected everything builds normally and quickly.

There are some key details with this setup. This is the main Coq repo being changed, and importantly there is ml code being edited which are later being depended on by dune Coq stanzas. Retrying the same test, but this time with the @check target (building only ml code) results in no observable slow down, leading me to believe that this is an issue with the interaction of rule finding and the Coq rules.

Reproduction

  1. Clone the https://github.com/coq/coq repo.
  2. Do dune build -w. This will take a while but once it starts building .v files (see from --display=short or --verbose) you can do the next step.
  3. Edit an ml file, for example kernel/declareops.ml, you can even drop let _hello = "world" inside.
  4. Observe that the rebuild will now restrict itself to a low job count.
  5. Stopping watch mode and rebuilding will speed up the build again.

Specifications

  • Version of dune (output of dune --version): 3.0.3
  • Version of ocaml (output of ocamlc --version) 4.12.1
  • Operating system (distribution and version): Ubuntu

cc @rgrinberg

@ejgallego
Copy link
Collaborator

@Alizter if you do dune build -w theories/Reals/Real.vo for example, that is to say, with a concrete target, do you see the same behavior?

@Alizter
Copy link
Collaborator Author

Alizter commented Apr 6, 2022

@ejgallego Yes.

@rgrinberg rgrinberg added requires-team-discussion This topic requires a team discussion and removed requires-team-discussion This topic requires a team discussion labels Apr 6, 2022
@rgrinberg
Copy link
Member

From the meeting: there must be some mutable state remaining in dune that is causing this. We need to scan the code base once more for mutable state and convert it to Memo

@rgrinberg rgrinberg added the bug label May 4, 2022
@jchavarri
Copy link
Collaborator

I am seeing the same behavior described in the issue for a mid-large size Melange build. After watch mode restarts when the first change happens, the jobs count drops to 1, with very short peaks of 2-3 jobs. At Ahrefs, we use cpus with large parallelism, and most part of the build can be done using 255 parallel jobs, so the performance impact is severe. We would like to move away from external watch tools like watchexec, as they interfere with the recently added lock to prevent multiple build commands (#6360), but the jobs restricting issue would prevent the migration to Dune watch.

We need to scan the code base once more for mutable state and convert it to Memo

Are there any previous examples in PRs to show this type of conversion and how they look like? Also, is there some way to debug the dep graph information? I understand that for a set of rules that can be executed in parallel to become basically a sequence, each rule must at some point start depending on the previous one somehow, starting from the second and subsequent builds?

@Alizter
Copy link
Collaborator Author

Alizter commented Dec 3, 2022

@jchavarri Does this only happen with Melange builds? Have you observed it with OCaml? If not, then that might be an indicator to what is going wrong since Melange and Coq do some similar things.

@jchavarri
Copy link
Collaborator

jchavarri commented Dec 3, 2022

It also happens with OCaml.

Some more observations:

  • During the time where it gets stucked with 1 job, the left number shows a very low number as well, and the % remains always at 99%, e.g.: Done: 99% (908/913, 5 left) (jobs: 1). In the gif from the issue description, it shows 1 left. It looks like the queue only contains the most immediate rules, and it keeps adding new ones constantly after executing the current one?
  • I can reproduce the issue when updating both ml files, but also other kinds, like atd files
  • However, the issue does not happen when modifying any ml file. Some ml files trigger a parallelized build, while others do not. I have not yet figured out any patterns, but the behavior is consistent: if some file change breaks parallelization, it will always do so.

@rgrinberg
Copy link
Member

However, the issue does not happen when modifying any ml file. Some ml files trigger a parallelized build, while others do not. I have not yet figured out any patterns, but the behavior is consistent: if some file change breaks parallelization, it will always do so.

Thanks for the info. Could you find out the following:

  1. The .ml flies that do not cause this problem have mlis?
  2. How many other modules depend on the problematic .ml files?

@jchavarri
Copy link
Collaborator

The .ml flies that do not cause this problem have mlis?

I did not find any relation between ml files having mli or not.

How many other modules depend on the problematic .ml files?

There are hundreds of modules that depend on the .ml files. I noticed that rather than specific ml files being problematic, it seems to be an issue across the library boundary (maybe).

I was able to find a simplified repro using synthetic code, that I put up in https://github.com/jchavarri/dune_repro_issue_5549. Instructions from the readme copied below:

Dune repro issue 5549

  1. Install switch: make create-switch

  2. Run Dune in watch mode: dune build -w @all, notice how the build leverages all cores in cpu.

  3. Modify chunk2/dir_2_1/m2_1_1_1.ml and mli, notice how the build finishes very fast, even if a few modules depend on it.

  4. Modify chunk1/dir_1_1/m1_1_1_1.ml and mli, notice how the build only executes 1 job at a time.

@jchavarri
Copy link
Collaborator

Some added notes:

  • I can reproduce on both macOS and linux
  • the number of jobs is mostly 1, but can go to 2, 3 or larger numbers in short times
  • I looked for refs through the codebase and there are a lot. Many of them in stdune (not sure if they could be affecting the build indirectly). But one that caught my attention is this:

let pending_file_targets = ref Path.Build.Set.empty

Could it be related somehow? Is there a way to debug this to find the problem?

@jchavarri
Copy link
Collaborator

Another note: the bug is reproducible on 3.0.2 but not on 2.9.3 (both installed from opam). Trying to figure out now the exact commit where it regressed.

@jchavarri
Copy link
Collaborator

I can pin the regression down to this commit:

fb3c71d.

@rgrinberg
Copy link
Member

rgrinberg commented Mar 5, 2023

Do you mind trying the following:

  1. Checkout the commit immediately before fb3c71d

  2. Set DUNE_WATCHING_MODE_INCREMENTAL=true

  3. Try to reproduce the bug.

If so, then I would suggest to bisect further but with this environment variable switched on.

@jchavarri
Copy link
Collaborator

I did so, and I could keep reproducing going backwards by using the env variable.

I ended up in these two commits:

The PR from the commit where it breaks is #4422.

@rgrinberg
Copy link
Member

You can try the following experiments to see if it helps with the issue:

  • Change Execution_parameters.default to always return a constant
  • Change execution_parmaters to dune_rules/main.ml to always return a constant
  • Try doing the above 2 at the same time

Perhaps one of these will fix the issue

@jchavarri
Copy link
Collaborator

@rgrinberg I tried multiple things along the lines of what you suggested. The only thing that worked ultimately was changing the implementation of the Settings.get function to this:

let get () =
  let+ (_ : Memo.Run.t) = Memo.current_run () in
  builtin_default

let get () = Memo.Build.return builtin_default wouldn't work.

I am trying now to go forward and identify where this code is in current main to test a potential fix.

@jchavarri
Copy link
Collaborator

I can't find out how to reapply the fix in most recent main. Or I could, but maybe there are other regressions along the way. As there are almost 2 years of changes in between, it's hard to know.

For the record, here's the small patch that fixes the problem when applied to 621e4e2, there's no need to return constants or anything:

$ git diff
diff --git a/src/dune_engine/source_tree.ml b/src/dune_engine/source_tree.ml
index 1b1bbc5f8..362ce6927 100644
--- a/src/dune_engine/source_tree.ml
+++ b/src/dune_engine/source_tree.ml
@@ -377,7 +377,9 @@ module Settings = struct

   let set x = Fdecl.set t x

-  let get () = Fdecl.get t
+  let get () =
+    let* (_ : Memo.Run.t) = Memo.current_run () in
+    Fdecl.get t
 end

 let init = Settings.set

I guess I will try now the opposite direction? Move forward in time, version by version, reapplying this patch and checking if it still fixes the problem.

@jchavarri
Copy link
Collaborator

I have noticed that commenting the following lines in build_system.ml fixes the problem on the repro case I shared back in december:

diff --git a/src/dune_engine/build_system.ml b/src/dune_engine/build_system.ml
index 20d64cfb6..7ab6c074a 100644
--- a/src/dune_engine/build_system.ml
+++ b/src/dune_engine/build_system.ml
@@ -1027,7 +1027,7 @@ end = struct
     let cutoff = Tuple.T2.equal Digest.equal target_kind_equal in
     Memo.create "build-file" ~input:(module Path) ~cutoff build_file_impl

-  let build_file path = Memo.exec build_file_memo path >>| fst
+  let build_file path = build_file_impl path >>| fst

   let build_dir path =
     let+ digest, kind = Memo.exec build_file_memo path in

It seems that unlike the patch in #7224, this change doesn't discard every piece of memoized information completely, but just the "build-file" ones.

However, when using the version of dune with the above patch in a real project, the issue was not fully gone. Given a "leaf" library Z and a library Y that Z depends upon:

  • parallelization would work fine if modules in Y were modified (i.e. Z modules are rebuilt by dune watch in parallel, maximizing the number of jobs), which was already an improvement
  • but I still could reproduce the issue if I modified a module in Z and then update a module in Y.

The second case seems to be fixed by removing memoization from build-alias as well:

diff --git a/src/dune_engine/build_system.ml b/src/dune_engine/build_system.ml
index 7ab6c074a..16ffaac2c 100644
--- a/src/dune_engine/build_system.ml
+++ b/src/dune_engine/build_system.ml
@@ -282,7 +282,8 @@ and Exported : sig
   val build_file_memo : (Path.t, Digest.t * target_kind) Memo.Table.t
     [@@warning "-32"]

-  val build_alias_memo : (Alias.t, Dep.Fact.Files.t) Memo.Table.t
+  val build_alias_memo : Alias.t -> Dep.Fact.Files.t Memo.t
+
     [@@warning "-32"]

   val dep_on_alias_definition :
@@ -1037,12 +1038,9 @@ end = struct
       Code_error.raise "build_dir called on a file target"
         [ ("path", Path.to_dyn path) ]

-  let build_alias_memo =
-    Memo.create "build-alias"
-      ~input:(module Alias)
-      ~cutoff:Dep.Fact.Files.equal build_alias_impl
+  let build_alias_memo = build_alias_impl

-  let build_alias = Memo.exec build_alias_memo
+  let build_alias = build_alias_memo

   let execute_rule_memo =
     Memo.create "execute-rule"

Are these 2 optimizations build-file and build-alias critical for dune builds? From what I could gather, there was no impact on build time (just impression, not real measurements yet).

@rgrinberg Is there anything else I can log, measure or debug considering the above information?

@rgrinberg
Copy link
Member

Did you actually observe a speed up from these patches? Yes, removing memoization will make dune run more things in parallel, but that memoization was probably preventing them from running at all in the first place. So perhaps it was not faster all to remove it.

Are these 2 optimizations build-file and build-alias critical for dune builds? From what I could gather, there was no impact on build time (just impression, not real measurements yet).

They're pretty critical because they make sure dune doesn't rebuild rules or aliases in the same build.

Is there anything else I can log, measure or debug considering the above information?

I would say that it would be very valuable to get some perf measurements on what dune is doing exactly when it's building things serially.

@rgrinberg
Copy link
Member

Btw, what's missing from this ticket is some analysis as to why you think dune's missing opportunities for parallelism. In particular, we still aren't sure if this is in fact a bug in the engine or just how the rules are. It's not enough to have 256 cores and see 1 job running to determine that dune's is not building concurrently enough.

We need to demonstrate where which two rules it fails to parallelize. It's a bit of a shame we don't have any good tools in dune to do this, but I'd be more than happy to add them if it was possible. For now, I would suggest to sure --trace-file to look for this spot where dune fails to make the build concurrent.

@jchavarri
Copy link
Collaborator

Did you actually observe a speed up from these patches?

Yes, I definitely do. Note that the issue always occur when modifying files in a library Y that another library Z depends upon. In those cases, due to the way library dependencies are defined, all modules in Z will be rebuilt. So with the patches applied, if Z has a few hundred modules, you get them built in very short time with 255 jobs, while it can take almost a minute without the patch, as dune will build 1 module at a time. Some specific numbers in the traces shared below.

Btw, what's missing from this ticket is some analysis as to why you think dune's missing opportunities for parallelism.

Because I can see dune builds the modules from the library in a sequential way, when it is clear they can be built in parallel when running dune in regular (non-watch) mode, or with the patch.

For now, I would suggest to sure --trace-file to look for this spot where dune fails to make the build concurrent.

Find below a couple of traces, with and without the patch. In both cases I am running dune watch using the repro I shared above. I start dune watch with a failing build (modifying a module in the chunk1 library), then fixing that module so the build passes. I stopped the watch process as soon as the build passes.

traces.zip

@rgrinberg
Copy link
Member

I think I understand the issue now. The problem stems from how we invalidate dependency nodes. We do so by scanning the nodes one by one until we find a node that is out of date or fails the cutoff predicate. This scan is done linearly because the node is invalid even if a single dependency is invalid, and we want to avoid scanning and evaluating dependency nodes as soon as we find out our node is invalidated. Unfortunately, this check can evaluate many dependency nodes linearly and as a result completely undo the concurrency of the underlying memoized function.

@snowleopard Have you ran into this problem at all internally?

@snowleopard
Copy link
Collaborator

Oh, this seems like a problem indeed. I will try to reproduce the performance difference internally.

@snowleopard
Copy link
Collaborator

snowleopard commented Mar 8, 2023

I couldn't yet reproduce the problem internally but I'll keep trying.

@rgrinberg My understanding is that you think the problem is related to this comment:

dune/src/memo/memo.ml

Lines 422 to 450 in a0dd515

; (* The list of dependencies [deps], as captured at [last_validated_at].
Note that the list of dependencies can change over the lifetime of
[Cached_value]: this happens if the value gets re-computed but is
declared unchanged by the cutoff check.
Note that [deps] should be listed in the order in which they were
depended on to avoid recomputations of the dependencies that are no
longer relevant (see an example below). Asynchronous functions induce
a partial (rather than a total) order on dependencies, and so [deps]
should be a linearisation of this partial order. It is also worth
noting that the problem only occurs with dynamic dependencies,
because static dependencies can never become irrelevant.
As an example, consider the function [let f x = let y = g x in h y].
The correct order of dependencies of [f 0] is [g 0] and then [h y1],
where [y1] is the result of computing [g 0] in the first build run.
Now consider the situation where (i) [h y1] is incorrectly listed
first in [deps], and (ii) both [g] and [h] have changed in the second
build run (e.g. because they read modified files). To determine that
[f] needs to be recomputed, we start by recomputing [h y1], which is
likely to be a waste because now we are really interested in [h y2],
where [y2] is the result of computing [g 0] in the second run. Had we
listed [g 0] first, we would recompute it and the work wouldn't be
wasted since [f 0] does depend on it.
Another important reason to list [deps] according to a linearisation
of the dependency order is to eliminate spurious dependency
cycles. *)
mutable deps : Deps.t

We start with a partial order (due to parallelism) but we linearise it during the incremenal graph traversal. Ideally we would preserve the parallelism structure but it gets completely erased since Memo.t is essentially just Fiber.t and the latter isn't (fully) defunctionalised. If we did have a way to figure out that a certain slice of dependencies in Deps.t is coming from, say, Memo.parallel_map, then we could traverse these deps in parallel, without risking to recompute things unnecessarily.

@rgrinberg
Copy link
Member

We start with a partial order (due to parallelism) but we linearise it during the incremenal graph traversal. Ideally we would preserve the parallelism structure but it gets completely erased since Memo.t is essentially just Fiber.t and the latter isn't (fully) defunctionalised

That's an example of when the problem is most acute, but the problem exists even without any concurrency in the memoized function.

. If we did have a way to figure out that a certain slice of dependencies in Deps.t is coming from, say, Memo.parallel_map, then we could traverse these deps in parallel, without risking to recompute things unnecessarily.

I think there's always a risk of computing unnecessarily.

To rephrase the problem, when we check for when a node is invalid, we can optimize for two different properties:

  • Minimizing the number of computations required. This is roughly the number of user code we have to evaluate.
  • Maximizing the number of nodes we can check concurrently.

It's of course impossible to satisfy both properties in the general case. However, our invalidation algorithm is suboptimal in both of these criteria. Our current algorithm will traverse all nodes one by one while evaluating them sequentially if necessary. So it's both sequential, and it evaluates unnecessary nodes.

I would propose the following improvement:

  1. Do a best effort traversal of the graph to see if we can find an out of date node without doing any evaluation.
  2. If the step above terminates without finding out of date nodes, and we still have unevaluated nodes to check, we now evaluate all nodes concurrently and then check if any of them are out of date.

IMO, the scheme proposed would be better than our current algorithm although it would still cause unnecessary computation in the 2nd step. I would suggest that we had a cancellation mechanism for evaluating memoized nodes to mitigate all the unnecessary computation.

@rgrinberg
Copy link
Member

Actually, scratch the above. The issue is indeed only relevant to concurrent computation. What I didn't realize is that when we're doing evaluation to see if the node is out of date, all the nodes that we're going to evaluate will be needed to bring it up to date regardless. So there's really no work wasted.

Indeed the real issue is that our dependency list for nodes has been flattened so we're unable to traverse it in parallel where possible.

rgrinberg added a commit that referenced this issue Mar 8, 2023
Reproduces the loss of concurrency observed in #5549 in a unit test

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>

<!-- ps-id: 0f232962-2ad2-4f31-8ff7-2f7575e01e7f -->
@rgrinberg
Copy link
Member

#7251 attempts to reproduce this

rgrinberg added a commit that referenced this issue Mar 8, 2023
Reproduces the loss of concurrency observed in #5549 in a unit test

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>

<!-- ps-id: 0f232962-2ad2-4f31-8ff7-2f7575e01e7f -->
rgrinberg added a commit that referenced this issue Mar 9, 2023
Reproduces the loss of concurrency observed in #5549 in a unit test

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>

<!-- ps-id: 0f232962-2ad2-4f31-8ff7-2f7575e01e7f -->
rgrinberg added a commit that referenced this issue Mar 9, 2023
Reproduces the loss of concurrency observed in #5549 in a unit test

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
@snowleopard
Copy link
Collaborator

Indeed the real issue is that our dependency list for nodes has been flattened so we're unable to traverse it in parallel where possible.

Yep. One approach that I saw used in practice is switching from essentially a dep array to dep array array to represent node dependencies, where inner arrays are traversed concurrently, and the outer array is traversed sequentially. That loses some precision in some cases but can be a pretty good approximation.

Of course, we still need to get those inner batches from somewhere, and that seems to be the tricky part.

@rgrinberg
Copy link
Member

What about speculatively executing the cut off nodes in parallel and cancelling remaining computations once we find that one of them changed according to cutoff? That's not always ideal, but it seems like a good trade off if you have a lot of concurrent resources. If it's not always desirable, we could allow it specifically for functions where we know it would benefit like build_file with something like Speculative_cutoff of 'a -> 'a -> bool

@snowleopard
Copy link
Collaborator

I think we should do our best to avoid executing unnecessary computations.

Memo guarantees to do no unnecessary computations, and in my opinion it's one of its strongest features when comparing it to other incremental computation libraries.

Surely, preserving the shape of computations (concurrent vs sequential) is a better approach in the long term, since it unlocks a bunch of other improvements (particularly around cutoffs).

@snowleopard
Copy link
Collaborator

snowleopard commented Mar 9, 2023

To address this specific issue, could you just try removing some of the cutoffs in Dune rules?

Some cutoffs are more useful than others, and it might just happen that it's the useless ones that are causing the problem. One reason I'm thinking this is that we don't appear to be that affected by this internally, so perhaps our cutoff structure is just better.

@jchavarri
Copy link
Collaborator

I created a synthetic project + benchmark to reproduce the issue using dune watch, and integrated it with ci:

Once I had some reference measurement, I modified Memo following some sample code that @rgrinberg shared, in order to see how much parallelizing the calculations in Memo.Deps.changed_or_not would affect the results. The patch can be seen in jchavarri#10.

The results are shown in the chart below. The dune watch rebuild takes ~134.3s when using current main code, and 94.5s when using the patch. This is for GitHub CI nodes that have 2 cores, one would expect the result to be more dramatic for devices with more cores. In the 256 cores device I use daily I can see 10x gains.

The benchmark table pasted below can be found at https://jchavarri.github.io/dune/dev/bench/.

imagen

I hope this helps navigating the issue, and tracking the performance impact of any solutions to it. Please let me know if anything can be adapted or extended to be more informative.

@jchavarri
Copy link
Collaborator

To address this specific issue, could you just try removing some of the cutoffs in Dune rules?

@snowleopard I tried removing cutoffs in Build_system as well as other modules like Source_tree. The only one that seemed to improve performance in the synthetic benchmark case is the cutoff in Build_system.build_file memoized function. In the device I am testing, running that benchmark takes ~20s using dune from main vs ~3s without this cutoff.

The cutoff function of build-file checks two things, digest and target kind:

let cutoff = Tuple.T2.equal Digest.equal target_kind_equal in

The part that makes the cutoff function turns false is the digest, which I guess is not surprising 😄

Let me know if I can help testing any other things.

@snowleopard
Copy link
Collaborator

Cool, that's interesting. My understanding is that the current benchmark only examines one scenario, where we change a file in a way that makes the resulting digest change. How about also adding a scenario where after a change is made, the result of action execution remains the same? This is the scenario that benefits from the existence of the cutoff. It would be interesting to see if this benefit is significant (and we'll presumably lose it when dropping the cutoff).

In light of the previous discussion about concurrency: build_file is called from build_dep and the latter from

  let build_deps deps = Dep.Map.parallel_map deps ~f:(fun dep () -> build_dep dep)

I'll looking into making it possible to preserve concurrency in such cases during incremental Memo updates.

@jchavarri
Copy link
Collaborator

jchavarri commented Mar 16, 2023

How about also adding a scenario where after a change is made, the result of action execution remains the same? This is the scenario that benefits from the existence of the cutoff.

Sorry, I am not sure I understand how I can model an example of the given scenario. Would you maybe have a specific example in mind that I could test?

As I was not able to find an example for the above, I took some measurements on how many times the function build_file_impl (the one covered by the "build-file" memo) is called in both cases, with and without cutoff function.

Here are the results, still for the synth benchmark shared in #7255, the three steps happen all during a single dune watch run, and consist on initial successful build, modify m_1_1_1_1.ml and mli to lead them to error state, then modify them back so build succeeds:

With cutoff

  • initial successful build state -> 8020 calls
  • error state -> 1608
  • second successful build state -> 1642

Without cutoff

  • initial successful build state -> 8020 calls
  • error state -> 1624
  • second successful build state -> 1658

So in this example, the number of times the cutoff allows to memoize calls to build_file_impl is then 1658 + 1624 - (1642 + 1608) = 32. This number surprised me, as I thought that because the build_file memoization operates over paths, more calls could be memoized between runs thanks to the cutoffs.

I also measured the time taken by each call to this function, it goes from 0.1 to ~1.3ms, this time increases (as expected) when parallelization is at its best, most probably because each thread has to wait for I/O or other bottlenecks to proceed. I understand this time depends mainly on what actual rules the example is running (in this case, I assume calls to ocamlc and ocamlopt).

@rgrinberg
Copy link
Member

Would it make sense to make the cutoff configurable? I would say that this scenario:

How about also adding a scenario where after a change is made, the result of action execution remains the same

Is not all that relevant for most projects. I can't think of many compilation commands that can be changed without the output changing in a project. Re-ordering command line flags is one thing that comes to mind. But it hardly seems an important case.

@snowleopard
Copy link
Collaborator

snowleopard commented Mar 22, 2023

Sorry, I am not sure I understand how I can model an example of the given scenario. Would you maybe have a specific example in mind that I could test?

@jchavarri I'm not sure it fits well into your synthetic benchmark, but a typical example would be changing the source of a generator in a way that doesn't change the generated file (say, just adding a comment to the generator). If many rules depend on this generated file, then the cutoff on build_file prevents the corresponding OCaml code from being re-evaluated.

It's worth noting that even if we remove the cutoff from build_file, Dune's action execution engine would still skip executing actions whose inputs were rebuilt but didn't change: this property is called the "early cutoff" and it's very important in practice. The cutoff on build_file therefore has nothing to do with Dune's early cutoff property. It only impacts how much OCaml code will be re-executed, not how many external actions will be rerun.

Is not all that relevant for most projects.

@rgrinberg If you refer to the cutoff on buid_file then I agree it's unclear how useful it is for most projects. However, if you meant the early cutoff property, then I disagree: it is very important in practice, and dropping it would be a major regression.

Would it make sense to make the cutoff configurable?

Are you talking about having an option to essentially disable all Memo cutoffs in Dune? That might be interesting. If you are talking about having an option to disable just the cutoff on build_file, that seems a little dubious to me (maybe just as a temporary kludge?). This cutoff is either useful or not: we should understand which it is, and act accordingly. Even better, we should fix Memo so that cutoffs are predictably cheap, as they should be, and which is evidently not currently the case, as demonstrated by this ticket. I'm currently on vacation but I'm happy to take a look at fixing this problem in April.

@rgrinberg
Copy link
Member

I meant just making the cutoff on build_file configurable. With the default being off as long as we're still working on this problem. This would buy us time to address the issue without rushing, while users are able to enjoy the benefits of watch mode at full speed at the cost of some unneeded recompilation in some edge cases.

@snowleopard
Copy link
Collaborator

OK, I guess you are right. Now that we have support for cheap experimental configuration settings, it makes sense to use it.

I would probably give it a slightly more general name: disable-slow-cutoffs or something like this, so that we could later include some other cutoffs in case we discover new pathological scenarios that aren't addressed just by disabling just the cutoff on build_file.

@emillon
Copy link
Collaborator

emillon commented Dec 13, 2023

based on a discussion with @snowleopard, I think that this has been fixed at JS and that this is going to be contributed soon.

@snowleopard
Copy link
Collaborator

Yes, sorry, it's on my stack to upstream the corresponding changes. The fix isn't entirely free (we now need to record series-parallel dependency traces, which increases Memo's overheads) but the performance boost for incremental builds is worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants