-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace files should also contain dune internal computations #3862
Comments
We discussed this in a meeting and here's a quick recap: The source of slow down you may experience may come from different 3 sources:
It's possible for ass to add tracing for all of these operations with caveats on the 3rd item. While dune rules are evaluated per directory, much computation is shared between the rules. In other words, while it may seem as if loading rules for dir Additionally, it may be easier to diagnose your problem with existing profiling tools such as memtrace. @aalekseyev will follow up on what precisely he'd like to see to diagnose this. @aalekseyev shall we still add tracing for 1. and 2.? It seems like something that we can time reliably. |
If the sum of the 3 points listed is basically how long it takes to run Regarding 3, maybe it is possible to see the duration per dune rule rather than per directory? And maybe have a visual representation of dependencies between rules. |
I don't want to rush to adding tracing that may be confusing and maybe not really solve the problem. @Khady, so we don't have great user-facing ways to profile dune internal computations now. I think the best way to diagnose a Dune internal performance issue is to use general OCaml profiling tools, one of I know we have successfully profiled dune with spacetime in the past by following this guide: https://blog.janestreet.com/a-brief-trip-through-spacetime/ (may be slightly outdated). If the problem is in (1) and (2) then I expect either of those methods will clearly point the finger at the relevant phase, but I doubt that's the case. For (3) the result may be tricky to interpret, but I don't know how to make it easier with tracing. If you end up using any of those methods and need help interpreting the findings, I'm happy to try and help. |
So I tried to use perf and memtrace to get more information about this problem. Without too much success so far. First, I haven't been able to compile dune with memtrace in it. I think I am missing something in the bootstrap process. Here is the diff and the compilation error I get. That's based on the 2.7.1 tag diff --git a/bin/dune b/bin/dune
index 0bccec73f..10a7f0ea3 100644
--- a/bin/dune
+++ b/bin/dune
@@ -1,7 +1,8 @@
(executable
(name main)
- (libraries memo dune_lang fiber stdune unix cache_daemon cache dune_rules
- dune_engine dune_util cmdliner threads.posix build_info dune_csexp)
+ (libraries memtrace memo dune_lang fiber stdune unix cache_daemon cache
+ dune_rules dune_engine dune_util cmdliner threads.posix build_info
+ dune_csexp)
(bootstrap_info bootstrap-info))
(rule
diff --git a/bin/main.ml b/bin/main.ml
index 68159d2f1..6e30720a9 100644
--- a/bin/main.ml
+++ b/bin/main.ml
@@ -257,6 +257,7 @@ let default =
] )
let () =
+ Memtrace.trace_if_requested ();
Colors.setup_err_formatter_colors ();
try
match Term.eval_choice default all ~catch:false with
Regarding perf, I went with this command to
I don't think the problem is with the parsing or interpretation of the rules given that |
I don't understand the dune bootstrap process well enough to know how to fix that, but you can try bootstrapping without memtrace and then building with memtrace:
That just worked for me (at least I was able to build dune). Looking at perf results, it seems clear that Both of these scale with the number of dependency edges between rules (the |
@rgrinberg, I don't know if your "deforestation" efforts are related to this issue, but it seems relevant. |
Perhaps there are also some really large files where calculating the digest is taking some time.
No relation. Although I suppose it could help a tiny bit. |
I think in that case Digest.file would probably show up in the stack trace and not Digest.string. I expect the individual files are not a problem because they are cached, whereas the rest of |
By the way, in jenga we have an optimization where you can "group" dependencies by including a digest of a Dep.Trace.t into another Dep.Trace.t. (we call the equivalent type in jenga Proxy_map, for those familiar with the code) We will probably need to have the same optimization in Dune sooner or later. |
That's true. This is also supported by
I'm not familiar with the jenga code base, but I'm trying to think of ways how this may occur in dune. Something like
Definitely. |
I wouldn't worry about this. We've never seen this be a problem in practice. The examples you give are probably not really examples, though, because globs need to be tracked in full regardless. The examples go along the lines of:
anyway, we should probably not derail this thread too much. |
The commit tries to solve a performance issue. But I don't think that it provides what was asked in this issue. And given the troubles I have to go through to guess what is taking time in our builds, I still believe that having a trace of what happens in dune would be valuable. Could this issue be re-opened? |
Indeed, let's keep it open. |
So I managed to use memtrace thanks to @aalekseyev instructions. I don't know enough of dune internals to be able to understand what is going on exactly. Then I ran with perf again but using dwarf instead of lbr.
and the full report: perf-report.txt I could share the perf trace but it is over 300mb. So I created a smaller one with a frequency of 120. Findings are about the same. |
Thanks for the data points @Khady. Some good news and bad news. The bad news is that I think some of the slow down in your case is essential. You likely have a non-trivial amount of modules, libraries, packages, and a clean build has to do a minimum amount of work to process all this. Now for some good news. There seems to be 3 low hanging fruit that we can fix:
We could fix all of these things, but I doubt this will do more than shave off a couple more seconds. The long term solution is just to reuse the memoized rules between the builds themselves. |
I gathered some numbers to give some context. The project consist of 1200 ml(i) files (about 90k lines total), 500 atd, 150 custom files. Each of those 150 custom files are turned into 1 atd + 2 ml files. Those files are divided into 15 libs. At the end it represents 3600 ml(i) files to compile. The project depends (directly) on about 40 external (opam) libs. To generate While I understand that it is some work to check all that, I feel that there is something wrong. It shouldn't take 6s to run 10k stat on files. The project is not that big. It's definitely smaller than some duniverse out there. And I don't know what is the code layout at janestreet, but if they have 10 million lines of code, I wonder how dune is doing there. I've the impression that dune is using only one thread to check the rules. That's a real limitation. I can get a server with 8/40/256 cores, but I'll still face the same issue. The computation of the rules isn't super fast but isn't the main blocker either. It would be nice if dune could provide some stats about a project. For now I have to explore the file system myself and do some guessing. Ideally the number of human written rules in the project (as in rules/stanzas written by a human), the total number of rules (17k in this case?), the number of files in the project, the number of libs, the time taken to compile each lib, I'd even like the time taken for each human written stanzas. Maybe I should open another issue for this feature request? |
This is not where the time is being spent. Before dune can check that your 17k rules are up to date, dune must first generate these rules. Generating these rules requires quite a bit of computation.
This is precisely one of the reasons why dune at its form is not usable at janestreet. Generating the rules from scratch on every run does not scale. This is why we've added the rule memoization framework. We're yet to utilize it to save computation between runs, but it should address this issue.
That is definitely a limitation. Hopefully when multicore is available, we could experiment with making things faster. |
It seems that roughly a third of the allocation samples in the memtrace come from Merlin rules, which is very impressive. I couldn't get perf to show me backtraces, though (in fact I believe the data file doesn't have symbols at all). |
@Khady, you showed that the rules computation is fast, but this result can be misleading because rule computation is lazy: it will only compute the portion of the rules that's needed to answer a given query. For example it won't compute the command lines of most commands, and it probably won't compute the merlin config files, which is what takes a large chunk of the time. |
Would a draft PR on this topic be welcome? I've done a lot of "asynchronous tracing" work recently, and have been building out some tooling for that (with @c-cube); I think it could be well-applied in Dune to help @Khady/@jchavarri narrow down some of these issues on our codebase. (= |
Sure, drafts are fine. I don't know what you have in mind, but it's worth reminding that we are unlikely to add new dependencies just for this. |
Desired Behavior
When building a target with the
--trace-file
option, dune currently logs the time taken when running an external process (nproc, ocamldep, ocamlopt, ...). But it doesn't contain any detail about what's happening inside dune. For example, it doesn't tell us how long it takes to scan all the dependencies in a project (in a dir? unsure what the best granularity would be), how long it needs to compute the rules, ...We end up with
dune build ./some_target.bc --trace-file=trace.json
calls doing nothing but taking close to 8 seconds without any detail of what is causing such delay. Or a long delay at the start and the middle of a build without knowing what is going on.Also it would be great to know how many cores are assigned to each task.
The text was updated successfully, but these errors were encountered: