-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ocaml fails to build when too many cores are used #10235
Comments
I'm not quickly seeing it, although the machine is still looping. Could you give some more environment details - platform, etc.? |
Also exactly which version of |
Sorry, here are more information:
Is there anything else I can provide ? |
It's interesting that the two errors seem filesystem-related. Which filesystem are you using? (I am not saying it must be a filesystem error, I can't think of a scenario where make would see files that are not visible to its own subprocesses, but I have never thought about this at all. The most likely cause for the error is incorrect dependencies in the Make build system.) |
I'm currently using https://github.com/facebookexperimental/eden |
Let me correct my last message:
I'll try to replicate in full |
(Today I learned that Facebook uses a non-decentralized fork of Mercurial internally, whose monorepo-optimized server component is called Mononoke.) |
FYI I can't repro the failure after moving the OPAMROOT to a |
With the default So the most likely culprit, I think, is the |
I'll try to ping them on this. Not sure they will care since ocaml is not "an officially supported language" at FB, but it is worth a try |
I'm no make expert; the two sort of parallel-make failures I have observed with the OCaml build are as follows:
In your example failure log, the two relevant actions seem to be the following: # 399
./boot/ocamlrun ./boot/ocamlc [...] -c asmcomp/printlinear.mli
# 410
./boot/ocamlrun ./boot/ocamlc [...] -c asmcomp/printlinear.ml
# Error
File "[...]/.opam-switch/build/ocaml-variants.4.11.1+fp/asmcomp/printlinear.ml", line 1:
Error: Could not find the .cmi file for interface
[...]/.opam-switch/build/ocaml-variants.4.11.1+fp/asmcomp/printlinear.mli There is a dependency from |
I asked to the edenfs guys if that seemed plausible, will forward answer (if possible :D) |
If this is a fielsystem issue, you may be able to reproduce it without OCaml:
|
For what is worth: OCaml's CI includes a parallel build test, which does |
Didn't manage to replicate the issue like this.. Will do some other try, but I think we are now fairly convinced the issue is FS related. I've contacted EdenFS devs and made a report there. Let's close this one :) Thanks for the assist in investigating the issue. |
Enable parallel building for ocaml-4.08 and above. tested as: $ nix build -f. ocaml-ng.ocamlPackages_{4_{00_1,01_0,02,03,04,05,06,07,08,09,10,11,12,13},latest}.ocaml --keep-going ocaml build system supports parallel building, but but for multiple top-level targets at the same time as it usually spawns subprocess $(MAKE) that occasionally conflict with one another. To work it around we use tiny Makefile with a single rule that calls top-level targets sequentially as makefile calls: nixpkgs_world_bootstrap_world_opt: $(MAKE) world $(MAKE) bootstrap $(MAKE) world.opt On a 16-core machine ocaml-4.12 build speeds up from 6m55s to 1m35s. Releases 4_00_1, 4_01_0, 4_04 and 4_05 still have some race in them. Thus this change enables parallel builds only for ocaml-4.06 and above. Adapted from NixOS#142723 upstreams's CI tests the parallel makefile: ocaml/ocaml#10235 (comment) The limit was chosen to be 4.08 because it was released in 2019, not too long before the above link.
Hi !
I'm witnessing a lot of build failures of ocaml on machines with lots of cores (resp 24 and 80). Here is repro that fails most of the time, but not always, and not always at the same step.
The two main errors I'm witnessing are:
Error: Could not find the .cmi file for interface
but the .cmi file in question is there, andError: Unbound module
but the module is there.The errors are really random.
You can find more context in the original report (I first thought it was opam related) at ocaml/opam#4552
Please find an example of .env file at https://pastebin.com/KtKmfpSP and .out file at https://pastebin.com/5YHy5Rdp from my last failure
The text was updated successfully, but these errors were encountered: