Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Holmake: Exception: Subscript when building HOL with -j12 or greater (but not -j11 or lower) #1158

Closed
someplaceguy opened this issue Oct 25, 2023 · 5 comments

Comments

@someplaceguy
Copy link
Contributor

someplaceguy commented Oct 25, 2023

When building HOL from the master branch (commit 4b009e0), it's failing for me with this error:

hol4> Building directory src/AI/proof_search [25 Oct, 00:24:10]
hol4> Uploading files to /nix/store/40l0ipqfspmyl9340cvys7r1yv5ajcg3-hol4-20231025/src/sigobj
hol4> Building directory src/AI/machine_learning [25 Oct, 00:24:10]
hol4> Uploading files to /nix/store/40l0ipqfspmyl9340cvys7r1yv5ajcg3-hol4-20231025/src/sigobj
hol4> Building directory src/tactictoe/src [25 Oct, 00:24:10]
hol4> Uploading files to /nix/store/40l0ipqfspmyl9340cvys7r1yv5ajcg3-hol4-20231025/src/sigobj
hol4> Building directory src/tactictoe/selftest [25 Oct, 00:24:10]
hol4> Uploading files to /nix/store/40l0ipqfspmyl9340cvys7r1yv5ajcg3-hol4-20231025/src/sigobj
hol4> Building directory src/holyhammer [25 Oct, 00:24:10]
hol4> Uploading files to /nix/store/40l0ipqfspmyl9340cvys7r1yv5ajcg3-hol4-20231025/src/sigobj
hol4> Building directory src/parallel_builds/core [25 Oct, 00:24:10]
hol4> Scanned 31 directories/update0mKte/automation
hol4> Finished $(HOLDIR)/src/HolQbf                                         (0.000s)
hol4> Holmake: Exception: Subscript.   | fcp  | ..   | ..   | ..   | ..   | ..   | [0K
hol4> *** FATAL: Build failed in directory /nix/store/40l0ipqfspmyl9340cvys7r1yv5ajcg3-hol4-20231025/src/src/parallel_builds/core (exited with code 1)
error: build of '/nix/store/hm99v7kijccs09l14kafz266y5amhaj8-hol4-20231025.drv' on 'ssh-ng://nixdistributed@homeserv' failed: builder for '/nix/store/hm99v7kijccs09l14kafz266y5amhaj8-hol4-20231025.drv' failed with exit code 1;

I noticed that this always happened when I built HOL with the command bin/build --expk -j32, but if I built HOL with bin/build --expk, then it succeeded. I then also noticed that it succeeded with -j4.

So after a few tries, I determined that it always seems to fail with that error when using -j12 or greater, and it always seems to succeed with -j11 or lower.

This appears to be 100% reproducible so far.

@mn200
Copy link
Member

mn200 commented Oct 25, 2023

Fascinating. Does it reproduce with --stdknl instead of --expk? (I'll be super-surprised if it doesn't.)

I guess there's an error still lurking in the way Holmake prints its little display of running processes to the screen (see an earlier one at b51b4f1). If you run with a smaller -j number but a narrower terminal screen can you also reproduce?

@mn200
Copy link
Member

mn200 commented Oct 25, 2023

I can confirm that it reproduces on --stdknl. It will also fail faster if you build with -t because the degree of parallelisation possible in tools/Holmake/tests/parallel_tests is enough to prompt the problem.

@mn200
Copy link
Member

mn200 commented Oct 25, 2023

The problem is definitely with the display: if you run

build -t --stdknl -j12 | tee logit

the pretty displaying doesn't happen because it doesn't think it's writing to a terminal and the build goes fine.

@someplaceguy
Copy link
Contributor Author

someplaceguy commented Oct 25, 2023

I tried to build HOL with a terminal 1/3rd as wide (with -j12) and it also failed, but I'm not sure if that's actually doing anything because there are many layers of abstraction in-between the build and the actual terminal.

To be more precise, I'm running the build inside of a local tmux session, and furthermore the build is going through a local daemon (nix-daemon), which connects over ssh to another nix-daemon on a remote machine, which actually does the build inside a sandbox and captures the output log of the build (and in this specific case, it also sends it in real-time to my terminal).

So I heavily suspect that my actual terminal size has no effect on the build behavior and I would be slightly disappointed if it did (since it would interfere with all the efforts that NixOS has made to make builds as reproducible as possible, even though it still has countless limitations).

EDIT: sorry, I misread your question. I will try reproducing with a narrower terminal and with -j11.

EDIT 2: I can now confirm that in my setup, it succeeds with a terminal 1/3rd as wide and with -j11. In other words, the terminal width doesn't seem to have a difference in my environment.

But as I mentioned before, that's probably to be expected given my setup. So the bug might still be related to the terminal width, it's just that it seems that I have no easy way of changing it due to how the build is being done.

@mn200
Copy link
Member

mn200 commented Oct 25, 2023

Holmake explicitly makes shell calls to stty size and tput cols to try to figure out its "terminal" width (after first determining if the output stream is a tty at all). [I appreciate that this is gross.]

If you can resize your terminal so as to affect the numbers it gets out of those calls (it tries stty first and then falls back to tput), you might reproduce with a smaller -j. Of course, the error may well be quite dependent on things like width % j too, so simply making it narrower may mask the bug we're seeing at 80 columns.

It's still clearly a bug, and I think/hope it should be easy to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants