New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot build it with lots of cores. #438
Comments
BTW, I did spend hours reading through the other issues and pull requests to see if there was any clues. binfmt_misc was mentioned often, which might have been an issue since OpenSim is written in C#, which means I have Mono installed, which adds cli in binfmt_misc. This was always mentioned -
My system has the binfmt_misc kernel module running, but there's no /proc/sys. My kernel is 5.16.0-0.bpo.4-amd64 #1 SMP PREEMPT Debian 5.16.12-1~bpo11+1. I used "update-binfmts --disable" to disable all the binfmt entries, no difference. |
System Five Hundred is an abomination and therefore not a requirement. The issue you're likely encountering is the build system imposes quotas in order to prevent something like a fork() bomb in the Python unit tests from nuking your system. cosmopolitan/tool/build/compile.c Lines 745 to 749 in 206f073
The unfortunate thing about Could you please try something for me? diff --git a/build/definitions.mk b/build/definitions.mk
index 6274df0b3..5965d88aa 100644
--- a/build/definitions.mk
+++ b/build/definitions.mk
@@ -63,7 +63,7 @@ PKG = build/bootstrap/package.com
MKDEPS = build/bootstrap/mkdeps.com
ZIPOBJ = build/bootstrap/zipobj.com
MKDIR = build/bootstrap/mkdir.com -p
-COMPILE = build/bootstrap/compile.com -V9 $(QUOTA)
+COMPILE = build/bootstrap/compile.com -V9 -P2048 $(QUOTA)
COMMA := ,
PWD := $(shell build/bootstrap/pwd.com) If that solves the issue for you, then I'll commit it. |
I'm not choosing -j128, you can see in my original message that I'm doing -j1, it's your build system that's saying "make MODE= -j128 o//depend". I had tried other -j numbers, including the -j8 your instructions say to use, same problem. From what I saw in the source code, it's counting my CPUs to get that 128. I'm just about to start cooking dinner, I'll try that patch after I've eaten. |
It's possible this is related to the |
It might also help if you post your |
With your patch on the release tarball version, didn't make any difference.
The number of times I see the ".... cc1': vfork: Resource temporarily unavailable" message is always the number I put after -j. So -j1 shows the error once, -j2 shows it twice. I'm guessing it's using up all cores THEN trying to use as many as specified in the -j. With no -jX I see the error once. Makes me wish I could do -j0, but "make: the '-j' option requires a positive integer argument". DOH! If I could compile it from source without using your pre-built tools, I could change the threads = GetCpuCount(); line in mkdeps.c to halve the result. That would be a quick test of your new theory. |
Thought I could be clever with that first error in the output -
Nope, still tries to do -j128. |
Soooo, maybe change mkdep to "the user said -j8 for a reason, let's not ignore that, and just do what they said". |
I'm reminded of the MONO build tool. You can tell it to use more cores, but it takes longer coz it pauses a long time while it does .. something ... with 128 cores. I'm guessing one core at a time, since the more cores you tell it to use, the longer that initial pause is. lol |
Quick update. This issue may be blocked on #430 over the next few days, due to a lack of configurability in the
Here's how we can test the theory. I've made the following change locally: diff --git a/tool/build/mkdeps.c b/tool/build/mkdeps.c
index eefb4dfa3..df798b04a 100644
--- a/tool/build/mkdeps.c
+++ b/tool/build/mkdeps.c
@@ -442,7 +442,7 @@ int main(int argc, char *argv[]) {
char path[PATH_MAX];
if (argc == 2 && !strcmp(argv[1], "-n")) exit(0);
GetOpts(argc, argv);
- threads = GetCpuCount();
+ threads = 1;
tls = calloc(threads, sizeof(*tls));
stack = calloc(threads, sizeof(*stack));
bouts = calloc(threads, sizeof(*bouts)); There's a prebuilt binary available with the above change at https://justine.lol/cosmopolitan/mkdeps-zero-threads.com which you can try. This will only spawn a single thread. Let me know if that fixes things for you. |
For the record the edit: I also always see |
@onefang you should be able to build corrected binaries if you do it as another user on the system that doesn't have more than 1024 processes running (given the 128 you would probably want to to be below 800 just in case). Alternately, if you can get the number of processes running as your user well under 1024 (e.g. by closing firefox as I did) then you can build binaries with the fix for the quota from #430. On a related note: it would be nice if there was an alternate way to bootstrap the |
The recommended approach would be to spin up a VM and build them there, or as a different user as you suggested. The build bootstrap folder is intended only for commands that have a "chicken and egg problem" because the tools are required to compile the tools themselves. The only tool where that isn't the case is |
I've just doubled the default process quota to 2048 and rebuilt compile.com. I'm hoping that should be sufficient for your threadrippers. I'd bump it up to 4096 except people build this repo on low power laptops too, so I'm trying to walk a fine line here. The good news, is that thanks to the bug fix in #430 your build process quota is now configurable using the |
I suspect that there is the remaining issue of QUOTA not inheriting via The affected files are:
|
Did you try locally editing |
Yes, putting |
I did a git pull. Then make -j8 -O which took 1 minute 1 seconds. Then make -j128 -O, 26 seconds. Then make -O, which 6 minutes 43 seconds later spit out the "make MODE= -j128 o//third_party/python/Lib/test/test_os.py.runs` exited with 1" message after three errors from 'File "/zip/.python/test/test_os.py", line 1140, in setUp' and one from 'File "/zip/.python/test/test_os.py", line 2870, in test_path_t_converter'. I started from fresh copies of the git repo each time. Looks like it worked. Thousands of lines of output, but no red error messages. I'll read through the test output to double check none failed, and report back later. I was very amused by -
Damn, I only have 256.0G of memory. What did you say about low power laptops? 256.0G is all this motherboard will hold, and I'm on a pension, couldn't afford that much RAM even if I could find a box to put it all in. lol Thanks for helping out. Now I get to play with it, see if I can bend it to my will. Muahahaha! BTW, I'm very interested in getting LuaJIT working with this, so I'm keeping an eye on the effort to do that. |
@ahgamut #272 was already open in a tab and read before I opened this ticket, that's what I meant by "keeping an eye on the effort". So now that I can actually compile cosmo and redbean myself, I can try the things mentioned in #272. B-) I already build LuaJIT into an existing project of mine that I'm looking at using redbean for. Currently it uses FCGI to work with a variety of web servers. Thanks for your efforts to get that working. |
So, reviewing the output of make -j128 -O. I hope I provided enough context. TL,DR: no errors, but lots of skips that are questionable, including lots of "not enough memory", some expecting me to have billions of GB of RAM, some thinking 256 GB isn't enough for a test that needs 32GB or less. Some saying I don't have POSIX or Linux. This is a Linux desktop, it should be POSIX compliant. Near the end of the build section, it's only a warning, so probably not important.
The bit that amused me at the very end of the build section -
A couple of "... skipped 'TODO: find out why this fails'" which I'll ignore. "this test sucks" lol
These surprised me, a bunch of POSIX things skipped? I didn't dig deeper, but some of those look like things an ordinary Debian Linux should have, just off the top of my head. The skipped coz not 32 bit things are no surprise, this is a 64 bit Ryzen Threadripper.
Some more surprises. Skipped coz my Linux system isn't Linux?
A bunch of "skipped 'requires APE debug build'" which is fine, this isn't a debug build. Happy to do that if you want. Are these expected?
For these I'm not familar with the things it says it needs but don't have.
This one has a few skipped coz not Windoze, so fair enough, but the rest I should have with ext4 file system.
OK, this memory test is even more confused than the others. lol
Includes "only Windos" and "only Mac OS X" and "only Android". Wait, this runs on Android? Though you didn't have ARM support? I'd love to run this on Android.
This worried me, I'm in Australia/Brisbane. My system includes Australia/Lord_Howe, but from memory you have your own timezone stuff included in Cosmo's zip file.
More CPU and memory confusion.
One of those looks like a Windoze only test, and I do indeed lack IPv6 on my desktop.
A bunch of 32 bit build skips, an yet more memory skips that shouldn't.
Some requires build with -DEXTRA_FUNCTIONALITY which I guess the default build doesn't have.
|
The real test - o/tool/net/redbean-demo.com works and all the pages are OK. B-) |
There were several Python unit tests we disabled which required 4GB of RAM which seemed largely intended to test the transition away from the 32-bit era. Once I confirmed they work, I disabled them, because doing things like creating a bunch of 4gb files with gzip and bz2 every time I edit a header in
I love POSIX. The support of Austin Group in helping us change the shell requirements w.r.t. binary is one of the things that made this project possible. We're working on getting there when it comes to functions like
I assume you're running
No IPv6 yet. Even though IPv6 is something I personally disagree with (note: I am actually a Hurricane Electric certified IPv6 "Sage" funnily enough) I believe IPv6 is something we should support eventually. One of the soft blockers is that it gets hairy on Windows, where, for example, poll() cannot be used in a way that mixes sockets from different address families.
That looks wrong. But could by a red herring, since as I mentioned, we don't want a bunch of tests with four gigabytes of disk i/o churn running every time we run
We've got someone looking into that on the Discord channel. https://discord.gg/mvkhxRaW
That probably needs to be fixed. Command line should have POSIX semantics. But it depends on what they mean by that. For example, we currently have
The timezone database is 3.3mb in size. We only embed a subset of it into each binary that calls
Yeah Please keep in mind that while this whole repository builds/tests on your machine in 26 seconds, if you were to run the official Python unit test suite, it would likely take 15 minutes on your machine. The Python devs haven't figured out how to run their own builds/tests in parallel yet. All we had to do to turn 15 minutes into 26 seconds was disable a small subset of sloppy tests that a huge project like Python accumulated from third party contributors over the years.
Woot. |
Messages in syslog during the build / test is worrying though, or are they to be expected?
|
I've been a professional computer programmer and sysadmin since the late '70s, I have used 100 programming languages in my career (used to call myself Digital Polyglot), so I think I have earned the right to be a language bigot. I hate any programming language whose name begins with the letter P. Python is one of those, so I don't care about Python support, so wont be running any 15 minute Python test suite. |
In order to test that our security features work, we need to violate them. That's why you're seeing things like seccomp warnings in your log. We're simply trying to make sure our sandboxing will work for you as advertised. The same applies to things like trapping division errors. |
How do you feel about X3J11? One of the things on my bucket list is to convert //third_party/chibicc to an ILP64 data model and adding K&R C support. I'm reasonably certain that there's no reason we shouldn't use classic C syntax (which felt much more like a scripting language without the formalities that exist today) when pointers, ints, and longs all have the same width. Modern CPUs make 64-bit types fast. Depending on the quality of the LP64 code that's written, perhaps even faster. If you share my passion on this topic, then it might be something worth working on. |
Also I forgot to mention, the most compelling use case for K&R C is it polyglots with JavaScript https://justine.lol/sectorlisp2/#evaluation |
X3J11? Standards are good, makes things like cross platform support easy. I have no particular passions in that area. I do have passions in reducing bloat and increasing speed. Which is why my favourite languages are C and assembler, and why I prefer LuaJIT over plain Lua. LuaJIT tends to beat all other scripting languages in benchmarks. For web stuff I'm more a "use the old standards, with minimal JavaScript only when needed" kinda guy. But wasm has my interest, I've not actually played with it yet though. |
I agree. Standards are nice. It's also nice to restore the original visions of clarity that necessitated standardization, now that we no longer need to attract the consensus of companies like UNIVAC. We recently ported luajit thanks to @ahgamut and we'll likely consider merging it in here so it can be used by redbean. LuaJIT is something that's frequently requested. |
Cool, definitely want to see LuaJIT officially in redbean. I'll help test that. I'm in weekend mode now though, I should get some non coding stuff done. lol |
I've bumped up the default limit to 4096 processes. Enjoy! |
Cool, definitely want to see LuaJIT officially in redbean. I'll help test that. I'm in weekend mode now though, I should get some non coding stuff done. lol Edit: Oops, forgot to actually post this. lol |
Sooo, just git pulled and 'make -j128 -O' got errors -
error:tool/build/lib/elfwriter.c:168: check failed: -1 != -1 errno= 2
make: *** [build/rules.mk:77: o//third_party/python/Lib/venv/scripts/nt/Activate.ps1.zip.o] Error 68
make: *** [build/rules.mk:77: o//third_party/python/Lib/venv/scripts/nt/activate.bat.zip.o] Error 68
make: *** [build/rules.mk:77: o//third_party/python/Lib/venv/scripts/nt/deactivate.bat.zip.o] Error 68
make: *** [build/rules.mk:77: o//third_party/python/Lib/venv/scripts/posix/activate.csh.zip.o] Error 68
make: *** [build/rules.mk:77: o//third_party/python/Lib/venv/scripts/posix/activate.fish.zip.o] Error 68 |
'make -j128 -O' has no errors on todays git pull. B-) Since this is officially closed, and it's working for me, I'll stop bothering this issue. |
I've been trying and failing to build cosmopolitan under Devuan Chimaera, which is a Debian based Linux distribution without systemd. I do notice that you keep mentioning GNU/systemd instead of Linux, I was thinking that was some sort of joke, but maybe my problem is the lack of systemd?
I have tried git head, and the source tarball, same result.
The number of "vfork: Resource temporarily unavailable" sections I get depends on how many -jx I use. Despite giving specific -jx, there's always the "make MODE= -j128". Yes I have a Ryzen Threadripper with 64 cores, so CPU counting ends up at 128. Do I have too many cores, and something is tripping up on that? It seems to be running out of cores. lol
I notice there's no way of building completely from source, otherwise I would hack away at the source code until it worked, but the build system tries to use it's own pre compiled build tools that come with the source code, and it's this build system binary blob that is failing. So I can't fix this myself with out lots of work essentially rebuilding things from the ground up one step at a time.
Redbean worked fine, which is the thing that attracted me to cosmo in the first place. I want to try redbean out as the basis for my OpenSim rewrite (3D virtual world metaverse that has been around for over a decade).
The text was updated successfully, but these errors were encountered: