Setting minimum number of CPU cores for a build #3141

LecrisUT · 2024-02-13T17:48:23Z

Are there any ways to configure the copr project to only run on builders with at least x number of CPU cores? I've got a package where the difference is 1h on 4 cores and 3h+ on 2 cores due to oversubscription. This minimum requirement could either be set either on a project level or deduced from the .spec file.

The text was updated successfully, but these errors were encountered:

praiskup · 2024-02-14T13:18:09Z

Are you asking for powerful builders, or do you want to find some build-time macro to limit the number of processes? (builds should not overcommit, seems like a packaging issue).

LecrisUT · 2024-02-14T13:22:55Z

These are issues at %check time, where mpi + omp is tested and it should have at least 4 cores in order to not oversubscribe

xsuchy · 2024-02-14T13:39:45Z

I recall that several years ago, I opened a discussion to enhance SPEC files to include minimal hw requirements. Minimal memory. Minimal number of CPUs. Exactly what you are asking for. But it was not welcomed and it was never implemented.
Shame on me as I cannot find where the discussion happened. :(

I would be very skeptical about implementing something that works only in Copr. Please open the issue in RPM https://github.com/rpm-software-management/rpm/issues, and if it becomes part of SPEC format, we can read it and interpret it in the Copr.

LecrisUT · 2024-02-14T13:49:52Z

Long term I agree it should be in the spec format, but in the meantime, it can also be something defined manually for a whole copr project. wdyt?

Discussion on upstream opened: rpm-software-management/rpm#2909

praiskup · 2024-02-14T22:57:27Z

These are issues at %check time, where mpi + omp is tested and it should have at least 4 cores in order to not oversubscribe

Why the build "oversubscribes"? Can't the %check section be fixed to respect rpm --eval %_smp_build_ncpus?

LecrisUT · 2024-02-15T03:06:15Z

For example %ctest will only control the ctest parallelization. The internal parallelization will still survive, which for mpi-openmp is minimum 2*2

praiskup · 2024-02-16T07:49:47Z

But what if we give it 2x more CPU: 4 CPUs, it gets 4*2 overcommit, right? Why this isn't "just" 2x faster?

I mean, this seems wrong. It seems that the package wants to use something like %global _smp_mflags -j$(( RPM_BUILD_NCPUS / 2 )), wdyt?

praiskup · 2024-02-16T07:51:37Z

Don't take me wrong; we can offload your builds onto the powerful builders if you think this is the correct thing to do!

It just feels like every package should be reasonably "buildable" on every machine, even if it is just a 2-core system.

LecrisUT · 2024-02-16T08:01:06Z

Ctest parallelization is orthogonal with the base parallelization, that is to say, if the base parallelization is 4, if you pass it -j 1 all the way to -j 7 it will only spawn one ctest worker, only at -j 8 it will spawn an additional one. But that means it will happily oversubscribe at -j < 4 (spawn one ctest worker, but still use 4x parallelization internally).

Why it isn't just 2x faster is because of the oversubscription where the OS has to take over on each core and the overhead of that stopping and restating tasks creates an overhead especially with mpi where it is not shared memory.

Buildability wise, yes it is buildable on any machine, it's just not testable. To be fair testing-farm can pick up the slack for that (once I figure out how to configure it), but there are certain tests that are not coverable like that, e.g. unit tests with hidden headers.

praiskup · 2024-02-16T08:59:49Z

This seems like it, yes: If the package build oversubscribes the machine that is known to have "just N CPUs", then it seems like a package bug to me. Or perhaps that's the point of the test to oversubscribe the machine? Dunno.

This or that way, I assume it is a hard-to-fix issue in the package. We should go with this. What do you think?

LecrisUT · 2024-02-16T09:15:43Z

It's a limitation of the MPI+OpenMP infrastructure and the requirements for testing.

MPI needs to spawn at least 2 workers, i.e. mpiexec forks 2 tasks of the same executable (/usr/bin/my_bin). Than inside each task, when it reaches an OpenMP routine it forks itself again into at least 2 (probably this is done earlier) with shared memory. Thus it needs 2*2 cores to not oversubscribe.

One issue is this MPI+OpenMP is standard in scientific code and most of HPC, so it is bound to happen/is happening to various packages as well.

But, since these issues are only %check related there are two options:

Migrate the tests to testing-farm
Use powerful builders

For the package I am working on, I can do the former since it only provides regression tests. But having easier access to the latter will come in handy as others encounter this issue. Having the specifications of the normal and powerful builders will also be helpful for the packagers to decide which one to request.

praiskup · 2024-02-17T06:40:01Z

But having easier access to the latter will come in handy as others encounter this issue

It's about telling us where you want to build the packages, copr namespace (owner, project, chroots), and the package name. Copr administrators then can finish the configuration.

praiskup · 2024-04-15T08:02:13Z

Except for powerful builders, there are these additional options:

the AWS workers are now 4CPU, while the VMs on our in-lab hypervisors are only 2CPU ... so for aarch64/x86_64 we can assure that your builds are processed on AWS... let us know
we can have worker labels like 4CPU, 2CPU, and assign appropriate labels to your jobs (note there's currently no logic for ">=", just labeling system - so RFE against the Resalloc project would be needed for >=)
we can bump the workers on hypervisors to have 4CPU ... my feeling is that we can overcommit a bit more (getting 2x more CPU to each worker, decreasing the number of them by ~1/3 from 20 to say 14).
we can have different types of workers on hypervisors, too (a bit more work with configuration)

I'm closing this for inactivity, but, feel free to continue with the request (or discuss).

LecrisUT · 2024-04-15T08:26:35Z

Any of those would be helpful. For now I have decoupled the build and test stage, where I had more control on testing-farm. Still takes ages for both build and test (quite surprised about the former). Right now the scaling is not quite linear (43min on 4 core vs 67min on 2 core) so I let you decide if it makes sense to bump it up or not. Here is the packit configuration that builds it.

praiskup added the easyfix label Feb 21, 2024

praiskup self-assigned this Apr 8, 2024

praiskup closed this as completed Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting minimum number of CPU cores for a build #3141

Setting minimum number of CPU cores for a build #3141

LecrisUT commented Feb 13, 2024

praiskup commented Feb 14, 2024

LecrisUT commented Feb 14, 2024

xsuchy commented Feb 14, 2024

LecrisUT commented Feb 14, 2024 •

edited

Loading

praiskup commented Feb 14, 2024

LecrisUT commented Feb 15, 2024

praiskup commented Feb 16, 2024

praiskup commented Feb 16, 2024

LecrisUT commented Feb 16, 2024 •

edited

Loading

praiskup commented Feb 16, 2024

LecrisUT commented Feb 16, 2024

praiskup commented Feb 17, 2024

praiskup commented Apr 15, 2024

LecrisUT commented Apr 15, 2024

Setting minimum number of CPU cores for a build #3141

Setting minimum number of CPU cores for a build #3141

Comments

LecrisUT commented Feb 13, 2024

praiskup commented Feb 14, 2024

LecrisUT commented Feb 14, 2024

xsuchy commented Feb 14, 2024

LecrisUT commented Feb 14, 2024 • edited Loading

praiskup commented Feb 14, 2024

LecrisUT commented Feb 15, 2024

praiskup commented Feb 16, 2024

praiskup commented Feb 16, 2024

LecrisUT commented Feb 16, 2024 • edited Loading

praiskup commented Feb 16, 2024

LecrisUT commented Feb 16, 2024

praiskup commented Feb 17, 2024

praiskup commented Apr 15, 2024

LecrisUT commented Apr 15, 2024

LecrisUT commented Feb 14, 2024 •

edited

Loading

LecrisUT commented Feb 16, 2024 •

edited

Loading