Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting minimum number of CPU cores for a build #3141

Closed
LecrisUT opened this issue Feb 13, 2024 · 14 comments
Closed

Setting minimum number of CPU cores for a build #3141

LecrisUT opened this issue Feb 13, 2024 · 14 comments
Assignees
Labels

Comments

@LecrisUT
Copy link

Are there any ways to configure the copr project to only run on builders with at least x number of CPU cores? I've got a package where the difference is 1h on 4 cores and 3h+ on 2 cores due to oversubscription. This minimum requirement could either be set either on a project level or deduced from the .spec file.

@praiskup
Copy link
Member

Are you asking for powerful builders, or do you want to find some build-time macro to limit the number of processes? (builds should not overcommit, seems like a packaging issue).

@LecrisUT
Copy link
Author

These are issues at %check time, where mpi + omp is tested and it should have at least 4 cores in order to not oversubscribe

@xsuchy
Copy link
Member

xsuchy commented Feb 14, 2024

I recall that several years ago, I opened a discussion to enhance SPEC files to include minimal hw requirements. Minimal memory. Minimal number of CPUs. Exactly what you are asking for. But it was not welcomed and it was never implemented.
Shame on me as I cannot find where the discussion happened. :(

I would be very skeptical about implementing something that works only in Copr. Please open the issue in RPM https://github.com/rpm-software-management/rpm/issues, and if it becomes part of SPEC format, we can read it and interpret it in the Copr.

@LecrisUT
Copy link
Author

LecrisUT commented Feb 14, 2024

Long term I agree it should be in the spec format, but in the meantime, it can also be something defined manually for a whole copr project. wdyt?

Discussion on upstream opened: rpm-software-management/rpm#2909

@praiskup
Copy link
Member

These are issues at %check time, where mpi + omp is tested and it should have at least 4 cores in order to not oversubscribe

Why the build "oversubscribes"? Can't the %check section be fixed to respect rpm --eval %_smp_build_ncpus?

@LecrisUT
Copy link
Author

For example %ctest will only control the ctest parallelization. The internal parallelization will still survive, which for mpi-openmp is minimum 2*2

@praiskup
Copy link
Member

But what if we give it 2x more CPU: 4 CPUs, it gets 4*2 overcommit, right? Why this isn't "just" 2x faster?

I mean, this seems wrong. It seems that the package wants to use something like %global _smp_mflags -j$(( RPM_BUILD_NCPUS / 2 )), wdyt?

@praiskup
Copy link
Member

Don't take me wrong; we can offload your builds onto the powerful builders if you think this is the correct thing to do!

It just feels like every package should be reasonably "buildable" on every machine, even if it is just a 2-core system.

@LecrisUT
Copy link
Author

LecrisUT commented Feb 16, 2024

Ctest parallelization is orthogonal with the base parallelization, that is to say, if the base parallelization is 4, if you pass it -j 1 all the way to -j 7 it will only spawn one ctest worker, only at -j 8 it will spawn an additional one. But that means it will happily oversubscribe at -j < 4 (spawn one ctest worker, but still use 4x parallelization internally).

Why it isn't just 2x faster is because of the oversubscription where the OS has to take over on each core and the overhead of that stopping and restating tasks creates an overhead especially with mpi where it is not shared memory.

Buildability wise, yes it is buildable on any machine, it's just not testable. To be fair testing-farm can pick up the slack for that (once I figure out how to configure it), but there are certain tests that are not coverable like that, e.g. unit tests with hidden headers.

@praiskup
Copy link
Member

This seems like it, yes: If the package build oversubscribes the machine that is known to have "just N CPUs", then it seems like a package bug to me. Or perhaps that's the point of the test to oversubscribe the machine? Dunno.

This or that way, I assume it is a hard-to-fix issue in the package. We should go with this. What do you think?

@LecrisUT
Copy link
Author

It's a limitation of the MPI+OpenMP infrastructure and the requirements for testing.

MPI needs to spawn at least 2 workers, i.e. mpiexec forks 2 tasks of the same executable (/usr/bin/my_bin). Than inside each task, when it reaches an OpenMP routine it forks itself again into at least 2 (probably this is done earlier) with shared memory. Thus it needs 2*2 cores to not oversubscribe.

One issue is this MPI+OpenMP is standard in scientific code and most of HPC, so it is bound to happen/is happening to various packages as well.

But, since these issues are only %check related there are two options:

  • Migrate the tests to testing-farm
  • Use powerful builders

For the package I am working on, I can do the former since it only provides regression tests. But having easier access to the latter will come in handy as others encounter this issue. Having the specifications of the normal and powerful builders will also be helpful for the packagers to decide which one to request.

@praiskup
Copy link
Member

But having easier access to the latter will come in handy as others encounter this issue

It's about telling us where you want to build the packages, copr namespace (owner, project, chroots), and the package name. Copr administrators then can finish the configuration.

@praiskup praiskup self-assigned this Apr 8, 2024
@praiskup
Copy link
Member

Except for powerful builders, there are these additional options:

  • the AWS workers are now 4CPU, while the VMs on our in-lab hypervisors are only 2CPU ... so for aarch64/x86_64 we can assure that your builds are processed on AWS... let us know
  • we can have worker labels like 4CPU, 2CPU, and assign appropriate labels to your jobs (note there's currently no logic for ">=", just labeling system - so RFE against the Resalloc project would be needed for >=)
  • we can bump the workers on hypervisors to have 4CPU ... my feeling is that we can overcommit a bit more (getting 2x more CPU to each worker, decreasing the number of them by ~1/3 from 20 to say 14).
  • we can have different types of workers on hypervisors, too (a bit more work with configuration)

I'm closing this for inactivity, but, feel free to continue with the request (or discuss).

@LecrisUT
Copy link
Author

Any of those would be helpful. For now I have decoupled the build and test stage, where I had more control on testing-farm. Still takes ages for both build and test (quite surprised about the former). Right now the scaling is not quite linear (43min on 4 core vs 67min on 2 core) so I let you decide if it makes sense to bump it up or not. Here is the packit configuration that builds it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants