Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora Asahi kernel builds are way too slow (~6 hours) #2925

Closed
Conan-Kudo opened this issue Sep 21, 2023 · 12 comments
Closed

Fedora Asahi kernel builds are way too slow (~6 hours) #2925

Conan-Kudo opened this issue Sep 21, 2023 · 12 comments
Assignees

Comments

@Conan-Kudo
Copy link
Contributor

Conan-Kudo commented Sep 21, 2023

Something is wrong with the Fedora COPR builders, because they went from being able to build kernels in roughly a couple of hours to taking 6 hours to build.

For comparison of some older builds:

To current builds:

The amount of time these builds take is unreasonably long and it makes it very difficult for me to ship things in a timely fashion.

Even with figuring out these bottlenecks, it'd be great to upgrade the instances selected so we get more out of it more quickly.

Based on what I see for the COPR instance provisioning, it looks like we're using i4i.large for x86_64 and c7g.xlarge for aarch64.

Could we please look into bumping up to a larger instance? And maybe getting instances that have dedicated NVMe to not have I/O bottlenecks? Something like c7g.12xlarge or c7gd.4xlarge would be tremendously helpful for AArch64.

x86_64 has similar problems and could benefit from an upgrade to c7i.2xlarge.

If experiments are needed to find a happier medium, we're happy to help.

cc: @marcan @davdunc @davide125

@praiskup
Copy link
Member

Thank you for the report. Can we do this in #2241?

@Conan-Kudo
Copy link
Contributor Author

Sure. This is technically two issues anyway:

@Conan-Kudo
Copy link
Contributor Author

Since there's been no progress on #2241, could the instance types be upgraded? It would generally get everything to move much faster if we have upgraded instances, and we skip bootstrap these days with mock 5+...

@praiskup
Copy link
Member

praiskup commented Sep 22, 2023

Since there's been no progress on #2241, could the instance types be upgraded?

#2241 is ready, we need to deploy it (define a new "on demand" pool of workers).

could the instance types be upgraded?

We don't want to make Copr overly demanding (99% of builds would be doable on slower machines, so we could eventually decrease the power in the future and rather have more builders to better parallelize).

and we skip bootstrap these days with mock 5+...

Can be done on a per-chroot/per-copr basis, sure. This will give you minimal speedup, though.

@praiskup
Copy link
Member

Something is wrong with the Fedora COPR builders, because they went from being able to build kernels in roughly a couple of hours to taking 6 hours to build.

Reading again ^^^, are you sure something changed in Copr/AWS? Isn't this a package building problem? To the best of my knowledge, we haven't changed the instance type since your last request (a1.xlarge => c7g.xlarge).

@Conan-Kudo
Copy link
Contributor Author

Do we run the builds in tmpfs or on disk?

@praiskup
Copy link
Member

Tmpfs (may overflow to swap and consume a lot of I/O)

@Conan-Kudo
Copy link
Contributor Author

Oh, then we want memory-optimized instances, then. r7g.16xlarge (AArch64) and r7a.16xlarge (x86_64) are better choices.

@praiskup
Copy link
Member

This still doesn't answer why the builds take 3x more now. Is this worth reporting against EC2?

@Conan-Kudo
Copy link
Contributor Author

This still doesn't answer why the builds take 3x more now. Is this worth reporting against EC2?

It takes at least double because now there are double the flavors being built. What happened is it became 2.5 hours with Rust, then I doubled the flavors because now there's 4K and 16K, and then more code got turned on, and here we are at 6~7 hours.

@praiskup praiskup self-assigned this Sep 27, 2023
@marcan
Copy link

marcan commented Oct 8, 2023

Just to be clear: enabling Rust in kernel builds should have a negligible impact on build times. It is an insignificant amount of code compared to the rest of the kernel, and all kernel Rust code takes less than one minute to build with -j1.

If flipping Rust on caused our COPR kernel builds to be measurably slower, my understanding is that it must be because the builders are so ridiculously undersized right now that they are already running into thrashing issues, and rustc's moderately higher peak memory usage vs. gcc is causing an even more pathological situation there.

@praiskup
Copy link
Member

I think we can close this request as a redundant one finally.

We moved the default x86 machines from i4i.large to c7i.xlarge which performs the x86_64 builds roughly twice as fast. But we do not use those normal EC2 builders as long as we can handle the throughput with our 4 x86 hypervisors (cloud cost saving, so "normally" nothing changes here). See the discussion in #2241 which is the duplicate of this one anyway.

We enabled the powerful builders for Asahi project(s) in #2966, which should handle the build in less than 40 minutes. The overall build takes more than that because we have the keygen slowdown #2757. But that is a separate issue.

Please reopen or at least feel free to comment. Happy building!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants