-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
16k kernel page builds to support Apple Silicon (ARM64) #7335
Comments
We are not planning on producing binaries for systems with nonstandard page sizes. Most distros that previously used 64k (or other multiple-of-16k) pages have now standardized on 4k. Ref: #6708 (comment) |
Right - but for Apple Silicon (Asahi Linux), they can't standardise on 4K pages, due to the design of the Apple ARM chips. Is there any chance of creating a build for Apple Silicon (M1, M2 etc.) so that people are able to standardise on Apple machines? Or even of just providing steps, so somebody could help get k3s running on Apple machines? |
As per the page you linked, and others such as https://asahilinux.org/2021/10/progress-report-september-2021/, 4k pages are possible.
https://asahilinux.org/2022/03/asahi-linux-alpha-release/
These are all pretty old links, have you checked to see if there is a 4k page kernel available yet? Note that even if we did make a special build of k3s available, you'd also need all your aarch64 container images to support the odd page size as well. |
You seem to be thoroughly confused. All ARM64 binaries for Linux are built by default with support for all standard page sizes (4K, 16K, 64K). This is the case for every major distribution. Not supporting all three (which are all standard and part of the architecture) is a bug. People are already running typical ARM64 containers on Asahi just fine. If you are deliberately overriding the toolchain section alignment to 4K, you need to stop doing that and switch to 64K (which is the default everywhere nowadays). Doing so will allow your binaries to run on any page size, as you can always load them with a smaller page size. This claim that doing so would break the Raspberry Pi and other 4K platforms is completely wrong. We are literally running the same packages on Asahi Linux as you would on a Raspberry Pi. The page size does not matter as long as the binaries are built properly. We didn't have to do anything special, and every major distro's userland runs on our 16K kernels just fine without any rebuilding. If you are making bad runtime assumptions about the page size, you need to fix that. We've already gotten several projects to fix bugs like these, but they only tend to happen in stuff like allocators and apps doing silly stuff with mmap (e.g. emacs). This is precisely the reason why we are not yet offering 4K kernels, because their availability would give people an excuse not to fix these bugs in ecosystem packages. 4K kernels have significant downsides, including a notable performance penalty (up to 20%) due to quadrupling the TLB pressure and increasing memory management overhead, depending on the specific workload. They will never be ideal nor the default on distro builds intended specifically for Apple Silicon platforms. The only legitimate reason to use a 4K kernel on Apple Silicon is to run x86 software in emulation (and Android software, since that ecosystem made the mistake of standardizing on 4K section alignment and now they're stuck with it). Doing so for anything else is just working around buggy software. 4K pages made sense back when the Intel 386 came out. They are thoroughly obsolete, and the only reason they are the default on typical ARM64 distros is because 4K is the lowest common denominator supported everywhere and the Linux kernel's poor design does not allow deciding the page size at boot time. 16K is unarguably beneficial for all but the smallest embedded systems, and 64K is the logical choice for large servers. 4K pages increase overhead and do not provide a measurable memory savings. There's a reason Apple went with 16K for their entire 64-bit ARM ecosystem (because it's better, and because they control it all so they can make that decision). |
From what I can tell, that From the k3s side, I would suggest switching to |
As you noted, the issue is 100% with the buildroot binaries. "We" the K3s project don't make any assumptions anywhere in the code that we maintain.
Have you reported this to buildroot or uclibc? We've not changed anything on our side, including the buildroot page size selection. All we did on our side was update the buildroot release, which introduced what is clearly a dependency on a specific kernel page size.
Except for buildroot uclibc static binaries apparently? |
This seems like something we could easily check. I do have access to both a Mac Mini M1, and also a Raspberry Pi 4 Model B - I am happy to run some tests here on the hardware I do have, if that would help? Does anybody happen to have a recent build of k3s with 4K/16K/64K support, that I could download and try please? I think the change to add Prior to this, it seems none of the page size options were defined (BR2_ARM64_PAGE_SIZE_4K, BR2_ARM64_PAGE_SIZE_16K and BR2_ARM64_PAGE_SIZE_64K were simply unset) - so I assume it would have been the default. Also, sorry if this is a basic question, I'm not clear on the relationship between k3s-root and k3. How does the k3s-root get built, when you build k3s? Anyway, if anybody is able to give me e.g. k3s 1.26 with the 64K alignment (or instructions on how I can do this - my default machine is macOS, but I can access to a Linux box if I need to, to build this if I have to), let me know. |
K3s bundles statically linked uclibc user space tools from k3s-root, which is basically just a set of preconfigured buildroot configs for our supported architectures. We don't build it as part of k3s, we just download the tarball release artifacts from that repo. K3s itself works fine regardless of page size. It's the buildroot user space binaries that crash if the kernel page size doesn't match what buildroot was configured with. You'd have to build a reconfigured k3s-root tarball using the build scripts in that repo, and then build k3s with that. Or just experiment with the binaries directly, without involving k3s itself. It looks like you're already poking around in that repo, it should be pretty easy for you to figure out how to build a modified tarball and compare that to our current configuration. |
If all it takes is building for 64k pages size, and it doesn't regress on any other aarch64 platforms or distros with smaller page sizes, that'd be great. I haven't personally tried it, as our initial investigation suggested that these alternative page sizes were fairly niche, and had fallen out of favor with most distros. |
Right, so the thing is The only question is the uClibc stuff, and it's unclear whether it matters at all (they should get rid of the page size selection, but that doesn't mean it won't actually work built for 64K). So the thing to do is just build with In general most hardcoded page size assumptions aren't inherently problem as long as they are defined as "min possible" or "max possible" correctly. Allocators built for hardcoded 64K pages will work on smaller page sizes, as will code that needs page size-sized buffers. But code making alignment checks needs to check against a hardcoded 4K minimum (of course it's ideal to use the real page size in all cases anyway, but hardcoding min/max here works if done right). From a quick look at uClibc I only saw one instance of the latter and it was using a #define that was seemingly not set for ARM64 and defaulting to 4096, so that one wouldn't be a problem, but I didn't check the rest of the hits exhaustively. It should be easier to just try it and see if it breaks first, then worry about it if it does. TL;DR this is mostly just an issue of buildroot being really confusing with their page size selection, probably because the major use case for buildroot is building for specific embedded systems (where you get to pick and you know the page size), and nobody documented that the correct default is "64K" if you want binaries that will run on any system (even though 4K is the most popular page size). |
AIUI the buildroot stuff is only used for bundled binaries, right? That is, uClibc isn't being offered as a build environment for any third-party code? I'd be more worried about uClibc defining |
I've cloned k3s-root, and changed the
I then build it for ARM64 with During building, I did see this reference to
but not sure if it's actually an issue or not. Now under
@brandond Do you know which of the two above tarballs I should be using? Secondly, how exactly do I use this when building k3s? I saw that |
@victorhooi I just released a rebuild on my fork https://github.com/e3b0c442/k3s-root/releases/tag/v0.13.0-e3b0c442.6. No shasums as I did a quick port of the CI over to GHA so that I didn't have to stand up my own Drone box, but I'm also in the process of trying to test this change as I am also using RHEL8 on ARM64 and looking to expand my k3s footprint in that direction. |
Reopening to track potentially moving k3s-root over to 64k page size if it is confirmed not to regress hosts with 4k pages. |
This should work on kernels with 64k pages, while also working on nodes with 4k or 16k page size. Ref: k3s-io/k3s#7335 Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This should work on kernels with 64k pages, while also working on nodes with 4k or 16k page size. Ref: k3s-io/k3s#7335 Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
I was able to rebuild k3s again my compiled k3s-root tonight and successfully tested the failed case from #6708 on AWS a1 instances, one running AlmaLinux 8 and one running AlmaLinux 9. Core binaries do not exhibit segfaults on either. I should have time tomorrow to get Alma 8 spun up on a spare RPi to test there as well. //edit: and now I see I'm behind the curve :) |
Can one of y'all on a weird page size test K3s with the new root: |
I have validated this on AlmaLinux 8 (64K page size) and AlmaLinux 9 (4K page size) on a1 AWS instances. @victorhooi would you be able to test on Asahi? I don't have any M1 hardware I can currently spare for Linux. |
k3s starts here on my m1 mini (Running Fedora Asahi Remix) with the provided commit.
|
Awesome! Gentle reminder to remember to update the docs (which still claim 4K is required) :) |
Any chance we can get this in the next 1.26 release as well? Many thanks! |
Yes, everything gets backported to active branches. |
Thank you @e3b0c442 and @drlamb for validating this fix! I don't see this causing any regressions, but if it does it will be caught in general issue and patch validation testing, so I am going to close this out as "validated by community" ❤️ It will be available for general use in the May patch releases |
Is your feature request related to a problem? Please describe.
AIUI, Asahi Linux requires 16K kernel pages (due to the architecture of the Apple Silicon machines - https://lwn.net/Articles/872053/ has some background).
Hence, k3s will not work on Apple Silicon Macs, as k3s only supports 4K kernel pages.
Describe the solution you'd like
It would be amazing if k3s could also provide builds with 16K pages for ARM64 - the Apple Mac Mini M1 is a great, accessible, low-cost option for homelabs and a great entry into ARM64 - so a pretty good match for one of k3s's main use cases.
And of course, there are other higher-end Apple Silicon options (M2, Mac Studio) etc that provide great performance-per-watt as well, for larger deploys etc.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: