Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16k kernel page builds to support Apple Silicon (ARM64) #7335

Closed
victorhooi opened this issue Apr 21, 2023 · 22 comments
Closed

16k kernel page builds to support Apple Silicon (ARM64) #7335

victorhooi opened this issue Apr 21, 2023 · 22 comments
Milestone

Comments

@victorhooi
Copy link

Is your feature request related to a problem? Please describe.

AIUI, Asahi Linux requires 16K kernel pages (due to the architecture of the Apple Silicon machines - https://lwn.net/Articles/872053/ has some background).

Hence, k3s will not work on Apple Silicon Macs, as k3s only supports 4K kernel pages.

Describe the solution you'd like

It would be amazing if k3s could also provide builds with 16K pages for ARM64 - the Apple Mac Mini M1 is a great, accessible, low-cost option for homelabs and a great entry into ARM64 - so a pretty good match for one of k3s's main use cases.

And of course, there are other higher-end Apple Silicon options (M2, Mac Studio) etc that provide great performance-per-watt as well, for larger deploys etc.

Describe alternatives you've considered

Additional context

@brandond
Copy link
Member

brandond commented Apr 21, 2023

We are not planning on producing binaries for systems with nonstandard page sizes. Most distros that previously used 64k (or other multiple-of-16k) pages have now standardized on 4k. Ref: #6708 (comment)

@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Apr 21, 2023
@victorhooi
Copy link
Author

Right - but for Apple Silicon (Asahi Linux), they can't standardise on 4K pages, due to the design of the Apple ARM chips.

Is there any chance of creating a build for Apple Silicon (M1, M2 etc.) so that people are able to standardise on Apple machines?

Or even of just providing steps, so somebody could help get k3s running on Apple machines?

@brandond
Copy link
Member

brandond commented Apr 22, 2023

As per the page you linked, and others such as https://asahilinux.org/2021/10/progress-report-september-2021/, 4k pages are possible.

Sven took on the challenge and now has a patch series that makes Linux’s IOMMU support layer play nicely with hardware that has an IOMMU page size larger than the kernel page size! It’s not perfect, as it can’t support a select few corner case drivers (that do things that are fundamentally impossible to support in this situation), but it works well and will support everything we need to make 4K kernels viable.

https://asahilinux.org/2022/03/asahi-linux-alpha-release/

There is a category of software that will likely never support 16K page sizes: certain emulators and compatibility layers, including FEX. Android is also affected, in case someone wants to try running it natively some day. For users of these tools, we will provide 4K page size kernels in the future, once the kernel changes that make this possible are ready for upstreaming.

These are all pretty old links, have you checked to see if there is a 4k page kernel available yet?

Note that even if we did make a special build of k3s available, you'd also need all your aarch64 container images to support the odd page size as well.

@marcan
Copy link

marcan commented Apr 22, 2023

You seem to be thoroughly confused. All ARM64 binaries for Linux are built by default with support for all standard page sizes (4K, 16K, 64K). This is the case for every major distribution. Not supporting all three (which are all standard and part of the architecture) is a bug. People are already running typical ARM64 containers on Asahi just fine.

If you are deliberately overriding the toolchain section alignment to 4K, you need to stop doing that and switch to 64K (which is the default everywhere nowadays). Doing so will allow your binaries to run on any page size, as you can always load them with a smaller page size. This claim that doing so would break the Raspberry Pi and other 4K platforms is completely wrong. We are literally running the same packages on Asahi Linux as you would on a Raspberry Pi. The page size does not matter as long as the binaries are built properly. We didn't have to do anything special, and every major distro's userland runs on our 16K kernels just fine without any rebuilding.

If you are making bad runtime assumptions about the page size, you need to fix that. We've already gotten several projects to fix bugs like these, but they only tend to happen in stuff like allocators and apps doing silly stuff with mmap (e.g. emacs).

This is precisely the reason why we are not yet offering 4K kernels, because their availability would give people an excuse not to fix these bugs in ecosystem packages. 4K kernels have significant downsides, including a notable performance penalty (up to 20%) due to quadrupling the TLB pressure and increasing memory management overhead, depending on the specific workload. They will never be ideal nor the default on distro builds intended specifically for Apple Silicon platforms.

The only legitimate reason to use a 4K kernel on Apple Silicon is to run x86 software in emulation (and Android software, since that ecosystem made the mistake of standardizing on 4K section alignment and now they're stuck with it). Doing so for anything else is just working around buggy software.

4K pages made sense back when the Intel 386 came out. They are thoroughly obsolete, and the only reason they are the default on typical ARM64 distros is because 4K is the lowest common denominator supported everywhere and the Linux kernel's poor design does not allow deciding the page size at boot time. 16K is unarguably beneficial for all but the smallest embedded systems, and 64K is the logical choice for large servers. 4K pages increase overhead and do not provide a measurable memory savings. There's a reason Apple went with 16K for their entire 64-bit ARM ecosystem (because it's better, and because they control it all so they can make that decision).

@marcan
Copy link

marcan commented Apr 22, 2023

From what I can tell, that BR2_ARM64_PAGE_SIZE_64K buildroot config that was previously mentioned only does three things: changes the kernel page size (which you don't care about), changes the linker max-page-size to 64K (which is what you want), and changes the uClibc page size to 64K. The latter doesn't seem to actually do much, and they should probably get rid of that config option entirely and replace the few references to PAGE_SIZE internally with sysconf(_SC_PAGESIZE). Someone should probably file a uClibc bug for that, since compile-time hardcoded page size is not a thing on ARM64 (you will not find PAGE_SIZE defined on glibc arm64 systems).

From the k3s side, I would suggest switching to BR2_ARM64_PAGE_SIZE_64K and seeing if anything breaks on 4K systems. If anything does, it should be clearly identifiable as a uClibc bug (and if you don't use uClibc I don't expect anything to break).

@brandond
Copy link
Member

brandond commented Apr 23, 2023

If you are making bad runtime assumptions about the page size, you need to fix that. We've already gotten several projects to fix bugs like these, but they only tend to happen in stuff like allocators and apps doing silly stuff with mmap (e.g. emacs).

As you noted, the issue is 100% with the buildroot binaries. "We" the K3s project don't make any assumptions anywhere in the code that we maintain.

they should probably get rid of that config option entirely and replace the few references to PAGE_SIZE internally with sysconf(_SC_PAGESIZE). Someone should probably file a uClibc bug for that, since compile-time hardcoded page size is not a thing on ARM64 (you will not find PAGE_SIZE defined on glibc arm64 systems).

Have you reported this to buildroot or uclibc? We've not changed anything on our side, including the buildroot page size selection. All we did on our side was update the buildroot release, which introduced what is clearly a dependency on a specific kernel page size.

All ARM64 binaries for Linux are built by default with support for all standard page sizes (4K, 16K, 64K). This is the case for every major distribution.

Except for buildroot uclibc static binaries apparently?

@victorhooi
Copy link
Author

victorhooi commented Apr 23, 2023

This seems like something we could easily check.

I do have access to both a Mac Mini M1, and also a Raspberry Pi 4 Model B - I am happy to run some tests here on the hardware I do have, if that would help?

Does anybody happen to have a recent build of k3s with 4K/16K/64K support, that I could download and try please?

I think the change to add BR2_ARM64_PAGE_SIZE_4K=y is actually a fairly recent change (November 2022) - the configuration change was actually bundled in together with a coreutils update:

k3s-io/k3s-root@48b49ad

Prior to this, it seems none of the page size options were defined (BR2_ARM64_PAGE_SIZE_4K, BR2_ARM64_PAGE_SIZE_16K and BR2_ARM64_PAGE_SIZE_64K were simply unset) - so I assume it would have been the default.

Also, sorry if this is a basic question, I'm not clear on the relationship between k3s-root and k3. How does the k3s-root get built, when you build k3s?

Anyway, if anybody is able to give me e.g. k3s 1.26 with the 64K alignment (or instructions on how I can do this - my default machine is macOS, but I can access to a Linux box if I need to, to build this if I have to), let me know.

@brandond
Copy link
Member

brandond commented Apr 23, 2023

K3s bundles statically linked uclibc user space tools from k3s-root, which is basically just a set of preconfigured buildroot configs for our supported architectures. We don't build it as part of k3s, we just download the tarball release artifacts from that repo.

K3s itself works fine regardless of page size. It's the buildroot user space binaries that crash if the kernel page size doesn't match what buildroot was configured with. You'd have to build a reconfigured k3s-root tarball using the build scripts in that repo, and then build k3s with that. Or just experiment with the binaries directly, without involving k3s itself.

It looks like you're already poking around in that repo, it should be pretty easy for you to figure out how to build a modified tarball and compare that to our current configuration.

@brandond
Copy link
Member

If all it takes is building for 64k pages size, and it doesn't regress on any other aarch64 platforms or distros with smaller page sizes, that'd be great. I haven't personally tried it, as our initial investigation suggested that these alternative page sizes were fairly niche, and had fallen out of favor with most distros.

@marcan
Copy link

marcan commented Apr 23, 2023

Right, so the thing is BR2_ARM64_PAGE_SIZE_4K sets the linker maximum page size to 4K, while BR2_ARM64_PAGE_SIZE_64K sets the maximum page size to 64K. That means (as far as the linker is concerned) binaries built "for 64K" will run on any page size. This is the default config on every major distro (64K section alignment), which is why their binaries work everywhere.

The only question is the uClibc stuff, and it's unclear whether it matters at all (they should get rid of the page size selection, but that doesn't mean it won't actually work built for 64K). So the thing to do is just build with BR2_ARM64_PAGE_SIZE_64K. If the tools work as expected on a 4K system (it would be nice to at least do a cursory test of the basic stuff) then you're done. If you run into any problems, then they should point us directly at hardcoded page size assumptions in uClibc and we can fix those.

In general most hardcoded page size assumptions aren't inherently problem as long as they are defined as "min possible" or "max possible" correctly. Allocators built for hardcoded 64K pages will work on smaller page sizes, as will code that needs page size-sized buffers. But code making alignment checks needs to check against a hardcoded 4K minimum (of course it's ideal to use the real page size in all cases anyway, but hardcoding min/max here works if done right). From a quick look at uClibc I only saw one instance of the latter and it was using a #define that was seemingly not set for ARM64 and defaulting to 4096, so that one wouldn't be a problem, but I didn't check the rest of the hits exhaustively. It should be easier to just try it and see if it breaks first, then worry about it if it does.

TL;DR this is mostly just an issue of buildroot being really confusing with their page size selection, probably because the major use case for buildroot is building for specific embedded systems (where you get to pick and you know the page size), and nobody documented that the correct default is "64K" if you want binaries that will run on any system (even though 4K is the most popular page size).

@marcan
Copy link

marcan commented Apr 23, 2023

AIUI the buildroot stuff is only used for bundled binaries, right? That is, uClibc isn't being offered as a build environment for any third-party code? I'd be more worried about uClibc defining PAGE_SIZE on arm64 (which is wrong since it tells consumers that the page size is fixed) if there could be arbitrary code using it, but if all you're doing is compiling a limited set of tools against it, then if it all works anyway there isn't much left to worry about.

@victorhooi
Copy link
Author

I've cloned k3s-root, and changed the arm64config file as below:

$ git diff
diff --git a/buildroot/arm64config b/buildroot/arm64config
index 51c57cf..e578d0c 100644
--- a/buildroot/arm64config
+++ b/buildroot/arm64config
@@ -97,6 +97,7 @@ BR2_cortex_a53=y
 # BR2_ARM_FPU_VFPV4 is not set
 # BR2_ARM_FPU_VFPV4D16 is not set
 BR2_ARM_FPU_FP_ARMV8=y
-BR2_ARM64_PAGE_SIZE_4K=y
+#BR2_ARM64_PAGE_SIZE_4K=y
 # BR2_ARM64_PAGE_SIZE_64K is not set
-BR2_ARM64_PAGE_SIZE="4K"
+BR2_ARM64_PAGE_SIZE_64K=y
+BR2_ARM64_PAGE_SIZE="64K"

I then build it for ARM64 with make ARCH=arm64.

During building, I did see this reference to BR2_ARM64_PAGE_SIZE_64K:

+ for PATCH in /source/patches/*.patch
+ patch -t -p1 -i /source/patches/0100-package-nfs-utils-fixup.patch
patching file package/nfs-utils/nfs-utils.mk
+ for PATCH in /source/patches/*.patch
+ patch -t -p1 -i /source/patches/0100-package-slirp4netns.patch
patching file package/Config.in
Hunk #1 succeeded at 1936 with fuzz 2 (offset 109 lines).
+ popd
~/scripts
+ cat /source/buildroot/arm64config /source/buildroot/config
+ ./build
+ pushd /usr/src/buildroot
/usr/src/buildroot ~/scripts
+ V=1
+ unset VERBOSE
+ make -s V=1 olddefconfig
/usr/src/buildroot/.config:102:warning: override: reassigning to symbol BR2_ARM64_PAGE_SIZE_64K
#
# configuration written to /usr/src/buildroot/.config
#
+ make -s V=1
>>> host-skeleton  Extracting
>>> host-skeleton  Patching
>>> host-skeleton  Configuring
>>> host-skeleton  Building
>>> host-skeleton  Installing to host directory
>>> host-ccache 3.7.12 Downloading

but not sure if it's actually an issue or not.

Now under k3s-root/dist, I have:

  • k3s-root-arm64.tar
  • k3s-root-xtables-arm64.tar

@brandond Do you know which of the two above tarballs I should be using? Secondly, how exactly do I use this when building k3s?

I saw that k3s/scripts/download uses curl to pull it from Github - however, is there an easy way to get it use a local file?

@e3b0c442
Copy link

@victorhooi I just released a rebuild on my fork https://github.com/e3b0c442/k3s-root/releases/tag/v0.13.0-e3b0c442.6. No shasums as I did a quick port of the CI over to GHA so that I didn't have to stand up my own Drone box, but I'm also in the process of trying to test this change as I am also using RHEL8 on ARM64 and looking to expand my k3s footprint in that direction.

@brandond
Copy link
Member

brandond commented Apr 24, 2023

Reopening to track potentially moving k3s-root over to 64k page size if it is confirmed not to regress hosts with 4k pages.

@brandond brandond reopened this Apr 24, 2023
brandond added a commit to brandond/k3s-root that referenced this issue Apr 24, 2023
This should work on kernels with 64k pages, while also working on nodes with 4k or 16k page size.

Ref: k3s-io/k3s#7335

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to k3s-io/k3s-root that referenced this issue Apr 25, 2023
This should work on kernels with 64k pages, while also working on nodes with 4k or 16k page size.

Ref: k3s-io/k3s#7335

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
@e3b0c442
Copy link

e3b0c442 commented Apr 26, 2023

I was able to rebuild k3s again my compiled k3s-root tonight and successfully tested the failed case from #6708 on AWS a1 instances, one running AlmaLinux 8 and one running AlmaLinux 9. Core binaries do not exhibit segfaults on either.

I should have time tomorrow to get Alma 8 spun up on a spare RPi to test there as well.

//edit: and now I see I'm behind the curve :)

@brandond
Copy link
Member

brandond commented Apr 27, 2023

Can one of y'all on a weird page size test K3s with the new root:
curl -ksL get.k3s.io | INSTALL_K3S_COMMIT=087bddb571826fed4093faf0cb11e8d5e081e39f sh -s -

@e3b0c442
Copy link

e3b0c442 commented Apr 28, 2023

I have validated this on AlmaLinux 8 (64K page size) and AlmaLinux 9 (4K page size) on a1 AWS instances. @victorhooi would you be able to test on Asahi? I don't have any M1 hardware I can currently spare for Linux.

@drlamb
Copy link

drlamb commented Apr 28, 2023

k3s starts here on my m1 mini (Running Fedora Asahi Remix) with the provided commit.

[drlamb@trashcan ~]$ getconf PAGESIZE
16384
[drlamb@trashcan ~]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   coredns-77ccd57875-2rhbv                 1/1     Running     0          79s
kube-system   local-path-provisioner-957fdf8bc-46l67   1/1     Running     0          79s
kube-system   helm-install-traefik-crd-qkp87           0/1     Completed   0          80s
kube-system   svclb-traefik-2a58c0cf-s4chr             2/2     Running     0          73s
kube-system   helm-install-traefik-vt4pb               0/1     Completed   1          80s
kube-system   traefik-84745cf649-k9656                 1/1     Running     0          73s
kube-system   metrics-server-54dc485875-vmkqj          1/1     Running     0          79s

@marcan
Copy link

marcan commented Apr 28, 2023

Awesome!

Gentle reminder to remember to update the docs (which still claim 4K is required) :)

@brandond brandond moved this from Done Issue to To Test in K3s Development Apr 28, 2023
@brandond brandond added this to the v1.27.2+k3s1 milestone Apr 28, 2023
@e3b0c442
Copy link

Any chance we can get this in the next 1.26 release as well? Many thanks!

@brandond
Copy link
Member

Yes, everything gets backported to active branches.

@rancher-max
Copy link
Contributor

Thank you @e3b0c442 and @drlamb for validating this fix! I don't see this causing any regressions, but if it does it will be caught in general issue and patch validation testing, so I am going to close this out as "validated by community" ❤️ It will be available for general use in the May patch releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

6 participants