-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JobSet release for ARM? #237
Comments
heyo! I was wondering if this is something we could look into? |
I think it is doable. I believe that Kueue supports both. Looks like its some work in the makefile: https://github.com/kubernetes-sigs/kueue/blob/main/Makefile to tell which platform to build. |
Yes! I've done this for two operators now, if you want another pattern to look at:
Arm is much slower to build, so I only do it on merges. https://github.com/flux-framework/flux-operator/blob/76c4754f90043dfc7afecc16ade053549c89ba18/.github/workflows/build-deploy.yaml#L12-L37 |
yeah, we can support that, do you want to send a PR that follows the Kueue pattern? |
The lab asked me to stop contributing until we have a corporate CLA (I was using an individual one before) so I definitely could, but with some delay for that to finalize (we are pushing fairly hard for it to be signed)! If there is someone else that wants to do it, go ahead, if not I can when we have that. |
/help |
@kannon92: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@vsoch Thanks for the links, very insightful. QQ: Do you also use/publish manifests to group multiple image tags ( |
I don’t use publish manifests - I have a similar workflow that just builds an explicit arm tag (for a push to main or release with the version) and then generates the equivalent CRD yaml file targeting arm. I didn’t want to tangle the builds together or assume they would stay the same (aside from the architecture). |
Thanks for the clarification. I'm bringing this conversation because I'm currently reading about container registry manifests, and i think it would be useful (in the context of this project) to push image tags for each of the supported platform + a manifest mapping the images the their respective platforms. This way user will only have to specify the same tag, regardless of the platform, and leave it up the container runtime to pull the right image. I want to hear what folks this this about this solution/approach? Also happy to contribute and help supporting this. |
I’ve developed container registries and contributed to OCI, so I’m familiar. My reasons are primarily the drastic differences in build times. The typical approach is to use this action to build and push both at once (same manifest) and it’s not reasonable to do in the time of a PR. https://drpdishant.medium.com/multi-arch-images-with-docker-buildx-and-qemu-141e0b6161e7 that said, it would work fine to do both like that on merge to main and release, and just one for PR. It’s up to you how you want to maintain your images and CRDs - I want to keep them separate. I don’t think there is a right choice, just different preferences. |
Thanks for clarifying again! I have few questions to complete the missing pieces in my mind:
|
Maybe i brought some confusion with my initial message, but the manifest i'm talking about was more the "container registry manifests" (https://docs.docker.com/engine/reference/commandline/manifest/), not the released |
A container image for a controller is directly tied to a CRD. If the metadata is off or wrong there, it won't install properly. E.g., here is the container reference for my CRD that the user applies (that single file) to do the entire deployment. https://github.com/flux-framework/flux-operator/blob/76c4754f90043dfc7afecc16ade053549c89ba18/examples/dist/flux-operator.yaml#L1398. The chart is just a more separated variant of that, where the version (or tag) is filled with a variable, e.g., here is the same. https://github.com/flux-framework/flux-operator/blob/76c4754f90043dfc7afecc16ade053549c89ba18/chart/templates/deployment.yaml#L58
if you use the buildx action and give it two architectures, it's going to take (I think) the sum of both, or at best, the time of the fastest one (if it can somehow build in parallel without losing efficiency). In my operator's case, the difference in times is 6 minutes for x86 and 33 for arm. And (in the scope of things I've built with QEMU and arm) that is really speedy, if you have to do more compilation it can take hours (or go outside of action limits). Here are the two paired, which happens on merge to main / releases. https://github.com/flux-framework/flux-operator/actions/runs/6137092811 |
No I understand. You want to pull one digest/tag and have the registry choose the platform. That happens by way of packaging them together into the same OCI image manifest. That is separate from the CRD yaml manifests (that I tried to distinguish based on name). My other point is that the second set of manifests (for the operator) can have subtle differences that would make sharing the OCI image manifest platform potentially problematic. I don't have a good example yet but a simple one might be that some of the clouds expect a pod selector for arm, which we'd need for one CRD manifest for arm but not the other. |
To be explicitly clear, you can find a good strategy to put both platforms in the same manifest and allow the OCI registry to choose. The complexity, I think, is ensuring that the builds happen in a reasonable time in testing, and if you can do then, ensuring that (forever) there are no differences in the operator manifests to warrant having just one the wrong approach. As I already said, I don't think there is a right answer, but it's a choice depending on the needs / use cases for your operator and choices for CI. |
I understand that CRDs can be tightly coupled with the controller version (e.g when it comes to controllers reconciliation logic). Is it accurate to say that the main issue revolves around ensuring compatibility between CRDs (in terms of versioning) and container images, preventing them from diverging? Additionally, in the case of this project, could there be scenarios where different platforms require distinct "CRDs yaml defintions" to accommodate platform-specific requirements? |
Yes that's why I have two :) And I mentioned:
|
Reviving this issue since I think it would be useful to do this, if anyone wants to give it a shot :) |
I'm testing JobSet with an operator, and I saw this error in my operator
Doh! I forgot that JobSet is probably built for x86!
Would it be possible to have an ARM build / set of manifests?
The text was updated successfully, but these errors were encountered: