Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide linux/arm64 architecture support with an open-source chain of images (eg. forem/ruby) #19626

Open
klardotsh opened this issue Jun 22, 2023 Discussed in #16765 · 7 comments · May be fixed by #20848
Open

Provide linux/arm64 architecture support with an open-source chain of images (eg. forem/ruby) #19626

klardotsh opened this issue Jun 22, 2023 Discussed in #16765 · 7 comments · May be fixed by #20848
Assignees
Labels
internal team only internal tasks only for Forem team members

Comments

@klardotsh
Copy link
Contributor

Discussed in #16765

Originally posted by RedstoneWizard08 March 2, 2022

Description

Provide support for arm64 CPUs to run the Docker image of Forem. This will include the Mac M1 chips, general aarch64 chips (linux/arm64), and Raspberry Pis. This will also continue to allow amd64 chips to run it as well.

Potential Steps

Update Forem ruby image for arm64 and amd64 (and please make it open-source), Dockerfile is below.
Possible Forem AIO image (but use the regular one, it's better, just add arm64 support)
Update Forem docker image to allow for arm64 and amd64 architectures.

Notes

  • To determine the architecture, use this kind of thing in a Dockerfile:
RUN export ARCH=$(uname -m)
  • Stuff is still WIP.

Files

See here: https://github.com/RedstoneWizard08/Forem-Multiarch-Docker

@klardotsh klardotsh self-assigned this Jun 22, 2023
@github-actions
Copy link
Contributor

Thanks for the issue, we will take it into consideration! Our team of engineers is busy working on many types of features, please give us time to get back to you.

To our amazing contributors: issues labeled bug are always up for grabs, but for feature requests, please wait until we add a ready for dev before starting to work on it.

If this is a feature request from an external contributor (not core team at Forem), please close the issue and re-post via GitHub Discussions.

To claim an issue to work on, please leave a comment. If you've claimed the issue and need help, please ping @forem-team. The OSS Community Manager or the engineers on OSS rotation will follow up.

For full info on how to contribute, please check out our contributors guide.

@RedstoneWizard08
Copy link

It has been way too long since I've worked on this lol. Didn't expect this to become a real thing xD.

@RedstoneWizard08
Copy link

Ok I'm working on this again because I want this too and I want to help :)

@mirie mirie added the internal team only internal tasks only for Forem team members label Jun 23, 2023
@klardotsh
Copy link
Contributor Author

Hey @RedstoneWizard08, thanks for the draft pull request and for reviving the branch - we really appreciate the enthusiasm! That said, we've already started this work internally and are doing last passes of testing on (1) a fully revised image that doesn't use Fedora as a base at all (making this draft hard to reconcile with the work ongoing), and (2) should come with ARM64 support out of the box. In general, we attach a Ready For Dev label to stuff that's ready for external contributors: in this case, we've attached Forem Team indicating we're tackling this one in-house. I converted the discussion to a ticket mostly for my own in-house tracking purposes, and this refs #19603 where you'll find further context about what has led to this bit of work coming onto my radar.

Again, thanks so much for opening a draft with the branch that held that work you'd already tackled, but no need to stress yourself (or burn your nights and weekends, or whatever) on this any further - I'm tackling this one as part of my day job here at Forem :)

For future such issues, look for that Ready For Dev label: those are open season and not things we're actively tackling (at that time) in-house, and I'm sure at some point, some platform and infrastructure related items might show up in that feed!

@RedstoneWizard08
Copy link

RedstoneWizard08 commented Jun 23, 2023

Okay, I didn't realize that's what that meant.

klardotsh added a commit that referenced this issue Jun 23, 2023
This builds a Debian-derived base image that targets Ruby 3.0.2 (our
current version lock) to replace `quay.io/forem/ruby`, which is based on
Fedora 35 (and as such, lacks ARM64 support).

The building of this image is not yet automated, I intend to add that as
a follow-up some time vaguely soon-ish. This has been built and pushed
to a temporary tag,
[ghcr.io/forem/ruby:klardotsh-test](https://github.com/orgs/forem/packages/container/ruby/103882793?tag=klardotsh-test).
Once this merges, I'll build and push the result in main (pending any
revisions that come up in PR review) with the following incantation,
which will generate a multiarch manifest:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile.base . -t ghcr.io/forem/ruby:latest -t ghcr.io/forem/ruby:$(git rev-parse --short HEAD) --push
```

This refs, but does not complete, #19626, and is one of several blockers
on the path to getting #19603 merged.
klardotsh added a commit that referenced this issue Jun 24, 2023
This "rebases" our application images off of the new Ruby base image
added in #19632, and fixes numerous problems and quirks with how the
images were built along the way. Notably:

- Issues where layers attempted to delete files in prior layers have
  been resolved (this caused build failures on some Docker filesystem
  drivers, notably overlay2).

- Bundler is no longer allowed to deviate from or modify the lockfile
  (`BUNDLE_FROZEN` is now `true`).

- `git(1)` is no longer required to live inside the container and `.git`
  is no longer required to be copied into the Docker build context, as
  these were only used to calculate `FOREM_BUILD_SHA`, which is now
  passed in as a Build Argument to the container build context.

- The entire source tree is no longer `chmod` in one giant swing, which
  ran so long on my system (as just one example) that I gave up after
  15-20 minutes and issued it a `SIGTERM`. Instead, `COPY --chown` is
  used more heavily and ensures the `APP_USER` will have access to the
  requisite files.

This new container image appears to build successfully for
`linux/arm64`, which refs (but does not complete) #19626. Currently,
such builds aren't automated , and must be built on a developer
workstation. For example:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile . -t ghcr.io/forem/forem:klardotsh-test --push --build-arg VCS_REF=$(git rev-parse --short HEAD)
```

In the meantime, the existing `linux/amd64`-only BuildKite scripts have
been updated to allow this PR to merge as a separate unit, and CI
refactors to enable the multiarch builds of `linux/arm64,linux/amd64`
can come later when more time is available.

This is one of several blockers on the path to getting #19603 merged.
The next step in that chronology will be rebasing that work on top of
this work, which *should* be, on the containerization side, as
straightforward as bumping `Containerfile.base` to reference the new
upstream image, rebuilding the base container, and then bumping the
reference in `Containerfile`.
klardotsh added a commit that referenced this issue Jun 28, 2023
This "rebases" our application images off of the new Ruby base image
added in #19632, and fixes numerous problems and quirks with how the
images were built along the way. Notably:

- Issues where layers attempted to delete files in prior layers have
  been resolved (this caused build failures on some Docker filesystem
  drivers, notably overlay2).

- Bundler is no longer allowed to deviate from or modify the lockfile
  (`BUNDLE_FROZEN` is now `true`).

- `git(1)` is no longer required to live inside the container and `.git`
  is no longer required to be copied into the Docker build context, as
  these were only used to calculate `FOREM_BUILD_SHA`, which is now
  passed in as a Build Argument to the container build context.

- The entire source tree is no longer `chmod` in one giant swing, which
  ran so long on my system (as just one example) that I gave up after
  15-20 minutes and issued it a `SIGTERM`. Instead, `COPY --chown` is
  used more heavily and ensures the `APP_USER` will have access to the
  requisite files.

This new container image appears to build successfully for
`linux/arm64`, which refs (but does not complete) #19626. Currently,
such builds aren't automated , and must be built on a developer
workstation. For example:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile . -t ghcr.io/forem/forem:klardotsh-test --push --build-arg VCS_REF=$(git rev-parse --short HEAD)
```

In the meantime, the existing `linux/amd64`-only BuildKite scripts have
been updated to allow this PR to merge as a separate unit, and CI
refactors to enable the multiarch builds of `linux/arm64,linux/amd64`
can come later when more time is available.

This is one of several blockers on the path to getting #19603 merged.
The next step in that chronology will be rebasing that work on top of
this work, which *should* be, on the containerization side, as
straightforward as bumping `Containerfile.base` to reference the new
upstream image, rebuilding the base container, and then bumping the
reference in `Containerfile`.
klardotsh added a commit that referenced this issue Jun 28, 2023
This "rebases" our application images off of the new Ruby base image
added in #19632, and fixes numerous problems and quirks with how the
images were built along the way. Notably:

- Issues where layers attempted to delete files in prior layers have
  been resolved (this caused build failures on some Docker filesystem
  drivers, notably overlay2).

- Bundler is no longer allowed to deviate from or modify the lockfile
  (`BUNDLE_FROZEN` is now `true`).

- `git(1)` is no longer required to live inside the container and `.git`
  is no longer required to be copied into the Docker build context, as
  these were only used to calculate `FOREM_BUILD_SHA`, which is now
  passed in as a Build Argument to the container build context.

- The entire source tree is no longer `chmod` in one giant swing, which
  ran so long on my system (as just one example) that I gave up after
  15-20 minutes and issued it a `SIGTERM`. Instead, `COPY --chown` is
  used more heavily and ensures the `APP_USER` will have access to the
  requisite files.

This new container image appears to build successfully for
`linux/arm64`, which refs (but does not complete) #19626. Currently,
such builds aren't automated , and must be built on a developer
workstation. For example:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile . -t ghcr.io/forem/forem:klardotsh-test --push --build-arg VCS_REF=$(git rev-parse --short HEAD)
```

In the meantime, the existing `linux/amd64`-only BuildKite scripts have
been updated to allow this PR to merge as a separate unit, and CI
refactors to enable the multiarch builds of `linux/arm64,linux/amd64`
can come later when more time is available.

This is one of several blockers on the path to getting #19603 merged.
The next step in that chronology will be rebasing that work on top of
this work, which *should* be, on the containerization side, as
straightforward as bumping `Containerfile.base` to reference the new
upstream image, rebuilding the base container, and then bumping the
reference in `Containerfile`.
klardotsh added a commit that referenced this issue Jun 28, 2023
…e. (#19632)

This builds a Debian-derived base image that targets Ruby 3.0.2 (our
current version lock) to replace `quay.io/forem/ruby`, which is based on
Fedora 35 (and as such, lacks ARM64 support).

The building of this image is not yet automated, I intend to add that as
a follow-up some time vaguely soon-ish. This has been built and pushed
to a temporary tag,
[ghcr.io/forem/ruby:klardotsh-test](https://github.com/orgs/forem/packages/container/ruby/103882793?tag=klardotsh-test).
Once this merges, I'll build and push the result in main (pending any
revisions that come up in PR review) with the following incantation,
which will generate a multiarch manifest:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile.base . -t ghcr.io/forem/ruby:latest -t ghcr.io/forem/ruby:$(git rev-parse --short HEAD) --push
```

This refs, but does not complete, #19626, and is one of several blockers
on the path to getting #19603 merged.
klardotsh added a commit that referenced this issue Jun 28, 2023
This "rebases" our application images off of the new Ruby base image
added in #19632, and fixes numerous problems and quirks with how the
images were built along the way. Notably:

- Issues where layers attempted to delete files in prior layers have
  been resolved (this caused build failures on some Docker filesystem
  drivers, notably overlay2).

- Bundler is no longer allowed to deviate from or modify the lockfile
  (`BUNDLE_FROZEN` is now `true`).

- `git(1)` is no longer required to live inside the container and `.git`
  is no longer required to be copied into the Docker build context, as
  these were only used to calculate `FOREM_BUILD_SHA`, which is now
  passed in as a Build Argument to the container build context.

- The entire source tree is no longer `chmod` in one giant swing, which
  ran so long on my system (as just one example) that I gave up after
  15-20 minutes and issued it a `SIGTERM`. Instead, `COPY --chown` is
  used more heavily and ensures the `APP_USER` will have access to the
  requisite files.

This new container image appears to build successfully for
`linux/arm64`, which refs (but does not complete) #19626. Currently,
such builds aren't automated , and must be built on a developer
workstation. For example:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile . -t ghcr.io/forem/forem:klardotsh-test --push --build-arg VCS_REF=$(git rev-parse --short HEAD)
```

In the meantime, the existing `linux/amd64`-only BuildKite scripts have
been updated to allow this PR to merge as a separate unit, and CI
refactors to enable the multiarch builds of `linux/arm64,linux/amd64`
can come later when more time is available.

This is one of several blockers on the path to getting #19603 merged.
The next step in that chronology will be rebasing that work on top of
this work, which *should* be, on the containerization side, as
straightforward as bumping `Containerfile.base` to reference the new
upstream image, rebuilding the base container, and then bumping the
reference in `Containerfile`.
dukegreene pushed a commit that referenced this issue Jun 28, 2023
…e. (#19632)

This builds a Debian-derived base image that targets Ruby 3.0.2 (our
current version lock) to replace `quay.io/forem/ruby`, which is based on
Fedora 35 (and as such, lacks ARM64 support).

The building of this image is not yet automated, I intend to add that as
a follow-up some time vaguely soon-ish. This has been built and pushed
to a temporary tag,
[ghcr.io/forem/ruby:klardotsh-test](https://github.com/orgs/forem/packages/container/ruby/103882793?tag=klardotsh-test).
Once this merges, I'll build and push the result in main (pending any
revisions that come up in PR review) with the following incantation,
which will generate a multiarch manifest:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile.base . -t ghcr.io/forem/ruby:latest -t ghcr.io/forem/ruby:$(git rev-parse --short HEAD) --push
```

This refs, but does not complete, #19626, and is one of several blockers
on the path to getting #19603 merged.
klardotsh added a commit that referenced this issue Jun 29, 2023
…#19633)

This "rebases" our application images off of the new Ruby base image
added in #19632, and fixes numerous problems and quirks with how the
images were built along the way. Notably:

- Issues where layers attempted to delete files in prior layers have
  been resolved (this caused build failures on some Docker filesystem
  drivers, notably overlay2).

- Bundler is no longer allowed to deviate from or modify the lockfile
  (`BUNDLE_FROZEN` is now `true`).

- `git(1)` is no longer required to live inside the container and `.git`
  is no longer required to be copied into the Docker build context, as
  these were only used to calculate `FOREM_BUILD_SHA`, which is now
  passed in as a Build Argument to the container build context.

- The entire source tree is no longer `chmod` in one giant swing, which
  ran so long on my system (as just one example) that I gave up after
  15-20 minutes and issued it a `SIGTERM`. Instead, `COPY --chown` is
  used more heavily and ensures the `APP_USER` will have access to the
  requisite files.

This new container image appears to build successfully for
`linux/arm64`, which refs (but does not complete) #19626. Currently,
such builds aren't automated , and must be built on a developer
workstation. For example:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile . -t ghcr.io/forem/forem:klardotsh-test --push --build-arg VCS_REF=$(git rev-parse --short HEAD)
```

In the meantime, the existing `linux/amd64`-only BuildKite scripts have
been updated to allow this PR to merge as a separate unit, and CI
refactors to enable the multiarch builds of `linux/arm64,linux/amd64`
can come later when more time is available.

This is one of several blockers on the path to getting #19603 merged.
The next step in that chronology will be rebasing that work on top of
this work, which *should* be, on the containerization side, as
straightforward as bumping `Containerfile.base` to reference the new
upstream image, rebuilding the base container, and then bumping the
reference in `Containerfile`.
rt4914 pushed a commit that referenced this issue Jun 29, 2023
…#19633)

This "rebases" our application images off of the new Ruby base image
added in #19632, and fixes numerous problems and quirks with how the
images were built along the way. Notably:

- Issues where layers attempted to delete files in prior layers have
  been resolved (this caused build failures on some Docker filesystem
  drivers, notably overlay2).

- Bundler is no longer allowed to deviate from or modify the lockfile
  (`BUNDLE_FROZEN` is now `true`).

- `git(1)` is no longer required to live inside the container and `.git`
  is no longer required to be copied into the Docker build context, as
  these were only used to calculate `FOREM_BUILD_SHA`, which is now
  passed in as a Build Argument to the container build context.

- The entire source tree is no longer `chmod` in one giant swing, which
  ran so long on my system (as just one example) that I gave up after
  15-20 minutes and issued it a `SIGTERM`. Instead, `COPY --chown` is
  used more heavily and ensures the `APP_USER` will have access to the
  requisite files.

This new container image appears to build successfully for
`linux/arm64`, which refs (but does not complete) #19626. Currently,
such builds aren't automated , and must be built on a developer
workstation. For example:

```sh
docker buildx build --platform linux/amd64,linux/arm64 -f Containerfile . -t ghcr.io/forem/forem:klardotsh-test --push --build-arg VCS_REF=$(git rev-parse --short HEAD)
```

In the meantime, the existing `linux/amd64`-only BuildKite scripts have
been updated to allow this PR to merge as a separate unit, and CI
refactors to enable the multiarch builds of `linux/arm64,linux/amd64`
can come later when more time is available.

This is one of several blockers on the path to getting #19603 merged.
The next step in that chronology will be rebasing that work on top of
this work, which *should* be, on the containerization side, as
straightforward as bumping `Containerfile.base` to reference the new
upstream image, rebuilding the base container, and then bumping the
reference in `Containerfile`.
@klardotsh
Copy link
Contributor Author

I think this is going to be tricky to deal with: on an AMD 6900HX with 32GB of RAM, the AMD64 images build in ~10 minutes or so (maybe 15?), but I just clocked in about 2 hours for the total multiarch manifest counting the QEMU-based ARM64 cross compile.

I think actually shipping ARM64 images as part of our CI pipeline is going to require setting up GitHub Actions or BuildKite agents on actual ARM64 instances (perhaps AWS Graviton2), building the image natively on each arch, and then stitching the two images together with a custom manifest in a third CI step. This is a noteworthy bit of infra tinkering to take on, and so I don't think this issue is likely to close fully within the next month: I'll work with @mirie to slot this build infra work in as our bandwidth allows.

In the mean time, I feel reasonably confident in someone's ability to docker build (or podman build, or whatever) our app images on ARM64 boxes themselves now that 1a92880 has landed in main (with the merge of #19633). In other words: ARM64 support is "basically there I think?" though not fully tested, and no automated builds exist yet to publish pre-built images to the registry. Baby steps!

@RedstoneWizard08
Copy link

This looks amazing so far! If I may suggest, although I don't know if it has been tried yet, I was able to mitigate the speed problem to some degree using GitHub actions caching on docker buildx.

@dgadelha dgadelha linked a pull request Apr 8, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal team only internal tasks only for Forem team members
Projects
None yet
3 participants