Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal for build multi arch images #159

Closed
wants to merge 6 commits into from
Closed

proposal for build multi arch images #159

wants to merge 6 commits into from

Conversation

yuzp1996
Copy link
Contributor

No description provided.

change the number info

Signed-off-by: zpyu <zpyu@alauda.io>

3. When the COPY instruction is executed in the Dockerfile, the binary address can be obtained according to the ARCH variable name, which can be /make/photon/core/binary/amd64/core

4. Use docker buildx instead of docker build to build all the images, and determine the target architecture according to the specified parameter ARCH
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we upstream don't need to use docker buildx


2. Add a file named adapter.sh, responsible for making changes to areas that cannot be covered by upstream changes

3. Add a Makefile (which can be called when performing github actions) to achieve such as overwriting adapter.sh or other functions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there will be many dockefiles


4. Use docker buildx instead of docker build to build all the images, and determine the target architecture according to the specified parameter ARCH

5. Determine the generated version name according to the environment variable ARCH name, if it is not amd64, we need to add the architecture name after the generated version number. For example v2.0.1-arm64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need do this

Maintain the build and test of an architecture images

#### Change

This comment was marked as outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readme in upstream can link to there


However, there are currently the following issues that need to be resolved

1. During testing, docker-compose will be used to deploy Harbor and test, but it seems that docker-compose [only provides x86 binary files](https://github.com/docker/compose/releases), and we need to compile other versions by ourselves.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try pip install docker-compose


1. During testing, docker-compose will be used to deploy Harbor and test, but it seems that docker-compose [only provides x86 binary files](https://github.com/docker/compose/releases), and we need to compile other versions by ourselves.

2. Images of different architectures may not be able to run using the runner provided by github action. [Self-hosted runners](https://docs.github.com/en/actions/hosting-your-own-runners) should be used to ensure that images of different architectures can run tests.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Travis has support for a few alternative architectures, but the product quality/reliability has seen a steady decline since their acquisition.

We might be able to use the binfmt-based user-mode QEMU emulation support for testing, but it has traditionally struggled with supporting Go binaries, so might not be sufficient for real testing.

Alternatively, now that Harbor is part of the CNCF sandbox, we might be able to convince them to sponsor something like an Amazon A1 instance for a self-hosted arm64 runner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice.

We may solve this problem through the third comment you mentioned.


We can provide support for different architectures in a different Harbor sub-projects.

For example, in the harbor-arm repo, we can maintain the build logic and other related information for arm64 images
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it need to be in separate repositories?

(I've built and run Harbor on arm64 successfully, and it didn't require any code changes -- just a base image that was supported, as seen in goharbor/harbor#13788)

I think the PostgreSQL major version upgrade problem we discussed over there in goharbor/harbor#13788 (comment) is very relevant to this work -- without some way to upgrade between major versions of PostgreSQL, this is going to get very complicated very quickly (and is the reason that PR got more complicated and added a separate Dockerfile for arm64 - it has to upgrade to Photon 3.0 to get arm64 support but that also comes with newer PostgreSQL which will break existing installs). See docker-library/postgres#37 for a really long discussion around this particular problem and why it's so complicated to solve nicely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will discuss major version upgrades of PostgreSQL at a meeting every Wednesday at 5 pm. You are welcome to join in. The meeting link is https://VMware.zoom.com.cn/j/94528344426?pwd=aVpmMWtweHNXenRBOFhQNjgvQlZ0UT09

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice! Which timezone is that in?
(it looks like maybe today's already happened? I'm in Las Vegas, so even listening in on today's community call at 6am was a bit rough for me 😬)

Copy link
Contributor Author

@yuzp1996 yuzp1996 Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize this. 😂 We are in Beijing now. We discussed your mention of PostgreSQL in comment. As far as I know Harbor Team will provide some solution to resolve this problem in two weeks.

But we are not sure if we will encounter the same problem when building arm64 image base photon 4.0. We need do a test about this.

If Harbor Team can provide a solution to solve this problem within two weeks, then this may not be a problem any more

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough -- if upstream stays on photon 2.0 and downstream is on photon 4.0, neither will have an issue today, but it's one more delta between upstream and downstream, and it will be a serious issue for both eventually (photon 2.0 won't be maintained forever, and even it is, the PostgreSQL version in it won't be 😅).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry ! I am not sure if I understand the delta between upstream and downstream that you have mentioned. 😭

The base images in upstream have upgrade to photon 4.0 in this PR which has been merged to master.

https://github.com/goharbor/harbor/pull/14551/files

I think we will develop based on upstream master branch. Then the arm images will use the latest version of PostgreSQL in photon 4.0 so maybe we don't need to care about the major version problem for now?

If I did not understand the question you mentioned correctly please feel free to correct me

Thank you very much! ☺️☺️☺️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, I hadn't seen goharbor/harbor#14551 yet 😬🙈

That's definitely going to be a problem for upgrades unless the PostgreSQL upgrade problem is solved. 😬😬😬


Upstream: github repository goharbor/harbor

Downstream: github repository like harbor-arm harbor-loongson and so on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not understanding the need for these to be separate? If they're treated as downstreams, they'll always be second-class citizens. If they require code changes, the ideal would be to get those upstream, and have those tested upstream too.

Copy link
Contributor Author

@yuzp1996 yuzp1996 Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution you mentioned is an ideal solution. However, Harbor Team has no plans to do multi-architecture support in the upstream this year, because the workload brought about by this is huge and it is easy to cause delays in release.
The current solution is a relatively fast method. We’ve weighed and decided to use the current way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a reasonable middle-ground would be adding a GitHub Action upstream that just verifies that the code still compiles successfully for any architectures that get built/tested downstream? I think Go's cross-compilation support is probably enough to accomplish that without QEMU or any major slowdown, and it would help make sure that the downstream project is more likely to have success.


1. Increase the environment variable ARCH, the default is amd64, which can be overwritten during execution

2. When executing make compile to generate binary, it can generate binary of different architectures according to the environment variable ARCH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is mostly just Go code, shouldn't this just use the standard GOARCH variable instead of having a translation have to happen between ARCH and GOARCH?

If we want to support arm32, we'll also need to think about how we handle setting GOARM appropriately (to 6 for RPi0/1, and to 7 for almost everything else that's meaningful to deploy something like Harbor to).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good advice


2. When executing make compile to generate binary, it can generate binary of different architectures according to the environment variable ARCH

3. Add a file named adapter.sh, the content of the file is empty. Before executing compile and build commands, please call adapter.sh scripts. This shell script file can be overwritten by downstream users, and downstream users can write their own modifications in this file, such as overriding the addresses of third-party dependencies or replacing Dockerfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really strange IMO -- Go has top-notch, first-class support for multiple architectures and cross-compiling, so it really feels like a backwards approach to treat them as secondary and even downstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstream will be responsible for compiling different architecture binaries, while the downstream will not be responsible for this

The downstream will use the way that upstream provided to get the different architecture binary.


#### Change

1. Introduce the harbor repository into the downstream repository through git Submodules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be personal, but I've never had anything but a bad time using Git Submodules, especially at larger scales (which I think is another strong argument against having other architectures support be totally separated from the core product code).

One concrete issue is that they're tied to a specific commit without any real history or notation of intent, so it becomes difficult to determine which branch that commit comes from and why that branch was chosen, for example.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can suggest a better way to achieve the same goals?

I believe that the requirement here is to build downstream repos on upstream have the following changes:

  • on every master commit
  • every release-* commit/tag

using Github Actions we want to be able to compile the core commit and use the downstream project's scripts (Makefile, Dockerfile, etc.)

@tianon any suggestion how can we achieve that using Github Actions will be of great help

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can find, the only way to get the downstreams to build automatically whenever there's a commit on the upstream is to add something in the upstream's GitHub actions with an access token for the downstream and have it use that token to explicitly trigger an event like workflow_dispatch or repository_dispatch on the downstream repository directly. The other option would be something like a scheduled action watching for changes, or maintaining some kind of external infrastructure that listens to webhooks and does the triggering (but in that case it seems much more reliable to just use GitHub Actions on the main repository for that).

No matter how this downstream repository is consuming the upstream code, it's going to be complicated to coordinate -- I imagine each release of Harbor is going to require slightly different downstream code (like the addition of the new metrics service in 2.2, for example), so I'm not sure what the best solution for mapping those and keeping them up to date is going to be. 😬


1. During testing, docker-compose will be used to deploy Harbor and test, but it seems that docker-compose [only provides x86 binary files](https://github.com/docker/compose/releases), and we need to compile other versions by ourselves.

2. Images of different architectures may not be able to run using the runner provided by github action. [Self-hosted runners](https://docs.github.com/en/actions/hosting-your-own-runners) should be used to ensure that images of different architectures can run tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Travis has support for a few alternative architectures, but the product quality/reliability has seen a steady decline since their acquisition.

We might be able to use the binfmt-based user-mode QEMU emulation support for testing, but it has traditionally struggled with supporting Go binaries, so might not be sufficient for real testing.

Alternatively, now that Harbor is part of the CNCF sandbox, we might be able to convince them to sponsor something like an Amazon A1 instance for a self-hosted arm64 runner.


2. When executing make compile to generate binary, it can generate binary of different architectures according to the environment variable ARCH

3. Add a file named adapter.sh, the content of the file is empty. Before executing compile and build commands, please call adapter.sh scripts. This shell script file can be overwritten by downstream users, and downstream users can write their own modifications in this file, such as overriding the addresses of third-party dependencies or replacing Dockerfile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why we need to put this upstream. Could you give a concrete example as to what this script may look like and what it does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file will not perform any operations in the upstream. This is just an empty file. It looks like a placeholder and should be overwritten by downstream repositories.

Copy link

@danielfbm danielfbm Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what @tianon and @reasonerjt mean is that this script should not be called in the upstream at all. Since the downstream repository is responsible for executing the whole build process, it is enough to have the Makefile have different instructions for compiling go binaries, and to build docker images. Then downstream projects can control the flow and decide whether they need or not to call a script or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. I agree with this

2. When executing make compile to generate binary, it can generate binary of different architectures according to the environment variable ARCH

3. Add a file named adapter.sh, the content of the file is empty. Before executing compile and build commands, please call adapter.sh scripts. This shell script file can be overwritten by downstream users, and downstream users can write their own modifications in this file, such as overriding the addresses of third-party dependencies or replacing Dockerfile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any considerations, as to what upstream should be careful with, in order not to break the down stream workflows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing that upstream to care about because it is an empty file and should be overwritten by the downstream repository. Downstream will care about the content of this file.

In the upstream repository just need call this shell file before executing the compilation and build tasks in the makefile to ensure that this shell can be executed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the focus of the question here is not regarding the script, but regarding any general changes in the core project, for example:

  • Adding or removing components
  • Changing base images
  • Changing Dockerfile in general
  • Other environmental dependency changes
    ?


2. Upstream changed the build dependency package.

3. There are other situations...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other few things to consider:

  • Changes in architecture: Removing or adding new components?
  • Any compatibility braking changes

In order to provider a **short-term** and **long-term** process to introduce and iterate a new architecture into Harbor ecosystem a maturity model is proposed below

### Overview

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a table describing an overview of the maturity model? For example:

Level Description Goals Challenges
Dev Introduce a new architecture into the ecosystem as independent repo and images - Build the upstrea's master branch code
- Introduce development practices
- Build platform and code
- Upstream changes trigger downstream build
Test Uses Harbor's standard testing procedures to test the new architectures version - Validate the new architecture's artifacts making sure it conforms to Harbor's standard test suite
- Ensure upstream master code have at least a daily build and test
- Infrastructure to deploy and run tests
Alpha Release Onboard architecture first release together with Harbor's release - Make sure release candidates and other release builds on upstream can execute successfully on downstream
- Run release tests in the new architecture
- Sync with Harbor release timing
Integration Integrate new architecture within Upstream code. PS: only available for Mainstream architectures - Enable upstream to integrate build methods
- Enable master build and PR build checks
- Add build process on upstream without affecting contributors and maintainers experience
- Active maintainance on upstream
Beta Release First architecture release after integration - Enable official upstream release with new architecture
- Offer official multi-architecture tag for users
- Make sure release time and processes will not be affected by new architecture
GA release After the first beta release any other releases will be considered GA for the new architecture - Offline installer integration - Offline code changes
- Offiline installer build process and architecture


Long-term: Harbor-arm keeps the synchronization update with goharbor/harbor. When goharbor/harbor updates a version, harbor-arm follows the update and then pushes the mirror image to the mirror warehouse to provide users with multi-architecture support.

#### Jobs For Upstream

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream changes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


3. Add a Makefile (which can be called when performing github actions) to achieve, such as overwriting adapter.sh or other functions.

4. Add Dockerfile if necessary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jobs for downstream

This whole section (besides Risk) can be moved and merged into the Dev section?

Copy link

@Jeremy-boo Jeremy-boo Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will modify it


- 2. Can harbor officials provide official test solutions and support different architecture modes?

- 3. arm test machine problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test possible blocking items:

    1. Harbor standard conformance testing in the new architecture
    1. Infrastructure for the new architecture: (add CNCF code repo link)
    1. Executing deployment and testing automatically either daily or with each upstream master commit


- 4. Can it be merged into the master branch in version 2.3? About goharbor/harbor increase the GOARCH parameter to support multi-architecture mirror construction pr.

- 5. Harbor upgrade pg incompatibility problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev possible blocking Item:

    1. Confirm Docker image repository address and credentials inside Goharbor's DockerHub org
    1. Harbor components compatibility issues. Postgres database compatbiility when changing version making an upgrade rollout impossible
    1. Possible necessary upstream Makefile changes
    1. Build process for new architecture

@CooperLi
Copy link

any update for this proposal?
@yuzp1996 @tianon @reasonerjt @danielfbm @Jeremy-boo

Copy link
Contributor

@wy65701436 wy65701436 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants