Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppc64le helm image build #1533

Merged
merged 1 commit into from
Sep 19, 2019
Merged

Conversation

clnperez
Copy link
Contributor

@clnperez clnperez commented Jun 7, 2019

build for ppc64le:

  • the sdk cli
  • the helm operator image

and create a "multi-arch" image for helm

Signed-off-by: Christy Norman christy@linux.vnet.ibm.com

Description of the change:

Motivation for the change:

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 7, 2019
@openshift-ci-robot
Copy link

Hi @clnperez. Thanks for your PR.

I'm waiting for a operator-framework or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 7, 2019
@clnperez
Copy link
Contributor Author

clnperez commented Jun 7, 2019

I tested this as much as possible using our internal Travis, but, we'll see what happens when it runs here.
🤞

@clnperez
Copy link
Contributor Author

clnperez commented Jun 7, 2019

Oh, I've also tested out using the cli and the helm image to create and deploy an operator on Power (ppc64le) using https://github.com/IBM/charts/tree/master/stable/ibm-mqadvanced-server-dev.

@joelanford
Copy link
Member

Cool!

A couple of quick comments.

  1. Unless I'm forgetting something, I don't think we actually have to build ppc64le images on ppc64le machines, since go can cross-compile ppc64le from linux
  2. It doesn't look like there are any changes for the base image in the Dockerfile. Do we have to do something special to pick up the ppc64le version of the registry.access.redhat.com/ubi7/ubi-minimal:latest base image? Maybe that something special is building on ppc64le? 🙂
  3. If we're going to do this for helm, I think we should also do it for the scorecard and ansible base images as well.

@AlexNPavel Any consequences this would have on the transition to prow?

@joelanford joelanford self-assigned this Jun 7, 2019
@AlexNPavel
Copy link
Contributor

@joelanford If we use go cross-compilation, we should still be fine migrating to prow since we're not testing the ppc64 images, only building them.

@AlexNPavel
Copy link
Contributor

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 7, 2019
@joelanford
Copy link
Member

@AlexNPavel What about the need for ppc64le base images? I see that those exist for ubi7:ubi-minimal, but not sure where/how to inject that. And we may need ppc64le build VMs, if it turns out we can't build the ppc64le images on linux VMs. Do you know if that possibility exists in prow?

@clnperez
Copy link
Contributor Author

clnperez commented Jun 7, 2019

The UBI images are multi-arch (actually manifest lists) already.

We could cross-compile instead of using the power nodes built into Travis, but then we'd need to know the hash of the ppc64le UBI image (unless they're pushed with an arch-name...which would be nice for this case, and I'm not sure whether they are or who to ask).

@clnperez
Copy link
Contributor Author

clnperez commented Jun 7, 2019

To answer @joelanford 's other questions:

Unless I'm forgetting something, I don't think we actually have to build ppc64le images on ppc64le machines, since go can cross-compile ppc64le from linux

You can cross compile, but some of this, IIRC, does some things in docker (like setting permission bits of the cli binary, etc.) that will blow up if you try to pull down a power image and then do anything other than just COPY into it.

It doesn't look like there are any changes for the base image in the Dockerfile. Do we have to do something special to pick up the ppc64le version of the registry.access.redhat.com/ubi7/ubi-minimal:latest base image? Maybe that something special is building on ppc64le? slightly_smiling_face

That's the manifest list part that thankfully the redhat registry supports and the creators of that image went ahead and configured.

If we're going to do this for helm, I think we should also do it for the scorecard and ansible base images as well.

I had that thought as well, but, I was thinking one at a time and also to see how this one goes. :)

And if you want a ppc64le build node for prow, we can get you a free one hosted at OSU since this is an open-source project.

@AlexNPavel
Copy link
Contributor

@joelanford That's good point. We might need to stick to travis for the ppc64le image builds. That shouldn't be too much of a problem though. The main thing we will be using prow for is testing, and while we will be doing image builds there, I don't think there's a problem doing ppc64le builds on travis.

@clnperez
Copy link
Contributor Author

Has any more thought been put into whether merging this as-is is okay?

Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the typical build/push workflow for multi-arch images? Is it to build and push all of the images then do a single manifest push? Is it possible to incrementally build the manifest as the individual images are being pushed?

.travis.yml Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
hack/image/build-helm-image.sh Outdated Show resolved Hide resolved
Makefile Outdated
./hack/image/build-helm-image.sh $(HELM_BASE_IMAGE):dev
image/build/helm: build/operator-sdk-dev-${BIN_ARCH}-linux-gnu
./hack/image/build-helm-image.sh ${GOARCH} ${BIN_ARCH} $(HELM_BASE_IMAGE):dev
./hack/image/build-helm-image.sh ${GOARCH} ${BIN_ARCH} $(HELM_ARCH_IMAGE):dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in build-helm-image.sh. I could be forgetting something, but I don't think we need to build the image twice. Can we leave these lines unchanged and rely on the fact that our build host handles the architecture specific things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was me being lazy and not wanting to redo the script. The image doesn't build twice b/c of docker's caching. It just gets you both names. the arch image is needed for pushing, but the base one is needed for testing locally. I can find a non-lazy solution to that if you'd prefer.

Copy link
Contributor Author

@clnperez clnperez Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I could just put in a docker tag there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already do docker tag in the push script to change the image tag from :dev to whatever tags we want to push. I'm wondering if we can update the push script to also include the architecture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i tackled that. will see what happens in travis land

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made a dryrun test function and this still needs more work so hold off on reviewing for the time being

Makefile Outdated Show resolved Hide resolved
hack/ci/get-dep.sh Outdated Show resolved Hide resolved
@clnperez
Copy link
Contributor Author

clnperez commented Jun 12, 2019

Is it to build and push all of the images then do a single manifest push?

That's how I set this up.

Is it possible to incrementally build the manifest as the individual images are being pushed?

Nope. The manifest list references image layers in a registry so you have to do the build and push of all the images, and then do the manifest create (to get those layers) and push (to put the list on the registry too).

@clnperez clnperez force-pushed the ppc64le branch 3 times, most recently from f598f32 to 27fe8fb Compare June 17, 2019 21:18
@joelanford
Copy link
Member

@clnperez Sorry, we've been pretty busy over the past week. Last time I tried, I was still having issues with the quay manifest list support, and I didn't have time to dig into it further. I should have more time this week to pick this back up.

@joelanford
Copy link
Member

@clnperez I was able to manually run the make targets locally and get things pushed to quay and then confirm that the manifest and manifests lists made it there correctly with curl. So I think the quay side of the equation is fixed.

However, when trying to push the manifest list from travis, I'm still getting a "No such manifest" error. I tried adding a docker pull command to fetch the manifests locally first, but that doesn't seem to help.

Have you had success with travis pushing the manifest list to quay? One difference I see between travis and my local environment is that I'm on docker 19.03.1-ce and travis is on 18.06.0-ce.

@clnperez
Copy link
Contributor Author

clnperez commented Sep 9, 2019

@joelanford it looks like your Travis local docker doesn't have the quay certs:

https://travis-ci.org/joelanford/operator-sdk/jobs/581841450#L365 --
open /etc/docker/certs.d/quay.io: permission denied

Here's where I fixed that issue (only seen, IIRC, with rhel/centos distros):
docker/cli#1378

You can try adding the --insecure flag, or you can import the certs for the registry, or you can chmod the dir (I think. It's been a minute... :D )

BTW, you don't need to pull the images down. The manifest info is always pulled from the registry.

Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip about that permissions error! I had been ignoring it since all of the other push/pull commands seemed to be working. Looks like we need to chmod /etc/docker.

.travis.yml Show resolved Hide resolved
@clnperez
Copy link
Contributor Author

@joelanford it looks like the failing travis job was cancelled but i can't see why or by whom. any ideas? can we kick that off again?

@joelanford
Copy link
Member

joelanford commented Sep 12, 2019

Not sure why it was cancelled. Just restarted it. I've also got a separate build on my fork that enables the deploy tasks, so I'm babysitting that one too :)

https://travis-ci.org/joelanford/operator-sdk/builds/583868166

@clnperez
Copy link
Contributor Author

thanks @joelanford. looks like all the tests here passed 🎉

Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I haven't done this review sooner. I was consumed with getting the travis manifest list push working.

I think after this round we'll be very close, if not ready for merge.

/cc @estroz @hasbro17 @jmrodri
PTAL when you get a chance.

push_image=$1; shift || fatal "${FUNCNAME} usage error"
arches=$@

image_name=$push_image # @TODO bug workaround
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this line that sets image_name if we add the line in docker_login that I mentioned in this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yah that's better. i misread your comment then. thanks. fixed.

docker_login
check_can_push || return 0

image_name=$push_image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with this line.

image_name=$push_image
docker_login $image_name

check_can_push
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we no longer return 0 if check_can_push fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. fixed.


check_can_push

docker tag $source_image $push_image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this docker tag necessary? I think we're doing this in line 31, which also includes the correct image tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i played around with it, trying to remember how it had gotten there. pretty sure it's not, so im taking it out

image_name=$push_image # @TODO bug workaround
docker_login $push_image

check_can_push || exit 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return like we do in other places, or do we need to exit here instead?

Suggested change
check_can_push || exit 0
check_can_push || return 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, changed that to return

.travis.yml Outdated
@@ -68,6 +69,24 @@ x_base_steps:
services:
- docker

# Manifest list deploy job
- &manifest-deploy
stage: deploy-manifest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (and I'm not positive this works): Can this be the following:

Suggested change
stage: deploy-manifest
stage: "Deploy multi-arch manifest lists"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's fine. b/c you can create a manifest list with x86_64 windows and x86_64 linux images :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, did you mean you're not sure if it works functionally b/c there are spaces @joelanford? that i don't know!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does indeed work just fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Great! Let's make this change then.

@jmrodri
Copy link
Member

jmrodri commented Sep 13, 2019

Verified it created a ppc binary

$ file operator-sdk-v0.10.0-42-g03b2b4b1-ppc64le-linux-gnu 
operator-sdk-v0.10.0-42-g03b2b4b1-ppc64le-linux-gnu: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, Go BuildID=01P6JY3bk0SyQ1MRiRFE/q93uPCNdWuq-wKeS2DJW/_Qq-PbxaUGmR5cY25unK/IYO0QrWb3GLWi58OAaDH, not stripped

@jmrodri
Copy link
Member

jmrodri commented Sep 13, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 13, 2019
Copy link
Member

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@jmrodri
Copy link
Member

jmrodri commented Sep 15, 2019

localhost:~$ arch
ppc64le
localhost:~$ uname -a
Linux localhost 4.19.67-0-vanilla #1-Alpine SMP Mon Aug 19 09:42:34 UTC 2019 ppc64le Linux
localhost:~$ file /tmp/operator-sdk-v0.10.0-17-ge3c3974a-ppc64le-linux-gnu 
-ash: file: not found
localhost:~$ /tmp/operator-sdk-v0.10.0-17-ge3c3974a-ppc64le-linux-gnu --help
An SDK for building operators with ease

Usage:
  operator-sdk [command]

Available Commands:
  add         Adds a controller or resource to the project
  alpha       Run an alpha subcommand
  build       Compiles code and builds artifacts
  completion  Generators for shell completions
  generate    Invokes specific generator
  help        Help about any command
  migrate     Adds source code to an operator
  new         Creates a new operator application
  olm-catalog Invokes a olm-catalog command
  print-deps  Print Golang packages and versions required to run the operator
  run         Runs a generic operator
  scorecard   Run scorecard tests
  test        Tests the operator
  up          Launches the operator
  version     Prints the version of operator-sdk

Flags:
  -h, --help      help for operator-sdk
      --verbose   Enable verbose logging

Use "operator-sdk [command] --help" for more information about a command.

    the sdk cli
    the helm operator image

and create a "multi-arch" image for helm

Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com>
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 16, 2019
@clnperez
Copy link
Contributor Author

@joelanford all those changes made and pushed after a little more verification this morning

Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks SO MUCH for the patience @clnperez

/cc @estroz or @hasbro17 can I get one more review of this one before we merge?

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 17, 2019
@clnperez
Copy link
Contributor Author

thank you for all your reviews and attention to detail @joelanford!


release: clean $(release_x86_64) $(release_x86_64:=.asc)
release: clean $(release_builds) $(release_builds:=.asc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're supporting ppc64le release binaries now then we should update the release doc to mention that.
https://github.com/operator-framework/operator-sdk/blob/master/doc/dev/release.md#operating-systems-and-architectures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for a followon pr so it dosn't invalidate this lgtm again?

Copy link
Member

@joelanford joelanford Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and fix it here. I can babysit the builds if necessary :)

EDIT: on second thought, wordsmithing that change might involve some back and forth, so maybe it is best left for a follow-up.

@hasbro17 Are you okay with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep we can do a follow up. Just a nit.

Copy link
Contributor

@hasbro17 hasbro17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 nit but LGTM overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants