Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Dockerfile for cross compilation (test Jenkinsfile changes) #43613

Closed
wants to merge 26 commits into from

Conversation

thaJeztah
Copy link
Member

Just a rebase of #43529, but Jenkinsfile changes are not taken into account for pull requests if the contributor doesn't have write access), so opening this to put everything to the test 馃槄

@crazy-max
Copy link
Member

[2022-05-18T13:57:36.674Z] === Failed
[2022-05-18T13:57:36.674Z] === FAIL: amd64.integration-cli TestDockerDaemonSuite/TestHTTPSInfoRogueServerCert (0.55s)
[2022-05-18T13:57:36.674Z]     docker_cli_daemon_test.go:1407: Expected err: x509: certificate signed by unknown authority, got instead: exit status 1 and output: error during connect: Get "https://localhost:4272/v1.30/info": x509: certificate relies on legacy Common Name field, use SANs instead
[2022-05-18T13:57:36.674Z]     --- FAIL: TestDockerDaemonSuite/TestHTTPSInfoRogueServerCert (0.55s)
[2022-05-18T13:57:36.674Z] 
[2022-05-18T13:57:36.674Z] === FAIL: amd64.integration-cli TestDockerDaemonSuite (335.42s)

Looks like I need to fix the rogue certs for Jenkins like I've done on this branch for GHA: crazy-max@11a16d5

@crazy-max
Copy link
Member

[2022-05-18T13:57:36.674Z] === Failed
[2022-05-18T13:57:36.674Z] === FAIL: amd64.integration-cli TestDockerDaemonSuite/TestHTTPSInfoRogueServerCert (0.55s)
[2022-05-18T13:57:36.674Z]     docker_cli_daemon_test.go:1407: Expected err: x509: certificate signed by unknown authority, got instead: exit status 1 and output: error during connect: Get "https://localhost:4272/v1.30/info": x509: certificate relies on legacy Common Name field, use SANs instead
[2022-05-18T13:57:36.674Z]     --- FAIL: TestDockerDaemonSuite/TestHTTPSInfoRogueServerCert (0.55s)
[2022-05-18T13:57:36.674Z] 
[2022-05-18T13:57:36.674Z] === FAIL: amd64.integration-cli TestDockerDaemonSuite (335.42s)

Looks like I need to fix the rogue certs for Jenkins like I've done on this branch for GHA: crazy-max@11a16d5

Of found the root cause of this issue. this is linked to the new Dockerfile that now builds the docker cli for the integration tests:

moby/Dockerfile

Lines 453 to 460 in 1bf5dbb

FROM dockercli-base AS dockercli
RUN --mount=from=dockercli-src,src=/usr/src/dockercli/components/cli,rw \
--mount=type=cache,target=/root/.cache \
--mount=type=cache,target=/go/pkg/mod <<EOT
set -e
go build -o /out/docker -v ./cmd/docker
xx-verify /out/docker
EOT

previously the Dockerfile was just downloading the docker cli from download.docker.com and this one was using an old Go version that didn't have this requirement for an alt name yet. I think it's fair to update the rogue certs anyway as in the future we are going to have the exact same issue anyway.

@crazy-max
Copy link
Member

crazy-max commented May 19, 2022

@thaJeztah Made some changes in #43529 so now docker cli is built against the right go version if you want to rebase to check that tests are ok now. let me know if you still want a way in the dockerfile to choose between building vs downloading.

@crazy-max
Copy link
Member

@thaJeztah Made some changes in #43529 so now docker cli is built against the right go version if you want to rebase to check that tests are ok now. let me know if you still want a way in the dockerfile to choose between building vs downloading.

------
[2022-05-19T09:52:04.586Z]  > [internal] load metadata for docker.io/library/golang:1.8.3:
[2022-05-19T09:52:04.586Z] ------
[2022-05-19T09:52:04.586Z] failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = no match for platform in manifest sha256:32c769bf92205580d6579d5b93c3c705f787f6c648105f00bb88a35024c7f8e4: not found

Argh no arm64 arch for this golang image, I guess we need to go back to the previous logic with dow,loading from download.docker.com.

@thaJeztah
Copy link
Member Author

Some failures - not sure if all related, but this one I hadn't seen before (or at least, not that I recall 馃) - it's networking, so could still be just flaky 馃槀

=== RUN   TestQueryEndpointInfo
time="2022-05-19T18:09:59Z" level=warning msg="bridge store not initialized. kv object docker/network/v1.0/bridge/net1/ is not added to the store"
time="2022-05-19T18:09:59Z" level=warning msg="bridge store not initialized. kv object docker/network/v1.0/bridge-endpoint/ep1/ is not added to the store"
time="2022-05-19T18:09:59Z" level=warning msg="Failed to allocate and map port 23000-23000: Error starting userland proxy: listen tcp [::]:23000: bind: address already in use"
    bridge_test.go:665: Failed to program external connectivity: Error starting userland proxy: listen tcp [::]:23000: bind: address already in use
--- FAIL: TestQueryEndpointInfo (0.07s)

@crazy-max
Copy link
Member

Some failures - not sure if all related, but this one I hadn't seen before (or at least, not that I recall 馃) - it's networking, so could still be just flaky 馃槀

Should be fixed now if you can rebase, thx!

@crazy-max
Copy link
Member

crazy-max commented May 24, 2022

@thaJeztah Looks good on Jenkins, minus unrelated error while cleaning the workspace on windows nodes:

image

And known flaky test on GHA workflow for Windows #38521:

=== Failed
=== FAIL: github.com/docker/docker/integration-cli TestDockerSuite/TestStartReturnCorrectExitCode (5.92s)
    docker_cli_start_test.go:191: assertion failed: 
        Command:  D:\a\moby\moby\out\docker.exe start -a withRestart
        ExitCode: 0
        Stdout:   
        Stderr:   
        
        Failures:
        ExitCode was 0 expected 11
    --- FAIL: TestDockerSuite/TestStartReturnCorrectExitCode (5.92s)

@crazy-max
Copy link
Member

Argh another racy one linked to #11966: https://github.com/moby/moby/runs/6591181007?check_suite_focus=true#step:15:4943

=== Failed
=== FAIL: github.com/docker/docker/integration-cli TestDockerSuite/TestRunContainerWithRmFlagCannotStartContainer (2.18s)
    docker_cli_run_test.go:2770: Expected not to have containers d0c6fef36fc5

And looks like Windows node on Jenkins has timed out: https://ci-next.docker.com/public/blue/organizations/jenkins/moby/detail/PR-43613/4/pipeline#step-284-log-1841

@crazy-max
Copy link
Member

@thaJeztah TestNetworkDBIslands known as flaky #42459 failed: https://ci-next.docker.com/public/blue/organizations/jenkins/moby/detail/PR-43613/7/pipeline#step-327-log-110

=== FAIL: libnetwork/networkdb TestNetworkDBIslands (123.95s)

Can you restart the Jenkins pipeline?

@thaJeztah
Copy link
Member Author

Restarted!

@crazy-max
Copy link
Member

crazy-max commented Jun 1, 2022

Looks like this one is still flaky #11966:

=== Failed
=== FAIL: github.com/docker/docker/integration-cli TestDockerSuite/TestRunContainerWithRmFlagCannotStartContainer (1.79s)
    docker_cli_run_test.go:2770: Expected not to have containers 0f26800e99b0
        
    --- FAIL: TestDockerSuite/TestRunContainerWithRmFlagCannotStartContainer (1.79s)

https://github.com/moby/moby/runs/6669613995?check_suite_focus=true#step:15:4943

Will wait for #43672 to be merged and keep you in touch when #43529 is updated.

@crazy-max
Copy link
Member

@thaJeztah Ok #43529 rebased with latest changes from master.

@thaJeztah
Copy link
Member Author

done 馃憤

@crazy-max
Copy link
Member

Same one bites the dust: https://github.com/moby/moby/runs/6689351547?check_suite_focus=true#step:15:4948

=== FAIL: github.com/docker/docker/integration-cli TestDockerSuite/TestRunContainerWithRmFlagCannotStartContainer (1.66s)
    docker_cli_run_test.go:2770: Expected not to have containers 6a5c139fc3dc
        
    --- FAIL: TestDockerSuite/TestRunContainerWithRmFlagCannotStartContainer (1.66s)

@AkihiroSuda
Copy link
Member

Needs rebase

@crazy-max
Copy link
Member

crazy-max commented Aug 5, 2022

This one is interesting: https://ci-next.docker.com/public/blue/organizations/jenkins/moby/detail/PR-43613/11/pipeline/252#step-253-log-76

[2022-08-04T23:02:23.570Z] docker/errors.py:31: in create_api_error_from_http_exception
[2022-08-04T23:02:23.570Z]     raise cls(e, response=response, explanation=explanation)
[2022-08-04T23:02:23.570Z] E   docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.43/images/create?tag=latest&fromImage=hello-world: Internal Server Error ("Head "https://registry-1.docker.io/v2/library/hello-world/manifests/latest": Get "https://auth.docker.io/token?scope=repository%3Alibrary%2Fhello-world%3Apull&service=registry.docker.io": EOF")

@thaJeztah Maybe linked to what you said at maintainer meeting about the arm32v7/hello-world image.

I think I will bring some bits of crazy-max@7751ec5 in the PR just to check if this is a regression.

@thaJeztah
Copy link
Member Author

I kicked Jenkins again

@crazy-max
Copy link
Member

image :)

frozen-images stage doesn't use the download-frozen-image-v2.sh
anymore so we can effectively use TARGETPLATFORM from global scope.
The test util has been updated accordingly.

In a follow-up we can remove download-frozen-image-v2.sh script but
needs to look first at Dockerfile.e2e which seems not used anymore
in our pipeline.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
To add support for riscv64 builds we need crossbuild packages for
riscv64 but current golang image with debian bullseye does not
support it. Ubuntu 22.04 supports riscv64 but unfortunately drops
support for armel arch. Therefore we need a multi base image that
will be picked up based on the target platform we want to build.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
vpnkit stage only supports linux/amd64 and linux/arm64
platforms when building dev image and will crash if we
try building against another platform.

with this change we can still build the dev image
against any platform using dummy scratch base.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
pin criu for better reproducibility and build from
source so we can use it across any platform.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
dummy stage allows to bypass build for deps that
don't support some platforms

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
containerd build in Dockerfile is limited to
host platform and could not be cross-built
for other platforms.

this change allows to build against any platforms
if we want to smoke test in a follow-up but also
enhance e2e tests for linux and windows in our
pipeline.

also introduced DOCKER_LINKMODE to be able to
build dynamic or static binaries.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
runc build in Dockerfile is limited to host
platform and could not be cross-built for
other platforms.

this change allows to build against any platforms
if we want to smoke test in a follow-up but also
enhance e2e tests for linux and windows in our
pipeline.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
tini build in Dockerfile is limited to host
platform and could not be cross-built for
other platforms.

this change allows to build against any platforms
if we want to smoke test in a follow-up but also
enhance e2e tests for linux and windows in our
pipeline.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
rootlesskit build in Dockerfile is limited to host
platform and could not be cross-built for
other platforms.

this change allows to build against any platforms
if we want to smoke test in a follow-up but also
enhance e2e tests for linux and windows in our
pipeline.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
containerutility build in Dockerfile is limited to
windows platform atm but enabling cross build for
it enhance and reduce footprint in our piepline
for linux and windows e2e tests.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Better support for cross compilation so we can rely
on --platform flag of buildx for a seamless integration.

This removes not necessary extra cross logic in the Dockerfile
as well as hack scripts.

Tried my best to reduce the footprint of changes but modifying
one bit in the Dockerfile involves other changes in ./hack
scripts. Non-sandboxed build invocation is still supported.

It also handles cross compilation for external tools dynamically
based on platform arg available in global scope (containerd,
runc, tini, ...).

Dev stages have been updated accordingly to changes for cross
comp as well as linked tools (swagger, tomll, gotestsum, ...)

The current bake definition has been updated to take the changes
into account as well as the ci gha workflow.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
needs to update supported platforms for pie buildmode
and adds smoke test

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants