Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Windows CI builds failing to find docker (Update to Run Windows inside docker containers) #281

Closed
mch2 opened this issue May 9, 2023 · 39 comments

Comments

@mch2
Copy link
Member

mch2 commented May 9, 2023

Describe the bug

Windows CI builds are failing, example: https://build.ci.opensearch.org/job/gradle-check/14914/console

+ docker logout
C:/Users/Administrator/jenkins/workspace/gradle-check@tmp/durable-392d4e2e/script.sh: line 1: docker: command not found
[Pipeline] }
[Pipeline] // script
Error when executing always post condition:
hudson.AbortException: script returned exit code 127
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:664)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:610)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:554)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

To reproduce

N/A

Expected behavior

Builds should pass and docker tests should run.

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

No response

Additional context

No response

Relevant log output

No response

@mch2 mch2 added bug Something isn't working untriaged Issues that have not yet been triaged labels May 9, 2023
@jordarlu jordarlu removed the untriaged Issues that have not yet been triaged label May 12, 2023
@jordarlu
Copy link
Contributor

Hi, @mch2 , could you let me know how do yor trigger the gradle_check in this case? if you had a PR that triggered it, can you send me the PR link?
thanks,

CC @peterzhuamazon

@peterzhuamazon
Copy link
Member

I will take care of this as I have talked to @mch2 offline.
Thanks.

@peterzhuamazon peterzhuamazon self-assigned this May 12, 2023
@peterzhuamazon peterzhuamazon added enhancement New feature or request windows packer and removed bug Something isn't working labels May 12, 2023
@peterzhuamazon
Copy link
Member

peterzhuamazon commented May 12, 2023

Able to get docker running on Windows with hyperv.

Administrator@<> MINGW64 ~
$ docker version
Client:
 Version:           23.0.6
 API version:       1.42
 Go version:        go1.19.9
 Git commit:        ef23cbc
 Built:             Fri May  5 21:18:35 2023
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.6
  API version:      1.42 (minimum version 1.24)
  Go version:       go1.19.9
  Git commit:       9dbdbd4
  Built:            Fri May  5 21:17:32 2023
  OS/Arch:          windows/amd64
  Experimental:     false



Administrator@<> MINGW64 ~
$  docker pull mcr.microsoft.com/windows/nanoserver:ltsc2019
ltsc2019: Pulling from windows/nanoserver
aaaa081173ae: Pulling fs layer
aaaa081173ae: Verifying Checksum
aaaa081173ae: Download complete
aaaa081173ae: Pull complete
Digest: sha256:fb78bd84ac937f6b1453e19015ccce41636bbeca5fe5bc6dc5c7d55adb4a2bc5
Status: Downloaded newer image for mcr.microsoft.com/windows/nanoserver:ltsc2019
mcr.microsoft.com/windows/nanoserver:ltsc2019

Needs @mch2 to confirm what are the exact images that windows docker is running with.

On windows, if you use hyperv then windows host can only run windows container.
If we need windows host to run linux container, we need to enable wsl2 later on and might have issues.

Please let me know about this.
Thanks.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented May 12, 2023

Also, this can be a good start into these two issues to bring windows integTest with docker host and containers, even building the artifacts on windows docker containers.

Here is a chart showcasing the comparison between different offers of containers on Windows:

Here's a chart comparing some of the key differences between Windows Server with Server Core installation and Windows Nano Server:

Feature Windows Server with Server Core Windows Nano Server
Installation size Larger (several GBs) Smaller (a few hundred MBs)
Attack surface Larger Smaller
Support for GUI Yes (minimal) No
Support for 32-bit applications Yes No
Support for Windows Services Yes Limited
Support for .NET Framework Yes Limited
Support for Containers Yes Yes
Licensing Standard, Datacenter Standard, Datacenter
Available editions All Windows Server editions Standard and Datacenter only

Will try to see if we can bring nanoserver in place to make Windows light wight in build, test, and check.

Thanks.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 26, 2023

I eventually get the docker container running the nanoserver on Windows:


PS C:\Users\Administrator> docker images
REPOSITORY                             TAG        IMAGE ID       CREATED       SIZE
mcr.microsoft.com/windows/nanoserver   ltsc2019   82ef3885248c   2 weeks ago   252MB

PS C:\Users\Administrator> docker run 82ef3885248c
Microsoft Windows [Version 10.0.17763.4645]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\>

PS C:\Users\Administrator> docker ps -a
CONTAINER ID   IMAGE          COMMAND                    CREATED              STATUS                          PORTS     NAMES
4aced3bb72dd   82ef3885248c   "c:\\windows\\system32…"   About a minute ago   Exited (0) About a minute ago             blissful_liskov

PS C:\Users\Administrator> docker rm 4aced3bb72dd
4aced3bb72dd

PRs:

  • Updating

@peterzhuamazon
Copy link
Member

We will be better of with the servercore option rather than the nanoserver, as the latter lack of several core components, while the servercore is just a headless version of the normal server base of Windows.

https://techcommunity.microsoft.com/t5/containers/nano-server-x-server-core-x-server-which-base-image-is-the-right/ba-p/2835785

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 26, 2023

Issues in the windows docker that is currently not able to solve to make it the same as AMI:
Move-Item : Access to the path is denied.

moby/moby#38256
microsoft/Windows-Containers#147

@peterzhuamazon
Copy link
Member

Just able to confirm that I am using --isolation=process not --isolation=hyperv.

@peterzhuamazon
Copy link
Member

Able to resolve the move issue by just using mingw and force the mv happens by bash.exe.

bash.exe -c "mv -v 'C:\\Windows\\System32\\find.exe' 'C:\\Windows\\System32\\find_windows.exe'"

renamed 'C:\Windows\System32\find.exe' -> 'C:\Windows\System32\find_windows.exe'

@peterzhuamazon
Copy link
Member

Seems like issue with volta on 1.1.1: volta-cli/volta#1435

Will revert to either the older 1.0.8 or 1.1.0 now.

Thanks.

@peterzhuamazon
Copy link
Member

Able to invoke bash.exe directly in the windows container and able to run test workflow:

ContainerAdministrator@44082dfc4844 MINGW64 /c
$ whoami
ContainerAdministrator


@peterzhuamazon peterzhuamazon changed the title [Bug]: Windows CI builds failing to find docker [Bug]: Windows CI builds failing to find docker (Update to Run Windows inside docker containers) Jul 28, 2023
@peterzhuamazon
Copy link
Member

New issues:

windows [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Aug 2, 2023

Tried many methods including install pip-system-certs, scoop install cacerts, install certifi, manually push mozilla ca certs to the certifi certs, export REQUESTS_CA_BUNDLE, etc.

Right now the only method that seems working is using curl to pull the zip once such as curl https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.9.0/8184/windows/x64/zip/dist/opensearch/opensearch-2.9.0-windows-x64.zip -o test.sh so the cloudfront public cert is being added once to the certifi certs or system ca cert bundle, then the python requests package within the windows docker container will able to do ssl verification correctly.

Very weird and probably I missed something here. Thanks.

@peterzhuamazon
Copy link
Member

New way is supported to run correctly but still not, just curl ci.opensearch.org for now as it is stable:



ContainerAdministrator@0062c7841faa MINGW64 ~/opensearch-build-peterzhuamazon (windows-docker-setups-2)
$ openssl s_client -connect ci.opensearch.org:443 </dev/null | openssl x509 -outform PEM > certificate2.crt
depth=2 C = US, O = Amazon, CN = Amazon Root CA 1
verify return:1
depth=1 C = US, O = Amazon, CN = Amazon RSA 2048 M01
verify return:1
depth=0 CN = ci.opensearch.org
verify return:1
DONE

ContainerAdministrator@0062c7841faa MINGW64 ~/opensearch-build-peterzhuamazon (windows-docker-setups-2)
$ vi certificate2.crt

ContainerAdministrator@0062c7841faa MINGW64 ~/opensearch-build-peterzhuamazon (windows-docker-setups-2)
$ certutil -addstore CA certificate2.crt
CA "Intermediate Certification Authorities"
Certificate "ci.opensearch.org" added to store.
CertUtil: -addstore command completed successfully.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Aug 23, 2023

So pigz seems only runs if you put in root of C: with dir like C:\pigz and put into machine env vars.
It seems like pigz only saves time when the extraction is happening, but the most time wasted is after the extraction.
It does not seems that pigz will help improve time that much:

Without pigz:

$ time docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1

real    12m39.993s
user    0m0.000s
sys     0m0.015s

With pigz:


$ time docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1

real    12m29.576s
user    0m0.015s
sys     0m0.000s

Saved 10 seconds.

@peterzhuamazon
Copy link
Member

Some more test:

Without pigz

$ time docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1

real    5m34.866s
user    0m0.000s
sys     0m0.015s

With pigz:


$ time  docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1

real    4m45.069s
user    0m0.000s
sys     0m0.016s

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Aug 29, 2023

git clone now on the windows host is instant on build repo.

@peterzhuamazon
Copy link
Member

There is a bug right now that every time when we pull the image from fresh it will always fail once on the sh stage.
I suspect we need to pre-load the image on the runner beforehand.
It will goes to success soon after in the second rerun:

ERROR: script returned exit code 127

@peterzhuamazon
Copy link
Member

Add a docker image initialization step on Windows Docker Host to resolve above issues.

@peterzhuamazon
Copy link
Member

Add new integTest support with Windows container now.

@peterzhuamazon
Copy link
Member

Per opensearch-project/opensearch-build#3816 we have fixed the docker commands issues on Windows, but it only supports hyperv running windows on windows through docker.

Per discussion with @mch2 the core team needs to disable the linux container related test on Windows.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants