Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic timeouts in oci_container.py #1692

Open
tvalentyn opened this issue Dec 7, 2023 · 2 comments
Open

Sporadic timeouts in oci_container.py #1692

tvalentyn opened this issue Dec 7, 2023 · 2 comments

Comments

@tvalentyn
Copy link

Description

We observe in apache/beam#28703 that cibuildwheel in our github action runs sometimes fails with a time out after the wheel is built. The 30 second timeout seems to be hardcoded in

self.process.wait(timeout=30)
. we plan to work around by adding retries on our end, but wondering if cibuildwheel should increase the timeout or add some retry logic that would benefit all users.

Error:

Building cp311-manylinux_x86_64 wheel
CPython 3.11 manylinux x86_64

Setting up build environment...
  
      + /opt/python/cp38-cp38/bin/python -c 'import sys, json, os; json.dump(os.environ.copy(), sys.stdout)'
      + which python
      + which pip
                                                              ??? 0.16s
Building wheel...
  
      + rm -rf /tmp/cibuildwheel/built_wheel
      + mkdir -p /tmp/cibuildwheel/built_wheel
      + python -m pip wheel /project --wheel-dir=/tmp/cibuildwheel/built_wheel --no-deps
  
  Notice:  A new release of pip available: 22.2.2 -> 23.3.1
  Notice:  To update, run: pip install --upgrade pip
  Processing /project
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'done'
    Preparing metadata (pyproject.toml): started
    Preparing metadata (pyproject.toml): finished with status 'done'
  Building wheels for collected packages: apache-beam
    Building wheel for apache-beam (pyproject.toml): started
    Building wheel for apache-beam (pyproject.toml): still running...
    Building wheel for apache-beam (pyproject.toml): still running...
    Building wheel for apache-beam (pyproject.toml): still running...
    Building wheel for apache-beam (pyproject.toml): finished with status 'done'
    Created wheel for apache-beam: filename=apache_beam-2.53.0.dev0-cp311-cp311-linux_x86_64.whl size=15801207 sha256=89a7600e4a95baded35614736e3992129cf9bb1c25827c97de58716abfce268b
    Stored in directory: /tmp/pip-ephem-wheel-cache-mr_x0801/wheels/aa/0c/49/7c4d39428e78274ff9f1db5b9c539aa0d6bd6becda70207fa7
  Successfully built apache-beam
      + /opt/python/cp38-cp38/bin/python -c 'import sys, json, glob; json.dump(glob.glob('"'"'/tmp/cibuildwheel/built_wheel/*.whl'"'"'), sys.stdout)'
      + rm -rf /tmp/cibuildwheel/repaired_wheel
      + mkdir -p /tmp/cibuildwheel/repaired_wheel
                                                            ??? 219.52s
Repairing wheel...
                                                              ??? 8.68s

??? cp311-manylinux_x86_64 finished in 228.46s
Copying wheels back to host...
                                                              ??? 0.49s
Traceback (most recent call last):
  File "/runner/_work/beam/beam/build/gradleenv/1922375555/bin/cibuildwheel", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/runner/_work/beam/beam/build/gradleenv/1922375555/lib/python3.11/site-packages/cibuildwheel/__main__.py", line 129, in main
    build_in_directory(args)
  File "/runner/_work/beam/beam/build/gradleenv/1922375555/lib/python3.11/site-packages/cibuildwheel/__main__.py", line 248, in build_in_directory
    cibuildwheel.linux.build(options, tmp_path)
  File "/runner/_work/beam/beam/build/gradleenv/1922375555/lib/python3.11/site-packages/cibuildwheel/linux.py", line 377, in build
    with OCIContainer(
  File "/runner/_work/beam/beam/build/gradleenv/1922375555/lib/python3.11/site-packages/cibuildwheel/oci_container.py", line 134, in __exit__
    self.process.wait(timeout=30)
  File "/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/subprocess.py", line 1264, in wait
    return self._wait(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/subprocess.py", line 2038, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['docker', 'start', '--attach', '--interactive', 'cibuildwheel-1e8a86de-1756-420b-838a-67027220d8ba']' timed out after 30 seconds

Build log

https://github.com/apache/beam/actions/runs/7134462214/job/19429335128?pr=29672

CI config

https://github.com/apache/beam/blob/318227345264e70ed2241538b8ab42a64af5a282/sdks/python/build.gradle#L106

@tvalentyn tvalentyn changed the title Sporadic timeouts Sporadic timeouts in oci_container.py Dec 7, 2023
@joerick
Copy link
Contributor

joerick commented Jan 26, 2024

Any idea what the container is doing that's making it take so long to close? The self.process.wait(timeout=30) is simply waiting for a bash shell in the container to exit (having received an exit command), so it seems to me likely that something has hung somewhere, normally this takes a fraction of a second.

@lmaddox
Copy link

lmaddox commented Aug 19, 2024

is there a way to add a retry as a work-around, but only to the part that's failing, i.e.,

| ✓ cp313-manylinux_i686 finished in 25.37s
[Build and Publish/build]   ❓  ::group::Copying wheels back to host...
| 
[Build and Publish/build]   ❓  ::endgroup::Copying wheels back to host...
|                                                               ✓ 0.07s

long build is finished ^^^

some other step is failing:

| Traceback (most recent call last):
|   File "<frozen runpy>", line 198, in _run_module_as_main
|   File "<frozen runpy>", line 88, in _run_code
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/cibuildwheel/__main__.py", line 422, in <module>
|     main()
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/cibuildwheel/__main__.py", line 49, in main
|     main_inner(global_options)
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/cibuildwheel/__main__.py", line 184, in main_inner
|     build_in_directory(args)
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/cibuildwheel/__main__.py", line 352, in build_in_directory
|     platform_module.build(options, tmp_path)
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/cibuildwheel/linux.py", line 450, in build
|     with OCIContainer(
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/cibuildwheel/oci_container.py", line 201, in __exit__
|     self.process.wait(timeout=30)
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/subprocess.py", line 1264, in wait
|     return self._wait(timeout=timeout)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/subprocess.py", line 2045, in _wait
|     raise TimeoutExpired(self.args, timeout)
| subprocess.TimeoutExpired: Command '['docker', 'start', '--attach', '--interactive', 'cibuildwheel-adcfb72d-fb9e-42e4-ab9b-887e6b095ff9']' timed out after 30 seconds
[Build and Publish/build]   ❌  Failure - Main Install dependencies
[Build and Publish/build] exitcode '1': failure
[Build and Publish/build] 🏁  Job failed
Error: Job 'build' failed

tech stack is: act+cibuildwheel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants