Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64 wheel not working; cmake binary segfault #115

Closed
AWSjswinney opened this issue Sep 28, 2020 · 5 comments · Fixed by #116
Closed

aarch64 wheel not working; cmake binary segfault #115

AWSjswinney opened this issue Sep 28, 2020 · 5 comments · Fixed by #116

Comments

@AWSjswinney
Copy link
Contributor

AWSjswinney commented Sep 28, 2020

I'm working on a contribution to fix this problem, so I want to engage the community to see which solutions are acceptable.

After installing cmake from pip, the cmake binary does not work, failing with a segmentation fault. Below is a test on Ubuntu 20.04, arm64:

docker run --rm -it ubuntu:focal bash
root@dd6e77ac5530:/# apt update && apt install python3-pip
...
root@dd6e77ac5530:/# pip3 install cmake
Collecting cmake
  Downloading cmake-3.18.2.post1-py3-none-manylinux2014_aarch64.whl (15.2 MB)
     |████████████████████████████████| 15.2 MB 27.6 MB/s
Installing collected packages: cmake
Successfully installed cmake-3.18.2.post1
root@dd6e77ac5530:/# cmake --version
root@dd6e77ac5530:/# echo $?
245
root@dd6e77ac5530:/# /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake
Segmentation fault (core dumped)
root@dd6e77ac5530:/#

After some debugging I discovered this problem could be fixed by dynamically linking against libstdc++ and libgcc instead of statically linking. I changed these lines:

file(WRITE "${CMAKE_BINARY_DIR}/initial-cache.txt"
"set(CMAKE_C_FLAGS \"-D_POSIX_C_SOURCE=199506L -D_POSIX_SOURCE=1 -D_SVID_SOURCE=1 -D_BSD_SOURCE=1\" CACHE STRING \"Initial cache\" FORCE)
set(CMAKE_EXE_LINKER_FLAGS \"-static-libstdc++ -static-libgcc -lrt\" CACHE STRING \"Initial cache\" FORCE)

to

        file(WRITE "${CMAKE_BINARY_DIR}/initial-cache.txt"
"set(CMAKE_C_FLAGS \"-g3 -D_POSIX_C_SOURCE=199506L -D_POSIX_SOURCE=1 -D_SVID_SOURCE=1 -D_BSD_SOURCE=1\" CACHE STRING \"Initial cache\" FORCE)
set(CMAKE_EXE_LINKER_FLAGS \"-lstdc++ -lgcc -lrt\" CACHE STRING \"Initial cache\" FORCE)

This fixed the problem for Ubuntu 20.04. The wheel works correctly after this change. However, testing in the manylinux2014-aarch64, I discovered that the cmake binary was linked against newer versions than were available of libcrypto and libssl. This problem could be fixed with the auditwheel tool, but this had two new problems: 1. the auditwheel tool couldn't be run in the cross compiler container so it has to be run in a separate emulated container or natively. 2. After I switched to run natively on an Arm system, auditwheel refused to repair the wheel because of linkage with glibc 2.25.

[root@74a50ab14cd5 io]# /opt/python/cp38-cp38/bin/auditwheel repair dist/cmake-0.post308+gef0722c-py3-none-manylinux2014_aarch64.whl
INFO:auditwheel.main_repair:Repairing cmake-0.post308+gef0722c-py3-none-manylinux2014_aarch64.whl
usage: auditwheel [-h] [-V] [-v] command ...
auditwheel: error: cannot repair "dist/cmake-0.post308+gef0722c-py3-none-manylinux2014_aarch64.whl" to "manylinux2014_aarch64" ABI because of the presence of too-recent versioned symbols. You'll need to compile the wheel on an older toolchain.

From there I discovered that the dockcross container used to cross compile the arm64 wheel is using libraries that aren't compliant with the PEP599 spec, so I reported this bug to dockcross.

Finally, I am working on a change to use Travis-CI to do a native build for Linux on Arm with the manylinux2014-aarch64 container. Since the build uses scikit-build which itself depends on cmake, it must first download, build, and install cmake before it can be built again for the wheel.

I have the following questions to the community:

  1. Would you support migrating away from the dockcross container to prefer native builds on Arm?
  2. Using auditwheel bundles the dynamic libraries with the wheel. Does this present any licensing problems?
  3. Would you be open to migrating from Travis-CI.org to Travis-CI.com? The dot-com version supports Arm64 build on AWS Graviton 2 instances, which are much faster than the previous generation available on Travis. See: https://blog.travis-ci.com/2020-09-11-arm-on-aws
@thewtex
Copy link
Member

thewtex commented Sep 28, 2020

I'm working on a contribution to fix this problem, so I want to engage the community to see which solutions are acceptable.

Thanks for contributing 🙏

After some debugging I discovered this problem could be fixed by dynamically linking against libstdc++ and libgcc instead of statically linking.

Great, this is preferred.

  1. the auditwheel tool couldn't be run in the cross compiler container so it has to be run in a separate emulated container or natively.

Perhaps qemu can help here?

From there I discovered that the dockcross container used to cross compile the arm64 wheel is using libraries that aren't compliant with the PEP599 spec, so I reported this bug to dockcross.

Thanks.

Would you support migrating away from the dockcross container to prefer native builds on Arm?

It ended up that we had to use a native build instead of a dockcross build, at least for now, as discussed in #96 . It would be nice if we can cross-compile the wheel, but a CI / CD setup to build test and deploy a native wheel would be a step up 🚀

Using auditwheel bundles the dynamic libraries with the wheel. Does this present any licensing problems?

There should not be any licensing issues with bundling the libstdc++, openssl, or cmake dynamic libraries in the wheel.

Would you be open to migrating from Travis-CI.org to Travis-CI.com? The dot-com version supports Arm64 build on AWS Graviton 2 instances, which are much faster than the previous generation available on Travis. See: https://blog.travis-ci.com/2020-09-11-arm-on-aws

Sounds great, and migration of open source projects to travis-ci.com seems to encouraged now. @jcfr what do you think?

@AWSjswinney
Copy link
Contributor Author

  1. the auditwheel tool couldn't be run in the cross compiler container so it has to be run in a separate emulated container or natively.

Perhaps qemu can help here?

Yes, it definitely could, and I was successful in my attempts to use it. However, because of the bug with dockcross, we can't produce a compliant wheel, so building the whole project natively seemed like a better option to me.

Would you be open to migrating from Travis-CI.org to Travis-CI.com? The dot-com version supports Arm64 build on AWS Graviton 2 instances, which are much faster than the previous generation available on Travis. See: https://blog.travis-ci.com/2020-09-11-arm-on-aws

Sounds great, and migration of open source projects to travis-ci.com seems to encouraged now. @jcfr what do you think?

Excellent. I'm working on getting the necessary legal approvals from AWS to make my contribution. Expect it soon.

@AWSjswinney
Copy link
Contributor Author

@thewtex Would you take a quick look at the Windows build failures on Appveyor for #116 ? I don't understand how the changes that I made would have affected the windows build in the way that it's failing. I'm wondering if it's some change in an upstream dependency. Is it possible to retry the build on the current head of master to make sure they are passing there?

Additionally, the arm64 build sometimes times out on Travis-CI. This problem would be fixed by switching to Travis-CI.com and using arch:arm64-graviton2. The Graviton 2 builders are much faster and in my tests, have completed in about 20 minutes.

@thewtex
Copy link
Member

thewtex commented Oct 7, 2020

@thewtex Would you take a quick look at the Windows build failures on Appveyor for #116 ? I don't understand how the changes that I made would have affected the windows build in the way that it's failing. I'm wondering if it's some change in an upstream dependency. Is it possible to retry the build on the current head of master to make sure they are passing there?

I am not sure why it is failing now. This issue appears to be related to picking up Python 2 / an extension build. While it is supposed to be a Python 3 build: Environment: APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, PYTHON_DIR=C:\Python37, PYTHON_VERSION=3.7.x, PYTHON_ARCH=64, BLOCK=0, the scikit-ci installation is using Python 2 and failing in the compilation of the ruamel.ordereddict dependency.

Build started
python -m pip install -U scikit-ci scikit-ci-addons
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Collecting scikit-ci
  Downloading https://files.pythonhosted.org/packages/39/fd/649b63a3c9e0e0545adaa7fa20b4036297969f9eca15eabe81555bc81583/scikit_ci-0.21.0-py2.py3-none-any.whl
Collecting scikit-ci-addons
  Downloading https://files.pythonhosted.org/packages/cb/84/1fb0241578111d9797c8fc88bb46848ed85be82818904e4f04dd52c31b42/scikit_ci_addons-0.25.0-py2.py3-none-any.whl (45kB)
Collecting ruamel.yaml>=0.15; python_version == "2.7"
  Downloading https://files.pythonhosted.org/packages/7e/39/186f14f3836ac5d2a6a042c8de69988770e8b9abb537610edc429e4914aa/ruamel.yaml-0.16.12-py2.py3-none-any.whl (111kB)
Collecting pyfiglet
  Downloading https://files.pythonhosted.org/packages/33/07/fcfdd7a2872f5b348953de35acce1544dab0c1e8368dca54279b1cde5c15/pyfiglet-0.8.post1-py2.py3-none-any.whl (865kB)
Collecting lxml; python_version < "3.8"
  Downloading https://files.pythonhosted.org/packages/e4/af/987265368135ef520adb617da79c8cbb40915706c8fd0d5ec7745c2e2b84/lxml-4.5.2-cp27-cp27m-win32.whl (3.2MB)
Collecting githubrelease>=1.5.7
  Downloading https://files.pythonhosted.org/packages/a3/4c/d6b3594ad70128be6167a1c9e726b1d9aeb07fdc744acb4e7055f8ec2e83/githubrelease-1.5.8-py2.py3-none-any.whl
Collecting ruamel.ordereddict; platform_python_implementation == "CPython" and python_version <= "2.7"
  Downloading https://files.pythonhosted.org/packages/bf/c0/6facfb1aa7ab8ee7f12883f8a77ac2331789b411a920da6c1f559c1af98d/ruamel.ordereddict-0.4.15.tar.gz (61kB)
Collecting ruamel.yaml.clib>=0.1.2; platform_python_implementation == "CPython" and python_version < "3.9"
  Downloading https://files.pythonhosted.org/packages/5e/78/dd0cb2fd894f77969a2a2f50e0475ba56070ada384a947784b129e58230a/ruamel.yaml.clib-0.2.2-cp27-cp27m-win32.whl (98kB)
Collecting requests
  Downloading https://files.pythonhosted.org/packages/45/1e/0c169c6a5381e241ba7404532c16a21d86ab872c9bed8bdcd4c423954103/requests-2.24.0-py2.py3-none-any.whl (61kB)
Collecting linkheader
  Downloading https://files.pythonhosted.org/packages/27/d4/eb1da743b2dc825e936ef1d9e04356b5701e3a9ea022c7aaffdf4f6b0594/LinkHeader-0.4.3.tar.gz
Collecting click
  Downloading https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl (82kB)
Collecting certifi>=2017.4.17
  Downloading https://files.pythonhosted.org/packages/5e/c4/6c4fe722df5343c33226f0b4e0bb042e4dc13483228b4718baf286f86d87/certifi-2020.6.20-py2.py3-none-any.whl (156kB)
Collecting chardet<4,>=3.0.2
  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
Collecting idna<3,>=2.5
  Downloading https://files.pythonhosted.org/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whl (58kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Downloading https://files.pythonhosted.org/packages/9f/f0/a391d1463ebb1b233795cabfc0ef38d3db4442339de68f847026199e69d7/urllib3-1.25.10-py2.py3-none-any.whl (127kB)
Installing collected packages: ruamel.ordereddict, ruamel.yaml.clib, ruamel.yaml, pyfiglet, scikit-ci, lxml, certifi, chardet, idna, urllib3, requests, linkheader, click, githubrelease, scikit-ci-addons
    Running setup.py install for ruamel.ordereddict: started
    Running setup.py install for ruamel.ordereddict: finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: 'C:\Python27\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'c:\\users\\appveyor\\appdata\\local\\temp\\1\\pip-install-is6nkk\\ruamel.ordereddict\\setup.py'"'"'; __file__='"'"'c:\\users\\appveyor\\appdata\\local\\temp\\1\\pip-install-is6nkk\\ruamel.ordereddict\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'c:\users\appveyor\appdata\local\temp\1\pip-record-7lcgnc\install-record.txt' --single-version-externally-managed --compile
         cwd: c:\users\appveyor\appdata\local\temp\1\pip-install-is6nkk\ruamel.ordereddict\
    Complete output (11 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-2.7
    creating build\lib.win32-2.7\ruamel
    creating build\lib.win32-2.7\ruamel\ordereddict
    copying .\__init__.py -> build\lib.win32-2.7\ruamel\ordereddict
    running build_ext
    building '_ordereddict' extension
    error: Microsoft Visual C++ 9.0 is required. Get it from http://aka.ms/vcpython27
    ----------------------------------------

A few ideas how to move forward:

  • Find a way use Python 3 instead, sadly CPython from python.org does not come with a python3.exe executable with Python 3.7 :-(.
  • Move from AppVeyor to GitHub Actions.

I will defer to @jcfr on how he would like to proceed.

Additionally, the arm64 build sometimes times out on Travis-CI. This problem would be fixed by switching to Travis-CI.com and using arch:arm64-graviton2. The Graviton 2 builders are much faster and in my tests, have completed in about 20 minutes.

Sounds good to me. I sent a request to install the Travis-CI.com GitHub App on this repository, but @jcfr has to approve to enable. ✔️

@AWSjswinney
Copy link
Contributor Author

Thanks for the suggestions. I was able to figure out how to fix the AppVeyor build. I'm not sure why it spontaneously broke, but it's passing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants