Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Tracking flaky macOS CI: dlopen … No such file during import libtiledbsoma #2237

Closed
ryan-williams opened this issue Mar 7, 2024 · 7 comments · Fixed by #2435
Closed

Comments

@ryan-williams
Copy link
Member

ryan-williams commented Mar 7, 2024

Describe the bug

The "macos TILEDB_EXISTS: yes TILEDBSOMA_EXISTS: no" job sometimes fails, in the "Confirm linking to installed shared objects" step (python-so-copying.yml#L232), with error:

OSError: dlopen(/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledb.dylib, 0x000A): tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledb.dylib' (no such file)
Example full stack trace
Traceback (most recent call last):
  File "/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/__init__.py", line 104, in <module>
    from . import pytiledbsoma as clib
ImportError: dlopen(/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/pytiledbsoma.cpython-311-darwin.so, 0x0002): Library not loaded: '@rpath/libtiledbsoma.dylib'
  Referenced from: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/pytiledbsoma.cpython-311-darwin.so'
  Reason: tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/__init__.py", line 118, in <module>
    ctypes.CDLL(os.path.join(lib_dir, libtiledb_name), mode=ctypes.RTLD_GLOBAL)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledb.dylib, 0x000A): tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledb.dylib' (no such file)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/__init__.py", line 121, in <module>
    ctypes.CDLL(libtiledb_name, mode=ctypes.RTLD_GLOBAL)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py", line [37](https://github.com/single-cell-data/TileDB-SOMA/actions/runs/8191829188/job/22401948329?pr=2235#step:16:38)6, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(libtiledb.dylib, 0x000A): tried: 'libtiledb.dylib' (no such file), '/usr/lib/libtiledb.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/libtiledb.dylib' (no such file)
Error: Process completed with exit code 1.

Running list of instances of this failure that I've noticed:

In all cases, 1 re-run was sufficient to mine a successful run.

Aside: mitigation + possible separate segfault issue

#2229 should make it less common (by running the job in question less frequently/superfluously).

Note that a similar flaky CI job was observed there, by @jdblischak (GHA link), in job "macos TILEDB_EXISTS: yes TILEDBSOMA_EXISTS: yes", step Build and install libtiledbsoma:

clang: error: unable to execute command: Segmentation fault: 11
Full stack trace
-- Install prefix is /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external.
-- The C compiler identification is AppleClang 14.0.0.14000029
-- The CXX compiler identification is AppleClang 14.0.0.14000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Starting TileDB-SOMA superbuild.
-- Found TileDB: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledb.dylib
-- Could NOT find spdlog (missing: spdlog_DIR)
-- Adding spdlog as an external project
-- Not found clang-tidy
-- Not found clang-format
-- Configuring done (4.4s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/build-libtiledbsoma
/Users/runner/work/_temp/f7f53cdf-a50d-4785-9ee7-cb72edffd8d5.sh: line 8: nproc: command not found
[  6%] Creating directories for 'ep_spdlog'
[ 12%] Performing download step (download, verify and extract) for 'ep_spdlog'
-- ep_spdlog download command succeeded.  See also /Users/runner/work/TileDB-SOMA/TileDB-SOMA/build-libtiledbsoma/externals/src/ep_spdlog-stamp/ep_spdlog-download-*.log
[ 18%] No update step for 'ep_spdlog'
[ 25%] Performing patch step for 'ep_spdlog'
patching file include/spdlog/logger-inl.h
[ 31%] Performing configure step for 'ep_spdlog'
-- ep_spdlog configure command succeeded.  See also /Users/runner/work/TileDB-SOMA/TileDB-SOMA/build-libtiledbsoma/externals/src/ep_spdlog-stamp/ep_spdlog-configure-*.log
[ 37%] Performing build step for 'ep_spdlog'
CMake Error at /Users/runner/work/TileDB-SOMA/TileDB-SOMA/build-libtiledbsoma/externals/src/ep_spdlog-stamp/ep_spdlog-build-Release.cmake:37 (message):
-- stdout output is:
  Command failed: 2
[ 10%] Building CXX object CMakeFiles/spdlog.dir/src/spdlog.cpp.o

[ [20](https://github.com/single-cell-data/TileDB-SOMA/actions/runs/8180364638/job/22368243175#step:4:21)%] Building CXX object CMakeFiles/spdlog.dir/src/stdout_sinks.cpp.o
   '/Applications/Xcode_14.2.app/Contents/Developer/usr/bin/make'
[ 30%] Building CXX object CMakeFiles/spdlog.dir/src/color_sinks.cpp.o

  See also

    /Users/runner/work/TileDB-SOMA/TileDB-SOMA/build-libtiledbsoma/externals/src/ep_spdlog-stamp/ep_spdlog-build-*.log


CMake Error at /Users/runner/work/TileDB-SOMA/TileDB-SOMA/build-libtiledbsoma/externals/src/ep_spdlog-stamp/ep_spdlog-build-Release.cmake:47 (message):
  Stopping after outputting logs.


make[2]: *** [externals/src/ep_spdlog-stamp/ep_spdlog-build] Error 1
make[1]: *** [CMakeFiles/ep_spdlog.dir/all] Error 2
make: *** [all] Error 2
[ 40%] Building CXX object CMakeFiles/spdlog.dir/src/file_sinks.cpp.o
[ 50%] Building CXX object CMakeFiles/spdlog.dir/src/async.cpp.o
[ 60%] Building CXX object CMakeFiles/spdlog.dir/src/cfg.cpp.o
[ 70%] Building CXX object CMakeFiles/spdlog.dir/src/fmt.cpp.o

-- stderr output is:
clang: error: unable to execute command: Segmentation fault: 11
clang: error: clang frontend command failed due to signal (use -v to see invocation)
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin[21](https://github.com/single-cell-data/TileDB-SOMA/actions/runs/8180364638/job/22368243175#step:4:22).6.0
Thread model: posix
InstalledDir: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /var/folders/h1/8hndypj13nsbj5pn4xsnv1tm0000gn/T/spdlog-510f3d.cpp
clang: note: diagnostic msg: /var/folders/h1/8hndypj13nsbj5pn4xsnv1tm0000gn/T/spdlog-510f3d.sh
clang: note: diagnostic msg: Crash backtrace is located in
clang: note: diagnostic msg: /Users/runner/Library/Logs/DiagnosticReports/clang_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
clang: note: diagnostic msg: 

********************
make[5]: *** [CMakeFiles/spdlog.dir/src/spdlog.cpp.o] Error [25](https://github.com/single-cell-data/TileDB-SOMA/actions/runs/8180364638/job/22368243175#step:4:26)4
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [CMakeFiles/spdlog.dir/all] Error 2
make[3]: *** [all] Error 2

Error: Process completed with exit code 2.

Can file a separate issue for that, if we keep seeing it.

@johnkerl johnkerl changed the title [Bug] Flaky macOS CI: dlopen … No such file during import libtiledbsoma [Bug] Tracking flaky macOS CI: dlopen … No such file during import libtiledbsoma Mar 7, 2024
@ryan-williams
Copy link
Member Author

Added one to the list above:

@ryan-williams
Copy link
Member Author

And another:

@jdblischak
Copy link
Collaborator

This flaky build error is so frustrating. I've been meaning to investigate further, and it just happened to me recently, so I took some time to compare a failed build with its passing build after a simple manual restart. I copy-pasted the raw logs, removed the timestamps, and compared.

To begin, in both the failing and passing builds, the required libtiledbsoma.dylib is in the same exact directory as pytiledbsoma.cpython-311-darwin.so, so it should be easy to find.

./venv-soma/lib/python3.11/site-packages/tiledbsoma/pytiledbsoma.cpython-311-darwin.so
./venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledbsoma.dylib

And the RPATH is identical in both the failing and passing builds: @rpath/libtiledbsoma.dylib

Next I grep'd for libtiledbsoma.dylib and diff'd the results.

Results of diffing the passing and failing logs for all mentions of `libtiledbsoma.dylib`
$ grep libtiledbsoma.dylib soma-failed.txt
[ 74%] Linking CXX shared library libtiledbsoma.dylib
-- Installing: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: False
dlopen(libtiledbsoma.dylib, 0x0006): tried: 'libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/libtiledbsoma.dylib' (no such file)
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: True
  copying file /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib to /Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/src/tiledbsoma
  adding to package_data: ['libtiledbsoma.dylib']
copying src/tiledbsoma/libtiledbsoma.dylib -> build/lib.macosx-10.9-universal2-cpython-311/tiledbsoma
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: True
ld: warning: ignoring file /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
ld: warning: dylib (/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib) was built for newer macOS version (12.7) than being linked (11.0)
copying build/lib.macosx-10.9-universal2-cpython-311/tiledbsoma/libtiledbsoma.dylib -> build/bdist.macosx-10.9-universal2/wheel/tiledbsoma
build/bdist.macosx-10.9-universal2/wheel/tiledbsoma/libtiledbsoma.dylib
build/bdist.macosx-10.9-universal2/wheel/tiledbsoma/libtiledbsoma.dylib
adding 'tiledbsoma/libtiledbsoma.dylib'
   989448  04-10-2024 15:52   tiledbsoma/libtiledbsoma.dylib
##[group]Run unzip -l apis/python/dist/tiledbsoma-*.whl | grep -q libtiledbsoma.dylib
unzip -l apis/python/dist/tiledbsoma-*.whl | grep -q libtiledbsoma.dylib
        @rpath/libtiledbsoma.dylib (compatibility version 0.0.0, current version 0.0.0)
./venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledbsoma.dylib
ImportError: dlopen(/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/pytiledbsoma.cpython-311-darwin.so, 0x0002): Library not loaded: '@rpath/libtiledbsoma.dylib'
  Reason: tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file)

$ grep libtiledbsoma.dylib soma-passed.txt
[ 71%] Linking CXX shared library libtiledbsoma.dylib
-- Installing: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: False
dlopen(libtiledbsoma.dylib, 0x0006): tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), 'libtiledbsoma.dylib' (no such file), '/usr/local/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/libtiledbsoma.dylib' (no such file)
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: True
  copying file /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib to /Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/src/tiledbsoma
  adding to package_data: ['libtiledbsoma.dylib']
copying src/tiledbsoma/libtiledbsoma.dylib -> build/lib.macosx-10.9-universal2-cpython-311/tiledbsoma
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: True
ld: warning: ignoring file /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
ld: warning: dylib (/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib) was built for newer macOS version (12.7) than being linked (11.0)
build/bdist.macosx-10.9-universal2/wheel/tiledbsoma/libtiledbsoma.dylib
copying build/lib.macosx-10.9-universal2-cpython-311/tiledbsoma/libtiledbsoma.dylib -> build/bdist.macosx-10.9-universal2/wheel/tiledbsoma
build/bdist.macosx-10.9-universal2/wheel/tiledbsoma/libtiledbsoma.dylib
adding 'tiledbsoma/libtiledbsoma.dylib'
  1005848  04-11-2024 15:52   tiledbsoma/libtiledbsoma.dylib
##[group]Run unzip -l apis/python/dist/tiledbsoma-*.whl | grep -q libtiledbsoma.dylib
unzip -l apis/python/dist/tiledbsoma-*.whl | grep -q libtiledbsoma.dylib
        @rpath/libtiledbsoma.dylib (compatibility version 0.0.0, current version 0.0.0)
./venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledbsoma.dylib

$ diff <(grep libtiledbsoma.dylib soma-failed.txt) <(grep libtiledbsoma.dylib soma-passed.txt)
1c1
< [ 74%] Linking CXX shared library libtiledbsoma.dylib
---
> [ 71%] Linking CXX shared library libtiledbsoma.dylib
4c4
< dlopen(libtiledbsoma.dylib, 0x0006): tried: 'libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/libtiledbsoma.dylib' (no such file)
---
> dlopen(libtiledbsoma.dylib, 0x0006): tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), 'libtiledbsoma.dylib' (no such file), '/usr/local/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/libtiledbsoma.dylib' (no such file)
12d11
< copying build/lib.macosx-10.9-universal2-cpython-311/tiledbsoma/libtiledbsoma.dylib -> build/bdist.macosx-10.9-universal2/wheel/tiledbsoma
13a13
> copying build/lib.macosx-10.9-universal2-cpython-311/tiledbsoma/libtiledbsoma.dylib -> build/bdist.macosx-10.9-universal2/wheel/tiledbsoma
16c16
<    989448  04-10-2024 15:52   tiledbsoma/libtiledbsoma.dylib
---
>   1005848  04-11-2024 15:52   tiledbsoma/libtiledbsoma.dylib
21,22d20
< ImportError: dlopen(/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/pytiledbsoma.cpython-311-darwin.so, 0x0002): Library not loaded: '@rpath/libtiledbsoma.dylib'
<   Reason: tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file)

The main difference I observed was that when setup.py initially searches for an existing libtiledbsoma.dylib (and in this instance we know it doesn't exist and has to be built from source), the passing job searches in more locations, including external/lib (where libtiledb.dylib is installed).

for lib_dir in dist_dirs:
full_lib_path = lib_dir / get_libtiledbsoma_library_name()
print(f"Checking: {full_lib_path} exists: {full_lib_path.exists()}")
if full_lib_path.exists():
return lib_dir

# failed
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: False
dlopen(libtiledbsoma.dylib, 0x0006): tried: 'libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/libtiledbsoma.dylib' (no such file)
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: True
  copying file /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib to /Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/src/tiledbsoma
  adding to package_data: ['libtiledbsoma.dylib']

# passed
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: False
dlopen(libtiledbsoma.dylib, 0x0006): tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), 'libtiledbsoma.dylib' (no such file), '/usr/local/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/libtiledbsoma.dylib' (no such file)
Checking: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib exists: True
  copying file /Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib to /Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/src/tiledbsoma
  adding to package_data: ['libtiledbsoma.dylib']

So given that, maybe the primary error isn't finding libtiledbsoma.dylib installed in the Python package, but instead finding the external libtiledb.dylib installed in external/lib. As we can see from the error from cytpes.CDLL(), it is not even looking in external/lib for libtiledb.dylib.

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/__init__.py", line 118, in <module>
    ctypes.CDLL(os.path.join(lib_dir, libtiledb_name), mode=ctypes.RTLD_GLOBAL)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledb.dylib, 0x000A): tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/libtiledb.dylib' (no such file)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/runner/work/TileDB-SOMA/TileDB-SOMA/venv-soma/lib/python3.11/site-packages/tiledbsoma/__init__.py", line 121, in <module>
    ctypes.CDLL(libtiledb_name, mode=ctypes.RTLD_GLOBAL)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(libtiledb.dylib, 0x000A): tried: 'libtiledb.dylib' (no such file), '/usr/lib/libtiledb.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/libtiledb.dylib' (no such file)

But why is the search path different between the initial failing run and its passing restarted run?

@jdblischak
Copy link
Collaborator

Ok, I confirmed that the env vars affecting the library search path are the problem.

echo env vars: $DYLD_LIBRARY_PATH $PKG_CONFIG_PATH $TILEDB_PATH $TILEDBSOMA_PATH

$ diff <(grep 'external/lib' soma-failed.txt) <(grep 'external/lib' soma-passed.txt)
7a8
> env vars: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib /Users/runner/hostedtoolcache/Python/3.11.9/x64/lib/pkgconfig /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external
10a12
> dlopen(libtiledbsoma.dylib, 0x0006): tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), 'libtiledbsoma.dylib' (no such file), '/usr/local/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/apis/python/libtiledbsoma.dylib' (no such file)
44d45
<   Reason: tried: '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/dist/lib/libtiledbsoma.dylib' (no such file), '/Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib/libtiledbsoma.dylib' (no such file), '/usr/lib/libtiledbsoma.dylib' (no such file)

In the failing build, DYLD_LIBRARY_PATH is empty. But in the passing build, it is set to /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib

# failed
env vars: /Users/runner/hostedtoolcache/Python/3.11.9/x64/lib/pkgconfig /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external

# passed
env vars: /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external/lib /Users/runner/hostedtoolcache/Python/3.11.9/x64/lib/pkgconfig /Users/runner/work/TileDB-SOMA/TileDB-SOMA/external

So some flaky process is occasionally unsetting DYLD_LIBRARY_PATH

@eddelbuettel
Copy link
Contributor

Nice find.

For the record it also bit me maybe four or five times last week when working on nanoarrow (and triggering Python CI).

@jdblischak
Copy link
Collaborator

I think the problem is macOS's System Integrity Protection. From pypa/cibuildwheel#816 (comment):

macOS strips that env var out after each subprocess launch

Looking again at my failing build, I can see that DYLD_LIBRARY_PATH is set both during the wheel build and later during the failed runtime step. So I guess sometimes SIP is applied and sometimes not.

I'm at a bit of a loss on how to proceed here. I can literally see that I have properly set DYLD_LIBRARY_PATH in the workflow, but SIP randomly unsets it only some of the time. How can I write a robust workflow to run on macOS when it is actively making it difficult to find shared libraries?

image

@jdblischak
Copy link
Collaborator

Unfortunately SIP can't be toggled on and off in a GitHub Actions runner. However, it appears that SIP is disabled for macos-13, so we could try updating to that image

actions/runner-images#650 (comment)
actions/runner-images#8162
apache/datafusion-comet#112

Another idea would be to try DYLD_FALLBACK_LIBRARY_PATH (actions/runner-images#650 (comment)), though if SIP blocks DYLD_LIBRARY_PATH, I assume it is smart enough to block all search-path related env vars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants