Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Missing symbol when used with reference LAPACK: cblas_cdotc_sub #18371

Closed
h-vetinari opened this issue Apr 26, 2023 · 20 comments · Fixed by #18426
Closed

BUG: Missing symbol when used with reference LAPACK: cblas_cdotc_sub #18371

h-vetinari opened this issue Apr 26, 2023 · 20 comments · Fixed by #18426
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse.linalg

Comments

@h-vetinari
Copy link
Member

h-vetinari commented Apr 26, 2023

I've been testing scipy against all the BLAS/LAPACK variants available in conda-forge for a while, and though there are small issues here and there, generally things work everywhere.

Since scipy 1.10.1 (or some other change in the last 3 months), there's however now a kind of error I haven't seen before - a missing symbol (`cblas_cdotc_sub), but only for one BLAS/LAPACK flavour, namely the "netlib" one from https://github.com/Reference-LAPACK/lapack/.

This happens for linux on all arches (x64, aarch64, ppc64le), but not on osx/win.

Since this is causing an error directly upon import (rather than a small handful of failed tests), this has a much bigger blast radius.

import: 'scipy.cluster'
Traceback (most recent call last):
  [...]
    from . import _iterative
ImportError: $PREFIX/lib/python3.10/site-packages/scipy/sparse/linalg/_isolve/_iterative.cpython-310-x86_64-linux-gnu.so: undefined symbol: cblas_cdotc_sub

Apparently this is encountered in the wild already: python-control/Slycot#194

@h-vetinari h-vetinari added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse.linalg labels Apr 26, 2023
@ilayn
Copy link
Member

ilayn commented Apr 26, 2023

That might be due to a change in OpenBLAS builds on the MacPython side (or MKL for conda? ) . Not sure why we are using CBLAS variants in Fortran

@h-vetinari
Copy link
Member Author

h-vetinari commented Apr 26, 2023

That might be due to a change in OpenBLAS builds on the MacPython side (or MKL for conda?)

Nothing to do with OpenBLAS / MKL, which passes (aside from a small handful of test failures).

It's also not just on MacOS, but only on linux.

@gyutaepark
Copy link

I have the same issue after Conda updating Scipy.

Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03)
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import scipy
scipy.optimize
Traceback (most recent call last):
File "", line 1, in
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/init.py", line 200, in getattr
return _importlib.import_module(f'scipy.{name}')
File "/home/user/code/conda/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/optimize/init.py", line 404, in
from ._optimize import *
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/optimize/_optimize.py", line 33, in
from scipy.sparse.linalg import LinearOperator
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/init.py", line 283, in
from . import csgraph
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/csgraph/init.py", line 185, in
from ._laplacian import laplacian
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/csgraph/_laplacian.py", line 7, in
from scipy.sparse.linalg import LinearOperator
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/linalg/init.py", line 120, in
from ._isolve import *
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/linalg/_isolve/init.py", line 4, in
from .iterative import *
File "/home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/linalg/_isolve/iterative.py", line 9, in
from . import _iterative
ImportError: /home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/linalg/_isolve/_iterative.cpython-39-x86_64-linux-gnu.so: undefined symbol: cblas_cdotc_sub

@gyutaepark
Copy link

Downgrading to Scipy 1.10.0 on Conda forge removes this issue.

Problematic Version, Build: 1.10.1, py39he83f1e1_0
Ok Version: 1.10.0 py39h7360e5f_2

@moorepants
Copy link

moorepants commented Apr 28, 2023

We are seeing this in cyipopt here: mechmotum/cyipopt#182 (comment)

@h-vetinari
Copy link
Member Author

@rgommers @eli-schwartz
Could it be that this has something to do with switching the build to meson?

@h-vetinari
Copy link
Member Author

Problematic Version, Build: 1.10.1, py39he83f1e1_0
Ok Version: 1.10.0 py39h7360e5f_2

That problematic version is from conda-forge/scipy-feedstock#231.

Otherwise the only change in the folder where the blas-wrapping happens was: 9b7d313

@ilayn
Copy link
Member

ilayn commented Apr 28, 2023

Nothing to do with OpenBLAS / MKL, which passes (aside from a small handful of test failures).

That symbol belongs to MKL/OpenBLAS for dot products. If it is found on other platforms but misses out in one platform then I'd be suspicious about the CBLAS library, you can build BLAS/LAPACK without CBLAS etc. but then we should have been seeing more missing symbols. If it was on SciPy side then it should fail everywhere but here it is curiously missing just one symbol on one platform.

Weird.

PS: Strangely enough I'm converting this fortran code to cython so I guess planets are aligning again.

@rgommers
Copy link
Member

This is a regression, and seems to be entirely my fault: gh-18264. Let me try to open a PR to fix it - I'm about to be offline for a few days, so I'll leave it in draft and hope someone else can then confirm that the problem is gone.

@rgommers
Copy link
Member

Although that PR is from 3 weeks ago, and is not in 1.10.1. So there may be more to this. We unfortunately have no CI for Netlib BLAS (nor for MKL, nor using -use-g77-abi), which is why this wasn't caught in time. That's a huge gap unfortunately, that we need to fix at some point.

Since scipy 1.10.1 (or some other change in the last 3 months),

So to be sure: you started seeing this when switching from distutils to Meson, correct? The cblas_cdotc_sub symbol is only used in PROPACK, and the PROPACK wrapping is known to be broken badly. So this is not surprising. The question is whether you are still seeing it if PROPACK is disabled? And do the PROPACK-related build fixes in gh-18263 (which are in main but not 1.10.1 fix it?

@ilayn
Copy link
Member

ilayn commented Apr 28, 2023

It's also in sparse.linalg.iterative

@moorepants
Copy link

Conda forge recommends building against the netlib blas/lapack so that openblas, mkl, etc can be hot swapped during installations of the packages. In the cyipopt repo we run our test suite against binaries that are built against netlib blas/lapack and then I later install the optional scipy dependency which likely swaps the netlib versions with openblas (all conda forge) to run the tests for the scipy optional dependency. This has worked in the past with the scipy conda-forge binaries, so if no new scipy releases have come out, maybe changes to the scipy feedstock caused this to appear. I think it only showed up within the last month on our CI tests.

@rgommers
Copy link
Member

This has worked in the past with the scipy conda-forge binaries, so if no new scipy releases have come out, maybe changes to the scipy feedstock caused this to appear.

Yes, for Linux and macOS, the conda-forge 1.10.0 binaries were built with setup.py, the latest 1.10.1 ones with Meson.

It's also in sparse.linalg.iterative

I'm not seeing that. That correctly uses wcdotc, not cblas_cdotc explicitly.

@ilayn
Copy link
Member

ilayn commented Apr 28, 2023

I haven't gone deep enough but the error message says

ImportError: /home/user/code/conda/lib/python3.9/site-packages/scipy/sparse/linalg/_isolve/_iterative.cpython-39-x86_64-linux-gnu.so: undefined symbol: cblas_cdotc_sub

@moorepants
Copy link

moorepants commented Apr 28, 2023

Yes, for Linux and macOS, the conda-forge 1.10.0 binaries were built with setup.py, the latest 1.10.1 ones with Meson.

It looks like 3 weeks ago our CI was running with 1.10.1 binaries from conda forge. So maybe this change which was 2 weeks ago has some connection: conda-forge/scipy-feedstock#231

@rgommers
Copy link
Member

I haven't gone deep enough but the error message says

Ah okay, that's use of cdotc, done correctly via the wcdotc wrapper, which is happening in several sparse submodules. a symbol goes missing for some reason. I don't think there's anything wrong with sparse.linalg._isolve.iterative as such, it does nothing special. But in the use of g77_abi_wrappers, something is going wrong.

@h-vetinari
Copy link
Member Author

It's late for me so I won't be able to report the result right away, but I've restarted a test against all blas variants in conda-forge/scipy-feedstock#224 when built with distutils rather than meson (if nothing else, then to rule out that this might have something to do with it).

If the linux + netlib jobs pass the import tests and run the test suite, then the feedstock changes in conda-forge/scipy-feedstock#231 are at fault (somehow).

@h-vetinari
Copy link
Member Author

If the linux + netlib jobs pass the import tests and run the test suite, then the feedstock changes in conda-forge/scipy-feedstock#231 are at fault (somehow).

Reporting back here, the distutils based builds against netlibs on aarch/ppc were successful in conda-forge/scipy-feedstock#224, so it appears to be some consequence of switching to meson.

@ilayn
Copy link
Member

ilayn commented May 2, 2023

This cdotc usage will be gone end of this week when #18391. It doesn't solve the issue but will eliminate its impact radius

@rgommers
Copy link
Member

rgommers commented May 5, 2023

I can reproduce this in an environment with the following changes:

diff --git a/environment.yml b/environment.yml
index 6eb85bbcc..00b0141c7 100644
--- a/environment.yml
+++ b/environment.yml
@@ -3,7 +3,7 @@
 #   $ conda activate scipy-dev
 #
 # Also used to build the `scipy-dev` Docker image via GitHub Actions
-name: scipy-dev
+name: scipy-dev-netlib
 channels:
   - conda-forge
 dependencies:
@@ -15,9 +15,11 @@ dependencies:
   - meson-python
   - ninja
   - numpy
-  - openblas
+  - libblas
+  - libcblas
+  - liblapack
+  - blas-devel
   - pkg-config  # note: not available on Windows
-  - libblas=*=*openblas  # helps avoid pulling in MKL
+  - libblas=*=*netlib  # helps avoid pulling in MKL
   - pybind11
   # scipy.datasets dependency
   - pooch

And then running python dev.py build -C-Dblas=blas -C-Dlapack=lapack -C-Duse-g77-abi=true (after fixing another regression).

The issue is that Netlib BLAS has separate libblas and libcblas libraries, and we're only linking against libblas. We are using CBLAS explicitly - but only when building with -Duse-g77-abi=true, that's why this doesn't show up in CI. So the combination of building against Netlib BLAS and using those g77 ABI wrappers is where things go wrong:

$ objdump -T build/scipy/sparse/linalg/_isolve/_iterative.cpython-310-x86_64-linux-gnu.so | rg cblas_cdot
0000000000000000      D  *UND*  0000000000000000  Base        cblas_cdotc_sub
$ ldd build/scipy/sparse/linalg/_isolve/_iterative.cpython-310-x86_64-linux-gnu.so | rg blas
        libblas.so.3 => /home/rgommers/mambaforge/envs/scipy-dev-netlib/lib/libblas.so.3 (0x00007fca399a3000)
$ objdump -T ~/mambaforge/envs/scipy-dev-netlib/lib/libblas.so | rg cblas_cdot
$ objdump -T ~/mambaforge/envs/scipy-dev-netlib/lib/libcblas.so | rg cblas_cdot
000000000000c590 g    DF .text  000000000000002b  Base        cblas_cdotu_sub
000000000000c5c0 g    DF .text  000000000000002b  Base        cblas_cdotc_sub

A fix is on the way. Our CI situation for BLAS/LAPACK libraries is very bad, that's why this snuck in. I'll see if I can address that as well.

rgommers added a commit to rgommers/scipy that referenced this issue May 5, 2023
The first was caused by scipygh-18264 and caused a build issue,
the second was a missing CBLAS issue when building against Netlib BLAS
with this combination of flags:

  python -m build -nwx -C-Dblas=blas -C-Dlapack=lapack -C-Duse-g77-abi=true
  python dev.py build -C-Dblas=blas -C-Dlapack=lapack -C-Duse-g77-abi=true

Closes scipygh-18371
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse.linalg
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants