Linking problem with atlas on OS X #1247

Open
amueller opened this Issue Oct 18, 2012 · 24 comments

6 participants

@amueller
scikit-learn member
@ogrisel
scikit-learn member

To me there is no issue to fix in scikit-learn: you need to build sklearn against the same blas lib as numpy and scipy.

If you build numpy / scipy / scikit-learn with the default build environment of OSX (python / clang / accelerate framework) [1] then everything work fine and all tests pass on OSX 10.8.

[1] this is what happens when you do python setup.py install on the 3 projects.

@amueller
scikit-learn member

I was not sure if it is necessary that all three projects are build against the same blas. But I guess it makes sense.

@amueller amueller closed this Oct 18, 2012
@cdeil

@amueller @ogrisel I am having a similar problem with missing ATLAS symbols in sklearn, although I think in my case numpy, scipy and sklearn was linked against Accelerate:
http://trac.macports.org/ticket/36696

I didn't have this problem a few weeks ago, my guess would be that it was introduced by the recent update to scipy 0.11 in Macports?

If you have Macports, could you please check if you can reproduce the issue?
(I hope the problem is my setup and that sklearn is not broken for all Macports users at the moment.)

@amueller
scikit-learn member

Can you find out what k_means.so was linked against?

@cdeil

@amueller You mean _k_means.so?

$ otool -L /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/cluster/_k_means.so
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/cluster/_k_means.so:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)

I gave some more info on what numpy / scipy / sklearn is linked against in the Macports ticket.
Let me know what else is needed to identify the problem.

@ogrisel
scikit-learn member

I have built numpy / scipy / scikit-learn from sources (using the setup.py files) against Accelerate (on OSX 10.8) without any issue myself:

$ otool -L coding/scikit-learn/sklearn/linear_model/cd_fast.so
coding/scikit-learn/sklearn/linear_model/cd_fast.so:
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)

$ otool -L coding/scikit-learn/sklearn/cluster/_k_means.so 
coding/scikit-learn/sklearn/cluster/_k_means.so:
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)

The python interpreter itself has been installed using homebrew:

$ which python
/usr/local/bin/python
$ ls -l /usr/local/bin/python
lrwxr-xr-x  1 ogrisel  admin  33 17 oct 14:01 /usr/local/bin/python -> ../Cellar/python/2.7.3/bin/python

I assume that it would also work with the default python from the system but I prefer to now install custom python package on it.

I have not tried macports because I am quite happy with homebrew already.

@cdeil

@ogrisel According to the build log (https://gist.github.com/3938458), my sklearn was built against the Accelerate BLAS:

blas_opt_info:
  FOUND:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3)]
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']

Why does my _k_means.so contain a reference to an ATLAS symbol then?

$ nm /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/cluster/_k_means.so | grep ddot
                 U _ATL_ddot
000000000000ad00 T _cblas_ddot
$ nm /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib | grep ddot
000000000006a062 T _cblas_ddot
00000000000164d6 T _ddot
00000000000164d6 T _ddot_

Note that I do have the atlas @3.10.0_1+gcc45 port installed, maybe this is incorrectly used in the build for some reason?
Is the -DNO_ATLAS_INFO=3 option correct in my case?

@amueller amueller reopened this Oct 23, 2012
@amueller
scikit-learn member

@cdeil I'll have a look later if @ogrisel didn't figure it out by then ;)
for the future: please open a new issue, that might make it easier to keep track.

@ogrisel
scikit-learn member

I don't have time to dig deeper now but indeed there is probably a bug in one (or all) of our setup.py.

@amueller
scikit-learn member

I refactored that so that now the bug is in only one function ;)

@amueller
scikit-learn member

According to line 1347 in the gist, the linker flag is just -lcblas and -L/opt/local/lib.
How does the linker disambiguate which what to link against for -lcblas?

@amueller
scikit-learn member

I am a bit confused why you have NO_ATLAS_INFO=3
This is the code that set's the value:

        if sys.platform=='darwin' and not os.environ.get('ATLAS',None):
            args = []
            link_args = []
            if get_platform()[-4:] == 'i386':
                intel = 1
            else:
                intel = 0
            if os.path.exists('/System/Library/Frameworks/Accelerate.framework/'):
                if intel:
                    args.extend(['-msse3'])
                else:
                    args.extend(['-faltivec'])
                args.extend([
                    '-I/System/Library/Frameworks/vecLib.framework/Headers'])
                link_args.extend(['-Wl,-framework','-Wl,Accelerate'])
            elif os.path.exists('/System/Library/Frameworks/vecLib.framework/'):
                if intel:
                    args.extend(['-msse3'])
                else:
                    args.extend(['-faltivec'])
                args.extend([
                    '-I/System/Library/Frameworks/vecLib.framework/Headers'])
                link_args.extend(['-Wl,-framework','-Wl,vecLib'])
            if args:
                self.set_info(extra_compile_args=args,
                              extra_link_args=link_args,
                              define_macros=[('NO_ATLAS_INFO',3)])
                return

Do you have any idea why it didn't find accelerate?

@cdeil

On my machine:

In [10]: sys.platform=='darwin' and not os.environ.get('ATLAS',None)
Out[10]: True
In [11]: os.path.exists('/System/Library/Frameworks/Accelerate.framework/')
Out[11]: True

and thus there will be something in args and self.set_info will be executed at the end.

The code says: "if Accelerate is there, set NO_ATLAS_INFO to 3".
Is that what it should do?

@amueller
scikit-learn member

oh sorry. I misread the code. You are completely right.

@cdeil

Note that at the end of line 1341 there is also: -Wl,-framework -Wl,Accelerate

Without a closer look I don't know which cblas (Macports or Accelerate) is then actually chosen by the linker:

$ find /opt/local/lib -name '*cblas*'
/opt/local/lib/libcblas.a
/opt/local/lib/libgslcblas.0.dylib
/opt/local/lib/libgslcblas.a
/opt/local/lib/libgslcblas.dylib
/opt/local/lib/libgslcblas.la
/opt/local/lib/libptcblas.a
$ find /System -name '*cblas*'
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/Headers/cblas.h
/System/Library/Frameworks/vecLib.framework/Versions/A/Headers/cblas.h
@amueller
scikit-learn member

I guess we should get rid of -L/opt/local/lib then?

@amueller
scikit-learn member

I think you have LIBRARY_PATH='/opt/local/lib' in your environment variables. (line 718)
That confuses the linker, I would guess. Could you try to set it empty?

@cdeil

When building sklearn outside Macports, I didn't have $LD_LIBRARY_PATH and $DYLD_LIBRARY_PATH set. The -L/opt/local/lib addition must come from python or numpy. For the Macports build the user environment is irrelevant, I have no control there.

Removing -L/opt/local/lib by hand from the linker command I get rid of the undefined symbol _ATL_ddot, but now _cblas_ddot is undefined:

$ nm build/temp.macosx-10.8-x86_64-2.7/sklearn/cluster/_k_means.o | grep ddot
                 U _cblas_ddot

$ /usr/bin/clang -bundle -undefined dynamic_lookup -L/opt/local/lib build/temp.macosx-10.8-x86_64-2.7/sklearn/cluster/_k_means.o -Lbuild/temp.macosx-10.8-x86_64-2.7 -lcblas -lm -o build/lib.macosx-10.8-x86_64-2.7/sklearn/cluster/_k_means.so -Wl,-framework -Wl,Accelerate

$ nm build/lib.macosx-10.8-x86_64-2.7/sklearn/cluster/_k_means.so | grep ddot
                 U _ATL_ddot
0000000000011940 T _cblas_ddot

$ /usr/bin/clang -bundle -undefined dynamic_lookup build/temp.macosx-10.8-x86_64-2.7/sklearn/cluster/_k_means.o -Lbuild/temp.macosx-10.8-x86_64-2.7 -lcblas -lm -o build/lib.macosx-10.8-x86_64-2.7/sklearn/cluster/_k_means.so -Wl,-framework -Wl,Accelerate

$ nm build/lib.macosx-10.8-x86_64-2.7/sklearn/cluster/_k_means.so | grep ddot
                 U _cblas_ddot

Can one of you try to reproduce the issue?
This should do it:

sudo port install py27-scikits-learn
# wait a bit until Macports installs gfortran, python, numpy, scipy, ...
export PYTHONPATH=/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
python -c 'import sklearn.cluster'
@amueller
scikit-learn member

Sorry, no OS X here....

@ChrisBeaumont

I've hit the same issue (building scikit-learn from source). Any movement on this?

@ChrisBeaumont

Ok, I tried re-running all of the link commands, but removing all instances of -L/opt/local/lib. This seems to have coaxed the linker into using the system BLAS, and allows things to be imported

My flavor of the issue:

python -c "import sklearn.cluster"

ImportError: dlopen(/Users/beaumont/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/cd_fast.so, 2): Symbol not found: _ATL_daxpy
  Referenced from: /Users/beaumont/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/cd_fast.so
  Expected in: flat namespace
 in /Users/beaumont/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/cd_fast.so
beaumont@beaumont-3:~$ otool -L  /Users/beaumont/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/cd_fast.so
/Users/beaumont/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/cd_fast.so:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)

And the workaround:

cd scikit-learn
python workaround.py

cd
python -c "import sklearn.cluster" #ok
otool -L /Users/beaumont/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/cd_fast.so:
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)

The contents of workaround.py are at https://gist.github.com/4498773

@vene
scikit-learn member

I can reproduce this. I did two changes two my setup, at the same time: 1) switch from installer-python to macports python, 2) switch from installer-scipy to the current git head (ie, built instead of binary release). I suppose the second one is at fault, but still the issue was with scikit-learn, and I needed @ChrisBeaumont 's workaround.

I will update when I understand it better.

@vene
scikit-learn member

Uninstalling macports atlas fixes this. I suppose we should find out where the -L/opt/local/lib comes from and remove it unless that's the atlas path found by config.

@gerigk

UPDATE:

I solved the issue and the NO_ATLAS_INFO , -1 brought me on the way

apparently I had a second blas/lapack via ubuntu. I deleted those and reinstalled numpy/scipy/sklearn and now everything works like a charm.


I have the same issue...but on ubuntu.
is there any workaround known on ubuntu?

----> 3 from sklearn import svm

/usr/local/lib/python2.7/dist-packages/sklearn/svm/__init__.py in <module>()
     11 # License: New BSD, (C) INRIA 2010
     12 
---> 13 from .classes import SVC, NuSVC, SVR, NuSVR, OneClassSVM, LinearSVC
     14 from .bounds import l1_min_c
     15 from . import sparse, libsvm, liblinear, libsvm_sparse

/usr/local/lib/python2.7/dist-packages/sklearn/svm/classes.py in <module>()
      1 from .base import BaseLibLinear, BaseSVC, BaseLibSVM
      2 from ..base import RegressorMixin
----> 3 from ..linear_model.base import LinearClassifierMixin
      4 from ..feature_selection.selector_mixin import SelectorMixin
      5 

/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/__init__.py in <module>()
     10 # complete documentation.
     11 
---> 12 from .base import LinearRegression
     13 
     14 from .bayes import BayesianRidge, ARDRegression

/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/base.py in <module>()
     27 from ..utils.sparsefuncs import (csc_mean_variance_axis0,
     28                                  inplace_csc_column_scale)
---> 29 from .cd_fast import sparse_std
     30 
     31 

ImportError: /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/cd_fast.so: undefined symbol: ATL_dcopy

the output of the build process

building 'sklearn.linear_model.cd_fast' extension
compiling C sources
C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC

creating build/temp.linux-x86_64-2.7/sklearn/linear_model
compile options: '-DNO_ATLAS_INFO=-1 -Isklearn/src/cblas -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/local/atlas/include -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c'
x86_64-linux-gnu-gcc: sklearn/linear_model/cd_fast.c
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1728:0,
                 from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:17,
                 from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:15,
                 from sklearn/linear_model/cd_fast.c:257:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: #warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ufuncobject.h:311:0,
                 from sklearn/linear_model/cd_fast.c:258:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/__ufunc_api.h:236:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/sklearn/linear_model/cd_fast.o -L/usr/local/atlas/lib -Lbuild/temp.linux-x86_64-2.7 -lcblas -lm -o build/lib.linux-x86_64-2.7/sklearn/linear_model/cd_fast.so

and

Setting PTATLAS=ATLAS
  FOUND:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas/lib']
    language = c
    define_macros = [('NO_ATLAS_INFO', -1)]
    include_dirs = ['/usr/local/atlas/include']
@amueller amueller modified the milestone: 0.15.1, 0.14 Jul 18, 2014
@amueller amueller modified the milestone: 0.16, 0.17 Sep 11, 2015
@amueller amueller removed this from the 0.17 milestone Sep 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment