Install fails with undefined symbols apparently from LAPACK #15

Closed
simleb opened this Issue Nov 22, 2012 · 43 comments

Projects

None yet

4 participants

@simleb
simleb commented Nov 22, 2012

I'm trying to install julia with brew install --HEAD -v julia.

I ran into two related problems:

  1. When installing and failing, the linker warns:
    ld: warning: directory not found for option '-L-lopenblas'
    and then it fails with
Undefined symbols for architecture x86_64:
"_dgemm_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)
  "_dgemv_", referenced from:
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)
  "_dpotrf_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
  "_dsyrk_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
  "_dtrsm_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)
  "_dtrsv_", referenced from:
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)
  "_zgemm_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)
  "_zgemv_", referenced from:
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)
  "_zherk_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
  "_zpotrf_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
  "_ztrsm_", referenced from:
      _cholmod_super_numeric in libcholmod.a(cholmod_super_numeric.o)
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_numeric in libcholmod.a(cholmod_l_super_numeric.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)
  "_ztrsv_", referenced from:
      _cholmod_super_lsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_super_ltsolve in libcholmod.a(cholmod_super_solve.o)
      _cholmod_l_super_lsolve in libcholmod.a(cholmod_l_super_solve.o)
      _cholmod_l_super_ltsolve in libcholmod.a(cholmod_l_super_solve.o)

I found the problem to lie in one of the patches which assumes that the environment variable USRLIB is defined and then defines LIBBLAS as -L$(USRLIB) -lopenblas

Since by default this environment variable is undefined, -L -lopenblas didn't work.

  1. When installing with USRLIB=/usr/local/Cellar/openblas/0.2.4/lib brew install --HEAD -v julia, this time openblas was found but there are still missing symbols:
    Undefined symbols for architecture x86_64: "_dlarf_", referenced from: void spqr_private_apply1<double>(long, long, long, double*, double, double*, double*, cholmod_common_struct*) in libspqr.a(spqr_front.o) "_dlarfb_", referenced from: void spqr_larftb<double>(int, long, long, long, long, long, double*, double*, double*, double*, cholmod_common_struct*) in libspqr.a(spqr_larftb.o) "_dlarfg_", referenced from: double spqr_private_house<double>(long, double*, cholmod_common_struct*) in libspqr.a(spqr_front.o) "_dlarft_", referenced from: void spqr_larftb<double>(int, long, long, long, long, long, double*, double*, double*, double*, cholmod_common_struct*) in libspqr.a(spqr_larftb.o) "_zlarf_", referenced from: void spqr_private_apply1<std::complex<double> >(long, long, long, std::complex<double>*, std::complex<double>, std::complex<double>*, std::complex<double>*, cholmod_common_struct*) in libspqr.a(spqr_front.o) "_zlarfb_", referenced from: void spqr_larftb<std::complex<double> >(int, long, long, long, long, long, std::complex<double>*, std::complex<double>*, std::complex<double>*, std::complex<double>*, cholmod_common_struct*) in libspqr.a(spqr_larftb.o) "_zlarfg_", referenced from: std::complex<double> spqr_private_house<std::complex<double> >(long, std::complex<double>*, cholmod_common_struct*) in libspqr.a(spqr_front.o) "_zlarft_", referenced from: void spqr_larftb<std::complex<double> >(int, long, long, long, long, long, std::complex<double>*, std::complex<double>*, std::complex<double>*, std::complex<double>*, cholmod_common_struct*) in libspqr.a(spqr_larftb.o)
    Googling this symbols makes me think LAPACK should be linked. But I'm stuck here.

I'm running Mountain Lion and the doctor tells me I'm good to go. Any idea on how to fix this?

@staticfloat
Owner

Wow, thanks for the incredibly detailed bug report! This is due to a recent change in the way Julia handles her libraries. I've updated the formula, please brew update and try again!

@simleb
simleb commented Nov 22, 2012

I reinstalled openblas and tried to reinstall julia. It didn't fix it. I still have the exact same error.

@staticfloat
Owner

Ah, I found another problem, you will have to wait for JuliaLang/julia#1590 to be merged before this will work, otherwise running julia will fail with "system image not found".

Also, my previous statement about it being fixed is relevant only to your first problem, however I believe your second problem may get automatically fixed after fixing the first problem properly. OpenBLAS builds a custom LAPACK, and as such if OpenBLAS is linked in properly, you should get the LAPACK symbols.

@simleb
simleb commented Nov 22, 2012

Ok, great! I will closely follow this merge then.

@staticfloat
Owner

Now that's interesting, you're not finding the LAPACK symbols inside the OpenBLAS libraries like I'd expect you to be.

Can you run the following? You should get an output similar to mine:

$ nm $(brew --prefix)/opt/openblas/lib/libopenblas.a | grep _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
0000000000000000 T _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _dlarfb_
                 U _LAPACKE_dlarfb_work
/Users/sabae/.homebrew/opt/openblas/lib/libopenblas.a(lapacke_dlarfb_work.o):
0000000000000000 T _LAPACKE_dlarfb_work
0000000000000658 S _LAPACKE_dlarfb_work.eh
                 U _dlarfb_

@staticfloat
Owner

The merge happened (Those Julia people are fast!), but that won't affect your compilation not finding the LAPACK symbols. Can you run the above and see if you can find that LAPACK routine in your libopenblas.a? Thanks!

@simleb
simleb commented Nov 22, 2012

I get the same output as you. That's very strange!

I tried each of the missing symbols and they all seem to be present in the lib.

@simleb
simleb commented Nov 22, 2012

I narrowed it down. The error comes from this command:

clang++ -mmacosx-version-min=10.6 -shared -Xlinker -all_load libsuitesparseconfig.a libspqr.a   -o /private/tmp/julia-FTSg/usr/lib/libspqr.dylib -L/usr/local/opt/openblas/lib -L/usr/local/opt/readline/lib -L/usr/local/lib -F/usr/local/Frameworks -L/opt/X11/lib -L/private/tmp/julia-FTSg/usr/lib -lcholmod -lcolamd -lamd -L/private/tmp/julia-FTSg/usr/lib -lopenblas

The -shared flag made me think that -lopenblas is catching libopenblas.dylib instead of the static version.

Indeed, replacing -lopenblas by the full path of libopenblas.dylib gives exactly the same missing symbols, whereas replacing it by the full path to libopenblas.a gives

ld: warning: directory not found for option '-F/usr/local/Frameworks'
Undefined symbols for architecture x86_64:
  "__gfortran_compare_string", referenced from:
      _ilaenv_ in libopenblas.a(ilaenv.o)
  "__gfortran_concat_string", referenced from:
      _sgesvd_ in libopenblas.a(sgesvd.o)
      _shseqr_ in libopenblas.a(shseqr.o)
      _sormbr_ in libopenblas.a(sormbr.o)
      _sormhr_ in libopenblas.a(sormhr.o)
      _sormlq_ in libopenblas.a(sormlq.o)
      _sormql_ in libopenblas.a(sormql.o)
      _sormqr_ in libopenblas.a(sormqr.o)
      ...
  "__gfortran_pow_i4_i4", referenced from:
      _slalsa_ in libopenblas.a(slalsa.o)
      _dlalsa_ in libopenblas.a(dlalsa.o)
      _claed0_ in libopenblas.a(claed0.o)
      _claed7_ in libopenblas.a(claed7.o)
      _clalsa_ in libopenblas.a(clalsa.o)
      _cstedc_ in libopenblas.a(cstedc.o)
      _zlaed0_ in libopenblas.a(zlaed0.o)
      ...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The most surprising thing is that nm $(brew --prefix)/opt/openblas/lib/libopenblas.dylib | grep _dlarfb_ finds something:

0000000000a2b960 T _LAPACKE_dlarfb_work
000000000065a6e0 t _dlarfb_

so I guess it means that the symbols are present in the shared library too.

New ideas?

@staticfloat
Owner

Right, we actually link against the shared library on purpose, but in the end it shouldn't matter much.

What version of gfortran are you using? Did you install it from Homebrew or somewhere else? Also, can you run the following:

$ gfortran --print-file-name libgfortran.a
/Users/sabae/.homebrew/Cellar/gfortran/4.7.2/gfortran/lib/gcc/x86_64-apple-darwin12.2.0/4.7.2/../../../libgfortran.a
@simleb
simleb commented Nov 22, 2012

I use Homebrew's 4.7.2:

% gfortran --print-file-name libgfortran.a
/usr/local/Cellar/gfortran/4.7.2/gfortran/lib/gcc/x86_64-apple-darwin12.2.0/4.7.2/../../../libgfortran.a
@staticfloat
Owner

I updated the formula with an explicit link path to libgfortran. This might help your above error with the gfortran symbols, but honestly I have no idea why linking against libopenblas.dylib would gives those errors.

@samueljohn @nolta, do you two have any idea what's going on here?

@samueljohn
Contributor

I updated the gfortran formula to build 4.7.2 from source (and homebrew provieds bottles) instead of the older 4.2.
Not sure ...

@samueljohn
Contributor

Same issue for me https://gist.github.com/4132849.

@staticfloat
Owner

@samueljohn; those tbb symbols are because your SuiteSparse was compiled with TBB support linked in. You need to uninstall suitesparse and reinstall. (The default is now to have TBB disabled, unless you explicitly include --with-tbb, but as I don't yet support adding TBB into the julia builds, the linking will fail)

I'm actually having trouble reinstalling my entire toolchain right now, I feel like there might have been an environment change in Homebrew, I'll get back to you all once I've built everything from the bottom-up.

@samueljohn
Contributor

@staticfloat thanks. Let me know if I can help. I know a bit about homebrew now.

@staticfloat
Owner

I just rebuilt staticfloat/julia/{openblas,suite-sparse,arpack-ng,julia}, and everything works fine. Could this perhaps be a problem with bottled vs. unbottled gfortran? Nothing I download gets bottled because I have a non-standard prefix....

Note that my openblas installation wasn't working until I removed the explicit CFLAGS being passed to make for some reason. Not sure why, but it seems that I was overriding the default CFLAGS. I've updated the openblas formula to not do that anymore, and I believe that with superenv, it wasn't necessary in the first place.

@samueljohn
Contributor

I believe that with superenv, it wasn't necessary in the first place.

Yes, very likely.

I'll have to rebuilt my deps, too. I can do that next week. Moving this weekend.

@simleb
simleb commented Nov 24, 2012

Reinstalling gfortran unbottled and reinstalling {openblas,suite-sparse,arpack-ng,julia} didn't work for me.

@staticfloat
Owner

@simleb, because I can't reproduce and I can't find anything wrong with your environment from here, I'm going to wait to see if @samueljohn can reproduce the issue on his machine.

@simleb
simleb commented Nov 29, 2012

I managed to build julia but with some manual fiddling: see this gist.

I hope it will help understand the root of the problem.

@staticfloat
Owner

@simleb, sorry for the late response, I've been extremely busy these past few days. I'm glad you got it working!

Can you run brew test -v julia to make sure all tests pass? This is a very strange error; do you have any other openblas.{a,dylib} files laying around somewhere on your system that could be getting linked against first?

@samueljohn
Contributor

Wiped out everything and brew install -v --HEAD julia, I get this failure.

@staticfloat
Owner

I will wipe everything out tonight and try to fix this. There must be something funky with my installation that allows me to install properly.

@samueljohn
Contributor

Strangely, homebrew seems to have installed openblas 0.2.4 and not yours.

brew info openblas
openblas: stable 0.2.4, HEAD
http://xianyi.github.com/OpenBLAS/

This formula is keg-only.
Mac OS X already provides this software and installing another version in
parallel can cause all kinds of trouble.

/homebrew/Cellar/openblas/0.2.4 (13 files, 31M)
https://github.com/homebrew/homebrew-science/commits/master/openblas.rb
@simleb
simleb commented Dec 3, 2012
% brew test -v julia
Testing julia
==> /usr/local/Cellar/julia/HEAD/bin/julia runtests.jl all
/usr/local/Cellar/julia/HEAD/bin/julia runtests.jl all
     * all
     * core
     * numbers
     * strings
     * unicode
     * corelib
Warning: Possible conflict in library symbol dsyrk_
Warning: Possible conflict in library symbol dgemm_
     * hashing
     * remote
     * arrayops
     * linalg
Warning: Possible conflict in library symbol dcopy_
Warning: Possible conflict in library symbol dpotrf_
Warning: Possible conflict in library symbol dgesdd_
LLVM ERROR: Program used external function 'dgesdd_' which could not be resolved!
Assertion failed: (!isAlreadyCodeGenerating && "Error: Recursive compilation detected!"), function runJITOnFunctionUnlocked, file JIT.cpp, line 617.
Stack dump:
0.  Running pass 'X86 Machine Code Emitter' on function '@"julia_gesdd!"'
Error: julia: failed

Not good right?

@staticfloat
Owner

@samueljohn You must have another openblas formula laying around somewhere right? Or you've got a fork of my staticfloat-julia tap that has an older openblas?

@simleb no, that's not good. I believe there's something funky going on with your openblas installation. I will clear everything out and install from scratch tonight, and see if I can't figure out what's going on. You're not the only one with homebrew problems right now, so it'll be a good opportunity for me to run through the whole installation process again.

@simleb
simleb commented Dec 3, 2012

And I don't have any other OpenBLAS installed but the one from Homebrew:

% find / -iname "libopenblas.*"
/usr/local/Cellar/openblas/0.2.4/lib/libopenblas.a
/usr/local/Cellar/openblas/0.2.4/lib/libopenblas.dylib

I have the same brew info openblas output as @samueljohn

@samueljohn
Contributor

My (and perhaps some other problems) are perhaps related to only two formulae from your repo being tapped:

/homebrew ❯ brew untap staticfloat/julia
Untapped 2 formula

/homebrew ❯ brew tap staticfloat/julia
Cloning into '/homebrew/Library/Taps/staticfloat-julia'...
remote: Counting objects: 214, done.
remote: Compressing objects: 100% (126/126), done.
remote: Total 214 (delta 122), reused 177 (delta 87)
Receiving objects: 100% (214/214), 32.21 KiB, done.
Resolving deltas: 100% (122/122), done.
Warning: Could not tap staticfloat/julia/openblas over homebrew/science/openblas
Warning: Could not tap staticfloat/julia/suite-sparse over mxcl/master/suite-sparse
Tapped 2 formula

It seems I need to install staticfloat/julia/openblas and staticfloat/julia/suite-sparse manually first, because brew install -v --HEAD does use the ones from homebrew/science. Probably, because I tapped those first.

@staticfloat I have commit rights on homebrew/science, and I can make the necessary changes to openblas and suite-sparse such that Julia and you are happy. Just ping me whenever you need to get something in there quicky.

@samueljohn
Contributor

Now doin' brew install -v staticfloat/julia/openblas and brew install -v staticfloat/julia/suite-sparse.... waiting ... 🚶

@staticfloat
Owner

Ah. This is so great to find out. Thank you, @samueljohn! This is definitely a bug in Homebrew, as I have

depends_on "staticfloat/julia/suite-sparse"
depends_on "staticfloat/julia/openblas"

in julia.rb. I'll go open an issue on Homebrew-main about this.

I will submit a pull request to Homebrew-science once Homebrew/legacy-homebrew#14456 lands, so that we can switch between Accelerate and OpenBLAS easily; one of the reasons I have the arpack-ng and suite-sparse formulae in here is because I patch them to use OpenBLAS by default, which is the opposite of Homebrew-science, I believe.

@samueljohn
Contributor

@staticfloat I see, I see ... well, yes homebrew-science (and me) like to have Accelerate as the default and an option if one wants to use openblas. So we are both eagerly awaiting Homebrew/legacy-homebrew#14456 ...

Go @jacknagel, go :-)

@samueljohn
Contributor

It's worse. The obvious work-a-round is not working.
Installing your staticfloat/arpack-ng triggers the a re-install of homebrew-science/openblas - even if I had brew install staticfloat/julia/openblas before!

It's clear to me that this is a homebrew bug, that causes your explicit depends_on to be ignored.

@staticfloat
Owner

Bug reported here, because the issue seemed to match with this problem pretty well.
Homebrew/legacy-homebrew#14089

@jacknagel
Contributor

;) I feel kinda bad that it has taken this long, since my original spike was prepared over only a few days. But there are some parts of it that are sketchy and I want to rework before merging. I should have some time over the holidays to wrap it up though.

@staticfloat
Owner

I have implemented a new build process, which should dodge this problem completely.@simelb, @samueljohn, if you two care about this anymore, you can test it out after brew update.

@samueljohn
Contributor
brew install julia
Error: No available formula for openblas-julia 

??? I'll try to untap and tap your repo again.

@samueljohn
Contributor

yep, that was the problem ... testing now...

@samueljohn
Contributor

I still dislike that Julia clones on-demand some deps that homebrew already provides:

Cloning into 'deps/nginx'
Cloning into 'deps/libuv'
Cloning into 'deps/Rmath

Finally, I get this error: https://gist.github.com/samueljohn/4773136
Shall I open a new issue?

@staticfloat
Owner

@samueljohn: Jack works quickly: Homebrew/legacy-homebrew#17783 (comment). This should allow me to manually patch out submodules as I see fit.

Yes, please open a new issue, that should not be happening, as suite-sparse-julia shouldn't be......ah, I think I know what the problem is. You have suite-sparse installed with TBB, and Julia is finding that first. I need to change the order in which I search library paths. I'll submit a fix soon.

@samueljohn
Contributor

Oh yes probably you are right...

As said earlier, once you have reached a stable state with julia, I'd love to put some work into integrating the changes into suite-sparse, arpack-ng etc. that would allow us to build julia and aovid suite-saprse-julia and related.

@staticfloat
Owner

I'd love to integrate the two as well, the only problem I see is the Accelerate/OpenBLAS problem; Julia doesn't work when compiled against Accelerate anymore, and most other users of homebrew-science won't necessarily want OpenBLAS, right? Also, I don't think Accelerate supports the 64-bit interface that all the "64" versions of the julia dependencies support, so there is no point in trying to integrate e.g. suite-sparse64-julia and arpack64-julia back into homebrew-science unless they can depend on OpenBLAS.

@samueljohn
Contributor

On the 64-bit issue: If we name the resulting dylibs differently from the standard, it would be possible to integrate into the main formulae. But then we have teach suite-sparse and arpack and julia to use that other dylib names (i.e. suite-sparse64.dylib). Not sure if that is possible.

On the Accelerate vs. OpenBLAS: I have to think a play around a bit more. We might end up with having suite-sparse-ob.rb arpack-ob.rb etc. ... but not sure yet.

@staticfloat
Owner

I'm going to close this issue, please open another one if you have further issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment