Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with armadillo v5 and up #431

Closed
johnlees opened this issue Apr 14, 2015 · 10 comments
Closed

Segmentation fault with armadillo v5 and up #431

johnlees opened this issue Apr 14, 2015 · 10 comments
Assignees

Comments

@johnlees
Copy link

As detailed here:
http://arma.sourceforge.net/docs.html#uword

Armadillo uses 64 bit word length by default when using a c++11 capable compiler. When used with mlpack this leads to errors such as

error: arma::memory::acquire(): out of memory

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

I've tried uncommenting the relevant line in include/armadillo_bits/config.hpp, and using -DARMA_64BIT_WORD when running cmake with mlpack. However when I link this version of mlpack my application immediately segfaults:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff791f905 in long double boost::math::lanczos::lanczos17m64::lanczos_sum_expG_scaled<long double>(long double const&) ()
from /nfs/users/nfs_j/jl11/installations/pangwas/mlpack/build_64/lib/libmlpack.so.1
(gdb) bt
#0  0x00007ffff791f905 in long double boost::math::lanczos::lanczos17m64::lanczos_sum_expG_scaled<long double>(long double const&) ()
from /nfs/users/nfs_j/jl11/installations/pangwas/mlpack/build_64/lib/libmlpack.so.1
#1  0x00007ffff791133f in _GLOBAL__sub_I_discrete_distribution.cpp () from /nfs/users/nfs_j/jl11/installations/pangwas/mlpack/build_64/lib/libmlpack.so.1
#2  0x00007ffff7de9306 in ?? () from /lib64/ld-linux-x86-64.so.2

Could you provide advice on how to compile and link mlpack correctly when using the newer versions of Armadillo?

Thanks!

@rcurtin rcurtin self-assigned this Apr 14, 2015
@rcurtin
Copy link
Member

rcurtin commented Apr 14, 2015

First things first, I disabled the CMake warning for ARMA_64BIT_WORD in ea0d81f.

Can you give me more information on what you've done?

My own investigation uncovered a bug in Armadillo 5.000.0, where the symbol arma_cxx11_rng_instance isn't properly being compiled into libarmadillo.so. I can compile libmlpack.so successfully, but trying to compile anything against libmlpack.so that requires arma_cxx11_rng_instance (so, anything that uses random numbers from Armadillo) gives me something of the form

/tmp/ccBf4Ouc.o: In function `TLS wrapper function for arma::arma_rng_cxx11_instance':
test.cpp:(.text._ZTWN4arma23arma_rng_cxx11_instanceE[_ZTWN4arma23arma_rng_cxx11_instanceE]+0x5): undefined reference to `TLS init function for arma::arma_rng_cxx11_instance'
test.cpp:(.text._ZTWN4arma23arma_rng_cxx11_instanceE[_ZTWN4arma23arma_rng_cxx11_instanceE]+0x15): undefined reference to `arma::arma_rng_cxx11_instance'
collect2: error: ld returned 1 exit status

I reported the problem upstream, and there will probably be a fix in the next day or so (5.000.1 probably?)

I'd expect you to be encountering the same problem too, but the fact that you aren't leads me to suspect that you are compiling against Armadillo 5.000.0 but linking against an older version of libarmadillo.so (which does not have ARMA_64BIT_WORD enabled) and thus as soon your code calls something internal to libarmadillo.so, stack mangling and other assorted disasters occur. But... the backtrace you provide is from boost, not from Armadillo, so... could I get more information? What OS is this on?

@dongli
Copy link

dongli commented Apr 15, 2015

I also meet segmentation fault when executing allknn as:

$ allknn -r grids.csv -q point.csv -n neighbors.csv -d distances.csv -k 1 -v
[INFO ] Loading 'grids.csv' as CSV data.  Size is 612942 x 3.
[INFO ] Loaded reference data from 'grids.csv' (612942 x 3).
[INFO ] Loading 'point.csv' as CSV data.  Size is 1 x 3.
[INFO ] Loaded query data from 'point.csv' (1 x 3).
[INFO ] Building reference tree...
[INFO ] Loaded query data from 'point.csv' (1 x 3).
[INFO ] Building query tree...
[INFO ] Tree built.
[INFO ] Computing 1 nearest neighbors...
Segmentation fault: 11

I just updated armadillo to 5.000.0. I don't know if this is relevant. I will try to degrade armadillo to 4.650.4 later, and test again.

Edit: This is not related to armadillo version. I used lldb to get error location:

Process 59011 stopped
* thread #1: tid = 0x15ffa5, 0x000000010001c9e7 allknn`double mlpack::bound::HRectBound<2, true>::MinDistance<arma::subview_col<double> >(arma::subview_col<double> const&, boost::enable_if<IsVector<arma::subview_col<double> >, void>*) const + 135, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7fff5fc84000)
    frame #0: 0x000000010001c9e7 allknn`double mlpack::bound::HRectBound<2, true>::MinDistance<arma::subview_col<double> >(arma::subview_col<double> const&, boost::enable_if<IsVector<arma::subview_col<double> >, void>*) const + 135
allknn`double mlpack::bound::HRectBound<2, true>::MinDistance<arma::subview_col<double> >:
->  0x10001c9e7 <+135>: movsd  (%rdx,%rdi,8), %xmm3
    0x10001c9ec <+140>: subsd  %xmm3, %xmm2
    0x10001c9f0 <+144>: subsd  (%rcx), %xmm3
    0x10001c9f4 <+148>: movapd %xmm2, %xmm4

@johnlees
Copy link
Author

Ok, here's some more information about what I've done.
Firstly I have removed any currently existing versions of armadillo or mlpack anywhere on the system (which is Ubuntu 12.04)

I install arma 5.000.0 with

cmake -DCMAKE_CXX_COMPILER_ID=Intel -DCMAKE_CXX_COMPILER=icpc -DCMAKE_CXX_FLAGS=-O3 .
make
make install DESTDIR=~/software

I used your update to CMakeLists.txt as referenced above, then installed mlpack with

cmake -DCMAKE_INSTALL_PREFIX:PATH=~/software -DBOOST_ROOT=~/software ../
make
make install

which didn't warn about 64 bit words not being used

I then compile my application with

g++ -Wall -g -O0 -std=c++11 -I${HOME}/software/include -I/usr/include/libxml2 -c -o app.o app.cpp

and link with

g++ -Wall -g -O0 -std=c++11 -I${HOME}/software/include -I/usr/include/libxml2 app.o -L${HOME}/software/lib -lmlpack -larmadillo -lboost_program_options -lblas -llapack -lm -o app

however I still get a conflict due to word length

error: arma::memory::acquire(): out of memory

terminate called after throwing an instance of 'std::bad_alloc'
what():  std::bad_alloc

Program received signal SIGABRT, Aborted.
0x00007ffff57300d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6

(gdb) bt
#0  0x00007ffff57300d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff573383b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff602b2d5 in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff6029336 in __cxxabiv1::__terminate (handler=<optimised out>) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007ffff6029381 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007ffff6029598 in __cxxabiv1::__cxa_throw (obj=0x7fffec000940, tinfo=0x6bf600 <typeinfo for std::bad_alloc@@GLIBCXX_3.4>,
dest=0x416260 <std::bad_alloc::~bad_alloc()@plt>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:87
#6  0x0000000000418ca4 in arma::arma_stop_bad_alloc<char [39]> (x=...) at ~/software/include/armadillo_bits/debug.hpp:138
#7  0x000000000041a829 in arma::arma_check_bad_alloc<char [39]> (state=true, x=...) at ~/software/include/armadillo_bits/debug.hpp:378
#8  0x000000000041a63d in arma::memory::acquire<double> (n_elem=8589934596) at ~/software/include/armadillo_bits/memory.hpp:94
#9  0x000000000041a525 in arma::Mat<double>::init_warm (this=0x7fffffffab20, in_n_rows=4294967298, in_n_cols=2)
at ~/software/include/armadillo_bits/Mat_meat.hpp:311
#10 0x000000000042d4bf in arma::Mat<double>::set_size (this=0x7fffffffab20, in_rows=4294967298, in_cols=2)
at ~/software/include/armadillo_bits/Mat_meat.hpp:5638
#11 0x000000000047503d in mlpack::optimization::L_BFGS<mlpack::regression::LogisticRegressionFunction>::L_BFGS (this=0x7fffffffab10, function=...,
numBasis=5, maxIterations=0, armijoConstant=0.0001, wolfe=0.90000000000000002, minGradientNorm=1e-10, maxLineSearchTrials=50,
minStep=9.9999999999999995e-21, maxStep=1e+20) at ~/software/include/mlpack/core/optimizers/lbfgs/lbfgs_impl.hpp:63
#12 0x00000000004744a2 in mlpack::regression::LogisticRegression<mlpack::optimization::L_BFGS>::LogisticRegression (this=0x7fffffffb330, predictors=...,
responses=..., lambda=0) at ~/software/include/mlpack/methods/logistic_regression/logistic_regression_impl.hpp:33

This happens regardless of whether I have #define ARMA_64BIT_WORD in armadillo_bits/config.hpp commented or uncommented

I should note the identical process worked fine with armadillo.4.650

Inspecting the libraries

ldd app

libmlpack.so.1 => ~/software/lib/libmlpack.so.1 (0x00007f3ff4a18000)
libarmadillo.so.5 => ~/software/lib/libarmadillo.so.5 (0x00007f3ff4813000)
libboost_program_options.so.1.57.0 => ~/software/lib/libboost_program_options.so.1.57.0 (0x00007f3ff45a3000)
libblas.so.3gf => /usr/lib/libblas.so.3gf (0x00007f3ff403a000)
liblapack.so.3gf => /usr/lib/liblapack.so.3gf (0x00007f3ff341d000)
libstdc++.so.6 => /software/gcc-4.9.1/lib64/libstdc++.so.6 (0x00007f3ff3112000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3ff2e16000)
libgcc_s.so.1 => /software/gcc-4.9.1/lib64/libgcc_s.so.1 (0x00007f3ff2c00000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3ff2840000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3ff2623000)
libboost_unit_test_framework.so.1.57.0 => ~/software/lib/libboost_unit_test_framework.so.1.57.0 (0x00007f3ff237c000)
libboost_random.so.1.57.0 => ~/software/lib/libboost_random.so.1.57.0 (0x00007f3ff2176000)
libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f3ff1e1a000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3ff1c12000)
libmkl_rt.so => /software/intel-tools-2015/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_rt.so (0x00007f3ff169b000)
libimf.so => /software/intel-tools-2015/composer_xe_2015.0.090/compiler/lib/intel64/libimf.so (0x00007f3ff11e1000)
libsvml.so => /software/intel-tools-2015/composer_xe_2015.0.090/compiler/lib/intel64/libsvml.so (0x00007f3ff0592000)
libirng.so => /software/intel-tools-2015/composer_xe_2015.0.090/compiler/lib/intel64/libirng.so (0x00007f3ff038a000)
libintlc.so.5 => /software/intel-tools-2015/composer_xe_2015.0.090/compiler/lib/intel64/libintlc.so.5 (0x00007f3ff0130000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3feff2c000)
libgfortran.so.3 => /software/gcc-4.9.1/lib64/libgfortran.so.3 (0x00007f3fefc0f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3ff4f40000)
libboost_system.so.1.57.0 => ~/software/lib/libboost_system.so.1.57.0 (0x00007f3fefa0c000)
libquadmath.so.0 => /software/gcc-4.9.1/lib/../lib64/libquadmath.so.0 (0x00007f3fef7ce000)

My above ld error I think was caused by compiling mlpack against a different boost from the one that was then loaded (which I've now fixed)

@johnlees
Copy link
Author

If I use the mlpack binary ~/software/bin/logistic_regression however, it seems to work just fine (inspecting with ldd, the libraries are the same as my app, except mine also links libblas.so.3gf, libgfortran.so.3, liblapack.so.3gf, and libquadmath.so.0)

A further bit of info, my include directive in the .cpp file is

#include <mlpack/core.hpp> // this includes armadillo
#include <mlpack/methods/logistic_regression/logistic_regression.hpp>

@johnlees
Copy link
Author

Ah, I've been able to fix it!

I needed to uncomment the line in armadillo_bits/config.hpp before compiling mlpack. Not sure if this is an issue specific to me, or a problem with the cmake configuration of mlpack?

@rcurtin
Copy link
Member

rcurtin commented Apr 15, 2015

I'm assuming that you mean that for the fix, you had to uncomment #define ARMA_64BIT_WORD in config.hpp?

Either way, something is not completely adding up for me here. Can you give me more information on your application and how I can reproduce what is going on?

@rcurtin
Copy link
Member

rcurtin commented Apr 15, 2015

Also, Li, would you like to open a new issue for the problem you're encountering? Thanks.

@johnlees
Copy link
Author

Yes, that's how I fixed it. Uncommenting that line in arma's config.hpp before running cmake/make/make install for mlpack
The code that goes wrong in the application is a call to:

mlpack::regression::LogisticRegression<> fit(x_train.t(), y_train)

where x is a 200x3 arma::mat and y is a 200x1 arma::vec

Otherwise I think I've put all the information I can above. I think it's an installation issue rather than the application. Perhaps the state of ARMA_64BIT_WORD is incorrectly detected

Installing as above, followed by such a call should reproduce the error. If it doesn't, or it is difficult to follow the exact same steps, I'm happy to close this as solved/fixed!

@rcurtin
Copy link
Member

rcurtin commented Apr 15, 2015

The only way I can reproduce this is by failing to set LD_LIBRARY_PATH, so the runtime linker tries to use a version of libmlpack.so which was compiled with 32-bit uword; then it fails. But compiling out of the box as you suggested, I can't make a libarmadillo.so with 64-bit uword and a libmlpack.so with 32-bit uword.

So is it possible that there is a libmlpack.so hanging around somewhere still that the runtime linker is picking up on? (Your output of ldd suggests that this is not the case, since ld seems to be finding the correct libmlpack.so.)

The only other possibility I have not dug further into is that you've used Intel's compiler for Armadillo but gcc for mlpack. If you use Intel's compiler for mlpack, does this change anything?

@johnlees
Copy link
Author

Hmm, strange. I'm pretty sure I got rid of mlpack completely, and ldd would seem to agree. I can't figure out why we're having a difference, but perhaps it is the compiler
I've tried compiling mlpack with icpc, but it runs a lot slower, and doesn't solve this problem unfortunately.

Perhaps my installation of armadillo was incorrect too. Anyway, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants