New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SoftmaxRegression] munmap_chunk(): invalid pointer during smr.Train() #1358

Closed
bqv opened this Issue Apr 7, 2018 · 12 comments

Comments

Projects
None yet
4 participants
@bqv

bqv commented Apr 7, 2018

See https://gist.github.com/bqv/ada8359fe79da1ede98012d9e5579712 for a minimal reproducible example.

Backtrace:

#1  0x00007f8759677cf7 in __GI_abort () at abort.c:90
#2  0x00007f87596b8f87 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f87597bec38 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007f87596bf27a in malloc_printerr (str=str@entry=0x7f87597c04f8 "munmap_chunk(): invalid pointer") at malloc.c:5354
#4  0x00007f87596bf72c in munmap_chunk (p=<optimized out>) at malloc.c:2845
#5  0x00000000004155c8 in arma::memory::release<double const> (mem=0x7fffffffe180) at /usr/include/armadillo_bits/memory.hpp:145
#6  arma::Mat<double>::~Mat (this=0x7fffffffd830) at /usr/include/armadillo_bits/Mat_meat.hpp:29
#7  0x0000000000419b05 in mlpack::regression::SoftmaxRegressionFunction::~SoftmaxRegressionFunction (this=0x7fffffffd6b0) at /usr/include/mlpack/methods/softmax_regression/softmax_regression_function.hpp:21
#8  0x0000000000414fbf in mlpack::regression::SoftmaxRegression::Train<mlpack::optimization::GradientDescent> (this=0x7fffffffde10, data=..., labels=..., numClasses=4, optimizer=...)
    at /usr/include/mlpack/methods/softmax_regression/softmax_regression_impl.hpp:64
#9  0x000000000040add6 in mlpackMain () at mlpack.cpp:33
#10 0x000000000040aa2a in main (argc=2, argv=0x7fffffffe448) at /usr/include/mlpack/core/util/mlpack_main.hpp:68```
@bqv

This comment has been minimized.

bqv commented Apr 7, 2018

Further digging shows this error can be changed to Mat::init(): requested size is too large by changing numClasses in the initialization call of smr to 0

@bqv

This comment has been minimized.

bqv commented Apr 8, 2018

Further yet digging shows uninitialized memory regions in valgrind when run on the following code with a much smaller dataset:

#include <mlpack/core.hpp>
#include <mlpack/core/util/cli.hpp>
#define BINDING_TYPE BINDING_TYPE_CLI
#include <mlpack/core/util/mlpack_main.hpp>
#include <mlpack/methods/softmax_regression/softmax_regression.hpp>

using SoftmaxRegression = mlpack::regression::SoftmaxRegression;

static void mlpackMain()
{
    arma::mat dataset = arma::mat("0.0 1.0 2.0 3.0 4.0;"
                                  "1.0 1.0 2.0 3.0 3.0;"
                                  "2.0 1.0 2.0 3.0 2.0;"
                                  "3.0 1.0 2.0 3.0 1.0;"
                                  "4.0 1.0 2.0 3.0 0.0;");
    size_t data_dim = dataset.n_cols;
    arma::mat data = dataset.rows(0, data_dim - 2);
    arma::rowvec labelsRow = dataset.row(data_dim - 1);

    arma::Row<size_t> labels = arma::conv_to<arma::Row<size_t>>::from(labelsRow);
    SoftmaxRegression smr(data, labels, 5, 0.5);
}

The code compiles and runs with error "Mat::init(): requested size is too large" and when run with valgrind, shows a move depending on uninitialized values arising from double mlpack::regression::SoftmaxRegression::Train<mlpack::optimization::L_BFGS>(arma::Mat<double> const&, arma::Row<unsigned long> const&, unsigned long, mlpack::optimization::L_BFGS) (softmax_regression_impl.hpp:53)

@bqv

This comment has been minimized.

bqv commented Apr 8, 2018

More succinct example of this behaviour:

#include <mlpack/core.hpp>
#include <mlpack/core/util/cli.hpp>
#define BINDING_TYPE BINDING_TYPE_CLI
#include <mlpack/core/util/mlpack_main.hpp>
#include <mlpack/methods/softmax_regression/softmax_regression.hpp>

static void mlpackMain()
{
    arma::mat data;
    data.randu(5, 5);
    arma::Row<size_t> labels("0 1 2 3 4");
    mlpack::regression::SoftmaxRegressionFunction regressor(data, labels, 5, 0.5, false);
    arma::mat parameters = regressor.GetInitialPoint();
}
@rickllyxu

This comment has been minimized.

rickllyxu commented Apr 8, 2018

Different from your result, I runed the succinct version and got many undefined reference error. Can you tell me your compiling command?

@manish7294

This comment has been minimized.

Member

manish7294 commented Apr 8, 2018

@bqv I tried all your examples with the dataset provided by you and got all of them working smoothly. I think it might be a dependency issue.

@bqv

This comment has been minimized.

bqv commented Apr 8, 2018

I'm compiling with clang++ -std=c++11 -g -O0 -c -o minimal.o minimal.cpp and clang++ -g -o minimal minimal.o -lmlpack -lboost_program_options -larmadillo.

@bqv

This comment has been minimized.

bqv commented Apr 8, 2018

In case it is a dependency isssue, I'm running this on debian unstable with the packages:

libarmadillo8/testing,unstable,now 1:8.400.0+dfsg-2 amd64
libmlpack-dev/unstable,now 3.0.0-2 amd64
libmlpack3/unstable,now 3.0.0-2 amd64
@rcurtin

This comment has been minimized.

Member

rcurtin commented Apr 11, 2018

I reproduced this in a debian:sid Docker container. It's late now but I'll keep digging tomorrow.

@rcurtin

This comment has been minimized.

Member

rcurtin commented Apr 24, 2018

Ok, so I have done quite some digging on this and I am not to the bottom of it yet, but I have both a workaround and the information I need to keep digging.

The problem can be worked around (technically in an unsafe manner because it does not solve the actual problem) by moving the definition of arma::sp_mat groundTruth to the last member of SoftmaxRegressionFunction.

The smoking gun is this. When I run std::cout << "sizeof(arma::sp_mat): " << sizeof(arma::sp_mat) inside SoftmaxRegressionFunction::SoftmaxRegressionFunction() (i.e. in softmax_regression_function.cpp), I get:

sizeof(arma::sp_mat): 160

but when I run the same from softmax_regression_impl.hpp, I get:

sizeof(arma::sp_mat): 208

So the definitions of the class are different during compilation of mlpack and of the test program. And I haven't yet figured out exactly why that is. I'll keep you updated as I get deeper into it.

@rcurtin

This comment has been minimized.

Member

rcurtin commented Apr 25, 2018

Ok, the real issue here is that when mlpack was compiled, it was compiled with OpenMP. But when you compile with g++, the -fopenmp option isn't specified. This happens to change the size of the Armadillo sparse matrix (which behaves differently with and without OpenMP). So, if you compile with -fopenmp, this should resolve the issue.

I'm now working on a patch that will at least issue a warning during compilation if mlpack was compiled with -fopenmp and the program linking against mlpack is not using OpenMP.

@rcurtin

This comment has been minimized.

Member

rcurtin commented Apr 26, 2018

I opened #1382, which will issue a clear warning in situations like this. Once we merge that, I'll go ahead and release mlpack 3.0.1 and close this issue. Please do feel free to comment there (or here) if you have any input about the patch.

@rcurtin

This comment has been minimized.

Member

rcurtin commented Apr 30, 2018

Fixed with the merge of #1382. I'll go ahead and release mlpack 3.0.1 as a bugfix with this incorporated shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment