Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault on OS X #23

Open
peterswang opened this issue Aug 4, 2015 · 4 comments
Open

Segmentation Fault on OS X #23

peterswang opened this issue Aug 4, 2015 · 4 comments

Comments

@peterswang
Copy link

On 10.10.3, with Python 2.7.10 and boost and boost-python 1.58.0. Built using:

cmake .. -DCMAKE_BUILD_TYPE=Release -DDATA_DIR=~/test_images/coco-master/images/val2014 -DUSE_PYTHON=2
make -j9

Saw some warnings only, such as:
In file included from /Users/peterwang/CPP_Resources/lpo-release/lib/crf/crf.cpp:31:
/Users/peterwang/CPP_Resources/lpo-release/external/ibfs/ibfs.h:161:2: warning: 'Node' defined as a class here but previously declared as a struct
[-Wmismatched-tags]
class Node
^
/Users/peterwang/CPP_Resources/lpo-release/external/ibfs/ibfs.h:150:2: note: did you mean class here?
struct Node;
^~~~~~
class

Tried:

python train_lpo.py -f0 0.2 ../models/lpo_VOC_0.2.dat
and got: Segmentation fault: 11
This appears to have crashed on "from python.lpo import *" in lpo.py.

Crash report contained:
...
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 ??? 000000000000000000 0 + 0
1 org.python.python 0x0000000103d150dd PyEval_GetGlobals + 23
2 org.python.python 0x0000000103d2462b PyImport_Import + 137
3 org.python.python 0x0000000103d22d27 PyImport_ImportModule + 31
4 lpo.so 0x00000001033f45a3 init_numpy() + 19
5 lpo.so 0x00000001033f4779 defineUtil() + 25
6 lpo.so 0x00000001033f4499 init_module_lpo() + 9
7 libboost_python-mt.dylib 0x0000000103c36391 boost::python::handle_exception_impl(boost::function0) + 81
8 libboost_python-mt.dylib 0x0000000103c373b9 boost::python::detail::init_module(char const_, void (_)()) + 121
9 org.python.python 0x0000000101836327 _PyImport_LoadDynamicModule + 140
...

Saw the note in external/boost/readme.txt:
"In order to use a non-system boost library copy the "boost" and "libs" directory of a recent boost release (eg 1.57) here."

And in build/lib/python/CMakeFiles/lpo.dir/depend.make:
...
lib/python/CMakeFiles/lpo.dir/boost.cpp.o: /usr/local/include/boost/array.hpp
lib/python/CMakeFiles/lpo.dir/boost.cpp.o: /usr/local/include/boost/assert.hpp
lib/python/CMakeFiles/lpo.dir/boost.cpp.o: /usr/local/include/boost/bind.hpp
...

These seem to suggest the seg fault was due to boost version mismatch?

Is it sufficient to just do:

ln -s /usr/local/Cellar/boost/1.58.0 external/boost/
ln -s /usr/local/Cellar/boost/1.58.0/lib external/boost/libs

Or something else?

BTW, boost and boost-python were installed as part of setting up Caffe. The Caffe ImageNet model ran successfully when invoked from a Python test app.

Thanks for any light you could help shed.

@philkr
Copy link
Owner

philkr commented Aug 7, 2015

I don't think that the struct / class thing causes the segfault. Can you try to build it using cmake -DCMAKE_BUILD_TYPE=Debug, and then run either gdb or lldm on it?

@peterswang
Copy link
Author

Rebuilt with cmake -DCMAKE_BUILD_TYPE=Debug and run under lldb:

lldb -- python train_lpo.py -f0 0.2 ../models/lpo_VOC_0.2.dat
(lldb) target create "python"
Current executable set to 'python' (x86_64).
(lldb) settings set -- target.run-args "train_lpo.py" "-f0" "0.2" "../models/lpo_VOC_0.2.dat"
(lldb) breakpoint set -f boost.cpp -l 27
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
(lldb) r
Process 91895 launched: '/usr/local/bin/python' (x86_64)
Process 91895 stopped

  • thread matlab cmakelists GOP is included #1: tid = 0x67a797, 0x00007fff5fc01000 dyld_dyld_start, stop reason = exec frame #0: 0x00007fff5fc01000 dyld_dyld_start
    dyld`_dyld_start:
    -> 0x7fff5fc01000 <+0>: popq %rdi
    0x7fff5fc01001 <+1>: pushq $0x0
    0x7fff5fc01003 <+3>: movq %rsp, %rbp
    0x7fff5fc01006 <+6>: andq $-0x10, %rsp
    (lldb) br list
    Current breakpoints:
    1: file = 'boost.cpp', line = 27, locations = 0 (pending)

(lldb) br set -f contour.cpp -l 27
Breakpoint 2: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
(lldb) br set -n BOOST_PYTHON_MODULE
Breakpoint 3: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
(lldb) c
Process 91895 resuming
1 location added to breakpoint 2
Process 91895 stopped

  • thread matlab cmakelists GOP is included #1: tid = 0x67a797, 0x0000000000000000, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000000000000
    error: memory read failed for 0x0

I've not debugged Python wrapper using lldb, nor gdb, before. Trying to set breakpoints directly in C++ files while running a Python wrapper didn't work.

Could you advise on how to track this down in lldb?

@philkr
Copy link
Owner

philkr commented Aug 9, 2015

I think it added the breakpoints once lpo was loaded in python. See 1 location added to breakpoint 2.I think you're almost there, what is the backtrace for the EXC_BAD_ACCESS?

@peterswang
Copy link
Author

Thanks for pointing lldb msg. Here's the debug log & backtrace:

$ lldb -- python train_lpo.py -f0 0.2 ../models/lpo_VOC_0.2.dat
(lldb) target create "python"
Current executable set to 'python' (x86_64).
(lldb) settings set -- target.run-args "train_lpo.py" "-f0" "0.2" "../models/lpo_VOC_0.2.dat"
(lldb) br set -f lpo.cpp -l 38
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
(lldb) r
Process 7734 launched: '/usr/local/bin/python' (x86_64)
Process 7734 stopped

  • thread matlab cmakelists GOP is included #1: tid = 0x2ab3c, 0x00007fff5fc01000 dyld_dyld_start, stop reason = exec frame #0: 0x00007fff5fc01000 dyld_dyld_start
    dyld`_dyld_start:
    -> 0x7fff5fc01000 <+0>: popq %rdi
    0x7fff5fc01001 <+1>: pushq $0x0
    0x7fff5fc01003 <+3>: movq %rsp, %rbp
    0x7fff5fc01006 <+6>: andq $-0x10, %rsp
    (lldb) c
    Process 7734 resuming
    3 locations added to breakpoint 1
    Process 7734 stopped
  • thread matlab cmakelists GOP is included #1: tid = 0x2ab3c, 0x0000000103165a14 lpo.soinit_module_lpo() + 4 at lpo.cpp:38, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000103165a14 lpo.soinit_module_lpo() + 4 at lpo.cpp:38
    35
    36 BOOST_PYTHON_MODULE(lpo) {
    37 /************ Util **_/
    -> 38 defineUtil();
    39 #ifdef USE_DATASET
    40 /_
    * Dataset *************/
    41 defineDataset();
    (lldb) s
    Process 7734 stopped
  • thread matlab cmakelists GOP is included #1: tid = 0x2ab3c, 0x0000000103166524 lpo.sodefineUtil() + 4 at util.cpp:258, queue = 'com.apple.main-thread', stop reason = step in frame #0: 0x0000000103166524 lpo.sodefineUtil() + 4 at util.cpp:258
    255 BOOST_PYTHON_FUNCTION_OVERLOADS(rasterize2,rasterize,1,2)
    256 void defineUtil() {
    257 // NOTE: This file has a ton of macros and templates, so it's going to take a while to compile ...
    -> 258 init_numpy();
    259 boost::python::numeric::array::set_module_and_type("numpy", "ndarray");
    260
    261 register_exception_translator(&translateAssertException);
    (lldb) s
    Process 7734 stopped
  • thread matlab cmakelists GOP is included #1: tid = 0x2ab3c, 0x00000001031661e4 lpo.soinit_numpy() + 4 at util.cpp:250, queue = 'com.apple.main-thread', stop reason = step in frame #0: 0x00000001031661e4 lpo.soinit_numpy() + 4 at util.cpp:250
    247 }
    248 #else
    249 void init_numpy() {
    -> 250 import_array();
    251 }
    252 #endif
    253
    (lldb) s
    Process 7734 stopped
  • thread matlab cmakelists GOP is included #1: tid = 0x2ab3c, 0x000000010316622f lpo.so_import_array() + 15 at __multiarray_api.h:1632, queue = 'com.apple.main-thread', stop reason = step in frame #0: 0x000000010316622f lpo.so_import_array() + 15 at __multiarray_api.h:1632
    1629 _import_array(void)
    1630 {
    1631 int st;
    -> 1632 PyObject *numpy = PyImport_ImportModule("numpy.core.multiarray");
    1633 PyObject *c_api = NULL;
    1634
    1635 if (numpy == NULL) {
    (lldb) s
    Process 7734 stopped
  • thread matlab cmakelists GOP is included #1: tid = 0x2ab3c, 0x0000000000000000, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000000000000
    error: memory read failed for 0x0
    (lldb) bt
  • thread matlab cmakelists GOP is included #1: tid = 0x1d5d9, 0x0000000000000000, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)

So, it crashed when executing:
PyObject *numpy = PyImport_ImportModule("numpy.core.multiarray");

The Python 2.7.10 reference shows:
PyObject* PyImport_ImportModule(const char *name)

Does this mean there's a bug in the numpy/core/include/numpy/__multiarray_api.h?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants