Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for human-readable stacktraces #785

Merged
merged 16 commits into from
Oct 29, 2020
Merged

Add support for human-readable stacktraces #785

merged 16 commits into from
Oct 29, 2020

Conversation

ohm314
Copy link
Member

@ohm314 ohm314 commented Oct 20, 2020

This PR (see issue #733) uses the backtrace-cpp library to give rich backtrace information on segfaults and other terminating signals.

To get a nice stacktrace on a segfault with full source information (on linux) neuron, one has to enable the feature and provide paths to libbfd (part of binutils):

cmake .. \
 -DNRN_ENABLE_INTERVIEWS=OFF \
 -DNRN_ENABLE_MPI=OFF \
 -DNRN_ENABLE_RX3D=OFF \
 -DNRN_ENABLE_BACKTRACE=ON \
 -DBACKWARD_HAS_BFD=ON -DLIBBFD_LIBRARY=$BFDBASE/lib/libbfd.so -DLIBBFD_INCLUDE_DIR=$BFDBASE/include

Here is how a backtrace without backward-cpp looks like:

NEURON -- VERSION 8.0.dev-186-g623bdc4+ enh/733 (623bdc4+) 2020-10-20                                      [156/18942]
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2019
See http://neuron.yale.edu/neuron/credits

Segmentation violation
Backtrace:
        /lib64/libc.so.6 : ()+0x36340
        /lib64/libm.so.6 : ()+0x5b1e5
        /lib64/libm.so.6 : exp()+0x13
        /gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/build/lib/libnrniv.so : ()+0x1cffe8
        /gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/build/lib/libnrniv.so : nonvint()+0x72
        /gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/build/lib/libnrniv.so : ()+0x15cab6
        /gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/build/lib/libnrniv.so : ()+0x15cba8
        /gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/build/lib/libnrniv.so : nrn_fixed_step()+0xf4
        /gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/build/lib/libnrniv.so : ncs2nrn_integrate()+0x85
        /gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/build/lib/libnrniv.so : BBS::netpar_solve(double)+0x44
../../build/bin/nrniv: Aborting.
 in ring.hoc near line 116
 {pc.psolve(tstop)}
                   ^
        ParallelContext[0].psolve(1e+08)

and here is how it looks like with backward-cpp:

NEURON -- VERSION 8.0.dev-187-ge6ea6cf+ enh/733 (e6ea6cf+) 2020-10-20
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2019
See http://neuron.yale.edu/neuron/credits

Segmentation violation
Stack trace (most recent call last):
#12   Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/../oc/hoc_oop.c", line 684, in hoc_call_ob_
proc
        681:            hoc_pushstr(s);
        682:    }else{
        683:            double x;
      > 684:            x = (*(sym->u.u_proc->defn.pfd_vp))(ob->u.this_pointer);
        685:            pop_frame();
        686:            hoc_pushx(x);
        687:    }
#11   Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrniv/../parallel/ocbbs.cpp", line 676, in psolve
(void*)
        673:            nrncore_psolve(tstop);
        674:    }else if (enabled == 0) {
        675:            // Classic case
      > 676:            bbs->netpar_solve(tstop);
        677:    }
        678:    return double(enabled);
        679: }
#10   Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrniv/netpar.cpp", line 1299, in BBS::netpar_solv
e(double)
       1296:    };
       1297: //printf("%d netpar_solve exit t=%g tstop=%g mindelay_=%g\n",nrnmpi_myid, t, tstop, mindelay_);
       1298: #else // not NRNMPI
      >1299:    ncs2nrn_integrate(tstop);
       1300: #endif
       1301:    tstopunset;
       1302: }
#9    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrniv/../nrncvode/netcvode.cpp", line 3[66/22508]
s2nrn_integrate
       3722:            ts = tstop - .5*dt;
       3723:            while (nt_t < ts) {
       3724: #endif
      >3725:                    nrn_fixed_step();
       3726:                    if (stoprun) {break;}
       3727:            }
       3728:        }
#8    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/fadvance.c", line 328, in nrn_fixed_step
        325:                    }
        326:            //}
        327:    }else{
      > 328:            nrn_multithread_job(nrn_fixed_step_thread);
        329: /* if there is no nrnthread_v_transfer then there cannot be
        330:    a nrnmpi_v_transfer and lastpart
        331:    will be done in above call.
#7  | Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/multicore.c", line 1079, in nrn_m[49/22508]
_job
    |  1077: }
    |  1078:
    | >1079: void nrn_multithread_job(void*(*job)(NrnThread*)) {
    |  1080:    int i;
    |  1081: #if USE_PTHREAD
      Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/multicore.c", line 1099, in nrn_multithread
_job
       1096: #endif
       1097:            for (i=1; i < nrn_nthread; ++i) {
       1098:                    BENCHBEGIN(i)
      >1099:                    (*job)(nrn_threads + i);
       1100:                    BENCHADD(i+nrn_nthread)
       1101:            }
       1102:            BENCHBEGIN(0)
#6    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/fadvance.c", line 427, in nrn_fix[33/22508]
read
        424:    nth->_t += .5 * nth->_dt;
        425: #endif
        426:    fixed_play_continuous(nth);
      > 427:    setup_tree_matrix(nth);
        428:    nrn_solve(nth);
        429:    second_order_cur(nth);
        430:    update(nth);
#5    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/treeset.c", line 595, in setup_tree_matrix
        593: /* for the fixed step method */
        594: void* setup_tree_matrix(NrnThread* _nt){
      > 595:    nrn_rhs(_nt);
        596:    nrn_lhs(_nt);
        597:    nrn_nonvint_block_current(_nt->end, _nt->_actual_rhs, _nt->id);
        598:    nrn_nonvint_block_conductance(_nt->end, _nt->_actual_d, _nt->id);
#4    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/treeset.c", line 392, in nrn_rhs [17/22508]
        389:    for (tml = _nt->tml; tml; tml = tml->next) if (memb_func[tml->index].current) {
        390:            Pvmi s = memb_func[tml->index].current;
        391:            if (measure) { w = nrnmpi_wtime(); }
      > 392:            (*s)(_nt, tml->ml, tml->index);
        393:            if (measure) { nrn_mech_wtime_[tml->index] += nrnmpi_wtime() - w; }
        394:            if (errno) {
        395:                    if (nrn_errno_check(tml->index)) {
#3    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrnoc/hh.c", line 639, in _nrn_cur__hh
        636:   _dik = ik;
        637:  _rhs = _nrn_current(_p, _ppvar, _thread, _nt, _v);
        638:   _ion_dinadv += (_dina - ina)/.001 ;
      > 639:   _ion_dikdv += (_dik - ik)/.001 ;
        640:    }
        641:  _g = (_g - _rhs)/.001;
        642:   _ion_ina += ina ;
#2    Object "/lib64/libc.so.6", at 0x7fffeb54033f, in killpg
#1    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/oc/hoc.c", line 819, in sigsegvcatch
        816: RETSIGTYPE sigsegvcatch(int sig) /* segmentation violation probably due to arg type error */
        817: {
        818:    Fprintf(stderr, "Segmentation violation\n");
      > 819:    print_bt();
        820:    /*ARGSUSED*/
        821:    if (coredump) {
        822:            abort();
#0    Source "/gpfs/bbp.cscs.ch/home/awile/projects/cellular/nrn/src/nrniv/backtrace_utils.cpp", line 52, in backward_
wrapper
         50: void backward_wrapper() {
         51: #ifdef USE_BACKWARD
      >  52:     backward::StackTrace st; st.load_here(32);
         53:     backward::Printer p; p.print(st);
         54: #endif
         55: }
../../build/bin/nrniv: Aborting.
 in ring.hoc near line 116
 {pc.psolve(tstop)}
                   ^

On some platforms libc provides execinfo.h, which allows printing
the stack. We can use this feature to add stacktrace information to
the signal handlers of fatal errors such as SIGSEGV or SIGBUS.
This only works on macos for now. the regex will have to be adjusted for
linux.
Using backward-cpp we get proper backtraces on mac and backtraces+src on
linux. This fulfills #733 (but with the dependency on a new submodule)
@ohm314
Copy link
Member Author

ohm314 commented Oct 20, 2020

(I realize that I am missing somewhere still an #ifdef so this might need still a tiny bit of work, which I"ll do tonight)

ohm314 and others added 5 commits October 20, 2020 19:44
If backward is not desired or available let's fall back to our own
(simple) implementation. Still wip since linux backtrace format needs to
be understood
and minor improvements
Copy link
Member

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments/questions.

CMakeLists.txt Outdated Show resolved Hide resolved
src/nrniv/CMakeLists.txt Outdated Show resolved Hide resolved
src/nrniv/CMakeLists.txt Outdated Show resolved Hide resolved
src/oc/hoc.c Outdated Show resolved Hide resolved
src/oc/hoc.c Outdated Show resolved Hide resolved
src/nrniv/backtrace_utils.cpp Outdated Show resolved Hide resolved
src/nrniv/backtrace_utils.cpp Show resolved Hide resolved
Copy link
Member

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR locally and seems to be working fine!

Comment on lines +146 to +152
set(options CXX)
cmake_parse_arguments(nrn_check_include_files "${options}" "" "" ${ARGN})
if(${nrn_check_include_files_CXX})
check_include_file_cxx(${filename} ${variable})
else()
check_include_files(${filename} ${variable})
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify : this is currently not called with CXX option but only for C, right?

src/oc/hoc.c Outdated
Comment on lines 767 to 779
char* symbol = malloc(sizeof(char)*funcname_size);
char* offset = malloc(sizeof(char)*10);
char* funcname = malloc(sizeof(char)*funcname_size);
void* addr = NULL;
// get void*'s for maximum last 16 entries on the stack
size = backtrace(frames, nframes);

// print out all the frames to stderr
Fprintf(stderr, "Backtrace:\n");
bt_strings = backtrace_symbols(frames, size);
if (bt_strings) {
size_t i;
for(i = 2; i < size; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarification here about offset=10, i=2 etc would be helpful.

@pramodk
Copy link
Member

pramodk commented Oct 28, 2020

Just for the record, this is how I have tested this on BB5:

cmake ..  -DNRN_ENABLE_INTERVIEWS=OFF  -DNRN_ENABLE_MPI=OFF  -DNRN_ENABLE_RX3D=OFF  -DNRN_ENABLE_BACKTRACE=ON
make -j
./bin/nrniv ../test/ringtest/ring.hoc

and with binutils support as:

cmake ..  -DNRN_ENABLE_INTERVIEWS=OFF  -DNRN_ENABLE_MPI=OFF  -DNRN_ENABLE_RX3D=OFF  -DNRN_ENABLE_BACKTRACE=ON  -DBACKWARD_HAS_BFD=ON -DLIBBFD_LIBRARY=/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/tools/2020-02-01/linux-rhel7-x86_64/gcc-8.3.0/binutils-2.32-xy5oedscyd/lib/libbfd.so -DLIBBFD_INCLUDE_DIR=/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/tools/2020-02-01/linux-rhel7-x86_64/gcc-8.3.0/binutils-2.32-xy5oedscyd/include/
make -j
./bin/nrniv ../test/ringtest/ring.hoc

With python I wasn't able to get nice tracebacks because of the issue #797. But if I disable restore_signals then everything works fine.

#797 can be handled separately.

olupton added a commit that referenced this pull request Dec 7, 2022
* Tweak NVHPC warning suppressions.
* Emit nrn_pragma_{acc,omp}(...) macros. (#780)
* Use cnrn_target_ wrappers instead of acc_ API.
* GPU code generation improvements (#782)
* Fix NVHPC + OpenMP ~ OpenACC compilation (#784)
* Add EIGEN_DEVICE_FUNC to header to fix a compilation warning.
* Fudge partialPivLu<N> for NVHPC + OpenMP without OpenACC.
* Transfer ml only if cell is not artificial. (#785)
* Update Eigen to include OpenMP fixes. (#787, #789)

Co-authored-by: Nicolas Cornu <nicolas.cornu@epfl.ch>
Co-authored-by: Pramod Kumbhar <pramod.kumbhar@epfl.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants