New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure on shutdown #6

Closed
shane-kerr opened this Issue Feb 9, 2018 · 4 comments

Comments

Projects
None yet
1 participant
@shane-kerr
Owner

shane-kerr commented Feb 9, 2018

We get an assertion failure on shutdown:

mintcoind: /usr/include/boost/thread/pthread/recursive_mutex.hpp:113: void boost::recursive_mutex::lock(): Assertion `!pthread_mutex_lock(&m)' failed.

This happens on the Qt (GUI) version too.

One possible approach to fixing this would be to enable core dumps and then using gdb to look at the stack that is raising this exception. This may not help, because this sort of error can happen a long way in time and space from where it was created. 😢

@shane-kerr

This comment has been minimized.

Owner

shane-kerr commented Feb 11, 2018

Good news, everyone!

https://www.geek.com/wp-content/uploads/2016/03/GoodNewsEveryone-625x350.jpg

I was able to reproduce the assertion failure and get a stack trace:

Core was generated by `./mintcoind'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb7766cf9 in __kernel_vsyscall ()
[Current thread is 1 (Thread 0xb2cfeb40 (LWP 28803))]
(gdb) bt
#0  0xb7766cf9 in __kernel_vsyscall ()
#1  0xb6dc1dc0 in __libc_signal_restore_set (set=0xb2cfde10)
    at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xb6dc3287 in __GI_abort () at abort.c:89
#4  0xb6dbaa17 in __assert_fail_base (
    fmt=0xb6ef66ac "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x622bd1 "!pthread_mutex_lock(&m)", 
    file=0x6227a0 "/usr/include/boost/thread/pthread/recursive_mutex.hpp", line=113, 
    function=0x657480 <_ZZN5boost15recursive_mutex4lockEvE19__PRETTY_FUNCTION__> "void boost::recursive_mutex::lock()") at assert.c:92
#5  0xb6dbaa9b in __GI___assert_fail (assertion=0x622bd1 "!pthread_mutex_lock(&m)", 
    file=0x6227a0 "/usr/include/boost/thread/pthread/recursive_mutex.hpp", line=113, 
    function=0x657480 <_ZZN5boost15recursive_mutex4lockEvE19__PRETTY_FUNCTION__> "void boost::recursive_mutex::lock()") at assert.c:101
#6  0x004f7dfb in boost::recursive_mutex::lock (this=0x702ce8 <cs_vOneShots>)
    at /usr/include/boost/thread/pthread/recursive_mutex.hpp:113
#7  boost::unique_lock<boost::recursive_mutex>::lock (this=0xb2cfe10c)
    at /usr/include/boost/thread/lock_types.hpp:346
#8  CMutexLock<boost::recursive_mutex>::Enter (pszName=<optimized out>, 
    pszFile=<optimized out>, nLine=<optimized out>, this=0xb2cfe10c) at sync.h:52
#9  CMutexLock<boost::recursive_mutex>::CMutexLock (this=this@entry=0xb2cfe10c, 
    mutexIn=..., fTry=fTry@entry=false, nLine=<optimized out>, pszFile=<optimized out>, 
    pszName=<optimized out>) at sync.h:85
#10 0x004fb709 in ProcessOneShot () at net.cpp:1425
#11 0x004fe01d in ThreadOpenConnections2 (parg=0x0) at net.cpp:1489
#12 0x004fec2f in ThreadOpenConnections (parg=0x0) at net.cpp:1408
#13 0xb7675819 in ?? () from /usr/lib/i386-linux-gnu/libboost_thread.so.1.62.0
#14 0xb714127a in start_thread (arg=0xb2cfeb40) at pthread_create.c:333
#15 0xb6e7db56 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:110
@shane-kerr

This comment has been minimized.

Owner

shane-kerr commented Feb 11, 2018

Okay, it looks like this was an issue in the Bitcoin Core that Novacoin was forked from, which MintCoin was forked from:

bitcoin/bitcoin#2704

We don't know if the change mentioned there (switching to boost::asio deadline timers) actually fixed the problem, since the reporter didn't reply and the issue is still open, but my guess is that since we haven't seen any follow-up that it probably did.

Sadly that change is probably beyond the scope of what we want to do with the existing wallet code. ☹️

@shane-kerr

This comment has been minimized.

Owner

shane-kerr commented Feb 11, 2018

As often happens the lovely Stack Overflow site gave me a clue:

https://stackoverflow.com/questions/6858574/assertion-on-mutex-when-using-multiple-threads-and-mutexes

What seems to be happening is that we are exiting the program before the connection thread exits. Most likely exit() somehow starts to clean up the stack with the thread still running, trying to use the stack to store a mutex variable. We can see this if we look at the debug.log output from a run that had the assertion failure:

StopNode()
SetBestChain: new best=a8e685e9fede097c03936d9132e5a8b501dadbb0823726c406e99e7446f0b80f  height=1181030  trust=000000000000000000000000000000000000000000000000000d20ffe17281bd  date=01/30/15 12:55:52
Stake checkpoint: a2541238
ProcessBlock: ACCEPTED
ThreadMessageHandler exited
ThreadStakeMinter exiting, 0 threads remaining
Flushed 12239 addresses to peers.dat  177ms
Flush(true)
DBFlush(true) ended               0ms
MintCoin exited

connection timeout

Here we see the connection timeout message after the MintCoin exited message. That means that the connection thread is running after the main program has exited, which is probably why we are having the assertion failure.

The cause for this is that the code deliberately reduces the thread count before trying to connect:

    vnThreadsRunning[THREAD_OPENCONNECTIONS]--;
    CNode* pnode = ConnectNode(addrConnect, strDest);
    vnThreadsRunning[THREAD_OPENCONNECTIONS]++;

Probably this was added since the default timeout for connecting to a node is 5 seconds, and someone decided that this was too long to wait.

The solution to this is simple: remove the lines of code which adjust the thread count before and after the connection attempt. The drawback is that we may have to wait 5 seconds for the program to end if it is trying to connect. I think this is a fair trade-off.

Note that this is also why changing to boost::asio fixed the problem for Bitcoin Core. The ASIO library - being asynchronous - does not block at any time, so the program can exit immediately no matter what network I/O is in progress. No trickery involved. It was a good decision. 😉

@shane-kerr

This comment has been minimized.

Owner

shane-kerr commented Feb 11, 2018

@shane-kerr shane-kerr self-assigned this Feb 16, 2018

@shane-kerr shane-kerr added the bug label Feb 16, 2018

@shane-kerr shane-kerr closed this Mar 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment