Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMDB transaction time tracker #1947

Merged
merged 13 commits into from May 9, 2019

Conversation

@wezrule
Copy link
Collaborator

commented May 1, 2019

LMDB uses a multi-version concurrency control (MVCC) system which allows multiple threads to coordinate simultaneous access to a database. Each read is not affected by any writes (due to keeping multiple versions)` but it does mean that the memory cannot be reused until the read transaction has finished. LMDB database never gets smaller (unless copied and compacted), it only allows previous space to be reused.

This is a debug tool to help investigate ledger bloat. There is a new RPC call database_lock_tracker which takes a min_time and displays any currently open mdb read/write transactions which have been open for or longer than min_time. Use 0 to show all currently open transactions. When a transaction is opened for longer than the minimum in the txn_tracking config section various information is written to the log file (false by default).

New RPC command (Debug/MSVC)

{
  "action": "database_txn_tracker",
  "min_read_time" : "1000"
  "min_write_time" : "0"
}
{
    "txn_tracking": [
        {
            "thread": "Blck processing",  // What thread held the transaction
            "time_held_open": "2",          // Seconds the transaction has currently been held open for
            "write": "true",                       // If true it is a write lock, otherwise false.
            "stacktrace": [
                ...
                {
                    "name": "nano::mdb_store::tx_begin_write",
                    "address": "00007FF7142C5F86",
                    "source_file": "c:\\users\\wesley\\documents\\raiblocks\\nano\\node\\lmdb.cpp",
                    "source_line": "825"
                },
                {
                    "name": "nano::block_processor::process_batch",
                    "address": "00007FF714121EEA",
                    "source_file": "c:\\users\\wesley\\documents\\raiblocks\\nano\\node\\blockprocessor.cpp",
                    "source_line": "243"
                },
                {
                    "name": "nano::block_processor::process_blocks",
                    "address": "00007FF71411F8A6",
                    "source_file": "c:\\users\\wesley\\documents\\raiblocks\\nano\\node\\blockprocessor.cpp",
                    "source_line": "103"
                },
                {
                    "name": "<lambda_60e93c1383202c7d5b79d5d064c37649>::operator()",
                    "address": "00007FF71403BA2E",
                    "source_file": "c:\\users\\wesley\\documents\\raiblocks\\nano\\node\\node.cpp",
                    "source_line": "1048"
                },
                {
                    "name": "boost::detail::thread_data<<lambda_60e93c1383202c7d5b79d5d064c37649> >::run",
                    "address": "00007FF7140D3781",
                    "source_file": "c:\\boost\\include\\boost-1_69\\boost\\thread\\detail\\thread.hpp",
                    "source_line": "118"
                },
                {
                    "name": "boost::`anonymous namespace'::thread_start_function",
                    "address": "00007FF7147EF567",
                    "source_file": "c:\\users\\wesley\\downloads\\boost_1_69_0\\boost_1_69_0\\libs\\thread\\src\\win32\\thread.cpp",
                    "source_line": "296"
                }
                ...
            ]
        }
    ]
}

Example log output (Debug/MSVC):

[2019-05-01 18:27:56.093544]: 7s read  lock held for on thread Blck processing
 0# boost::stacktrace::basic_stacktrace<std::allocator<boost::stacktrace::frame> >::init at c:\boost\include\boost-1_69\boost\stacktrace\stacktrace.hpp:75
 1# boost::stacktrace::basic_stacktrace<std::allocator<boost::stacktrace::frame> >::basic_stacktrace<std::allocator<boost::stacktrace::frame> > at c:\boost\include\boost-1_69\boost\stacktrace\stacktrace.hpp:126
 2# std::_Ref_count_obj<boost::stacktrace::basic_stacktrace<std::allocator<boost::stacktrace::frame> > >::_Ref_count_obj<boost::stacktrace::basic_stacktrace<std::allocator<boost::stacktrace::frame> > ><> at c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\memory:1802
 3# std::make_shared<boost::stacktrace::basic_stacktrace<std::allocator<boost::stacktrace::frame> > > at c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\memory:1866
 4# nano::mdb_txn_stats::mdb_txn_stats at c:\users\wesley\documents\raiblocks\nano\node\lmdb_txn_tracker.cpp:130
 5# std::_Default_allocator_traits<std::allocator<nano::mdb_txn_stats> >::construct<nano::mdb_txn_stats,nano::transaction_impl * &,bool &> at c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\xmemory0:882
 6# std::vector<nano::mdb_txn_stats,std::allocator<nano::mdb_txn_stats> >::_Emplace_back_with_unused_capacity<nano::transaction_impl * &,bool &> at c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector:903
 7# std::vector<nano::mdb_txn_stats,std::allocator<nano::mdb_txn_stats> >::emplace_back<nano::transaction_impl * &,bool &> at c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector:919
 8# nano::mdb_txn_tracker::add at c:\users\wesley\documents\raiblocks\nano\node\lmdb_txn_tracker.cpp:109
 9# nano::mdb_store::tx_begin at c:\users\wesley\documents\raiblocks\nano\node\lmdb.cpp:843
10# nano::mdb_store::tx_begin_read at c:\users\wesley\documents\raiblocks\nano\node\lmdb.cpp:830
11# nano::block_processor::process_batch at c:\users\wesley\documents\raiblocks\nano\node\blockprocessor.cpp:236
12# nano::block_processor::process_blocks at c:\users\wesley\documents\raiblocks\nano\node\blockprocessor.cpp:103
13# <lambda_60e93c1383202c7d5b79d5d064c37649>::operator() at c:\users\wesley\documents\raiblocks\nano\node\node.cpp:1048
14# boost::detail::thread_data<<lambda_60e93c1383202c7d5b79d5d064c37649> >::run at c:\boost\include\boost-1_69\boost\thread\detail\thread.hpp:118
15# boost::`anonymous namespace'::thread_start_function at c:\users\wesley\downloads\boost_1_69_0\boost_1_69_0\libs\thread\src\win32\thread.cpp:296
16# register_onexit_function in ucrtbased
17# BaseThreadInitThunk in KERNEL32
18# RtlUserThreadStart in ntdll

The output shown varies with debug options/compilers, as is shown below with a RelWithDebInfo build on Ubuntu:

{
  "action": "database_txn_tracker",
  "min_read_time" : "0"
}
{
    "json": [
        {
            "thread": "Blck processing",
            "time_held_open": "0",
            "write": "true",
            "stacktrace": [
                {
                    "name": "",
                    "address": "0x56224fc9b859",
                    "source_file": "",
                    "source_line": "0"
                },
                {
                    "name": "",
                    "address": "0x56224fc9b8f2",
                    "source_file": "",
                    "source_line": "0"
                },
                {
                    "name": "clone",
                    "address": "0x7f74ea1a4b1f",
                    "source_file": "",
                    "source_line": "0"
                }
            ]
        }
    ]
}

Log file:

[2019-05-01 18:15:17.891103]: 6s read  lock held for on thread Vote processing
 0# 0x000055F27A2DA859 in ./nano_node
 1# 0x000055F27A2DA8F2 in ./nano_node
 6# 0x00007F672593B182 in /lib/x86_64-linux-gnu/libpthread.so.0
 7# clone in /lib/x86_64-linux-gnu/libc.so.6

Which seems pretty useless, but if the process is still running you can compare against the load address in /proc/<pid of nano_node>/maps to find the offsets (or if #1921 is merged that can be used as well) and use add2line/gdb to find something a bit more useful . The thread name is shown which should help regardless.

The new test takes 30 seconds on Debug/MSVC to generate the stacktraces, but only 2 seconds on travis so I haven't moved it to slow_tests.

When greping the log files output is like so:

[2019-05-01 13:51:23.262439]: 5s write lock held for on thread: Blck processing
[2019-05-01 13:51:34.987488]: 5s write lock held for on thread: Blck processing
[2019-05-01 13:53:06.155944]: 5s read  lock held for on thread: Vote processing
[2019-05-01 13:53:13.258375]: 6s read  lock held for on thread: Vote processing
[2019-05-01 13:54:22.728551]: 5s write lock held for on thread: Blck processing
[2019-05-01 13:54:22.729473]: 5s read  lock held for on thread: Conf height
[2019-05-01 13:54:29.013000]: 5s write lock held for on thread: Blck processing

@wezrule wezrule added this to the V19.0 milestone May 1, 2019

@wezrule wezrule requested review from SergiySW and cryptocode May 1, 2019

@wezrule wezrule self-assigned this May 1, 2019

@zhyatt zhyatt added this to During RC in V19 May 1, 2019

void nano::mdb_txn_tracker::add (const nano::transaction_impl * transaction_impl)
{
std::lock_guard<std::mutex> guard (mutex);
stats.emplace_back (transaction_impl);

This comment has been minimized.

Copy link
@cryptocode

cryptocode May 4, 2019

Collaborator

This shows up in the profiler, taking ~50% of the time spent in starting a read tx. I suspect it's the instantiation of the stack trace object induced by the emplace. Maybe a flag to enable/disable the tracking feature is needed.

This comment has been minimized.

Copy link
@wezrule

wezrule May 5, 2019

Author Collaborator

Made a new config node "diagnostics" which enables turning this on/off. Have made read/write minimum time open for tracking options configurable using this as well (as opposed to be just a hardcoded single value for both). While this may appear more suited to a CLI debug command as it is most useful to developers, I think a config node option makes more sense as we are likely going to want this enabled for all of our runs (rather than having to remember to set it every single time) as we can't retroactively find out what went wrong afterwards when there is ledger bloat. I imagine we will find more uses for this config in the future as well.

@cryptocode cryptocode self-requested a review May 4, 2019

Pressed the wrong button, was a request for comment :)

@wezrule wezrule merged commit 68aacfc into nanocurrency:master May 9, 2019

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@wezrule wezrule deleted the wezrule:mdb_lock_time_tracker branch May 9, 2019

@zhyatt zhyatt moved this from During RC to RC 3 (TBD) in V19 May 12, 2019

argakiig added a commit to argakiig/raiblocks that referenced this pull request May 22, 2019

LMDB transaction time tracker (nanocurrency#1947)
* Add MDB lock time tracker

* Update test and add conditional logging

* Remove unnecessary changes

* Modify the minimum time for a lock held before outputting in the log file to 5 seconds (from 10)

* Deference stacktrace not pass pointer...

* Change lock to be txn

* Formatting

* Increase granularity (use milliseconds instead of seconds)

* Update test to use milliseconds in comparison

* Allow erase to be called without error

* Add diagnostics config option & allow separate read/write minimum transaction open time tracking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.