Concurrent RPC requests against the ClaimTrie blocked critical section lock #126

tiger5226 · 2018-04-24T01:27:40Z

Summary

In syncing the ClaimTrie in LbryCRD with Chainquery a bottleneck was identified where a mutex lock on the critical section was blocking concurrent RPC connections from accessing the claimtrie in the same RPC call. The example is the getclaimsforname call. The mutex lock is located in several places:

Affected Code

getclaimsintrie
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L36

getclaimtrie
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L108

getvalueforname
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L182

getclaimsforname
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L308

getclaimbyid
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L373

getotalclaimednames
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L414

gettotalclaims
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L434

getotalvalueofclaims
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L456

getclaimsfortx
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L495

getnameproof
https://github.com/lbryio/lbrycrd/blob/master/src/rpc/claimtrie.cpp#L720

Proposal

These locks are locking a critical section. It is used in many areas of bitcoin. However, since we are dealing with the claimtrie, we should not use a critical section lock. We should implement a shared mutex that allows for RW/RO shared/exclusive access.

The processing of a block should require exclusive access to the claimtrie due to the ability to add, delete, update claimtrie nodes. Since the RPC calls do not modify but read the trie these should provide for shared access allowing concurrent reads of the claimtrie.

It appears this lock was simply copied from other bitcoin rpc calls.

Use Case

The use case is where you are processing claim specific information where you can get the claims in the trie with getclaimsintrie, however subsequent calls to getclaimsforname are required to get certain fields like effective_amount or valid_at_height.

Reproduction 1 - Chainquery

chainquery.zip(linux)
run: ./chainquery serve
branch: https://github.com/lbryio/chainquery/tree/lbrycrdissue
code: https://github.com/lbryio/chainquery/blob/lbrycrdissue/daemon/jobs/claimtriesync.go#L21

Outputs time per call of iterating through the claim names and getting the claims for each name from lbrycrd.

Reproduction 2 - Apache Benchmark

The text was updated successfully, but these errors were encountered:

kauffj · 2018-04-24T15:23:32Z

This is a great ticket. Thank you @tiger5226 !

BrannonKing · 2018-05-24T14:20:31Z

I'm planning to work on this in the upcoming week.

tiger5226 · 2018-05-24T23:10:46Z

Excellent! Let me know if you need any assistance with the reproduction! Happy to help!

@shyba Also had a reproduction that did not make it into the issue but I wanted to get it from him for you, in case you would like to use an apache benchmark for the reproduction:

BrannonKing · 2018-05-26T14:15:23Z

In looking at this, I see three approaches (in order of difficulty):

Modify the cs_main lock to be shareable (reader/writer).
Separate out cs_main into multiple locks, with a shareable one for pclaimTrie.
Change the data structures to be immutable.

I'm planning to go with _1. This will add overhead to each of those lock calls; it assumes that the majority of our calls are reader operations and that each lock protects hundreds to thousands of CPU instructions. _2 would require some restructuring, and I'm not sure it would perform any better than _1. _3 would require more effort than I can commit to at this time.

As sub-options of _1:

Use the boost shareable/upgradable mutex.
Use the C++14 shareable mutex.
Pull the fast shareable mutex code from Bloomberg BSL.

Again, I'm planning sub-option _1, given the code precedent and my time constraints. Let me know if you have thoughts to the contrary or other ideas. Thanks.

tiger5226 · 2018-05-26T14:30:39Z

We probably want @lyoshenka and @kaykurokawa to chime in here, but I would think we want a separate lock for the claimtrie ( option 2). I am not sure of the implications of option one as that is used for the blockchain ( blocks,transactions etc) rpc calls.

BrannonKing · 2018-05-26T18:26:45Z

Other reasons I was leaning towards modifying cs_main: there are many places in main.cpp that could benefit from the reader/writer approach; it appears that most operations there are merely reading the present state. Very few places actually modify state. Hence, we would benefit at that level in addition to the RPC level. When the miner is running it does request the exclusive lock much more frequently, but that situation would be no worse than it is now. Second, many of the RPC methods use both pcoinsTip and pclaimTrie while main.cpp uses a combo of pcoinsTip and chainActive. I'm obviously new to this codebase; I don't have a firm grip on those relationships yet. However, I was hesitant to add in an additional lock to main.cpp. Nested locks always increase the deadlock risk.

tiger5226 · 2018-05-26T23:49:15Z

Excellent points. It sounds more risky to add a lock than to extend the current locks features. I am not familiar enough with main.cp to make a call. @lyoshenka , @kaykurokawa will respond soon. I assign it to them so it shows up in their queues.

lyoshenka · 2018-05-27T15:42:10Z

@kaykurokawa @lbrynaut @shyba can you guys please take a look at this and give Brannon some guidance? thanks :-)

kaykurokawa · 2018-05-28T03:47:19Z

I'm going to rule out 1) Modify the cs_main lock to be shareable (reader/writer)

We don't want to make big changes to any components that we inherited from Bitcoin Core unless if its absolutely necessary. This is because a) Bitcoin Core has far more eyes on it than we have and b) it will make it easier for us to pull upstream changes from Bitcoin Core

I think 2) is a much better option. I'll have to think a bit about how to best approach 2) but I just wanted to chime in with this thought for now.

BrannonKing · 2018-05-30T05:33:38Z

I discovered that approach 1) is more difficult than I first thought: the recursive lock mechanism doesn't work out-of-the-box for shared locks. I've come to despise recursive locks (even though I've used them for years in C#). I've seen some recursive shared locks on the web; I think I'd have to massage one of those before I trusted it. The other angle would be to map out all the LOCK(cs_main) calls and eliminate the recursive usage; this is fairly invasive and should have been done upstream years ago.

As for approach 2), I started on that here: https://github.com/BrannonKing/lbrycrd/tree/shared_locks_claimtrie . It still has several deadlocks in it. It struggles with the lack of a recursive shared lock as well; I think I can map out the call tree for that, though, as there are much fewer calls. I've been pondering on using a different type of critical section protection for that as well.

lbrynaut · 2018-05-30T12:00:21Z

@BrannonKing I didn't review in detail, but this approach looks like a dead-end. Boost upgrade/shared locks are the way to go, but I strongly agree with @kaykurokawa that there are many subtleties to playing with the cs_main lock, and it's not a quick fix to replace/modify the usage of it. Adding more locks in the meantime is not beneficial.

tiger5226 · 2018-05-30T21:51:58Z

@kaykurokawa ruled out 1 ( upgrade/shared locks) and it appears you are suggesting that while agreeing it is a big lift. So is the suggestion that this is not beneficial to solve? Just want to make sure @BrannonKing has the guidance he needs.

lbrynaut · 2018-05-31T00:57:23Z

@tiger5226 Compared to the implementation attempt of 2), I was commenting that upgrade locks are preferable to additional locks/critical sections.

I'm new to this specific issue, so I'll throw out some thoughts that may have been discussed before. I believe the existing subtleties of the locking are so nuanced, there's a very good reason they haven't been changed upstream -- it's not laziness. Personally, I don't have my head wrapped around this anymore, though I tried in the past and realized it was really nasty (admittedly years ago so it may have improved ... but probably not). When one has a full understanding of the implications of the locking, it's near trivial to decide on what solution makes the most sense. Perhaps it's even what's there. But I would estimate that's a difficult understanding to achieve in the first place (not impossible, just a trade-off of resources).

Since most claimtrie accesses are made under existing locks and avoid races, it seems to me that adding additional locks won't serve much purpose. If the goal instead is to remove/replace existing locks, see the above thoughts. So maybe I just don't understand what the issue is, or what is being attempted?

But if I do, my vote is to wait until upstream resolves this a bit more and instead explore alternative approaches in the meantime (like load-balancing across multiple daemons, caching, or whatever makes sense for what's really needed).

tiger5226 · 2018-05-31T02:13:33Z

All clear! Thanks a ton for the added thoughts, very helpful. I was able to work around this bottleneck rather easily by making a fraction of the calls. I am not sure anyone else is suffering from this issue, hence I don't believe there is any urgency.

@lyoshenka This makes sense and we probably don't want to take on the risk. Thoughts?

BrannonKing · 2018-06-01T05:08:39Z

I like the suggestion for load balancing daemons. I think a Docker Swarm or Kubernetes approach would work very well. Also, for the situation where you need to make multiple RPC calls for a single piece of data -- it makes a lot of sense to make a new RPC call that returns all the data that you need in a single pull. I think those are both great workarounds.

I did want to understand how recursive the cs_main mutex really was. I modified it to not be recursive and made the recursion checks at the individual locations. You can see my work here: https://github.com/BrannonKing/lbrycrd/tree/testNonRecursiveCsMain . I didn't finish the exercise; the daemon does not run, but it's close. Most of the tests run. Conclusively, almost all lock calls are recursive. As an aside, there is a unit test failure that seems legitimate; I left a comment in checkqueue.h about that. Also, see my fix in claimtrie_tests.cpp.

With that exercise I discovered that most of the recursive lock calls come from the unit test scenarios more than the actual code. At the same time, this library has no definitive public API. If any method is an entry point, who is responsible for the necessary preconditioning of each method?

I still intend to run an exercise using a shared recursive lock.

BrannonKing · 2018-06-05T01:56:25Z

I did make a branch with a recursive shared mutex for cs_main: https://github.com/BrannonKing/lbrycrd/tree/use_recursive_shared_lock. It compiles and runs successfully using some code I acquired and extended from CodeReview. I've been trying to benchmark it using the apache benchmark (ab) call shown above. I'm not sure how much data I would have to put in to make the trie lookup take any significant time. Can I just run the cli generate command for that? I've added 10k, but the lookup still takes hardly no time. Or does somebody with a lot of data want to run my branch?

kaykurokawa · 2018-06-06T23:46:56Z

I'm not sure we should merge this but I believe @tiger5226 should at least try this branch ( https://github.com/BrannonKing/lbrycrd/tree/use_recursive_shared_lock ) out and let us know if he seems any improvements and works properly for his chainquery application.

tiger5226 · 2018-06-06T23:57:28Z

@BrannonKing I would not mind testing it out with the chainquery branch. However, there is another issue blocking me from building lbrycrd on my mac pro. If you can get me the binary I can test it out.

kauffj · 2018-06-07T15:57:19Z

@kaykurokawa maybe it's time to sort out that mac build issue?

bvbfan · 2018-06-17T06:18:42Z

About me, code written by @BrannonKing looks good, however recursive shared lock is a mess as design. I would do protect only shared resources, i.e. like this

template <typename T, typename M = CCriticalSection>
class CCProtector {
    T *data;
    M *mutex;
    struct CCProxy {
        T *data;
        M *mutex;
        CCProxy(T *d, M *m) : data(d), mutex(m) 
        {
            mutex->lock();
        }
        ~CCProxy()
        {
            mutex->unlock();
        }
        T* operator->()
        {
            return data;
        }
    };
public:
    CCProtector(T *d, M *m) : data(d), mutex(m) {}
    CCProxy operator->()
    {
        return CCProxy(data, mutex);
    }
};
// usage
CCProtector<CClaimTrie> claim_guard(pClaimTrie, &cs_main);
claim_gurd->flattenTrie(); <--------- mutex->lock(); call flattenTrie; mutex->unlock();

BrannonKing · 2018-06-19T01:04:01Z

@bvbfan , if you look through the code, particularly in the main header, you'll see that many of the mutexes protect multiple data structures. That's quite unfortunate -- it makes it impossible to do as you suggest and make the protection explicit in code. I would prefer to take things a step farther; use concurrent or immutable data structures. IOW, fully encapsulate the mutexes into the structures that rely on them. However, it seems that the current goal is to keep the codebase close to the original bitcoin fork so that newer bitcoin codebase fixes can be merged in. (Hence, we're free to use better structures in the LBRY portion of the code, and should work to keep that as untangled in the bitcoin codebase as possible.)

bvbfan · 2018-06-19T04:23:56Z

The problem with readers/writers is that you can't be sure whatever thread is read or write. So let's see CCoinsViewCache::AccessCoins through CCoinsViewCache::FetchCoins it can, potentially, write to cacheCoins, it can be done while other thread reads it -> data race, right?
Right approach should be - accessing structure through mutex, accessing element by copying.

BrannonKing · 2018-08-01T17:34:00Z

Waiting on #174

BrannonKing · 2020-03-25T19:42:18Z

While it's true that using a custom read/write recursive mutex was workable, it was a fairly large enhancement on existing code. It wasn't clear that this was a maintainable path going forward. With much of the data now easily accessible via sqlite, this seems less valuable. It could be revisited if and when upstream no longer uses a recursive mutex for cs_main

lbry-bot added the needs: triage label Apr 24, 2018

lyoshenka added type: improvement Existing (or partially existing) functionality needs to be changed help wanted good first issue level: 2 Some knowledge of the existing code is recommended and removed needs: triage labels Apr 24, 2018

tiger5226 assigned lyoshenka and kaykurokawa May 26, 2018

lyoshenka removed their assignment Jun 21, 2018

BrannonKing self-assigned this Aug 1, 2018

BrannonKing removed good first issue help wanted labels Aug 2, 2018

alyssaoc unassigned BrannonKing and kaykurokawa Nov 21, 2018

BrannonKing added level: 4 Intimate knowledge of the existing code is recommended and removed level: 2 Some knowledge of the existing code is recommended labels Sep 26, 2019

BrannonKing closed this as completed Mar 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent RPC requests against the ClaimTrie blocked critical section lock #126

Concurrent RPC requests against the ClaimTrie blocked critical section lock #126

tiger5226 commented Apr 24, 2018 •

edited

Loading

kauffj commented Apr 24, 2018

BrannonKing commented May 24, 2018

tiger5226 commented May 24, 2018 •

edited

Loading

BrannonKing commented May 26, 2018 •

edited

Loading

tiger5226 commented May 26, 2018

BrannonKing commented May 26, 2018

tiger5226 commented May 26, 2018

lyoshenka commented May 27, 2018

kaykurokawa commented May 28, 2018

BrannonKing commented May 30, 2018

lbrynaut commented May 30, 2018

tiger5226 commented May 30, 2018

lbrynaut commented May 31, 2018

tiger5226 commented May 31, 2018 •

edited

Loading

BrannonKing commented Jun 1, 2018

BrannonKing commented Jun 5, 2018 •

edited

Loading

kaykurokawa commented Jun 6, 2018 •

edited

Loading

tiger5226 commented Jun 6, 2018

kauffj commented Jun 7, 2018

bvbfan commented Jun 17, 2018 •

edited

Loading

BrannonKing commented Jun 19, 2018

bvbfan commented Jun 19, 2018 •

edited

Loading

BrannonKing commented Aug 1, 2018

BrannonKing commented Mar 25, 2020

Concurrent RPC requests against the ClaimTrie blocked critical section lock #126

Concurrent RPC requests against the ClaimTrie blocked critical section lock #126

Comments

tiger5226 commented Apr 24, 2018 • edited Loading

Summary

Affected Code

Proposal

Use Case

Reproduction 1 - Chainquery

Reproduction 2 - Apache Benchmark

kauffj commented Apr 24, 2018

BrannonKing commented May 24, 2018

tiger5226 commented May 24, 2018 • edited Loading

BrannonKing commented May 26, 2018 • edited Loading

tiger5226 commented May 26, 2018

BrannonKing commented May 26, 2018

tiger5226 commented May 26, 2018

lyoshenka commented May 27, 2018

kaykurokawa commented May 28, 2018

BrannonKing commented May 30, 2018

lbrynaut commented May 30, 2018

tiger5226 commented May 30, 2018

lbrynaut commented May 31, 2018

tiger5226 commented May 31, 2018 • edited Loading

BrannonKing commented Jun 1, 2018

BrannonKing commented Jun 5, 2018 • edited Loading

kaykurokawa commented Jun 6, 2018 • edited Loading

tiger5226 commented Jun 6, 2018

kauffj commented Jun 7, 2018

bvbfan commented Jun 17, 2018 • edited Loading

BrannonKing commented Jun 19, 2018

bvbfan commented Jun 19, 2018 • edited Loading

BrannonKing commented Aug 1, 2018

BrannonKing commented Mar 25, 2020

tiger5226 commented Apr 24, 2018 •

edited

Loading

tiger5226 commented May 24, 2018 •

edited

Loading

BrannonKing commented May 26, 2018 •

edited

Loading

tiger5226 commented May 31, 2018 •

edited

Loading

BrannonKing commented Jun 5, 2018 •

edited

Loading

kaykurokawa commented Jun 6, 2018 •

edited

Loading

bvbfan commented Jun 17, 2018 •

edited

Loading

bvbfan commented Jun 19, 2018 •

edited

Loading