History store row allocator requires concurrency guard. #150

sv-91 · 2017-10-13T09:18:02Z

Look at the address 1BrT827NCgxjctnEBdLiuDzukupwWHP1i2.
He owns the transaction e521a32cc35519164f6fd74e3b682071e3880c1b5099e5ae863ef340eb916097 (input and output) (https://blockexplorer.com/tx/e521a32cc35519164f6fd74e3b682071e3880c1b5099e5ae863ef340eb916097).
But in the history database, there is no output information.
This can be seen, for example, by transaction f26104fed45e7aaedfddcdeb713d12c1d2369330387b316f1797562c58f0be78.
Bitcoin explorer can not match the input of this transaction to the output

./bx fetch-history -c ./bx.cfg 1BrT827NCgxjctnEBdLiuDzukupwWHP1i2

transfer
{
    spent
    {
        hash b0abd44258e15418f4807021f53e462ace308fac23ce1b493f7dcbed79e4b75c
        height 476410
        index 5
    }
    value 18446744073709551615
}
transfer
{
    spent
    {
        hash f26104fed45e7aaedfddcdeb713d12c1d2369330387b316f1797562c58f0be78
        height 473014
        index 7
    }
    value 18446744073709551615
}

Is it possible to solve this problem?

The text was updated successfully, but these errors were encountered:

sv-91 · 2017-10-13T09:32:38Z

And in the local data base, losses are even greater
./bx fetch-history -c ./bx.cfg 1BrT827NCgxjctnEBdLiuDzukupwWHP1i2 | grep -c 18446744073709551615
43

evoskuil · 2017-10-13T14:27:50Z

What is meant by "local database" above?

sv-91 · 2017-10-13T17:29:01Z

This is if I run the libbitcoin server and in the explorer config writing local url.
But, as I said above, the problem also appears with the default config

evoskuil · 2017-10-13T17:53:52Z

This is what I get when running against the default community server:

history.txt

Note that the last two spends have no correlated outputs (receipts). This implies that the outputs are not indexed to the address 1BrT827NCgxjctnEBdLiuDzukupwWHP1i2, which can be caused by nonstandard scripts (including segwit for more recent txs).

It is also possible that spend correlation is being impacted by correlation ID hash collision. If the server is fully-indexed and the outputs are standard (allowing for address parsing) then this is the only possible cause short of a code error.

The use of a 64 bit hash to store input-output correlations in the spend table is a longstanding weakness of libbitcoin-server. It does not impact validation as the spend table only supports server queries. In v4 the spend table will be eliminated altogether and these queries will rely on a relational surrogate key index, resolving this issue and significantly reducing server storage.

History correlation occurs in the client library. It should be straightforward to modify the client to report the collision, and rebuild bx to verify that in this case collision is the problem.

sv-91 · 2017-10-14T07:08:21Z

As far as I remember, transactions there are standard, not multisig, etc. You can check and make sure.
I did not quite understand about the collision.
In this query, the running query to the history database. The key is the address (or its derivative). It turns out, at one point in time, the key is indexed into one cell, and at another time - in another cell?
The correlation occur is not in the client library. The client library reproduces everything that history database gives. I checked

sv-91 · 2017-10-15T06:40:43Z

it's really in the history database. I wrote this code

history_database::list history_database::get(const short_hash& key,
    size_t limit, size_t from_height) const
{
    list result;
    payment_record payment;
    const auto start = rows_multimap_.lookup(key);
    const auto records = record_multimap_iterable(rows_list_, start);

    int ii = 0;
    for (const auto index: records)
    {
        if (limit > 0 && result.size() >= limit)
            break;

        const auto record = rows_list_.get(index);
        auto deserial = make_unsafe_deserializer(REMAP_ADDRESS(record));

        // Failed reads are conflated with skipped returns.
        if (payment.from_data(deserial, from_height)) {
            result.push_back(payment);
        } else {
            std::cout << "Incorrect load data " << encode_base16(key) << std::endl;
        }
        ii++;
    }

    std::cout << "count " << ii  << std::endl;
    
    return result;
}

void history_database::store(const short_hash& key,
    const payment_record& payment)
{
    const auto write = [&](byte_serializer& serial)
    {
        payment.to_data(serial, false);
    };

    rows_multimap_.add_row(key, write);
    if (encode_base16(key) == "770b71d77f7c90522be3a050a90545d6167d7629") {
        get(key, 0, 0);
        std::cout << std::endl;
    }
}

Load the base and got a output:

count 36
count 37
count 38
count 38
count 39

That is, one number is repeated twice. That is, history db has time to lose the previous result between two inserts.

Could you look in the nearest code, is there no race condition or other errors?

evoskuil · 2017-10-16T00:04:38Z

Thanks for following up. Your read from with the write may make assumptions that don't hold. add_row is atomic and should not have any problem with concurrency. I'm on my phone for another week, so it will be hard for me to review.

sv-91 · 2017-10-17T08:36:44Z

I hung a mutex on all methods of histories db and I got rid of these errors!

In general, I looked at some places, and did not understand how the locks work there. For example:

void record_multimap<KeyType>::add_to_list(memory_ptr start_info,
    write_function write)
{
    const auto address = REMAP_ADDRESS(start_info);

    // Critical Section
    ///////////////////////////////////////////////////////////////////////////
    mutex_.lock_shared();
    const auto old_begin = from_little_endian_unsafe<array_index>(address);
    mutex_.unlock_shared();
    ///////////////////////////////////////////////////////////////////////////

    const auto new_begin = records_.insert(old_begin);
    const auto memory = records_.get(new_begin);
    const auto data = REMAP_ADDRESS(memory);
    auto serial_record = make_unsafe_serializer(data);
    serial_record.write_delegated(write);

    // The records_ and start_info remap safe pointers are in distinct files.
    auto serial_link = make_unsafe_serializer(address);

    // Critical Section
    ///////////////////////////////////////////////////////////////////////////
    unique_lock lock(mutex_);
    serial_link.template write_little_endian<array_index>(new_begin);
    ///////////////////////////////////////////////////////////////////////////
}

What happens if 2 threads get one value of old_begin and try to write it down?

Other places in the code also do not cause trust

evoskuil · 2017-10-17T22:33:32Z

The above critical sections exist only to guarantee atomicity of the link value read and write.

I believe you are correct in that the list insertion has a flaw in that concurrent inserts can result in the orphaning of an inserted element. This would affect address history index in the case of concurrent writes of the same row (address hash) only. [There would be no node/consensus impact.]

Expansion of the critical sections beyond value atomicity may introduce a deadlock risk, so I would need to look at it more closely to ensure a fix is both safe an optimal. However otherwise expansion of the critical section above to the full method using an upgraded lock should resolve the orphaning.

I don't understand your last comnent, could you clarify?

sv-91 · 2017-10-18T05:35:14Z

For example, this code also causes race condition:

void record_multimap<KeyType>::add_row(const KeyType& key,
    write_function write)
{
    const auto start_info = map_.find(key);

    if (!start_info)
    {
        create_new(key, write);
        return;
    }

If 2 threads simultaneously take start_info, and see that it does not exist

evoskuil · 2017-10-18T06:13:07Z

Yes, the row allocator needs to be protected in the same manner as the memory allocator. Thanks for your work on this. I can patch within a week.

evoskuil · 2017-10-28T09:12:47Z

It's taken a little longer than expected, but I hope to finish this up in the next couple of days.

evoskuil · 2017-10-30T01:41:54Z

History records are safely written, read and popped with the exception that concurrent link of a new element is a race that is won by only one of the writes in the race. Behavior is well-defined but leads to record loss in the case where there is a concurrent link of two records against the same payment address. This becomes more likely in the case of heavily-used addresses where there are multiple updates from transactions in the same block (which is the only case of concurrent write to the same hash table entry).

evoskuil · 2017-10-31T07:23:41Z

I've resynced my local store and verified the balance as well as correct history pairings for 1BrT827NCgxjctnEBdLiuDzukupwWHP1i2 and 1966U1pjj15tLxPXZ19U48c99EJDkdXeqb.

evoskuil added the bug label Oct 13, 2017

evoskuil added this to the 4.0 milestone Oct 13, 2017

evoskuil self-assigned this Oct 22, 2017

evoskuil changed the title ~~loss output in history database~~ History store row allocator requires concurrency guard. Oct 22, 2017

evoskuil modified the milestones: 4.0, 3.4 Oct 22, 2017

evoskuil mentioned this issue Oct 23, 2017

fetch-balance spent value is wrong libbitcoin/libbitcoin-explorer#495

Closed

evoskuil mentioned this issue Oct 30, 2017

Concurrency patch for history lists and hash table conflicts. #160

Merged

evoskuil closed this as completed Oct 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

History store row allocator requires concurrency guard. #150

History store row allocator requires concurrency guard. #150

sv-91 commented Oct 13, 2017

sv-91 commented Oct 13, 2017

evoskuil commented Oct 13, 2017

sv-91 commented Oct 13, 2017

evoskuil commented Oct 13, 2017 •

edited

Loading

sv-91 commented Oct 14, 2017

sv-91 commented Oct 15, 2017

evoskuil commented Oct 16, 2017

sv-91 commented Oct 17, 2017

evoskuil commented Oct 17, 2017

sv-91 commented Oct 18, 2017

evoskuil commented Oct 18, 2017 •

edited

Loading

evoskuil commented Oct 28, 2017

evoskuil commented Oct 30, 2017

evoskuil commented Oct 31, 2017 •

edited

Loading

History store row allocator requires concurrency guard. #150

History store row allocator requires concurrency guard. #150

Comments

sv-91 commented Oct 13, 2017

sv-91 commented Oct 13, 2017

evoskuil commented Oct 13, 2017

sv-91 commented Oct 13, 2017

evoskuil commented Oct 13, 2017 • edited Loading

sv-91 commented Oct 14, 2017

sv-91 commented Oct 15, 2017

evoskuil commented Oct 16, 2017

sv-91 commented Oct 17, 2017

evoskuil commented Oct 17, 2017

sv-91 commented Oct 18, 2017

evoskuil commented Oct 18, 2017 • edited Loading

evoskuil commented Oct 28, 2017

evoskuil commented Oct 30, 2017

evoskuil commented Oct 31, 2017 • edited Loading

evoskuil commented Oct 13, 2017 •

edited

Loading

evoskuil commented Oct 18, 2017 •

edited

Loading

evoskuil commented Oct 31, 2017 •

edited

Loading