Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt low level byte serialization #88

Closed
dexX7 opened this issue Jun 23, 2015 · 4 comments
Closed

Adopt low level byte serialization #88

dexX7 opened this issue Jun 23, 2015 · 4 comments

Comments

@dexX7
Copy link
Member

dexX7 commented Jun 23, 2015

As described in #85 (comment), one of the most time consuming parts in the processing flow is the serialization and deserialization of smart properties.

This is caused by the conversion of string -> JSON -> Entry and Entry -> JSON -> string, and not so much by fetching or storing data to the database.

I was curious, if we can improve it by adopting low level byte serialization, and the results are very impressive. The follownig data was captures on mainnet, averaged over 500 runs and based this commit:

For SP 4 on mainnet:

entry -> json -> str:   0.040382 ms
str -> json -> entry:   0.027424 ms
entry -> bytes:         0.002574 ms (16x faster)
bytes -> entry:         0.002864 ms (10x faster)

serialized size:      247 byte (= 0.247 kB)

For SP 3 on mainnet:

entry -> json -> str:  16.309372 ms
str -> json -> entry:  71.348212 ms
entry -> bytes:         0.056452 ms (289x faster)
bytes -> entry:         0.094424 ms (756x faster)

serialized size:    19014 byte (= 19.014 kB)

As proof-of-concept I removed the toJSON() and fromJSON() methods of CMPSPInfo::Entry and replaced the related parts where entries are stored or retrieved:

I'm currently parsing mainnet transactions to get a feeling for the speed improvements in practise. Given that CMPSPInfo::getSP() is used almost everywhere, it should be notable, especially in the context of listtransactions_MP, which is primarily slowed down due to the parsing of SP3.

I really, really don't want to start yet another feature before the release, but we should seriously consider replacing the critical parts.

If it turns out, as I hope, that listtransactions_MP gets a speed boost of 50x or more, then I'm going to prepare a clean replacement for getSP(), putSP() and friends.

Otherwise, and regarding the other parts where serialization may play a role, I suggest to adopt byte serialization after the release.

What do you think?

@dexX7
Copy link
Member Author

dexX7 commented Jun 23, 2015

D'oh, I should have collected more data before parsing from zero.. :)

Anyway, some data with the serialization via 4ef1b9eb0e:

./src/bitcoin-cli listtransactions_MP "*" 1 0 0 999999
listtransactions_MP():     135.888 ms,  found:    1 txs,     1 txs seen,     127.934 ms populated,  135.888 ms/tx
listtransactions_MP():      12.887 ms,  found:    1 txs,     1 txs seen,       5.104 ms populated,   12.887 ms/tx
listtransactions_MP():      11.616 ms,  found:    1 txs,     1 txs seen,       5.039 ms populated,   11.616 ms/tx
listtransactions_MP():      13.200 ms,  found:    1 txs,     1 txs seen,       4.957 ms populated,   13.200 ms/tx

./src/bitcoin-cli listtransactions_MP "*" 10 0 0 999999
listtransactions_MP():     365.838 ms,  found:   10 txs,    10 txs seen,     359.305 ms populated,   36.584 ms/tx
listtransactions_MP():      60.197 ms,  found:   10 txs,    10 txs seen,      52.126 ms populated,    6.020 ms/tx
listtransactions_MP():      55.568 ms,  found:   10 txs,    10 txs seen,      47.824 ms populated,    5.557 ms/tx
listtransactions_MP():      59.676 ms,  found:   10 txs,    10 txs seen,      51.577 ms populated,    5.968 ms/tx

./src/bitcoin-cli listtransactions_MP "*" 100 0 0 999999
listtransactions_MP():    2013.492 ms,  found:  100 txs,   120 txs seen,    2004.196 ms populated,  20.13492 ms/tx
listtransactions_MP():     517.316 ms,  found:  100 txs,   120 txs seen,     508.285 ms populated,   5.17316 ms/tx
listtransactions_MP():     507.737 ms,  found:  100 txs,   120 txs seen,     498.137 ms populated,   5.07737 ms/tx

./src/bitcoin-cli listtransactions_MP "*" 999999 0 0 999999
listtransactions_MP():  243649.652 ms,  found: 5151 txs,  5343 txs seen,  243482.901 ms populated,   47.301 ms/tx
listtransactions_MP():   26720.799 ms,  found: 5151 txs,  5343 txs seen,   26567.702 ms populated,    5.187 ms/tx
listtransactions_MP():   27407.749 ms,  found: 5151 txs,  5343 txs seen,   27259.639 ms populated,    5.321 ms/tx

It's notable that the tests were done on top of #74, which explains the 4-10x bump for the 2nd+ calls, due to the input transaction cache in ParseTransaction().

It can be seen that the average time to list one transaction is roughly 47 ms uncached, and around 5.2 ms after the first call.

The first time it took 4:04 min to list all 5151 Omni wallet transactions (of 5343 wallet transactions in total), and only about 27 seconds to list all transactions, if called a second time.

Once the mainnet parsing finished, I'm going to get some numbers to compare for the JSON serialization.

@dexX7
Copy link
Member Author

dexX7 commented Jun 23, 2015

Here we go, without byte serialization:

./src/bitcoin-cli listtransactions_MP "*" 1 0 0 999999
listtransactions_MP():     137.132 ms,  found:    1 txs,     1 txs seen,     129.788 ms populated,  137.132 ms/tx
listtransactions_MP():     139.959 ms,  found:    1 txs,     1 txs seen,      98.398 ms populated,  139.959 ms/tx
listtransactions_MP():     134.041 ms,  found:    1 txs,     1 txs seen,     121.070 ms populated,  134.041 ms/tx
listtransactions_MP():     129.049 ms,  found:    1 txs,     1 txs seen,     117.385 ms populated,  129.049 ms/tx

./src/bitcoin-cli listtransactions_MP "*" 10 0 0 999999
listtransactions_MP():    1822.721 ms,  found:   10 txs,    10 txs seen,    1815.296 ms populated,  182.272 ms/tx
listtransactions_MP():     909.400 ms,  found:   10 txs,    10 txs seen,     901.604 ms populated,   90.940 ms/tx
listtransactions_MP():     860.536 ms,  found:   10 txs,    10 txs seen,     853.687 ms populated,   86.054 ms/tx
listtransactions_MP():     990.114 ms,  found:   10 txs,    10 txs seen,     981.632 ms populated,   99.011 ms/tx

./src/bitcoin-cli listtransactions_MP "*" 100 0 0 999999
listtransactions_MP():   16344.688 ms,  found:  100 txs,   120 txs seen,   16335.132 ms populated,  163.447 ms/tx
listtransactions_MP():    9397.616 ms,  found:  100 txs,   120 txs seen,    9388.876 ms populated,   93.976 ms/tx
listtransactions_MP():    9253.002 ms,  found:  100 txs,   120 txs seen,    9239.713 ms populated,   92.530 ms/tx
listtransactions_MP():    8890.666 ms,  found:  100 txs,   120 txs seen,    8881.084 ms populated,   88.907 ms/tx
listtransactions_MP():   10226.605 ms,  found:  100 txs,   120 txs seen,   10217.782 ms populated,  102.266 ms/tx

./src/bitcoin-cli listtransactions_MP "*" 999999 0 0 999999
listtransactions_MP():  651622.547 ms,  found: 5151 txs,  5343 txs seen,  651484.830 ms populated,  126.504 ms/tx
listtransactions_MP():  457356.142 ms,  found: 5151 txs,  5343 txs seen,  457234.876 ms populated,   88.790 ms/tx

So the average time for one uncached transaction is 127 ms, and around 89 ms for subsequent calls.

In total, it took 10:52 min (vs. 4:04 min) to list all 5151 wallet transactions, if called the first time, and 7:37 min (vs. 27 seconds) for cached queries.

That's around 17x slower.

@zathras-crypto
Copy link

This is EPIC - nice work mate!!!!!

@dexX7
Copy link
Member Author

dexX7 commented Jun 24, 2015

Given that these results are better than expected, we should probably convert the other parts as well.

And since we're using the serialization of Bitcoin Core, things could get a lot easier, because values can simply be wrapped into READWRITE(value) and whole classes are immediately serializable.

Say there is a class to represent recipients of STO transactions:

class CSerializableStoRecipients
{
private:
    uint256 hashTxid;
    uint256 hashBlock;
    std::string strSender;
    uint32_t propertyId;
    int64_t amountFee;
    std::set<std::pair<int64_t, std::string> > recipientsSet;

public:
    /** Creates a new recipients object */
    CSerializableStoRecipients()
    {
        // Set null or so
    }

    ADD_SERIALIZE_METHODS;

    template <typename Stream, typename Operation>
    inline void SerializationOp(Stream& s, Operation ser_action, int nType, int nVersion) {
        READWRITE(hashTxid);
        READWRITE(hashBlock);
        READWRITE(strSender);
        READWRITE(propertyId);
        READWRITE(amountFee);
        READWRITE(recipientsSet);
    }

    /** Transaction hash */
    uint256 getTxid() const { return hashTxid; }
    /** Block hash */
    uint256 getBlock() const { return hashBlock; }
    /** Sender */
    std::string getSender() const { return strSender; }
    /** Property identifier */
    uint32_t getPropertyId() const { return propertyId; }
    /** Amount paid as STO fee */
    int64_t getFeeAmount() const { return amountFee; }
    /** Recipients */
    std::set<std::pair<int64_t, std::string> > getRecipients() const { return recipientsSet; }
    /** Sum of all recipient values */
    int64_t getTotalAmount() const;
}

Then it could be stored or retrieved as follows:

bool CMPSTOList::StoreRecipients(const CSerializableStoRecipients& recipients)
{
    // DB key for entry
    CDataStream ssKey(SER_DISK, CLIENT_VERSION);
    ssKey << std::make_pair('t', recipients.getTxid());
    leveldb::Slice slKey(&ssKey[0], ssKey.size());

    // DB value for entry
    CDataStream ssValue(SER_DISK, CLIENT_VERSION);
    ssValue.reserve(ssValue.GetSerializeSize(recipients));
    ssValue << recipients;
    leveldb::Slice slValue(&ssValue[0], ssValue.size());

    // Persist entry
    leveldb::Status status = pdb->Put(writeoptions, slKey, slValue);

    return status.ok();
}
bool CMPSTOList::GetRecipients(const uint256& hashTxid, CSerializableStoRecipients& retRecipients) const
{
    // DB key for entry
    CDataStream ssKey(SER_DISK, CLIENT_VERSION);
    ssKey << std::make_pair('t', hashTxid);
    leveldb::Slice slKey(&ssKey[0], ssKey.size());

    // DB value for entry
    std::string strValue;
    leveldb::Status status = pdb->Get(readoptions, slKey, &strValue);
    if (!status.ok()) {
        PrintToLog("%s(): ERROR: %s\n", __func__, status.ToString());
        return false;
    }

    // Deserialize entry
    try {
        CDataStream ssValue(strValue.data(), strValue.data() + strValue.size(), SER_DISK, CLIENT_VERSION);
        ssValue >> retRecipients;
    } catch (const std::exception& e) {
        PrintToLog("%s(): ERROR: %s\n", __func__, e.what());
        return false;
    }

    return true;
}

No need to manually parse the list of recipients ... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants