Database format does not allow pruning #82

ecdsa · 2016-12-18T11:23:17Z

History pruning is needed in order to be able to efficiently request and spend funds on addresses with long histories.

A pruned history does not contain the full set of transactions touching an address. It contains all the transactions that have UTXOs, but only a limited number of transaction that have STXOs (spent transaction outputs). The transactions that spend pruned inputs are similarly removed from the history, so that pruning one input removes two items from the list.

The current database format stores histories in the exact form that is required by the get_history RPC: A list of (tx_num, height). This means that the information required for pruning is lost: the server does not know which outgoing input spends which incoming output. In order to be able to prune histories, this information must be retained by the server.

electrum-server used to store spent histories as (txout, txin) pairs, where txin spends txout. This of course is likely to impact performance, but I think not being able to serve pruned histories is a regression. There might be ways of doing this more efficiently than how it is done in electrum-server.

kyuupichan · 2016-12-18T11:36:22Z

This is what I understood. I believe it's simple with the info currently in the DB, but I might be missing a detail. Can you perhaps give an example of an address and an example of it being pruned by e-s?

kyuupichan · 2016-12-18T12:42:31Z

I understand the goal here now.

kyuupichan · 2016-12-18T14:37:56Z

@ecdsa incidentally, I think this is perhaps not a good approach. It's pushing expensive processing to the server, which is dangerous. It belongs on the client. If someone wants to request histories for addresses with long histories, and they can't handle it because they're on mobile, the response perhaps should be to not do that then.
We can serve large histories smoothly with some new protocol about height. I'm not sure we should be catering to people who want to download abbreviated histories of their overused addresses and to push the responsibility for the filtering on the server. The right place is for them to bear their own expense. So I'm not very sympathetic to this use case at all.

ecdsa · 2017-01-08T08:56:33Z

Following our discussion on IRC: I think processing in electrum-server is expensive because spent histories are sorted by the height of the incoming transaction. If we change the chronology and use the height of the outgoing (spending) transaction instead, then adding new transaction pairs becomes a append-only operation. I believe that such a database would not be more expensive than what ElectrumX is currently doing. It would actually be lighter than the current database, because it would remove the redundancy between UTXO and histories.

ecdsa · 2017-01-08T09:02:07Z

Also, a new protocol that serves histories in an incremental way should be designed around pruning, because pruned histories have the same set of UTXOS as the final result. This means the wallet can be used before the whole history is downloaded.

kyuupichan · 2017-01-08T09:26:24Z

It would (asymptotically) double the history size because you would be storing tx pairs, no?

kyuupichan · 2017-01-08T09:30:45Z

In fact it can be worse; consider 3 receive txs with 3 payments in each, and 3 spend txs spending one from each receive TX. At present this is 6 txs in the history; storing pairs it would be 9 pairs, for 3x the space cost. I realize this is not common, but there do exist many large txs in the DB with large numbers of payments to the same address in one tx

kyuupichan · 2017-01-08T09:42:28Z

There may be room to compress the pair data though. Any changes here should be post 1.0 I think.

ecdsa · 2017-01-08T09:46:45Z

No it would not double the size because you are storing pairs.
[(a,b),(c,d)] is the same size as [a,b,c,d]

note: it would indeed increase the size in your second example.

kyuupichan · 2017-01-08T09:56:53Z

True, so perhaps this might not be so bad.

How do you see the protocol regarding incremental history? Starting from the most recent spend, and working backwards? If we serve history in pairs, the same receiving tx could be sent multiple times that way, which may not be an issue.

If we just serve tx hashes and heights like now, it could be expensive to sort it properly.

ecdsa · 2017-01-08T10:03:42Z

If we change the protocol, we do not need to sort histories before we serve them. The reason why the server sorts histories is in order to compute the same hash as the client ; this hash was defined before pruning was introduced in electrum-server, and has not been updated since.

In a new protocol, I would like to have two hashes per address: one for the utxo set, one for the spent history. each of them would be computed as the merkle root of a linear tree. That way, the hash does not depend on the length of history that is served. (or on the amount of utxos served, if we make that part incremental too).

ecdsa · 2017-01-08T10:06:27Z

note: the merkle root hash also allows a server to prune its database if they want to, and serve the same hash as other servers.

ecdsa · 2017-01-08T10:47:09Z

note: this new protocol idea would actually require a lot more space, because the database would need to store all these hashes (unless they are computed when the history is served, but I guess this is what we want to avoid)

ecdsa · 2017-01-08T10:56:54Z

note 2: to mitigate that, we could store one hash per chunk of 100 items.
in that case the server would serve history ranges that are multiple of 100.

kyuupichan · 2017-01-08T12:50:01Z

I think computing the merkle tree when serving is fine; we do that already for each tx

erasmospunk · 2017-02-24T14:49:31Z

@ecdsa I think that if we allow querying by block height ranges, it would reduce the bandwidth usage that is the most important for mobile users. It is fine if the server never supports pruning as the historical data is useful in many cases.

@kyuupichan when the DB::get_history is executed, does the self.hist_db.iterator(prefix=hashX) return the transactions in order of height or randomly?

kyuupichan · 2018-04-09T01:30:11Z

@ecdsa I don't think I want to implement pruning. I suspect the vast majority of transactions and DB space is occupied by addresses with minimal re-use, and that this will only become more true going forwards.

Each little-used address is more expensive for the server: it requires a key and small value in the history DB, whereas a heavily-used address requires only a handful of keys and uses many larger values.

I think the reason electrum-server implemented pruning was that it stored all the history in a single value, and the read-append-write cycle was inefficient. ElectrumX doesn't have this issue.

Also I have plans in the not-too-distant future to improve ElectrumXs history handling by fixing #348 and #185.

So I see this as ultimately a non-issue - the cost of supporting big histories will be small compared to the cost of supporting histories for all small addresses. Please re-open if you disagree.

kyuupichan self-assigned this Jan 9, 2017

kyuupichan added this to the 1.1 milestone Jan 9, 2017

kyuupichan added the enhancement label Jan 9, 2017

kyuupichan removed this from the 1.1 milestone Mar 30, 2017

kyuupichan closed this as completed Apr 9, 2018

This was referenced Dec 14, 2020

improve wallet synchronization spesmilo/electrum#2094

Open

docs: describe protocol version 2.0 spesmilo/electrumx#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database format does not allow pruning #82

Database format does not allow pruning #82

ecdsa commented Dec 18, 2016 •

edited

Loading

kyuupichan commented Dec 18, 2016

kyuupichan commented Dec 18, 2016

kyuupichan commented Dec 18, 2016 •

edited

Loading

ecdsa commented Jan 8, 2017 •

edited

Loading

ecdsa commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

ecdsa commented Jan 8, 2017 •

edited

Loading

kyuupichan commented Jan 8, 2017

ecdsa commented Jan 8, 2017 •

edited

Loading

ecdsa commented Jan 8, 2017

ecdsa commented Jan 8, 2017

ecdsa commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

erasmospunk commented Feb 24, 2017 •

edited

Loading

kyuupichan commented Apr 9, 2018

Database format does not allow pruning #82

Database format does not allow pruning #82

Comments

ecdsa commented Dec 18, 2016 • edited Loading

kyuupichan commented Dec 18, 2016

kyuupichan commented Dec 18, 2016

kyuupichan commented Dec 18, 2016 • edited Loading

ecdsa commented Jan 8, 2017 • edited Loading

ecdsa commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

ecdsa commented Jan 8, 2017 • edited Loading

kyuupichan commented Jan 8, 2017

ecdsa commented Jan 8, 2017 • edited Loading

ecdsa commented Jan 8, 2017

ecdsa commented Jan 8, 2017

ecdsa commented Jan 8, 2017

kyuupichan commented Jan 8, 2017

erasmospunk commented Feb 24, 2017 • edited Loading

kyuupichan commented Apr 9, 2018

ecdsa commented Dec 18, 2016 •

edited

Loading

kyuupichan commented Dec 18, 2016 •

edited

Loading

ecdsa commented Jan 8, 2017 •

edited

Loading

ecdsa commented Jan 8, 2017 •

edited

Loading

ecdsa commented Jan 8, 2017 •

edited

Loading

erasmospunk commented Feb 24, 2017 •

edited

Loading