-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database format does not allow pruning #82
Comments
This is what I understood. I believe it's simple with the info currently in the DB, but I might be missing a detail. Can you perhaps give an example of an address and an example of it being pruned by e-s? |
I understand the goal here now. |
@ecdsa incidentally, I think this is perhaps not a good approach. It's pushing expensive processing to the server, which is dangerous. It belongs on the client. If someone wants to request histories for addresses with long histories, and they can't handle it because they're on mobile, the response perhaps should be to not do that then. |
Following our discussion on IRC: I think processing in electrum-server is expensive because spent histories are sorted by the height of the incoming transaction. If we change the chronology and use the height of the outgoing (spending) transaction instead, then adding new transaction pairs becomes a append-only operation. I believe that such a database would not be more expensive than what ElectrumX is currently doing. It would actually be lighter than the current database, because it would remove the redundancy between UTXO and histories. |
Also, a new protocol that serves histories in an incremental way should be designed around pruning, because pruned histories have the same set of UTXOS as the final result. This means the wallet can be used before the whole history is downloaded. |
It would (asymptotically) double the history size because you would be storing tx pairs, no? |
In fact it can be worse; consider 3 receive txs with 3 payments in each, and 3 spend txs spending one from each receive TX. At present this is 6 txs in the history; storing pairs it would be 9 pairs, for 3x the space cost. I realize this is not common, but there do exist many large txs in the DB with large numbers of payments to the same address in one tx |
There may be room to compress the pair data though. Any changes here should be post 1.0 I think. |
No it would not double the size because you are storing pairs. note: it would indeed increase the size in your second example. |
True, so perhaps this might not be so bad. How do you see the protocol regarding incremental history? Starting from the most recent spend, and working backwards? If we serve history in pairs, the same receiving tx could be sent multiple times that way, which may not be an issue. If we just serve tx hashes and heights like now, it could be expensive to sort it properly. |
If we change the protocol, we do not need to sort histories before we serve them. The reason why the server sorts histories is in order to compute the same hash as the client ; this hash was defined before pruning was introduced in electrum-server, and has not been updated since. In a new protocol, I would like to have two hashes per address: one for the utxo set, one for the spent history. each of them would be computed as the merkle root of a linear tree. That way, the hash does not depend on the length of history that is served. (or on the amount of utxos served, if we make that part incremental too). |
note: the merkle root hash also allows a server to prune its database if they want to, and serve the same hash as other servers. |
note: this new protocol idea would actually require a lot more space, because the database would need to store all these hashes (unless they are computed when the history is served, but I guess this is what we want to avoid) |
note 2: to mitigate that, we could store one hash per chunk of 100 items. |
I think computing the merkle tree when serving is fine; we do that already for each tx |
@ecdsa I think that if we allow querying by block height ranges, it would reduce the bandwidth usage that is the most important for mobile users. It is fine if the server never supports pruning as the historical data is useful in many cases. @kyuupichan when the |
@ecdsa I don't think I want to implement pruning. I suspect the vast majority of transactions and DB space is occupied by addresses with minimal re-use, and that this will only become more true going forwards. Each little-used address is more expensive for the server: it requires a key and small value in the history DB, whereas a heavily-used address requires only a handful of keys and uses many larger values. I think the reason electrum-server implemented pruning was that it stored all the history in a single value, and the read-append-write cycle was inefficient. ElectrumX doesn't have this issue. Also I have plans in the not-too-distant future to improve ElectrumXs history handling by fixing #348 and #185. So I see this as ultimately a non-issue - the cost of supporting big histories will be small compared to the cost of supporting histories for all small addresses. Please re-open if you disagree. |
History pruning is needed in order to be able to efficiently request and spend funds on addresses with long histories.
A pruned history does not contain the full set of transactions touching an address. It contains all the transactions that have UTXOs, but only a limited number of transaction that have STXOs (spent transaction outputs). The transactions that spend pruned inputs are similarly removed from the history, so that pruning one input removes two items from the list.
The current database format stores histories in the exact form that is required by the get_history RPC: A list of (tx_num, height). This means that the information required for pruning is lost: the server does not know which outgoing input spends which incoming output. In order to be able to prune histories, this information must be retained by the server.
electrum-server used to store spent histories as (txout, txin) pairs, where txin spends txout. This of course is likely to impact performance, but I think not being able to serve pruned histories is a regression. There might be ways of doing this more efficiently than how it is done in electrum-server.
The text was updated successfully, but these errors were encountered: