Join GitHub today
Deprecate Data Entries #221
See #199 for some discussion.
The data entries API doesn't really make sense for most use cases -- in most cases; the person is better served doing the thing off-chain.
The only documented use case I could find with a cursory search is https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/stellar-dev/mzIP9OuegyI/bnPP0LVcAAAJ.
Pinning Stellar.toml is a nice idea, but there are so many practical issues with the approach, something much simpler makes more sense -- issuers provide time-windowed certificates which guarantee a period where the toml is up to date and if you issue a contract with a mintime/maxtime matching it you should expect (but not be guaranteed) for it to work.
I propose that we soft-deprecate the data entries by removing data entries from the public facing APIs while keeping them in consensus should someone be relying on the functionality currently. We should also strongly communicate the deprecation, and collect community feedback for a period of one year.
After the one year feedback period, a policy layer (read -- the software that nominators use to select transactions, but is not enforced by validators otherwise) should block transactions with data entry modifications.
Then, after an additional period, we can explore significantly increasing the reserve for Data Entries -- this enables contracts relying on Data Entries to still be processed if the person sends additional fund to the account to cover the unplanned excess.
I'm not sure I'd ever advocate full removal of data entries, given that a user may already have a compatibility issue, but perhaps with a long enough horizon for community input we can fully remove the feature.
Alternatively, I propose that we, in the next update, fully remove data entries (transactions with that operation are invalid). If there is strong motivation for the feature in the future, I propose that we could then add back an identical feature such that those transactions would again become valid, or a new type of data entry could be added to better match the needs of users. This is semantically somewhat equivalent to dramatically increasing the minfee for a data entry operation (from the perspective of an engineer implementing a system to handle data entries).
I have a lot of data about data entries and their usefulness; I'll try to go to
Why would you store data on a decentralized Ledger?
The same reason that you're posting any operation: you want authenticated
Decentralized database are actually going to solve a lot of security issues that
What are the possible use-cases?
Advanced one (dApp based on a side-chain):
How does it fit in Stellar?
This feature is needed so we extend account configuration without having to
In the future, we can expect external services to introduce vendor-specific
By simplifying the process of creating custom network, Stellar got its chances
Can this feature be abused?
In term of spam, this feature is more expensive that others operation like
It term of message sending potential, the cost of one base fee per 64 bytes is
Can we prevent arbitrary data publishing on the ledger?
There are many ways to publish arbitrary data on the ledger, some of them
Thanks for taking the time to respond.
So there are two claims here: the data is authenticated and the data is publicly available.
Authenticated data can be handled by bundling the data with signatures from the account in question. Public availability can be handled by a separate DHT, perhaps with some notion of version control.
Notably, stellar-core has no requirement (afaict) to store more than 32-bytes per data entry -- the implementation can discard the data immediately and keep a 32 byte (even salted!) hash of the key, without breaking any consensus rules.
Is this a different one than mentioned?
Again, there is no storage requirement for data entries (beyond hash) for a stellar-core node.
A system which tracks the long-term queryable state should be built as an entirely separate 'image' off of the ledger set. This points to including data as a type of memo or something in a transaction, but not storing account state.
How does it fit in Stellar?
In both of these protocols, the homedomain serving a stellar.toml is sufficient for the data to be served to the world. Furthermore, the homedomain isn't strictly needed as a separate DHT could include attestations for various accounts as to what their metadata is (incl routing to a homedomain).
Storing them in the ledger, maybe, but in the account state, perhaps not.
Would be nice to have at least a solid rational for deprecating data entries before continuing to push that proposal on other threads.
Statement that data is better off-chain is actually not true: on-chain data is a common requirement for dApps - else it wouldn't be called decentralized.
Several services relying on data entries already exist and this feature is actually valuable so except if there's a critical flaw to fix this operation must be maintained.
I think the main points I'm interested in are:
With respect to the argument that it's decentralized that's a miscategorization. Data entries are actually more centralized than just "information" which can be served by anyone with no central consistency. Getting stellar data requires running a validator or trust. Most of the protocols I've seen would be better off with a signed serialized XDR blob.
It's unfortunate that there are existing services using Data Entries, but I'm happy to help them figure out how to use a more pragmatic approach if they are incapable of doing so themselves.
3 of those points are opinions about how one should design its software.
Now that's clearly not a consensual opinion. You have the right to not like that feature but that's not a good reason to prevent everybody else to use it.
The point about data value not being enforced by the protocol is indeed an issue. That won't be solved by removing this op, though.
On-chain data storage have its use case - this is a known fact and that's why blockchains implementing smart contract also offer data storage capabilities. I don't think we can go forward without you aknowledging this fact.
Yes, hence I've proposed deprecating but not removing. As stewards of the protocol, there's an obligation to encourage good software engineering.
I agree there is a need to store data on chain -- I'm not sure where you ever got the idea I don't beleive that. However, I think that on-chain data storage should be minimally consumed and I'm yet to see a use case which truly benefited from using data entry in specific.
Well multisig coordination bootstrapping SEP is an excellent exemple. It is fairly well designed by the way - so I'm not even saying that protocol writers shouldn't encourage good design; What I'm saying is that your personal opinion about what is a good design is not necessarily shared by everybody and that there are good designer out there that are using account entries for good reasons.
Now we can play with words but practically if the feature is removed from Horizon before a better alternative is available it will break things badly and this is not a good practice.
I'd love to work towards an objective framework for smart contract data so that use cases and designs can be evaluated appropriately.
I'm trying in good faith to be as opinion less on this issue as possible and objectively evaluate current data entry practices, but I'm fallible so sorry if you haven't felt that my objectivity is transparent. I very much want people to build easy to use, inexpensive, robust, etc software and pave a forward looking path for the stellar network -- that's my only goal. Data Entries did not, as far as I know, kill my grandfather or something.
I think there's a couple things we can think about w.r.t. data functionality that help us work towards a more concrete evaluation of the relative merits of approaches:
Are there other properties you care about? Naturally, I've biasedly selected a set of issues where I think the benefits of off-chain data win (the causality is reverse though; I think off-chain data wins because of these factors), but if there are categories I haven't considered please share and then we can work towards a more rigorous evaluation.
Of course! My point were not to throw doubt at your intentions but to insist that there are other beings equally motivated & competent who think that data entries are needed. In fact the discussion about the multisig coordination boostrap SEP proved it true.
I'd also underline the fact that blockchain are mostly an unknown territory and nobody knows beforehand which kind of mighty invention will come out of this or that functionality. But we definitely knows that account data entry expose an unique set of property that can't be found off-chain:
It is really two different solutions and there's no such thing as "on-chain is better" or "off-chain is better".
Ok, so if I can synthesize between our posts:
So the 12 properties cover all the benefits you think that Data Entries bear?
For each of those topics I believe off chain data has preferable properties. It's not an exhaustive analysis, happy to go into more detail if you disagree on an individual category.
Off-chain data may be signed by the current signers on an account as well as by third party authenticators (e.g., if a field specifies that a service provider is to be used, that service provider may sign to corroborate that the account is a customer). If using the homedomain for hashed content, the data is hash authenticated which is useful for accounts without signers.
On-chain data was signed by the current or prior signers on an account. Current signers may be unaware of the data entries set. Race condition to unset data entries after changing keys.
Off chain data can be revoked by using a version number scheme (signing saying it's valid up-to sequence number current + 1 billion and using sequence bump to speed up invalidation), TTL, or by rotating keys. It can be made irrevocable (in some sense) using hash authentication for accounts with no signers.
On chain data can be revoked by removing it from the account in question.
Off chain data consistency guarantees are strong given the account owners want consistency, e.g., they properly use version numbers when modifying data.
On chain data consistency guarantees are weak -- data is guaranteed to be consistent within an SCP quorum, but are not guaranteed to be consistent with what a transaction signer anticipated them to be when they created the transaction due to interleavings.
Off chain data has no inherent availability guarantees. However, with proper mirroring infrastructure, off chain data can be made more available than the Stellar network itself as just serving data can be served via static file servers/edge caching infrastructure and requires reaching only one node which claims to have the data. Classic DHT literature applies...
On chain data is not guaranteed to be externally visable in the protocol. This may be amended, but is not the case now. The data is as available currently as a horizon node on the network. Without talking to a trusted set of horizon nodes, the data is fully unauthenticated (e.g. if served from caches) so we do not include this as a potential for availability.
Off-chain data supports arbitrary data structures.
On-chain is currently limited to (char, char) tuples.
Cost to Write
Off chain: the cost to write is very cheap and likely decreases over time.
On-chain: the cost to write is expensive, and likely to increase in price over time.
Cost to Read
Off-chain: the cost to read is very cheap and likely decreases over time
On-chain: the cost of reading trustlessly is maintaining full consensus with the Stellar network, which is likely to increase over time.
Off-chain: scales relatively well, see classic DHT literature and availability of global CDNs.
On-chain: central bottleneck. Engenders a trade off between the number of accounts/users and the amount of data per account. Potential DoS vector.
Developer Ease of use
Off-chain: No need to know about fees and reserves, can have a one-click interface (once privkeys are loaded in) to deploy new metadata globally instantly. Data files could be queried by (account_id, key) pairs or by a homedomain specified server.
On-Chain: Need to know about fees and reserves and have enough for an account to add the relevant data entries. More restrictions on data formats. Supporting many different protocols makes merging accounts harder potentially. When critical updates need to be applied to many data entries (e.g., because of a hacked service) then there is a flood of transactions writing data entries which raises fee rates forcing accounts to wait with stale data.
Off-chain: fine to support legacy features/versions forever as well as serve multiple versions for compatibility.
On-chain: protocols must be in step with SCP changes. E.g., if reserves increase protocols must increase it. If lengths change protocols must adapt. Etc. Difficult for multiple competing standards for things like namespaces to coexist.
Off-chain: Queries can go through a data provider of ones choice (perhaps your own as well). Data files could be stored encrypted to a set of keys for those who should be able to decrypt/have 'authenticated' data servers for groups wanting to share metadata privately in-group.
On-Chain: Every query must go through horizon. Data entries in general can't be stored on the network encrypted. All writes are observed by all.
Off-Chain: as censorship resistant as the internet at large.
On-Chain: as censorship resistant as the stellar network ( < internet).
Once again on-chain vs off-chain is a limiting view because both solution have their own set of properties that can fit different use cases: this is why you couldn'd find a proper off-chain alternative for multisig boostrapping SEP on the mailing list.
Your analysis is interesting but if at that time you're still trying to prove that account data entries are useless it means you did not took in account the peer input you asked for in the first place.
I'm not sure what you mean by that. The formal process I'm trying to emulate here is something like:
I'm not trying to prove that data entries are useless, I'm trying to show that they don't meet the requirements/properties we'd like to see out of a data solution.
I'm unclear what it insinuates that I haven't taken into account peer input in this discussion. I used peer input to form my requirements and properties proposed in step 2 and the overall goal in step 1.
If you disagree with my conclusion in step 5 that off-chain solutions offer stronger benefits, then please extend the analysis in step 4.
If you don't have more to add at step 4, then I (or someone else who cares about this issue) will begin to draft something for step 6. The reason you haven't seen someone propose a proper alternative using off-chain data is that it hasn't been described yet -- protocol development takes time and there are limited engineering resources in general.
referenced this issue
Dec 17, 2018
When someone decide to put himself in lead of solving an issue, I'd expect him
In particular, Paul & I took the time to explain and demonstrate that account
Also, as I repeatedly said, the present analysis has the bias of opposing two
For example step 4 is only right in case you want to publish data without using
Then there are use cases where privacy/cost to read/cost to
As the analysis miss any real case scenario, it is easy to overlook elements
We also find a bias toward off-chain conclusion in the fact that what is
I sincerely welcome your effort to getting us toward a better understanding and
To summarize, the problematic points I see on the proposed analysis are:
I'd like those points to be addressed this way:
Seems like a popular use case of Stellar is to attach IPFS hashes to data entries: https://galactictalk.org/d/433-stellar-should-have-a-big-memo-or-data
Note: some in the thread are arguing that we should go even further with data entries.