New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Data Entries #221

Open
JeremyRubin opened this Issue Dec 2, 2018 · 16 comments

Comments

Projects
None yet
3 participants
@JeremyRubin
Copy link
Contributor

JeremyRubin commented Dec 2, 2018

See #199 for some discussion.

The data entries API doesn't really make sense for most use cases -- in most cases; the person is better served doing the thing off-chain.

The only documented use case I could find with a cursory search is https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/stellar-dev/mzIP9OuegyI/bnPP0LVcAAAJ.

Pinning Stellar.toml is a nice idea, but there are so many practical issues with the approach, something much simpler makes more sense -- issuers provide time-windowed certificates which guarantee a period where the toml is up to date and if you issue a contract with a mintime/maxtime matching it you should expect (but not be guaranteed) for it to work.

Compatible Removal

I propose that we soft-deprecate the data entries by removing data entries from the public facing APIs while keeping them in consensus should someone be relying on the functionality currently. We should also strongly communicate the deprecation, and collect community feedback for a period of one year.

After the one year feedback period, a policy layer (read -- the software that nominators use to select transactions, but is not enforced by validators otherwise) should block transactions with data entry modifications.

Then, after an additional period, we can explore significantly increasing the reserve for Data Entries -- this enables contracts relying on Data Entries to still be processed if the person sends additional fund to the account to cover the unplanned excess.

I'm not sure I'd ever advocate full removal of data entries, given that a user may already have a compatibility issue, but perhaps with a long enough horizon for community input we can fully remove the feature.

Repairable Removal

Alternatively, I propose that we, in the next update, fully remove data entries (transactions with that operation are invalid). If there is strong motivation for the feature in the future, I propose that we could then add back an identical feature such that those transactions would again become valid, or a new type of data entry could be added to better match the needs of users. This is semantically somewhat equivalent to dramatically increasing the minfee for a data entry operation (from the perspective of an engineer implementing a system to handle data entries).

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 2, 2018

I have a lot of data about data entries and their usefulness; I'll try to go to
the point.

Why would you store data on a decentralized Ledger?

The same reason that you're posting any operation: you want authenticated
information to be publicly available in a resilient manner.

Decentralized database are actually going to solve a lot of security issues that
were a pain to deal with so far.

What are the possible use-cases?

Simple ones:

  • Storing your public key
  • Storing your wallets address
  • Storing dApps configuration
  • Storing hash of software/files/source-code

Advanced one (dApp based on a side-chain):

  • Blog / forum / decentralized social media
  • Decentralized DNS

How does it fit in Stellar?

This feature is needed so we extend account configuration without having to
modify the protocol. It's what is used in "Bootstrapping Multisig Coordination",
and it could have been used for "homeDomain" and "inflationDest" as well.

In the future, we can expect external services to introduce vendor-specific
configuration as well. We can also expect new use cases to emerge from this
feature; for example, I'm promoting the idea of storing webapp signatures on the
ledger in the web standard milieu.

Anything else?

By simplifying the process of creating custom network, Stellar got its chances
in becoming the mySQL 3.0. The consequences of such an achievement would be
absolutely huge for the project.

Can this feature be abused?

Yes, but...

In term of spam, this feature is more expensive that others operation like
sending a payment of a stroop with an URL as memo or spamming tiny trades.

It term of message sending potential, the cost of one base fee per 64 bytes is
already a significant filter on what one would be willing to send over the
ledger.

Can we prevent arbitrary data publishing on the ledger?

No.

There are many ways to publish arbitrary data on the ledger, some of them
being cheaper than manageData. Removing the manageData operation will
only prevent legit uses.

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

JeremyRubin commented Dec 3, 2018

Thanks for taking the time to respond.

The same reason that you're posting any operation: you want authenticated
information to be publicly available in a resilient manner.

So there are two claims here: the data is authenticated and the data is publicly available.

Authenticated data can be handled by bundling the data with signatures from the account in question. Public availability can be handled by a separate DHT, perhaps with some notion of version control.

Notably, stellar-core has no requirement (afaict) to store more than 32-bytes per data entry -- the implementation can discard the data immediately and keep a 32 byte (even salted!) hash of the key, without breaking any consensus rules.

What are the possible use-cases?

Simple ones:

Storing your public key

Is this a different one than mentioned?

Storing your wallets address
Storing dApps configuration
Storing hash of software/files/source-code

Advanced one (dApp based on a side-chain):

Blog / forum / decentralized social media
Decentralized DNS

Again, there is no storage requirement for data entries (beyond hash) for a stellar-core node.

A system which tracks the long-term queryable state should be built as an entirely separate 'image' off of the ledger set. This points to including data as a type of memo or something in a transaction, but not storing account state.

How does it fit in Stellar?

This feature is needed so we extend account configuration without having to
modify the protocol. It's what is used in "Bootstrapping Multisig Coordination",
and it could have been used for "homeDomain" and "inflationDest" as well.

In both of these protocols, the homedomain serving a stellar.toml is sufficient for the data to be served to the world. Furthermore, the homedomain isn't strictly needed as a separate DHT could include attestations for various accounts as to what their metadata is (incl routing to a homedomain).

In the future, we can expect external services to introduce vendor-specific
configuration as well. We can also expect new use cases to emerge from this
feature; for example, I'm promoting the idea of storing webapp signatures on the
ledger in the web standard milieu.

Storing them in the ledger, maybe, but in the account state, perhaps not.

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 11, 2018

Would be nice to have at least a solid rational for deprecating data entries before continuing to push that proposal on other threads.

Statement that data is better off-chain is actually not true: on-chain data is a common requirement for dApps - else it wouldn't be called decentralized.

Several services relying on data entries already exist and this feature is actually valuable so except if there's a critical flaw to fix this operation must be maintained.

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

JeremyRubin commented Dec 11, 2018

I think the main points I'm interested in are:

  1. Storing them on chain is expensive, and needs to/will become more expensive over time. We may even explore making reserves quadratic with respect to number of entries (I'm not in favor for various reasons, but it's on the table).
  2. The protocol doesn't guarantee availability of the data (no validation state is value dependent)
  3. Off-chain data is sufficient, and in many cases, better (no size restrictions, no weird formatting requirements, etc) than on-chain data.
  4. Desire to store metadata about accounts which you don't have control over

With respect to the argument that it's decentralized that's a miscategorization. Data entries are actually more centralized than just "information" which can be served by anyone with no central consistency. Getting stellar data requires running a validator or trust. Most of the protocols I've seen would be better off with a signed serialized XDR blob.

It's unfortunate that there are existing services using Data Entries, but I'm happy to help them figure out how to use a more pragmatic approach if they are incapable of doing so themselves.

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 11, 2018

3 of those points are opinions about how one should design its software.

Now that's clearly not a consensual opinion. You have the right to not like that feature but that's not a good reason to prevent everybody else to use it.

The point about data value not being enforced by the protocol is indeed an issue. That won't be solved by removing this op, though.

On-chain data storage have its use case - this is a known fact and that's why blockchains implementing smart contract also offer data storage capabilities. I don't think we can go forward without you aknowledging this fact.

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

JeremyRubin commented Dec 11, 2018

Yes, hence I've proposed deprecating but not removing. As stewards of the protocol, there's an obligation to encourage good software engineering.

I agree there is a need to store data on chain -- I'm not sure where you ever got the idea I don't beleive that. However, I think that on-chain data storage should be minimally consumed and I'm yet to see a use case which truly benefited from using data entry in specific.

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 11, 2018

Well multisig coordination bootstrapping SEP is an excellent exemple. It is fairly well designed by the way - so I'm not even saying that protocol writers shouldn't encourage good design; What I'm saying is that your personal opinion about what is a good design is not necessarily shared by everybody and that there are good designer out there that are using account entries for good reasons.

Now we can play with words but practically if the feature is removed from Horizon before a better alternative is available it will break things badly and this is not a good practice.

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

JeremyRubin commented Dec 13, 2018

I'd love to work towards an objective framework for smart contract data so that use cases and designs can be evaluated appropriately.

I'm trying in good faith to be as opinion less on this issue as possible and objectively evaluate current data entry practices, but I'm fallible so sorry if you haven't felt that my objectivity is transparent. I very much want people to build easy to use, inexpensive, robust, etc software and pave a forward looking path for the stellar network -- that's my only goal. Data Entries did not, as far as I know, kill my grandfather or something.

I think there's a couple things we can think about w.r.t. data functionality that help us work towards a more concrete evaluation of the relative merits of approaches:

  1. Authentication
  2. Revocation
  3. Consistency
  4. Availability
  5. Data Structure
  6. Cost to Write
  7. Cost to Read
  8. Scalability
  9. Developer Ease of use
  10. API Stability
  11. Privacy
  12. Censorship Resistance

Are there other properties you care about? Naturally, I've biasedly selected a set of issues where I think the benefits of off-chain data win (the causality is reverse though; I think off-chain data wins because of these factors), but if there are categories I haven't considered please share and then we can work towards a more rigorous evaluation.

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 14, 2018

Of course! My point were not to throw doubt at your intentions but to insist that there are other beings equally motivated & competent who think that data entries are needed. In fact the discussion about the multisig coordination boostrap SEP proved it true.

I'd also underline the fact that blockchain are mostly an unknown territory and nobody knows beforehand which kind of mighty invention will come out of this or that functionality. But we definitely knows that account data entry expose an unique set of property that can't be found off-chain:

  • It respects the signers setup
  • It is public
  • It is immutable
  • Its history is immutable
  • It doesn't depend on external/centralized services
  • So it is trustless

It is really two different solutions and there's no such thing as "on-chain is better" or "off-chain is better".

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

JeremyRubin commented Dec 16, 2018

Ok, so if I can synthesize between our posts:

  1. It respects the signers setup == Authentication
  2. It is public == Availability
  3. It is immutable == Authentication + Consistency
  4. Its history is immutable == Availability
  5. It doesn't depend on external/centralized services == Censorship Resistance + Developer Ease of Use
  6. So it is trustless == Censorship Resistance

So the 12 properties cover all the benefits you think that Data Entries bear?

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 16, 2018

I suppose, yes.

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

JeremyRubin commented Dec 17, 2018

For each of those topics I believe off chain data has preferable properties. It's not an exhaustive analysis, happy to go into more detail if you disagree on an individual category.

Authentication

Off-chain data may be signed by the current signers on an account as well as by third party authenticators (e.g., if a field specifies that a service provider is to be used, that service provider may sign to corroborate that the account is a customer). If using the homedomain for hashed content, the data is hash authenticated which is useful for accounts without signers.

On-chain data was signed by the current or prior signers on an account. Current signers may be unaware of the data entries set. Race condition to unset data entries after changing keys.

Revocation

Off chain data can be revoked by using a version number scheme (signing saying it's valid up-to sequence number current + 1 billion and using sequence bump to speed up invalidation), TTL, or by rotating keys. It can be made irrevocable (in some sense) using hash authentication for accounts with no signers.

On chain data can be revoked by removing it from the account in question.

Consistency

Off chain data consistency guarantees are strong given the account owners want consistency, e.g., they properly use version numbers when modifying data.

On chain data consistency guarantees are weak -- data is guaranteed to be consistent within an SCP quorum, but are not guaranteed to be consistent with what a transaction signer anticipated them to be when they created the transaction due to interleavings.

Availability

Off chain data has no inherent availability guarantees. However, with proper mirroring infrastructure, off chain data can be made more available than the Stellar network itself as just serving data can be served via static file servers/edge caching infrastructure and requires reaching only one node which claims to have the data. Classic DHT literature applies...

On chain data is not guaranteed to be externally visable in the protocol. This may be amended, but is not the case now. The data is as available currently as a horizon node on the network. Without talking to a trusted set of horizon nodes, the data is fully unauthenticated (e.g. if served from caches) so we do not include this as a potential for availability.

Data Structure

Off-chain data supports arbitrary data structures.

On-chain is currently limited to (char[64], char[64]) tuples.

Cost to Write

Off chain: the cost to write is very cheap and likely decreases over time.

On-chain: the cost to write is expensive, and likely to increase in price over time.

Cost to Read

Off-chain: the cost to read is very cheap and likely decreases over time

On-chain: the cost of reading trustlessly is maintaining full consensus with the Stellar network, which is likely to increase over time.

Scalability

Off-chain: scales relatively well, see classic DHT literature and availability of global CDNs.

On-chain: central bottleneck. Engenders a trade off between the number of accounts/users and the amount of data per account. Potential DoS vector.

Developer Ease of use

Off-chain: No need to know about fees and reserves, can have a one-click interface (once privkeys are loaded in) to deploy new metadata globally instantly. Data files could be queried by (account_id, key) pairs or by a homedomain specified server.

On-Chain: Need to know about fees and reserves and have enough for an account to add the relevant data entries. More restrictions on data formats. Supporting many different protocols makes merging accounts harder potentially. When critical updates need to be applied to many data entries (e.g., because of a hacked service) then there is a flood of transactions writing data entries which raises fee rates forcing accounts to wait with stale data.

API Stability

Off-chain: fine to support legacy features/versions forever as well as serve multiple versions for compatibility.

On-chain: protocols must be in step with SCP changes. E.g., if reserves increase protocols must increase it. If lengths change protocols must adapt. Etc. Difficult for multiple competing standards for things like namespaces to coexist.

Privacy

Off-chain: Queries can go through a data provider of ones choice (perhaps your own as well). Data files could be stored encrypted to a set of keys for those who should be able to decrypt/have 'authenticated' data servers for groups wanting to share metadata privately in-group.

On-Chain: Every query must go through horizon. Data entries in general can't be stored on the network encrypted. All writes are observed by all.

Censorship Resistance

Off-Chain: as censorship resistant as the internet at large.

On-Chain: as censorship resistant as the stellar network ( < internet).

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 17, 2018

Once again on-chain vs off-chain is a limiting view because both solution have their own set of properties that can fit different use cases: this is why you couldn'd find a proper off-chain alternative for multisig boostrapping SEP on the mailing list.

Your analysis is interesting but if at that time you're still trying to prove that account data entries are useless it means you did not took in account the peer input you asked for in the first place.

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

JeremyRubin commented Dec 17, 2018

I'm not sure what you mean by that. The formal process I'm trying to emulate here is something like:

  1. Statement of goal
  2. Proposed requirements/relevant properties
  3. Agreement on requirements and relevant properties
  4. Analysis of families of solution based on the requirements and properties
  5. Agreement on which famil(ies) of solutions offers the best trade offs
  6. Proposal of concrete solution in the agreed on families
  7. Agreement that proposed solution meets goal.

I'm not trying to prove that data entries are useless, I'm trying to show that they don't meet the requirements/properties we'd like to see out of a data solution.

I'm unclear what it insinuates that I haven't taken into account peer input in this discussion. I used peer input to form my requirements and properties proposed in step 2 and the overall goal in step 1.

If you disagree with my conclusion in step 5 that off-chain solutions offer stronger benefits, then please extend the analysis in step 4.

If you don't have more to add at step 4, then I (or someone else who cares about this issue) will begin to draft something for step 6. The reason you haven't seen someone propose a proper alternative using off-chain data is that it hasn't been described yet -- protocol development takes time and there are limited engineering resources in general.

@MisterTicot

This comment has been minimized.

Copy link

MisterTicot commented Dec 18, 2018

When someone decide to put himself in lead of solving an issue, I'd expect him
to understand, summarize and include peer inputs in its analysis. I hope you're
not intentionally leaving out arguments that doesn't go your way, but
unfortunately that's my impression.

In particular, Paul & I took the time to explain and demonstrate that account
data entry have a set of required properties that we are currently unable to
reproduce off-chain. This fact must figure in the issue analysis.

Also, as I repeatedly said, the present analysis has the bias of opposing two
complementary solutions. The angle is to tell which is better between on-chain
or off-chain solution - with a clear intention of ruling on-chain out. That
leads to overlook the actual complexity of the subject. Depending on the
situation, an application designer could choose one option or the other - or
both. The issue analysis must account for that possibility.

For example step 4 is only right in case you want to publish data without using
any other Stellar functionality. If you're building a dApp, you will use the
Stellar API anyway, so off-chain solution comes with the burden of that having
to use an external service on top of that. So that:

  • Hurts availability: availability of Stellar API is higher than availability of
    Stellar API + an off-chain solution.
  • Hurts API stability: you have to handle two APIs instead of one.
  • Hurts Developer ease of use: for the same reason.
  • Hurts consistency: you have to maintain consistency between two services.

Then there are use cases where privacy/cost to read/cost to
write/revocability/data size are not relevant. That's how you end up having some
sort of applications that would be better handled with on-chain data.

As the analysis miss any real case scenario, it is easy to overlook elements
that would prove the need for an on-chain data solution. In fact, trustlessly
publishing off-chain data in relation to an account or a transaction require
linking it somehow from the Ledger. Extending account configuration require
on-chain data. And so on...

We also find a bias toward off-chain conclusion in the fact that what is
compared is an ideal theoretical off-chain solution Vs. an admittedly flawed
on-chain solution. It must either compare an actual off-chain solution with an
actual on-chain solution, or an ideal on-chain solution with an ideal off-chain
solution.

I sincerely welcome your effort to getting us toward a better understanding and
handling of ledger-related data publishing. However, the general impression I'm
getting from this first attempt is that it started from the conclusion (we must
remove data entries), and that the analysis have been written accordingly to
that goal.

To summarize, the problematic points I see on the proposed analysis are:

  • It doesn't take in account existing practical use cases that demonstrate the
    need for data entries.
  • It opposes two options that are in fact complementary.
  • The off-chain solution that would have all the mentioned properties doesn't
    actually exist.
  • It doesn't mention that linking off-chain data from the ledger require
    on-chain data anyway.
  • It is not so clear if the goal is to make a point or to provide an objective
    analysis of the issue.

I'd like those points to be addressed this way:

  • Taking in account that a form of account-linked data entry is required.
  • Re-orienting the analysis toward an evaluation of each solution unique
    properties.
  • Linking the analysis to actual use cases to prevent to get lost in
    abstraction.
  • Clarifying which kind of off-chain solution we're talking about.
  • Clarifying how off-chain data is referred from the ledger.
  • Clarifying whether the goal of this work is to find solution that account for
    all use cases or removing data entries no matter what.
@pselden

This comment has been minimized.

Copy link

pselden commented Dec 20, 2018

Seems like a popular use case of Stellar is to attach IPFS hashes to data entries: https://galactictalk.org/d/433-stellar-should-have-a-big-memo-or-data

Note: some in the thread are arguing that we should go even further with data entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment