Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERC-1900: Decentralized Type System for EVM #1882

Closed
loredanacirstea opened this issue Mar 28, 2019 · 25 comments
Closed

ERC-1900: Decentralized Type System for EVM #1882

loredanacirstea opened this issue Mar 28, 2019 · 25 comments
Labels

Comments

@loredanacirstea
Copy link
Contributor

loredanacirstea commented Mar 28, 2019

The current draft can be found at https://eips.ethereum.org/EIPS/eip-1900

In-work implementation: https://github.com/pipeos-one/dType, along with a list of all related EIPs, articles and demo videos.

@loredanacirstea loredanacirstea changed the title ERC-xxxx Decentralized Type System for EVM ERC-1882 Decentralized Type System for EVM Mar 28, 2019
@OFRBG
Copy link

OFRBG commented Mar 29, 2019

Kinda related: https://github.com/ewasm/design. I don't think it's desired to add overhead to L1 and L0. This would work better as a "TypeScript" for Solidity. Remix already includes some static analysis, which could be extended for an extended-type Solidity.

@loredanacirstea
Copy link
Contributor Author

I don't think it's desired to add overhead to L1 and L0.

What is the overhead that you see? (so, I can properly answer)

@OFRBG
Copy link

OFRBG commented Mar 29, 2019

Adding extra lines of code of structs and memory allocation. IMHO Solidity code should be as short as possible, and higher level abstractions, such as types and pseudo-HOF should be used in a different dialect that finally compiles to native Solidity.

@loredanacirstea
Copy link
Contributor Author

@OFRBG ,
Regarding overhead: Solidity itself is an overhead over the bytecode. The question is not if an overhead exists, but whether it is justified or not. So, what did you understand as being the benefit of having a decentralized type system and why doesn't it justify this overhead? (I just published https://medium.com/@loredana.cirstea/a-vision-of-a-system-registry-for-the-world-computer-be1dc2da7cae if you want to read more about the vision).

In order to achieve functional programming, you need higher order functions (HOFs) in Solidity. You can have dType itself without the overhead of adding HOFs, but if multiple projects need the same libraries, it is an overhead to not standardize.

@kjekac
Copy link

kjekac commented Mar 30, 2019

I really really like the general idea of this. Having a type system decoupled from the contract language would make language interoperability (and hence language experimentation) much easier, and generally just make all the data stored on-chain much easier to compute with.

However, I would personally want to detach this from anything C-like (struct, ...) and go straight to proper type theory, which would make things less language-bound, as well as giving us nice properties for free. For example it would be relatively straight forward to find a normal form of a particular type, and thus be able to automatically convert data of one type to another, even though they were defined differently syntactically. It would also generally be easier to prove things about contracts, given that you would have fewer cases and very clear semantics.

I previously worked on a project called Typedefs and one of the long-term ideas there is to put it on e.g. IPFS as a kind of "global type system" that any data in any computing system can reference, to tell people/programs how it can be deconstructed and used. The whole project is built on an extremely simple core, it doesn't give you any primitive types except for Unit and Void (both in the functional programming sense, so not the C-like void), but using these together with combinators and recursion, you can build up types that are isomorphic to basically any type you could wish for. For example, in pseudocode:

Boolean := Unit + Unit
Char8 := Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean
String := (Char8 × String) + Unit

Of course, this is horribly inefficient in itself, but when writing a backend for a specific language/platform, you can define specializations to utilize the primitives and/or standard library types that you have available, as long as you can provide encode/decode functions between these and this minimal/universal representation. This makes the type system itself as language agnostic as possible, while still maintaining the full power of any modern type system. Building a backend for Solidity itself should be fairly trivial, the interesting thing would be how to make it "blockchain-aware", so to speak. I don't have time at the moment but will try to get back with a few thoughts on how to go about that in the coming week.

@wires
Copy link

wires commented Mar 31, 2019

Great comment by @kjekac, really well explained!

I see the comments regarding overhead at the lower levels, but let me try to twist that kind of thinking a bit.

Something as fundamental as the types of the inputs and outputs to functions can have a huge impact on the complexity at the higher levels. We well chosen type theory can constrain behaviour and maintain hold on complexity. An unfortunate flaw in the design and you feel it all over the place (https://developers.slashdot.org/story/09/03/03/1459209/Null-References-the-Billion-Dollar-Mistake).

Let me know if you have any questions, I second that it is a good idea and that typedefs can be applied to this.

@loredanacirstea
Copy link
Contributor Author

@kjekac ,

For dType, our focus was:

  • full Ethereum compatibility
  • extensibility to other languages
  • concept-oriented rather than type-oriented
  • global consensus on types and data formats
  • easily determining the type of a transaction side effect or constant function value

There are some type theory features that dType does not have, due to Solidity's limitations:

  • converting a dtype into another dtype with different structure
  • choice operator
  • void and unit types
  • direct type inheritance (you have composability instead)
  • recursivity of type definition

However, I would personally want to detach this from anything C-like (struct, ...) and go straight to proper type theory, which would make things less language-bound, as well as giving us nice properties for free.

I understand why this would be great.
But for dType, we actually wanted it to be fully Ethereum compatible, so you can do things programmatically on-chain (e.g. our functional programming video example: https://youtu.be/pcqi4yWBDuQ). dType stores the minimum usable ABI definition information.

We also wanted the type to be contained in the ABI description. Meaning, for example, that anyone should be able to determine the type (& metadata, implementation libraries etc.) of a return value given that value, ABI definition & Type Registry.
This can be achieved with structs, but I am not sure how it could be achieved (and with what effort) with non-native types, that are transpiled to Solidity. Maybe you can help with an example.

For example it would be relatively straight forward to find a normal form of a particular type, and thus be able to automatically convert data of one type to another, even though they were defined differently syntactically.

Yes, with dType you cannot convert from one type to another (only dtype <-> bytes). However, our approach is more concept-oriented, even more that type-oriented. In a way, we want to incentivize consensus on concept definitions across projects and convertibility was not a focus.

I previously worked on a project called Typedefs and one of the long-term ideas there is to put it on e.g. IPFS as a kind of "global type system" that any data in any computing system can reference, to tell people/programs how it can be deconstructed and used.

We thought of the same thing - extending the system to any language - our current implementation has an additional lang descriptor for each type -https://github.com/ctzurcanu/dType/blob/5e71ee683a167bd1b796f6ea07c41407be54aa0f/contracts/contracts/dTypeLib.sol#L6-L9. The idea was to also provide type checking against the blockchain (e.g. isType(type_name) & destructure(type_value) are some initial tools that we have implemented).

And we have bytes32 source, which is the Swarm hash for the type's source code - either Solidity libraries and contracts or source code in other languages.

Boolean := Unit + Unit
Char8 := Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean
String := (Char8 × String) + Unit

While we have x (product/tuple) - structs are essentially packed tuples, we don't support + (co-product/choice operator), so we do not have optional type components for example - and I do not see an easy/efficient way of supporting this. If you have a solution (for Solidity), we would like to discuss it.

the interesting thing would be how to make it "blockchain-aware", so to speak. I don't have time at the moment but will try to get back with a few thoughts on how to go about that in the coming week.

I would be interested in this, thank you.

@loredanacirstea
Copy link
Contributor Author

A draft has been submitted as a PR: #1900
An extension to this proposal can be found at #1921
I published a new blog post dType — Decentralized Type System & Functional Programming on Ethereum, explaining some of the concepts.

@loredanacirstea loredanacirstea changed the title ERC-1882 Decentralized Type System for EVM ERC-1900: Decentralized Type System for EVM Apr 11, 2019
@Arachnid
Copy link
Contributor

A few thoughts:

  • The spec could use a clear definition of what a type is, and what it's composed of. As it's written we're left to infer that from the data structures.
  • It's not clear to me why you'd want a key/value datastore for each type you define (the 'Storage Contract'). What's the motivation here?
  • One interface can have many implementations, but there doesn't seem to be any distinction made between the two here.
  • Given that it seems likely that the database will mostly be read not by other contracts but by offchain tools, it would make sense to use a compact encoding scheme such as CBOR or Protocol Buffers to encode type information, for efficient storage.
  • You should define the purpose of the fields in the structs where they're first encountered, not down the bottom.
  • Making the type registry mutable introduces a large amount of additional complexity. Why not make it immutable, so that a type identifier can be guaranteed to be a stable reference to a type?
  • What is a type library for, and why does it have to serialize and deserialize types?
  • Requiring each type to be accompanied by a 'type root contract' seems like a lot of overhead, that will discourage defining new types.
  • The 'source' field needs more definition. What language is it? Where can it be found?
  • "Initially single word names will be disallowed, to avoid name squatting" - are human-readable names going to be primary identifiers here? Can you specify that explicitly somewhere? Why is it necessary that a human-readable type name be unique (or even part of the canonical definition of the type)?

loredanacirstea added a commit to loredanacirstea/EIPs that referenced this issue Jun 27, 2019
Updating ERC-1900 after suggestions from ethereum#1882 (comment):
- clearer explanation of what a type is and what it's composed of
- defining the purpose of the struct fields when they are first encountered
- mentioning type immutability and the possibility of removing the dType `remove` function
- adding more information about the `source` field
- mentioning human readable names as a primary identifier for types

Additionally:
- removed the TypeStorage contract description, postponing it for a future EIP, due to multiple storage patterns being researched
loredanacirstea added a commit to loredanacirstea/EIPs that referenced this issue Jun 27, 2019
Updating ERC-1900 after suggestions from ethereum#1882 (comment):
- clearer explanation of what a type is and what it's composed of
- defining the purpose of the struct fields when they are first encountered
- mentioning type immutability and the possibility of removing the dType `remove` function
- adding more information about the `source` field
- mentioning human readable names as a primary identifier for types

Additionally:
- removed the TypeStorage contract description, postponing it for a future EIP, due to multiple storage patterns being researched
@loredanacirstea
Copy link
Contributor Author

@Arachnid, thank you for the feedback!

I updated the draft and restructured it, to solve the following points from #1882 (comment):

  • (1) clearer explanation of what a type is and what it's composed of
  • (2) I removed the TypeStorage contract description, postponing it for a future ERC (currently researching multiple storage patterns)
  • (5) defining the purpose of the struct fields when they are first encountered
  • (6) mentioning type immutability and the possibility of removing the dType remove function
  • (7) clearer explanation regarding the type library and structureBytes/destructureBytes
  • (8) TypeRootContract will be the type library address in the context of this ERC.
  • (9) adding more information about the source field
  • (10) mentioning human-readable names (+ version number) as a primary identifier for types

Replying to your other points:

(2) It's not clear to me why you'd want a key/value datastore for each type you define (the 'Storage Contract'). What's the motivation here?

We would like to make Solidity more functional and that means that the data should be harmonized (well formatted) and kept in the same place.
We are thinking about other storage patterns as well - instead of one smart contract, distributed among many user contracts.

(3) One interface can have many implementations, but there doesn't seem to be any distinction made between the two here.

Are you referring to the dType registry interface and implementation or to the type library?

(4) Given that it seems likely that the database will mostly be read not by other contracts but by offchain tools, it would make sense to use a compact encoding scheme such as CBOR or Protocol Buffers to encode type information, for efficient storage.

We expect most of the data to be used on-chain as well.
There will be exceptions where a type field will be bytes, using a compact encoding scheme: when we are defining a type for another language than Solidity. In this case, that type will be fully defined (in an un-encoded way, with all sub-fields) in the Swarm source file referenced in source.

(6) Making the type registry mutable introduces a large amount of additional complexity. Why not make it immutable, so that a type identifier can be guaranteed to be a stable reference to a type?

That is a good idea.
Immutability will follow once it becomes precompiled. It may also save a significant amount of gas. But that is part of subsequent ERCs

(8) Requiring each type to be accompanied by a 'type root contract' seems like a lot of overhead, that will discourage defining new types.

It is a lot of overhead involved. But also makes things very well defined in case the community votes the inclusion of the new types into precompiles. Afterwards, the overhead shrinks considerably. Nevertheless, we are defaulting the TypeRootContract address to the type library address for this ERC and moving this discussion to a future ERC detailing storage patterns for type data.

(9) The 'source' field needs more definition. What language is it? Where can it be found?

language and source fields were to be treated in detail in a future ERC. language = 0 for Solidity and it will be an enum. source is a pointer to the source code in the set language on Swarm or another distributed file system.

(10) "Initially single word names will be disallowed, to avoid name squatting" - are human-readable names going to be primary identifiers here? Can you specify that explicitly somewhere? Why is it necessary that a human-readable type name be unique (or even part of the canonical definition of the type)?

For naming, we should probably adopt capitalized camelback standard. The id is calculated as a keccak(language, name) so we prepare for future adoption of data bridges between types in different languages that have the same name. We would like to treat declarations in Solidity of the form:

TypeName varOfType = <instance data>;

A search on the dType registry would be:

 // language 0 is Solidity
find({name= “TypeName”, language= 0})`

This search will return the address of the library, the id of the type, set the correct data structure. Potentially a precompile could be run by a new opcode to do the same thing (not covered in this ERC).

@Arachnid
Copy link
Contributor

We would like to make Solidity more functional and that means that the data should be harmonized (well formatted) and kept in the same place.
We are thinking about other storage patterns as well - instead of one smart contract, distributed among many user contracts.

What does this mean? Can you give an example use-case?

(3) One interface can have many implementations, but there doesn't seem to be any distinction made between the two here.

Are you referring to the dType registry interface and implementation or to the type library?

No, I'm talking about the distinction between a type definition and an implementation. For example, the ERC20 standard defines some types, which are implemented by a large number of different contracts. Capturing the ability for a type to have many implementations seems like a pretty basic feature to support.

We expect most of the data to be used on-chain as well.

Can you give an example use-case for using type data onchain?

Immutability will follow once it becomes precompiled. It may also save a significant amount of gas. But that is part of subsequent ERCs

But why support mutable types at all? Once it changes, it stops being the same type; the type's fields etc are fundamental to what it is. Any change also likely breaks compatibility with anything using it.

language and source fields were to be treated in detail in a future ERC. language = 0 for Solidity and it will be an enum. source is a pointer to the source code in the set language on Swarm or another distributed file system.

If you don't define how these are used in the ERC where they're declared, it seems likely that you'll never be able to use them coherently, because there will be no universal expectation over the content stored in them.

For naming, we should probably adopt capitalized camelback standard. The id is calculated as a keccak(language, name) so we prepare for future adoption of data bridges between types in different languages that have the same name. We would like to treat declarations in Solidity of the form:

It's not clear to me why you need human names as primary identifiers at all. Why not use the typehash, just like Solidity does for function signatures? This addresses both mutability and name collisions. If you must have a human readable identifier, you could use ENS, for instance, to point a human readable name to a typehash.

@loredanacirstea
Copy link
Contributor Author

@Arachnid ,

What does this mean? Can you give an example use-case?

Published an in-work draft #2158 for the storage extension, to clarify motivation. Discussions at #2157.

@loredanacirstea
Copy link
Contributor Author

@Arachnid ,

No, I'm talking about the distinction between a type definition and an implementation. For example, the ERC20 standard defines some types, which are implemented by a large number of different contracts. Capturing the ability for a type to have many implementations seems like a pretty basic feature to support.

Devs have the freedom to implement type helper functions, as long as the required ones are implemented (to be discussed). As for the definition, I am open to other proposals that not necessarily based on structs. I am actually trying to have optional subtypes, that can be stored with map. dType registry will be extended with an additional optionals field in the dType struct & a standardized way to define optionals in the Type Library, but this is not ready yet. Other than this, what can I do more to give more freedom to the implementation?

Can you give an example use-case for using type data onchain?

  • the registry must check that a type's subtypes are already part of the registry, so it must contain references to them.
  • on-chain calculation of a type's signature, given the type identifier (especially for functions), is more secure and should be the go-to reference

I have some in-work examples in the dType repo with on-chain permissions control based on the dType registry identifiers. E.g. fine-grained function permissions, that can also be used in the storage contracts. And in-work patterns for functional programming, which use function dType identifiers to mimic more complex HOFs.

But why support mutable types at all? Once it changes, it stops being the same type; the type's fields etc are fundamental to what it is. Any change also likely breaks compatibility with anything using it.

I removed the update function because it was an artifact from a previous version. Indeed, types should not be modified. I left the remove function in place, for discussion. Is this ok from your side?

If you don't define how these are used in the ERC where they're declared, it seems likely that you'll never be able to use them coherently because there will be no universal expectation over the content stored in them.
(language and source fields)

I started a draft at https://github.com/loredanacirstea/EIPs/blob/d6fbbff5f1a1ecfa1eee6f8efa4ca3d896303e38/EIPS/eip-dtype_language.md with details. I will make a PR soon. We wanted to separate the ERCs because some devs may agree with dType core but not with the language extension. The source field has a purpose by itself in ERC-1900, as specified. Let us know what you think.

It's not clear to me why you need human names as primary identifiers at all. Why not use the typehash, just like Solidity does for function signatures? This addresses both mutability and name collisions. If you must have a human readable identifier, you could use ENS, for instance, to point a human readable name to a typehash.

Our initial version was using keccak256(language, name, types) as an identifier. The drawbacks of this:

  • the same type might have some differences when used cross-language (see language extension draft, at "Encoded Types for Language-specific Use")
  • devs might want to get the type definition (& ABI) by name. I can see this used in editor plugins, that can have a small cache for type names without storing the entire data. Also, when searching for a type, duplicate names can be confusing - names should be properly defined after their purpose & content + a version number (specs TBD).

@Arachnid
Copy link
Contributor

Arachnid commented Jul 1, 2019

Devs have the freedom to implement type helper functions, as long as the required ones are implemented (to be discussed). As for the definition, I am open to other proposals that not necessarily based on structs. I am actually trying to have optional subtypes, that can be stored with map. dType registry will be extended with an additional optionals field in the dType struct & a standardized way to define optionals in the Type Library, but this is not ready yet. Other than this, what can I do more to give more freedom to the implementation?

I still think you're not understanding the difference between an interface and an implementation. An interface describes an API for other code to interact with, but not how it's implemented. What you're proposing here seems to be more along the lines of a directory of library code.

the registry must check that a type's subtypes are already part of the registry, so it must contain references to them.

If a type's ID is based on a serialization of its fields, then types are immutable and this doesn't matter; types can be inserted into the registry in any order.

on-chain calculation of a type's signature, given the type identifier (especially for functions), is more secure and should be the go-to reference

Can you qualify 'more secure'? What's your threat model?

I removed the update function because it was an artifact from a previous version. Indeed, types should not be modified. I left the remove function in place, for discussion. Is this ok from your side?

Removing a type seems like it would cause chaos if it's in use somewhere, and still requires you to maintain a permission model.

The source field has a purpose by itself in ERC-1900, as specified. Let us know what you think.

EIP 1900 currently says:

source - a bytes32 Swarm hash where the source code of the type library and contracts can be found; in future EIPs, where dType will be extended to support other languages (e.g. JavaScript, Rust), the file identified by the Swarm hash will contain the type definitions in that language.

I see several problems with this:

  • It ties the system indelibly to Swarm, and to Keccak256 hashing.
  • It doesn't indicate a content-type, or any other metadata.
  • There's no clear mechanism for someone to determine what language the code is in, or what format the file is.

I'm also not sure what the motivation here is; why would anyone need to fetch the source in this context?

It seems to me that this spec is a long way away from being a simple distributed type registry. It has a lot of unnecessary complexity, and doesn't make a clear distinction between interfaces and implementations. I wish you luck, but I don't plan to offer further technical feedback.

@loredanacirstea
Copy link
Contributor Author

loredanacirstea commented Jul 2, 2019

@Arachnid ,

I still think you're not understanding the difference between an interface and an implementation. An interface describes an API for other code to interact with, but not how it's implemented. What you're proposing here seems to be more along the lines of a directory of library code.

I asked you before: "Are you referring to the dType registry interface and implementation or to the type library?", to which you answered with "No, I'm talking about the distinction between a type definition and an implementation". I suggest you pay attention to how clearly you phrase your questions before being unsatisfied with the answer.
I thought it was clear that this ERC aims to both:

So aside from your destructive (as opposed to constructive) and your inexact criticism, what can I do? Is ERC not the correct category? EIP-1 definition is "ERC - application-level standards and conventions", so it seems right.

Can you qualify 'more secure'? What's your threat model?

On-chain calculation is the standard. Any off-chain calculation may have a faulty implementation. You need a standard to compare off-chain implementations - why is this unclear? The threat model is the same as for blockchain vs. off-chain data & behavior.

Removing a type seems like it would cause chaos if it's in use somewhere, and still requires you to maintain a permission model.

Correct. I am ok with removing the remove function, but I want more debate on it, from multiple parties.

It ties the system indelibly to Swarm, and to Keccak256 hashing.

It's a bytes32 identifier. Any storage solution that is compatible with bytes32 will do. If multiple storages are used, then we need (at least) another field to describe which one is used. I wanted to keep it simple for now and open to suggestions.

  • It doesn't indicate a content-type, or any other metadata.
  • There's no clear mechanism for someone to determine what language the code is in, or what format the file is.

But tell me a way for the EVM to check something that is off-chain. You know very well it cannot and this is not constructive criticism. We can, however, replace the bytes32 source hash with an EthPM package identifier, that contains bytecode and source matadata hash (containing the file extension).
For a unique registry, like we are proposing, we still need governance on top to verify correctness and consensus.

I'm also not sure what the motivation here is; why would anyone need to fetch the source in this context?

Non-centralized source code verification (see EthPM).

It seems to me that this spec is a long way away from being a simple distributed type registry. It has a lot of unnecessary complexity and doesn't make a clear distinction between interfaces and implementations.

To summarize, I understand the "unnecessary complexity" as making the type ABI computable on chain, as opposed to blackboxing it to a packed encoding that the EVM cannot decode. In this case, the complexity is beneficial and overhead-worth (if used by many, overhead decreases).
Otherwise, please define "unnecessary complexity" in the context of the ERC.

I wish you luck, but I don't plan to offer further technical feedback.

This is fine, I thank you for the effort and time. However, you do not present a way forward.
As per EIP-1 (https://github.com/ethereum/EIPs/blob/27ea3a138b8d54b9657d518cc4fac7d8fe8b3dfc/EIPS/eip-1.md#eip-editors), you are an editor with the following responsibilities:

  • "Read the EIP to check if it is ready: sound and complete. The ideas must make technical sense, even if they don't seem likely to get to final status."
  • "The editors don't pass judgment on EIPs. We merely do the administrative & editorial part."

You did not say that this ERC does not make technical sense. Your opposition is currently "unnecessary complexity", which you did not define properly. I am (and have been) open to improving anything that is not technically sound.

Therefore, I do not see a reason to deny merging of this ERC Draft to master. It is work in progress, as defined in EIP-1: "Once the first draft has been merged, you may submit follow-up pull requests with further changes to your draft until such point as you believe the EIP to be mature and ready to proceed to the next status."

If you do not want to approve and merge due to technical reasons, please clearly list what they are and what editor to ping when I solve them.
If you do not want to approve and merge due to non-technical reasons, conflicts of interest etc., please ping/appoint another editor for this job.

However, if you are not in your editorial capacity (and you need to specify that, as you are on the editor's list), we welcome debate and new ideas from a technical person, like yourself and others.

This ERC is important for Ethereum. Saying that you refuse to give further feedback without explaining why, goes against the ethos of the community: collaboration, effort decentralization and evolution of computing.
I have no commercial interest in this. I am a volunteer working for these values.

loredanacirstea added a commit to loredanacirstea/EIPs that referenced this issue Jul 2, 2019
Updating ERC-1900 after suggestions from ethereum#1882 (comment):
- clearer explanation of what a type is and what it's composed of
- defining the purpose of the struct fields when they are first encountered
- mentioning type immutability and the possibility of removing the dType `remove` function
- adding more information about the `source` field
- mentioning human readable names as a primary identifier for types

Additionally:
- removed the TypeStorage contract description, postponing it for a future EIP, due to multiple storage patterns being researched
@Arachnid
Copy link
Contributor

Arachnid commented Jul 10, 2019

@loredanacirstea I was offering technical feedback as an individual contributor, not as an editor. As you've decided to ignore nearly all of my feedback, wasting time on giving more of it seems pointless.

I never suggested I was acting as an editor, or gating merging the draft based on my technical critique.

@loredanacirstea
Copy link
Contributor Author

@Arachnid , you are on the EIP editor list, you are by default in an editor capacity, you should specify when you are not, when commenting.

To recap:

  • I did a substantial rewording in 1adf3de, based on your suggestions.
  • I answered all your questions and provided explanations when our view was different
  • I published an additional ERC that I previously wanted to postpone, just because I wanted to answer your questions better: Storage Extension Add ERC - dType Storage Extension - Decentralized Type System for EVM #2158). Even though I removed mentions of it from the current ERC.
  • I reworded ERC-1900 and started a draft on another ERC that I previously wanted to postpone (ERC-1900: Decentralized Type System for EVM #1882 (comment)) - Language Extension, just because you said "If you don't define how these are used in the ERC where they're declared, it seems likely that you'll never be able to use them coherently because there will be no universal expectation over the content stored in them."

I think my effort was more than enough to demonstrate that I did not ignore your feedback, but appreciated it.

You say that you wasted your time because I did not necessarily agree with all of your suggestions, while I did provide arguments.
Still, you did not find the time to review this ERC as an editor. An ERC that you already read, never said that it was a bad idea, was abiding by EIP-1 rules and that could be improved and discussed after merging it as a Draft and getting an identifier.

These are facts.

@Arachnid
Copy link
Contributor

you are on the EIP editor list, you are by default in an editor capacity, you should specify when you are not, when commenting.

As I've said before in other PRs, editors are not primarily technical reviewers of EIPs, because that's not scalable. When I'm making requests for changes in a PR in order to merge it, I'm acting as an editor. When I'm discussing technical proposals in general, I'm clearly not - because that's not part of an editor's job.

Still, you did not find the time to review this ERC as an editor. An ERC that you already read, never said that it was a bad idea, was abiding by EIP-1 rules and that could be improved and discussed after merging it as a Draft and getting an identifier.

I only saw this issue because you drew my attention to it via Twitter. I didn't notice the associated PR, or I would have merged it as draft as soon as it met the typographical requirements.

I'll make one more attempt at clarifying a couple of misunderstandings:

I asked you before: "Are you referring to the dType registry interface and implementation or to the type library?", to which you answered with "No, I'm talking about the distinction between a type definition and an implementation". I suggest you pay attention to how clearly you phrase your questions before being unsatisfied with the answer.

This is a type:

interface IFoo {
  function doFoo(uint x) view returns(uint);
}

This is an implementation:

contract Foo implements IFoo {
  function doFoo(uint x) view returns(uint) {
    return x * 2;
  }
}

It's possible for one type to have many implementations - for example, ERC20.

A type library would specify common interfaces that implementations can conform to, and allow consumers to know what implementations support those interfaces.

This EIP claims to be a 'decentralised type system', but it lacks any distinction between a type and implementations of that type. Based on our back-and-forward, it seems like what you're really trying to build is a repository of library code. Which is fine - but then you should rename the EIP and reword accordingly.

Can you qualify 'more secure'? What's your threat model?

On-chain calculation is the standard. Any off-chain calculation may have a faulty implementation. You need a standard to compare off-chain implementations - why is this unclear? The threat model is the same as for blockchain vs. off-chain data & behavior.

I don't see how this is a security issue. If you write a standard for how to generate a hash from a type specification, you can provide standard test vectors, and anyone can implement that. Changing how you store data onchain so that you can do that inside Solidity buys you some convenience, but at the cost of a lot of additional storage overhead and gas costs.

It's a bytes32 identifier. Any storage solution that is compatible with bytes32 will do. If multiple storages are used, then we need (at least) another field to describe which one is used. I wanted to keep it simple for now and open to suggestions.

Assuming the goal is for a consumer to be able to fetch the content associated with this field, the data you have here is insufficient for that, unless you specify that it must, for example, refer to a Swarm manifest content hash. If you leave this unspecified, people will use it inconsistently, and nobody will be able to write code to fetch it with confidence that it will actually work.

If you do not want to specify it here, you should leave it out - anything else will result in useless overhead, because it is underspecified.

@chriseth
Copy link
Contributor

chriseth commented Sep 3, 2019

I think Ethereum is in need of more coupling between the on-chain and the off-chain world, especially, but not limited to, the connection between the high- or source-level concepts of a smart contract and the on-chain bytecode part. Being able to directly refer to the type of a contract (or a struct being defined in that contract) residing at a certain address would be really nice. Due to the metadata hash, this should in principle be possible, but we still do not have an easy to use storage solution.

I have the feeling that this proposal could be slimmed down a little, or at least could be specified in different "feature stages" and I'm not sure whether such data should be stored in storage, but it is certainly going in the right direction.

Also, I see the point that a decentralized type system is only half the fun without support from compilers. The problem is that we drew the boundary for the Solidity compiler at bytecode generation, i.e. excluding deployment and excluding any kind of networking access. The benefit here is that it makes the compiler more stable and the builds reproducible, but it makes features like on-chain type discovery or type-auto-registration impossible as a central language feature. Maybe we could discuss how type registries could be used in Solidity through "compiler drivers" by supplying the necessary data to the still offline and deterministic compiler.

@loredanacirstea
Copy link
Contributor Author

loredanacirstea commented Sep 10, 2019

@Arachnid, in response to your comment:

All I am asking for is a practical end to end example of how someone would use DType - how you define a type, implement it, and consume it.

Note: these proposals are not optimized in terms of storage or gas costs.

A) EIP-1900:

Define/implement the types in a library:

library GeopointLib {

    struct Longitude {
        int32 longitude;
    }
    
    struct Latitude {
        int32 latitude;
    }

    struct Geopoint {
        Longitude longitude;
        Latitude latitude;
        bytes32 identifier;
        // other fields
    }
    
    // type rules for each type, if needed
    // other helper functions e.g. insert(self, Geopoint memory geopoint) ...
}

Register the type with dType, by sending this info to the type registry:

{
      "name": "Longitude",
      "types": [
          {"name": "int32", "label": "longitude", "relation": 0, "dimensions":[]},
      ],
      "lang": 0,
      "typeChoice": 0,
      "contractAddress": "0x0000000000000000000000000000000000000001",
      "source": "0x0000000000000000000000000000000000000000000000000000000000000001"
}
{
      "name": "Latitude",
      "types": [
          {"name": "int32", "label": "latitude", "relation": 0, "dimensions":[]},
      ],
      "lang": 0,
      "typeChoice": 0,
      "contractAddress": "0x0000000000000000000000000000000000000001",
      "source": "0x0000000000000000000000000000000000000000000000000000000000000001"
  }
{
      "name": "Geopoint",
      "types": [
          {"name": "Longitude", "label": "longitude", "relation": 0, "dimensions":[]},
          {"name": "Latitude", "label": "latitude", "relation": 0, "dimensions":[]},
          {"name": "bytes32", "label": "identifier", "relation": 0, "dimensions":[]}
      ],
      "lang": 0,
      "typeChoice": 0,
      "contractAddress": "0x0000000000000000000000000000000000000001",
      "source": "0x0000000000000000000000000000000000000000000000000000000000000001"
}

Any dev can now use the type by importing the Geopoint type library:

import 'GeopointLib.sol';

contract DevContract {
    using GeopointLib for GeopointLib.Longitude;
    using GeopointLib for GeopointLib.Latitude;
    using GeopointLib for GeopointLib.Geopoint;
}

2) EIP-1921 - Functions Extension

Implement functions that can handle the type, e.g.:

library GeopointUtils {
    calculateDistance(
        Geopoint memory geopoint1,
        Geopoint memory geopoint2
    )
        pure
        public
        returns(Distance memory distance)
    {
        // ...
    }
}

, where Distance is another type, e.g.:

library GeoMathLib {
    struct Distance {
        uint256 meters;
    }
}

The calculateDistance function is registered with dType"

{
      "name": "calculateDistance",
      "types": [
          {"name": "Geopoint", "label": "geopoint1", "relation": 0, "dimensions":[]},
          {"name": "Geopoint", "label": "geopoint2", "relation": 0, "dimensions":[]},
      ],
      "outputs": [
          {"name": "Distance", "label": "distance", "relation": 0, "dimensions":[]},
      ],
      "lang": 0,
      "typeChoice": 0,
      "contractAddress": "0x0000000000000000000000000000000000000002",
      "source": "0x0000000000000000000000000000000000000000000000000000000000000002"
}

Devs can use it:

import 'GeopointUtils.sol';

contract DevContract {
    using GeopointLib for GeopointLib.Geopoint;
    using GeoMathLib for GeopointLib.Distance;
    using GeopointUtils for GeopointUtils.Geopoint;
    
    
    // in a function
        geo1.calculateDistance(geo2)
}

3) EIP-2157 - Storage Extension

Optionally, devs can define a storage contract for Geopoint, referenced in the dType registry, in contractAddress. E.g.:

contract GeopointStorage is StorageBase {
    using GeopointLib for GeopointLib.Geopoint;

    mapping(bytes32 => Type) public typeStruct;

    struct Type {
        GeopointLib.Geopoint data;
        uint256 index;
    }
    
    function insert(GeopointLib.Geopoint memory data) public returns (bytes32 hash) {}
    function update(bytes32 hashi, GeopointLib.Geopoint memory data) public returns(bytes32 hash) {}
    function remove(bytes32 hash) public returns(uint256 index) {}
    function isStored(bytes32 hash) public view returns(bool isIndeed) {}
    function getByHash(bytes32 hash) public view returns(GeopointLib.Geopoint memory data) {}
    
}

The ABI for this contract is deterministic - we know the Geopoint type's ABI from dType and the function names and arguments are standardized.

Any dev can reuse this contract for storing data.

4) EIP-2193 - Alias

With EIP-1900 & EIP-2157, you can have human readable identifiers for each data item. E.g. Geopoint.Berlin, Bob@Geopoint.

Regarding the comment:

It's not clear to me why you need human names as primary identifiers at all. Why not use the typehash, just like Solidity does for function signatures? This addresses both mutability and name collisions. If you must have a human readable identifier, you could use ENS, for instance, to point a human readable name to a typehash.

We need a way to retrieve and cache the types for use in Solidity editors. Using ENS for resolution & reverse resolution for each type adds complexity.

This makes ENS obsolete, at least for our purposes and for the purpose of having addressability on fine-grained data items.

Conclusion:

A type library is produced by developers and consumed by developers.

The only thing that is consumed by a non-dev is storage data. Storage data can be consumed by other projects and even end-users (see https://youtu.be/zcq2di8QIUE?t=143).

@loredanacirstea
Copy link
Contributor Author

To expand what the dType registry could mean for a developer:

  1. Anyone can cache & index the content of the registry and provide it as an editor plugin for Solidity files, for autocompletion.
  2. A typed Solidity -> Solidity transpiler can be built (we are working on a PoC). This is how it could work:

The ABI of the type itself & the functions can be recomposed entirely from the chain, from the dType registry. The transpiler could compose the structs needed for the typed code to work.

For Geopoint, it is even enough to have the structs subtypes:

struct Lng {int32 lng;}
struct Lat {int32 lat;}
struct Geo {
    Lng lng;
    Lat lat;
    bytes32 identifier
}

And this Geo type can be used with any libraries that know how to handle Geopoint, without importing the type definition's library.

Even without importing other function libraries. E.g. calculateDistance, can be transformed into a delegatecall / staticcall, based on the ABI of calculateDistance, which is also on chain. Or an interface can be deterministically created in the background, to expose this function’s ABI: interface GeopointUtils { function calculateDistance….}.

Devs would only see nice code & human-readable names.

As to why devs should walk the extra mile and use dType:

  • they make savings on deployment gas
  • can make more complex contracts within the bytecode size limit
  • they can quickly find & reuse libraries -> interoperability & battle-tested code
  • they can extend functionality for a type & contribute to the ecosystem
  • they can consume data produced by others, for that type
  • benefit from higher-order functions developed for that type
  • present their source code in a public, unified manner

@Arachnid
Copy link
Contributor

@loredanacirstea Your examples involve writing types and pushing them to the registry, but not actually querying the registry. What purpose does the registry actually serve here? Can you give an example of how a developer would consume data from the registry?

{
      "name": "Longitude",
      "types": [
          {"name": "int32", "label": "longitude", "relation": 0, "dimensions":[]},
      ],
      "lang": 0,
      "typeChoice": 0,
      "contractAddress": "0x0000000000000000000000000000000000000001",
      "source": "0x0000000000000000000000000000000000000000000000000000000000000001"
}

What are the lang, typeChoice, contractAddress and source fields supposed to hold? There's no deployed contract for this type, so it's difficult to see how the contractAddress and source fields would be useful. The types information conveys the ABI, which contains all the necessary information about this type.

@loredanacirstea
Copy link
Contributor Author

@Arachnid, these short videos show how smart contract developers could consume the dType registry data by write dTyped Solidity with the transpiler mentioned above: https://youtu.be/pBsual6FogE (type definitions), https://youtu.be/dpIVOYlAWrY (type definitions + function types).

@github-actions
Copy link

There has been no activity on this issue for two months. It will be closed in a week if no further activity occurs. If you would like to move this EIP forward, please respond to any outstanding feedback or add a comment indicating that you have addressed all required feedback and are ready for a review.

@github-actions github-actions bot added the stale label Nov 20, 2021
@github-actions
Copy link

github-actions bot commented Dec 4, 2021

This issue was closed due to inactivity. If you are still pursuing it, feel free to reopen it and respond to any feedback or request a review in a comment.

@github-actions github-actions bot closed this as completed Dec 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants