Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERC-1900: Decentralized Type System for EVM #1882

Open
loredanacirstea opened this issue Mar 28, 2019 · 18 comments

Comments

Projects
None yet
5 participants
@loredanacirstea
Copy link
Contributor

commented Mar 28, 2019

Draft can be found at #1900

eip: 1900
title: dType - Decentralized Type System for EVM
author: Loredana Cirstea (@loredanacirstea), Christian Tzurcanu (@ctzurcanu)
discussions-to: https://github.com/ethereum/EIPs/issues/1882
status: Draft
type: Standards Track
category: ERC
created: 2019-03-28

Simple Summary

The EVM and related languages such as Solidity need consensus on an extensible Type System in order to further evolve into the Singleton Operating System (The World Computer).

Abstract

We are proposing a decentralized Type System for Ethereum, to introduce data definition (and therefore ABI) consistency. This ERC focuses on defining an on-chain Type Registry (named dType) and a common interface for creating types, based on structs.

Motivation

In order to build a network of interoperable protocols on Ethereum, we need data standardization, to ensure a smooth flow of on-chain information. Off-chain, the Type Registry will allow a better analysis of blockchain data (e.g. for blockchain explorers) and creation of smart contract development tools for easily using existing types in a new smart contract.

However, this is only the first phase. As defined in this document and in the future proposals that will be based on this one, we are proposing something more: a decentralized Type System with Data Storage. This means each Type can have a Storage Contract that stores data entries that are of that type. These data entries can originate from different protocols or developers and are aggregated within the same smart contract. In addition, developers can create libraries of pure functions that know how to interact and modify the data entries. This will effectively create the base for a general functional programming system on Ethereum, where developers can use previously created building blocks.

To summarize:

  • We would like to have a good decentralized medium for integrating all Ethereum data, and relationships between the different types of data. Also, a way to address the behavior related to each data type.
  • Functional programming becomes easier. Functions like map, reduce, filter, are implemented by each type library.
  • Solidity development tools could be transparently extended to include the created types (For example in IDEs like Remix). At a later point, the EVM itself can have precompiled support for these types.
  • The system can be easily extended to types pertaining to other languages. (With type definitions in the source (Swarm stored source code in the respective language))
  • The dType database should be part of the System Registry for the Operating System of The World Computer

Specification

The Type Registry can have a governance protocol for its CRUD operations. However, this, and other permission guards are not covered in this proposal.

Type Definition and Metadata

The dType registry should support the registration of Solidity's elementary and complex types. In addition, it should also support contract events definitions. In this EIP, the focus will be on describing the minimal on-chain type definition and metadata needed for registering Solidity user-defined types.

Type Definition: TypeLibrary

A type definition consists of a type library containing:

  • the nominal struct used to define the type
  • additional functions used to:
    • check whether a given variable is an instance of the defined type
    • enforce other type-specific rules
    • provide HOFs such as map, filter, reduce
    • provide type structuring and destructuring. This can be useful for low-level calls or assembly code, when importing contract interfaces is not an efficient option.

A simple example is:

pragma solidity ^0.5.0;
pragma experimental ABIEncoderV2;

library myBalanceLib {

    struct myBalance {
        string accountName;
        uint256 amount;
    }

    function structureBytes(bytes memory data) pure public returns(myBalance memory balance)

    function destructureBytes(myBalance memory balance) pure public returns(bytes memory data)

    function map(
        address callbackAddr,
        bytes4 callbackSig,
        myBalance[] memory balanceArr
    )
        view
        internal
        returns (myBalance[] memory result)
}

Types can also use existing types in their composition. However, this will always result in a directed acyclic graph.

library myTokenLib {
    using myBalanceLib for myBalanceLib.myBalance;

    struct myToken {
        address token;
        myBalanceLib.myBalance;
    }
}

Type Metadata: dType Registry

Type metadata will be registered on-chain, in the dType registry contract. This consists of:

  • name - the type's name, as it would be used in Solidity; it can be stored as a string or encoded as bytes. The name can have a human-readable part and a version number.
  • typeChoice - used for storing additional ABI data that differentiate how types are handled on and off chain. It is defined as an enum with the following options: BaseType, PayableFunction, StateFunction, ViewFunction, PureFunction, Event
  • contractAddress - the Ethereum address of the TypeRootContract. For this proposal, we can consider the Type Library address as the TypeRootContract. Future EIPs will propose additional TypeStorage contracts that will modify the scope of contractAddress.
  • source - a bytes32 Swarm hash where the source code of the type library and contracts can be found; in future EIPs, where dType will be extended to support other languages (e.g. JavaScript, Rust), the file identified by the Swarm hash will contain the type definitions in that language.
  • types - metadata for subtypes: the first depth level internal components. This is an array of objects (structs), with the following fields:
    • name - the subtype name, of type string, similar to the above name definition
    • label - the subtype label
    • dimensions - string[] used for storing array dimensions. E.g.:
      • [] -> TypeA
      • [""] -> TypeA[]
      • ["2"] -> TypeA[2]
      • ["",""] -> TypeA[][]
      • ["2","3"] -> TypeA[2][3]

Examples of metadata, for simple, value types:

{
  "contractAddress": "0x0000000000000000000000000000000000000000",
  "typeChoice": 0,
  "source": "0x0000000000000000000000000000000000000000000000000000000000000000",
  "name": "uint256",
  "types": []
}

{
  "contractAddress": "0x0000000000000000000000000000000000000000",
  "typeChoice": 0,
  "source": "0x0000000000000000000000000000000000000000000000000000000000000000",
  "name": "string",
  "types": []
}

Composed types can be defined as:

{
  "contractAddress": "0x105631C6CdDBa84D12Fa916f0045B1F97eC9C268",
  "typeChoice": 0,
  "source": <a SWARM hash for type source code files>,
  "name": "myBalance",
  "types": [
    {"name": "string", "label": "accountName", dimensions: []},
    {"name": "uint256", "label": "amount", dimensions: []}
  ]
}

Composed types can be further composed:

{
  "contractAddress": "0x91E3737f15e9b182EdD44D45d943cF248b3a3BF9",
  "source": <a SWARM hash for type source code files>,
  "name": "myToken",
  "types": [
    {"name": "address", "label": "token", dimensions: []},
    {"name": "myBalance", "label": "balance", dimensions: []}
  ]
}

myToken type will have the final data format: (address,(string,uint256)) and a labeled format: (address token, (string accountName, uint256 amount)).

dType Registry Data Structures and Interface

To store this metadata, the dType registry will have the following data structures:

enum TypeChoices {
    BaseType,
    PayableFunction,
    StateFunction,
    ViewFunction,
    PureFunction,
    Event
}

struct dTypes {
    string name;
    string label;
    string[] dimensions;
}

struct dType {
    TypeChoices typeChoice;
    address contractAddress;
    bytes32 source;
    string name;
    dTypes[] types;
}

For storage, we propose a pattern which isolates the type metadata from additional storage-specific data and allows CRUD operations on records. This pattern has been described in detail here: https://medium.com/robhitchens/solidity-crud-part-1-824ffa69509a.

// key: typeHash
mapping(bytes32 => Type) public typeStruct;

// array of typeHashes
bytes32[] public typeIndex;

struct Type {
  dType data;
  uint256 index;
}

Note that we are proposing to define the type's primary identifier, typeHash, as keccak256(abi.encodePacked(name)). If the system is extended to other programming languages, we can define typeHash as keccak256(abi.encodePacked(language, name)).
Initially, single word English names can be disallowed, avoiding name squatting.

The dType registry interface is:

interface dType {
    event LogNew(bytes32 indexed hash, uint256 indexed index);
    event LogUpdate(bytes32 indexed hash, uint256 indexed index);
    event LogRemove(bytes32 indexed hash, uint256 indexed index);

    function insert(dTypeLib.dType memory data) external returns (bytes32 dataHash)

    function remove(bytes32 typeHash) external returns(uint256 index)

    function count() external view returns(uint256 counter)

    function getTypeHash(string memory name) pure external returns (bytes32 typeHash)

    function getByHash(bytes32 typeHash) view external returns(Type memory dtype)

    function get(string memory name) view external returns(Type memory dtype)

    function isType(bytes32 typeHash) view external returns(bool isIndeed)
}

Notes:

To ensure backward compatibility, we suggest that updating types should not be supported.

The remove function can also be removed from the interface, to ensure immutability. One reason for keeping it would be clearing up storage for types that are not in use or have been made obsolete. However, this can have undesired effects and should be accompanied by a solid permissions system, testing and governance process. This part will be updated when enough feedback has been received.

Rationale

The Type Registry must store the minimum amount of information for rebuilding the type ABI definition. This allows us to:

  • support on-chain interoperability
  • decode blockchain side effects off-chain (useful for block explorers)
  • allow off-chain tools to cache and search through the collection (e.g. editor plugin for writing typed smart contracts)

There is one advantage that has become clear with the emergence of global operating systems, like Ethereum: we can have a global type system through which the system’s parts can interoperate. Projects should agree on standardizing types and a type registry, continuously working on improving them, instead of creating encapsulated projects, each with their own types.

The effort of having consensus on new types being added or removing unused ones is left to the governance system.

After the basis of such a system is specified, we can move forward to building a static type checking system at compile time, based on the type definitions and rules stored in the dType registry.

The Type Library must express the behavior strictly pertinent to its defined type. Storage patterns for type instances will be covered in a future ERC and must abide by the same rule. Additional behavior, required by various project's business logic can be added later, through libraries containing functions that handle the respective type. These can also be registered in dType, but will be detailed in a future ERC.

This is an approach that will separate definitions from stored data and behavior, allowing for easier and more secure fine-grained upgrades.

Backwards Compatibility

This proposal does not affect extant Ethereum standards or implementations. It uses the present experimental version of ABIEncoderV2.

Test Cases

Will be added.

Implementation

An in-work implementation can be found at https://github.com/pipeos-one/dType/tree/master/contracts/contracts.
This proposal will be updated with an appropriate implementation when consensus is reached on the specifications.

A video demo of the current implementation (a more extended version of this proposal) can be seen at https://youtu.be/pcqi4yWBDuQ.

Copyright

Copyright and related rights waived via CC0.

@loredanacirstea loredanacirstea changed the title ERC-xxxx Decentralized Type System for EVM ERC-1882 Decentralized Type System for EVM Mar 28, 2019

@OFRBG

This comment has been minimized.

Copy link

commented Mar 29, 2019

Kinda related: https://github.com/ewasm/design. I don't think it's desired to add overhead to L1 and L0. This would work better as a "TypeScript" for Solidity. Remix already includes some static analysis, which could be extended for an extended-type Solidity.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

I don't think it's desired to add overhead to L1 and L0.

What is the overhead that you see? (so, I can properly answer)

@OFRBG

This comment has been minimized.

Copy link

commented Mar 29, 2019

Adding extra lines of code of structs and memory allocation. IMHO Solidity code should be as short as possible, and higher level abstractions, such as types and pseudo-HOF should be used in a different dialect that finally compiles to native Solidity.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Mar 30, 2019

@OFRBG ,
Regarding overhead: Solidity itself is an overhead over the bytecode. The question is not if an overhead exists, but whether it is justified or not. So, what did you understand as being the benefit of having a decentralized type system and why doesn't it justify this overhead? (I just published https://medium.com/@loredana.cirstea/a-vision-of-a-system-registry-for-the-world-computer-be1dc2da7cae if you want to read more about the vision).

In order to achieve functional programming, you need higher order functions (HOFs) in Solidity. You can have dType itself without the overhead of adding HOFs, but if multiple projects need the same libraries, it is an overhead to not standardize.

@kjekac

This comment has been minimized.

Copy link

commented Mar 30, 2019

I really really like the general idea of this. Having a type system decoupled from the contract language would make language interoperability (and hence language experimentation) much easier, and generally just make all the data stored on-chain much easier to compute with.

However, I would personally want to detach this from anything C-like (struct, ...) and go straight to proper type theory, which would make things less language-bound, as well as giving us nice properties for free. For example it would be relatively straight forward to find a normal form of a particular type, and thus be able to automatically convert data of one type to another, even though they were defined differently syntactically. It would also generally be easier to prove things about contracts, given that you would have fewer cases and very clear semantics.

I previously worked on a project called Typedefs and one of the long-term ideas there is to put it on e.g. IPFS as a kind of "global type system" that any data in any computing system can reference, to tell people/programs how it can be deconstructed and used. The whole project is built on an extremely simple core, it doesn't give you any primitive types except for Unit and Void (both in the functional programming sense, so not the C-like void), but using these together with combinators and recursion, you can build up types that are isomorphic to basically any type you could wish for. For example, in pseudocode:

Boolean := Unit + Unit
Char8 := Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean
String := (Char8 × String) + Unit

Of course, this is horribly inefficient in itself, but when writing a backend for a specific language/platform, you can define specializations to utilize the primitives and/or standard library types that you have available, as long as you can provide encode/decode functions between these and this minimal/universal representation. This makes the type system itself as language agnostic as possible, while still maintaining the full power of any modern type system. Building a backend for Solidity itself should be fairly trivial, the interesting thing would be how to make it "blockchain-aware", so to speak. I don't have time at the moment but will try to get back with a few thoughts on how to go about that in the coming week.

@wires

This comment has been minimized.

Copy link

commented Mar 31, 2019

Great comment by @kjekac, really well explained!

I see the comments regarding overhead at the lower levels, but let me try to twist that kind of thinking a bit.

Something as fundamental as the types of the inputs and outputs to functions can have a huge impact on the complexity at the higher levels. We well chosen type theory can constrain behaviour and maintain hold on complexity. An unfortunate flaw in the design and you feel it all over the place (https://developers.slashdot.org/story/09/03/03/1459209/Null-References-the-Billion-Dollar-Mistake).

Let me know if you have any questions, I second that it is a good idea and that typedefs can be applied to this.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Apr 1, 2019

@kjekac ,

For dType, our focus was:

  • full Ethereum compatibility
  • extensibility to other languages
  • concept-oriented rather than type-oriented
  • global consensus on types and data formats
  • easily determining the type of a transaction side effect or constant function value

There are some type theory features that dType does not have, due to Solidity's limitations:

  • converting a dtype into another dtype with different structure
  • choice operator
  • void and unit types
  • direct type inheritance (you have composability instead)
  • recursivity of type definition

However, I would personally want to detach this from anything C-like (struct, ...) and go straight to proper type theory, which would make things less language-bound, as well as giving us nice properties for free.

I understand why this would be great.
But for dType, we actually wanted it to be fully Ethereum compatible, so you can do things programmatically on-chain (e.g. our functional programming video example: https://youtu.be/pcqi4yWBDuQ). dType stores the minimum usable ABI definition information.

We also wanted the type to be contained in the ABI description. Meaning, for example, that anyone should be able to determine the type (& metadata, implementation libraries etc.) of a return value given that value, ABI definition & Type Registry.
This can be achieved with structs, but I am not sure how it could be achieved (and with what effort) with non-native types, that are transpiled to Solidity. Maybe you can help with an example.

For example it would be relatively straight forward to find a normal form of a particular type, and thus be able to automatically convert data of one type to another, even though they were defined differently syntactically.

Yes, with dType you cannot convert from one type to another (only dtype <-> bytes). However, our approach is more concept-oriented, even more that type-oriented. In a way, we want to incentivize consensus on concept definitions across projects and convertibility was not a focus.

I previously worked on a project called Typedefs and one of the long-term ideas there is to put it on e.g. IPFS as a kind of "global type system" that any data in any computing system can reference, to tell people/programs how it can be deconstructed and used.

We thought of the same thing - extending the system to any language - our current implementation has an additional lang descriptor for each type -https://github.com/ctzurcanu/dType/blob/5e71ee683a167bd1b796f6ea07c41407be54aa0f/contracts/contracts/dTypeLib.sol#L6-L9. The idea was to also provide type checking against the blockchain (e.g. isType(type_name) & destructure(type_value) are some initial tools that we have implemented).

And we have bytes32 source, which is the Swarm hash for the type's source code - either Solidity libraries and contracts or source code in other languages.

Boolean := Unit + Unit
Char8 := Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean × Boolean
String := (Char8 × String) + Unit

While we have x (product/tuple) - structs are essentially packed tuples, we don't support + (co-product/choice operator), so we do not have optional type components for example - and I do not see an easy/efficient way of supporting this. If you have a solution (for Solidity), we would like to discuss it.

the interesting thing would be how to make it "blockchain-aware", so to speak. I don't have time at the moment but will try to get back with a few thoughts on how to go about that in the coming week.

I would be interested in this, thank you.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Apr 6, 2019

A draft has been submitted as a PR: #1900
An extension to this proposal can be found at #1921
I published a new blog post dType — Decentralized Type System & Functional Programming on Ethereum, explaining some of the concepts.

@loredanacirstea loredanacirstea changed the title ERC-1882 Decentralized Type System for EVM ERC-1900: Decentralized Type System for EVM Apr 11, 2019

@Arachnid

This comment has been minimized.

Copy link
Collaborator

commented Jun 26, 2019

A few thoughts:

  • The spec could use a clear definition of what a type is, and what it's composed of. As it's written we're left to infer that from the data structures.
  • It's not clear to me why you'd want a key/value datastore for each type you define (the 'Storage Contract'). What's the motivation here?
  • One interface can have many implementations, but there doesn't seem to be any distinction made between the two here.
  • Given that it seems likely that the database will mostly be read not by other contracts but by offchain tools, it would make sense to use a compact encoding scheme such as CBOR or Protocol Buffers to encode type information, for efficient storage.
  • You should define the purpose of the fields in the structs where they're first encountered, not down the bottom.
  • Making the type registry mutable introduces a large amount of additional complexity. Why not make it immutable, so that a type identifier can be guaranteed to be a stable reference to a type?
  • What is a type library for, and why does it have to serialize and deserialize types?
  • Requiring each type to be accompanied by a 'type root contract' seems like a lot of overhead, that will discourage defining new types.
  • The 'source' field needs more definition. What language is it? Where can it be found?
  • "Initially single word names will be disallowed, to avoid name squatting" - are human-readable names going to be primary identifiers here? Can you specify that explicitly somewhere? Why is it necessary that a human-readable type name be unique (or even part of the canonical definition of the type)?

loredanacirstea added a commit to loredanacirstea/EIPs that referenced this issue Jun 27, 2019

Update ERC-1900 after comment suggestions
Updating ERC-1900 after suggestions from ethereum#1882 (comment):
- clearer explanation of what a type is and what it's composed of
- defining the purpose of the struct fields when they are first encountered
- mentioning type immutability and the possibility of removing the dType `remove` function
- adding more information about the `source` field
- mentioning human readable names as a primary identifier for types

Additionally:
- removed the TypeStorage contract description, postponing it for a future EIP, due to multiple storage patterns being researched

loredanacirstea added a commit to loredanacirstea/EIPs that referenced this issue Jun 27, 2019

Update ERC-1900 after comment suggestions
Updating ERC-1900 after suggestions from ethereum#1882 (comment):
- clearer explanation of what a type is and what it's composed of
- defining the purpose of the struct fields when they are first encountered
- mentioning type immutability and the possibility of removing the dType `remove` function
- adding more information about the `source` field
- mentioning human readable names as a primary identifier for types

Additionally:
- removed the TypeStorage contract description, postponing it for a future EIP, due to multiple storage patterns being researched
@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

@Arachnid, thank you for the feedback!

I updated the draft and restructured it, to solve the following points from #1882 (comment):

  • (1) clearer explanation of what a type is and what it's composed of
  • (2) I removed the TypeStorage contract description, postponing it for a future ERC (currently researching multiple storage patterns)
  • (5) defining the purpose of the struct fields when they are first encountered
  • (6) mentioning type immutability and the possibility of removing the dType remove function
  • (7) clearer explanation regarding the type library and structureBytes/destructureBytes
  • (8) TypeRootContract will be the type library address in the context of this ERC.
  • (9) adding more information about the source field
  • (10) mentioning human-readable names (+ version number) as a primary identifier for types

Replying to your other points:

(2) It's not clear to me why you'd want a key/value datastore for each type you define (the 'Storage Contract'). What's the motivation here?

We would like to make Solidity more functional and that means that the data should be harmonized (well formatted) and kept in the same place.
We are thinking about other storage patterns as well - instead of one smart contract, distributed among many user contracts.

(3) One interface can have many implementations, but there doesn't seem to be any distinction made between the two here.

Are you referring to the dType registry interface and implementation or to the type library?

(4) Given that it seems likely that the database will mostly be read not by other contracts but by offchain tools, it would make sense to use a compact encoding scheme such as CBOR or Protocol Buffers to encode type information, for efficient storage.

We expect most of the data to be used on-chain as well.
There will be exceptions where a type field will be bytes, using a compact encoding scheme: when we are defining a type for another language than Solidity. In this case, that type will be fully defined (in an un-encoded way, with all sub-fields) in the Swarm source file referenced in source.

(6) Making the type registry mutable introduces a large amount of additional complexity. Why not make it immutable, so that a type identifier can be guaranteed to be a stable reference to a type?

That is a good idea.
Immutability will follow once it becomes precompiled. It may also save a significant amount of gas. But that is part of subsequent ERCs

(8) Requiring each type to be accompanied by a 'type root contract' seems like a lot of overhead, that will discourage defining new types.

It is a lot of overhead involved. But also makes things very well defined in case the community votes the inclusion of the new types into precompiles. Afterwards, the overhead shrinks considerably. Nevertheless, we are defaulting the TypeRootContract address to the type library address for this ERC and moving this discussion to a future ERC detailing storage patterns for type data.

(9) The 'source' field needs more definition. What language is it? Where can it be found?

language and source fields were to be treated in detail in a future ERC. language = 0 for Solidity and it will be an enum. source is a pointer to the source code in the set language on Swarm or another distributed file system.

(10) "Initially single word names will be disallowed, to avoid name squatting" - are human-readable names going to be primary identifiers here? Can you specify that explicitly somewhere? Why is it necessary that a human-readable type name be unique (or even part of the canonical definition of the type)?

For naming, we should probably adopt capitalized camelback standard. The id is calculated as a keccak(language, name) so we prepare for future adoption of data bridges between types in different languages that have the same name. We would like to treat declarations in Solidity of the form:

TypeName varOfType = <instance data>;

A search on the dType registry would be:

 // language 0 is Solidity
find({name= “TypeName”, language= 0})`

This search will return the address of the library, the id of the type, set the correct data structure. Potentially a precompile could be run by a new opcode to do the same thing (not covered in this ERC).

@Arachnid

This comment has been minimized.

Copy link
Collaborator

commented Jun 27, 2019

We would like to make Solidity more functional and that means that the data should be harmonized (well formatted) and kept in the same place.
We are thinking about other storage patterns as well - instead of one smart contract, distributed among many user contracts.

What does this mean? Can you give an example use-case?

(3) One interface can have many implementations, but there doesn't seem to be any distinction made between the two here.

Are you referring to the dType registry interface and implementation or to the type library?

No, I'm talking about the distinction between a type definition and an implementation. For example, the ERC20 standard defines some types, which are implemented by a large number of different contracts. Capturing the ability for a type to have many implementations seems like a pretty basic feature to support.

We expect most of the data to be used on-chain as well.

Can you give an example use-case for using type data onchain?

Immutability will follow once it becomes precompiled. It may also save a significant amount of gas. But that is part of subsequent ERCs

But why support mutable types at all? Once it changes, it stops being the same type; the type's fields etc are fundamental to what it is. Any change also likely breaks compatibility with anything using it.

language and source fields were to be treated in detail in a future ERC. language = 0 for Solidity and it will be an enum. source is a pointer to the source code in the set language on Swarm or another distributed file system.

If you don't define how these are used in the ERC where they're declared, it seems likely that you'll never be able to use them coherently, because there will be no universal expectation over the content stored in them.

For naming, we should probably adopt capitalized camelback standard. The id is calculated as a keccak(language, name) so we prepare for future adoption of data bridges between types in different languages that have the same name. We would like to treat declarations in Solidity of the form:

It's not clear to me why you need human names as primary identifiers at all. Why not use the typehash, just like Solidity does for function signatures? This addresses both mutability and name collisions. If you must have a human readable identifier, you could use ENS, for instance, to point a human readable name to a typehash.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Jun 29, 2019

@Arachnid ,

What does this mean? Can you give an example use-case?

Published an in-work draft #2158 for the storage extension, to clarify motivation. Discussions at #2157.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Jul 1, 2019

@Arachnid ,

No, I'm talking about the distinction between a type definition and an implementation. For example, the ERC20 standard defines some types, which are implemented by a large number of different contracts. Capturing the ability for a type to have many implementations seems like a pretty basic feature to support.

Devs have the freedom to implement type helper functions, as long as the required ones are implemented (to be discussed). As for the definition, I am open to other proposals that not necessarily based on structs. I am actually trying to have optional subtypes, that can be stored with map. dType registry will be extended with an additional optionals field in the dType struct & a standardized way to define optionals in the Type Library, but this is not ready yet. Other than this, what can I do more to give more freedom to the implementation?

Can you give an example use-case for using type data onchain?

  • the registry must check that a type's subtypes are already part of the registry, so it must contain references to them.
  • on-chain calculation of a type's signature, given the type identifier (especially for functions), is more secure and should be the go-to reference

I have some in-work examples in the dType repo with on-chain permissions control based on the dType registry identifiers. E.g. fine-grained function permissions, that can also be used in the storage contracts. And in-work patterns for functional programming, which use function dType identifiers to mimic more complex HOFs.

But why support mutable types at all? Once it changes, it stops being the same type; the type's fields etc are fundamental to what it is. Any change also likely breaks compatibility with anything using it.

I removed the update function because it was an artifact from a previous version. Indeed, types should not be modified. I left the remove function in place, for discussion. Is this ok from your side?

If you don't define how these are used in the ERC where they're declared, it seems likely that you'll never be able to use them coherently because there will be no universal expectation over the content stored in them.
(language and source fields)

I started a draft at https://github.com/loredanacirstea/EIPs/blob/d6fbbff5f1a1ecfa1eee6f8efa4ca3d896303e38/EIPS/eip-dtype_language.md with details. I will make a PR soon. We wanted to separate the ERCs because some devs may agree with dType core but not with the language extension. The source field has a purpose by itself in ERC-1900, as specified. Let us know what you think.

It's not clear to me why you need human names as primary identifiers at all. Why not use the typehash, just like Solidity does for function signatures? This addresses both mutability and name collisions. If you must have a human readable identifier, you could use ENS, for instance, to point a human readable name to a typehash.

Our initial version was using keccak256(language, name, types) as an identifier. The drawbacks of this:

  • the same type might have some differences when used cross-language (see language extension draft, at "Encoded Types for Language-specific Use")
  • devs might want to get the type definition (& ABI) by name. I can see this used in editor plugins, that can have a small cache for type names without storing the entire data. Also, when searching for a type, duplicate names can be confusing - names should be properly defined after their purpose & content + a version number (specs TBD).
@Arachnid

This comment has been minimized.

Copy link
Collaborator

commented Jul 1, 2019

Devs have the freedom to implement type helper functions, as long as the required ones are implemented (to be discussed). As for the definition, I am open to other proposals that not necessarily based on structs. I am actually trying to have optional subtypes, that can be stored with map. dType registry will be extended with an additional optionals field in the dType struct & a standardized way to define optionals in the Type Library, but this is not ready yet. Other than this, what can I do more to give more freedom to the implementation?

I still think you're not understanding the difference between an interface and an implementation. An interface describes an API for other code to interact with, but not how it's implemented. What you're proposing here seems to be more along the lines of a directory of library code.

the registry must check that a type's subtypes are already part of the registry, so it must contain references to them.

If a type's ID is based on a serialization of its fields, then types are immutable and this doesn't matter; types can be inserted into the registry in any order.

on-chain calculation of a type's signature, given the type identifier (especially for functions), is more secure and should be the go-to reference

Can you qualify 'more secure'? What's your threat model?

I removed the update function because it was an artifact from a previous version. Indeed, types should not be modified. I left the remove function in place, for discussion. Is this ok from your side?

Removing a type seems like it would cause chaos if it's in use somewhere, and still requires you to maintain a permission model.

The source field has a purpose by itself in ERC-1900, as specified. Let us know what you think.

EIP 1900 currently says:

source - a bytes32 Swarm hash where the source code of the type library and contracts can be found; in future EIPs, where dType will be extended to support other languages (e.g. JavaScript, Rust), the file identified by the Swarm hash will contain the type definitions in that language.

I see several problems with this:

  • It ties the system indelibly to Swarm, and to Keccak256 hashing.
  • It doesn't indicate a content-type, or any other metadata.
  • There's no clear mechanism for someone to determine what language the code is in, or what format the file is.

I'm also not sure what the motivation here is; why would anyone need to fetch the source in this context?

It seems to me that this spec is a long way away from being a simple distributed type registry. It has a lot of unnecessary complexity, and doesn't make a clear distinction between interfaces and implementations. I wish you luck, but I don't plan to offer further technical feedback.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2019

@Arachnid ,

I still think you're not understanding the difference between an interface and an implementation. An interface describes an API for other code to interact with, but not how it's implemented. What you're proposing here seems to be more along the lines of a directory of library code.

I asked you before: "Are you referring to the dType registry interface and implementation or to the type library?", to which you answered with "No, I'm talking about the distinction between a type definition and an implementation". I suggest you pay attention to how clearly you phrase your questions before being unsatisfied with the answer.
I thought it was clear that this ERC aims to both:

So aside from your destructive (as opposed to constructive) and your inexact criticism, what can I do? Is ERC not the correct category? EIP-1 definition is "ERC - application-level standards and conventions", so it seems right.

Can you qualify 'more secure'? What's your threat model?

On-chain calculation is the standard. Any off-chain calculation may have a faulty implementation. You need a standard to compare off-chain implementations - why is this unclear? The threat model is the same as for blockchain vs. off-chain data & behavior.

Removing a type seems like it would cause chaos if it's in use somewhere, and still requires you to maintain a permission model.

Correct. I am ok with removing the remove function, but I want more debate on it, from multiple parties.

It ties the system indelibly to Swarm, and to Keccak256 hashing.

It's a bytes32 identifier. Any storage solution that is compatible with bytes32 will do. If multiple storages are used, then we need (at least) another field to describe which one is used. I wanted to keep it simple for now and open to suggestions.

  • It doesn't indicate a content-type, or any other metadata.
  • There's no clear mechanism for someone to determine what language the code is in, or what format the file is.

But tell me a way for the EVM to check something that is off-chain. You know very well it cannot and this is not constructive criticism. We can, however, replace the bytes32 source hash with an EthPM package identifier, that contains bytecode and source matadata hash (containing the file extension).
For a unique registry, like we are proposing, we still need governance on top to verify correctness and consensus.

I'm also not sure what the motivation here is; why would anyone need to fetch the source in this context?

Non-centralized source code verification (see EthPM).

It seems to me that this spec is a long way away from being a simple distributed type registry. It has a lot of unnecessary complexity and doesn't make a clear distinction between interfaces and implementations.

To summarize, I understand the "unnecessary complexity" as making the type ABI computable on chain, as opposed to blackboxing it to a packed encoding that the EVM cannot decode. In this case, the complexity is beneficial and overhead-worth (if used by many, overhead decreases).
Otherwise, please define "unnecessary complexity" in the context of the ERC.

I wish you luck, but I don't plan to offer further technical feedback.

This is fine, I thank you for the effort and time. However, you do not present a way forward.
As per EIP-1 (https://github.com/ethereum/EIPs/blob/27ea3a138b8d54b9657d518cc4fac7d8fe8b3dfc/EIPS/eip-1.md#eip-editors), you are an editor with the following responsibilities:

  • "Read the EIP to check if it is ready: sound and complete. The ideas must make technical sense, even if they don't seem likely to get to final status."
  • "The editors don't pass judgment on EIPs. We merely do the administrative & editorial part."

You did not say that this ERC does not make technical sense. Your opposition is currently "unnecessary complexity", which you did not define properly. I am (and have been) open to improving anything that is not technically sound.

Therefore, I do not see a reason to deny merging of this ERC Draft to master. It is work in progress, as defined in EIP-1: "Once the first draft has been merged, you may submit follow-up pull requests with further changes to your draft until such point as you believe the EIP to be mature and ready to proceed to the next status."

If you do not want to approve and merge due to technical reasons, please clearly list what they are and what editor to ping when I solve them.
If you do not want to approve and merge due to non-technical reasons, conflicts of interest etc., please ping/appoint another editor for this job.

However, if you are not in your editorial capacity (and you need to specify that, as you are on the editor's list), we welcome debate and new ideas from a technical person, like yourself and others.

This ERC is important for Ethereum. Saying that you refuse to give further feedback without explaining why, goes against the ethos of the community: collaboration, effort decentralization and evolution of computing.
I have no commercial interest in this. I am a volunteer working for these values.

loredanacirstea added a commit to loredanacirstea/EIPs that referenced this issue Jul 2, 2019

Update ERC-1900 after comment suggestions
Updating ERC-1900 after suggestions from ethereum#1882 (comment):
- clearer explanation of what a type is and what it's composed of
- defining the purpose of the struct fields when they are first encountered
- mentioning type immutability and the possibility of removing the dType `remove` function
- adding more information about the `source` field
- mentioning human readable names as a primary identifier for types

Additionally:
- removed the TypeStorage contract description, postponing it for a future EIP, due to multiple storage patterns being researched
@Arachnid

This comment has been minimized.

Copy link
Collaborator

commented Jul 10, 2019

@loredanacirstea I was offering technical feedback as an individual contributor, not as an editor. As you've decided to ignore nearly all of my feedback, wasting time on giving more of it seems pointless.

I never suggested I was acting as an editor, or gating merging the draft based on my technical critique.

@loredanacirstea

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

@Arachnid , you are on the EIP editor list, you are by default in an editor capacity, you should specify when you are not, when commenting.

To recap:

  • I did a substantial rewording in 1adf3de, based on your suggestions.
  • I answered all your questions and provided explanations when our view was different
  • I published an additional ERC that I previously wanted to postpone, just because I wanted to answer your questions better: Storage Extension #2158). Even though I removed mentions of it from the current ERC.
  • I reworded ERC-1900 and started a draft on another ERC that I previously wanted to postpone (#1882 (comment)) - Language Extension, just because you said "If you don't define how these are used in the ERC where they're declared, it seems likely that you'll never be able to use them coherently because there will be no universal expectation over the content stored in them."

I think my effort was more than enough to demonstrate that I did not ignore your feedback, but appreciated it.

You say that you wasted your time because I did not necessarily agree with all of your suggestions, while I did provide arguments.
Still, you did not find the time to review this ERC as an editor. An ERC that you already read, never said that it was a bad idea, was abiding by EIP-1 rules and that could be improved and discussed after merging it as a Draft and getting an identifier.

These are facts.

@Arachnid

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2019

you are on the EIP editor list, you are by default in an editor capacity, you should specify when you are not, when commenting.

As I've said before in other PRs, editors are not primarily technical reviewers of EIPs, because that's not scalable. When I'm making requests for changes in a PR in order to merge it, I'm acting as an editor. When I'm discussing technical proposals in general, I'm clearly not - because that's not part of an editor's job.

Still, you did not find the time to review this ERC as an editor. An ERC that you already read, never said that it was a bad idea, was abiding by EIP-1 rules and that could be improved and discussed after merging it as a Draft and getting an identifier.

I only saw this issue because you drew my attention to it via Twitter. I didn't notice the associated PR, or I would have merged it as draft as soon as it met the typographical requirements.

I'll make one more attempt at clarifying a couple of misunderstandings:

I asked you before: "Are you referring to the dType registry interface and implementation or to the type library?", to which you answered with "No, I'm talking about the distinction between a type definition and an implementation". I suggest you pay attention to how clearly you phrase your questions before being unsatisfied with the answer.

This is a type:

interface IFoo {
  function doFoo(uint x) view returns(uint);
}

This is an implementation:

contract Foo implements IFoo {
  function doFoo(uint x) view returns(uint) {
    return x * 2;
  }
}

It's possible for one type to have many implementations - for example, ERC20.

A type library would specify common interfaces that implementations can conform to, and allow consumers to know what implementations support those interfaces.

This EIP claims to be a 'decentralised type system', but it lacks any distinction between a type and implementations of that type. Based on our back-and-forward, it seems like what you're really trying to build is a repository of library code. Which is fine - but then you should rename the EIP and reword accordingly.

Can you qualify 'more secure'? What's your threat model?

On-chain calculation is the standard. Any off-chain calculation may have a faulty implementation. You need a standard to compare off-chain implementations - why is this unclear? The threat model is the same as for blockchain vs. off-chain data & behavior.

I don't see how this is a security issue. If you write a standard for how to generate a hash from a type specification, you can provide standard test vectors, and anyone can implement that. Changing how you store data onchain so that you can do that inside Solidity buys you some convenience, but at the cost of a lot of additional storage overhead and gas costs.

It's a bytes32 identifier. Any storage solution that is compatible with bytes32 will do. If multiple storages are used, then we need (at least) another field to describe which one is used. I wanted to keep it simple for now and open to suggestions.

Assuming the goal is for a consumer to be able to fetch the content associated with this field, the data you have here is insufficient for that, unless you specify that it must, for example, refer to a Swarm manifest content hash. If you leave this unspecified, people will use it inconsistently, and nobody will be able to write code to fetch it with confidence that it will actually work.

If you do not want to specify it here, you should leave it out - anything else will result in useless overhead, because it is underspecified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.