Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Data Modelling v2 #347

Closed
alvrs opened this issue Jan 16, 2023 · 40 comments
Closed

Proposal: Data Modelling v2 #347

alvrs opened this issue Jan 16, 2023 · 40 comments
Labels
discussion A discussion (without clear action/outcome)

Comments

@alvrs
Copy link
Member

alvrs commented Jan 16, 2023

Note: this proposal contains a lot of pseudo code and some of the core aspects of the proposal are contained in the code comments - don't skip over it

Table of contents

Abstract

This proposal addresses a couple of issues that come with MUD’s current approach to on-chain data modelling and state management.

We propose to move away from individual component contracts to store application state, and instead create a core spec and library for on-chain for data modelling and storage. The core library can be used in any contract to store state in a MUD compatible way and emit MUD compatible events for general-purpose indexers. (The core library doesn’t implement access control.)

We then use this core library to create a framework to add storage access control and the ability for third party developers to register new data pools and mount new contracts (similar to the current World contract).

Issues with the previous approach

  • Currently, state is organised into separate components to manage access control and implement functions with typed parameters and typed return values (since Solidity doesn’t support generic types)

    • The component contracts call a function on a central World contract to register their update
    • The components use solidity’s abi.encode under the hood, which leads to unnecessarily high gas costs (because abi.encode reserves one 32 byte word per struct key)
  • Currently, developers have to opt-in to MUD’s entire framework to benefit from conceptually independent features like general purpose indexers, instead of being able to upgrade their existing codebases

  • Currently, developers have to manually define their components' schemas using a DSL, which is not intuitive for Solidity developers and leads to easy to miss bugs (when the defined schema doesn’t match the abi.encoded value)

  • Currently, developers using MUD have to implement a lot of “boilerplate” code to read or write component values compared to setting vanilla Solidity storage variables

    • Current MUD:

      PositionComponent position = PositionComponent(getAddressById(components, PositionId));
      position.set(0x00, Position(1,2));
    • Vanilla Solidity:

      positions[0x00] = Position(1,2);
  • Currently, MUD is limited to the ECS pattern (Entity-Component-System), requiring every piece of data to be associated with a single uint256 id. This makes some data modelling harder than desired, for example using a composite key (consisting of two values)

    • The current workaround is to create composite entity ids by using keccak(entity1, entity2), but this approach obfuscates the used entity ids and is cumbersome to work with

The following is a proposal to address all of the issues above and more.

Design goals

  • A way to store introspectable structured data on-chain
    • Introspectable = data schema can be retrieved on-chain, so default general-purpose indexers are possible
  • General-purpose indexers by default
    • Events notifying indexers about every state change
    • On-chain schema, so indexers know how to interpret state changes
    • SQL-compatible data modelling on-chain, so indexers can benefit from decades of SQL research
  • Dynamic schemas / ability to add more schemas after the core contract has been deployed
    • This is important to enable “autonomous worlds” where third party developers can add data packets and systems to an application
  • As little gas-overhead compared to the most efficient custom way of storing data on-chain as possible
  • As little “third party code managing core state” as possible. As much as possible should be done by the core library
  • The best developer experience possible (at least as good as working with native solidity structs/mappings)
  • Splitting up the core storage problem from the framework problem
    • This allows more people to develop tools integrating with the core storage method, without having to opt-in to the framework

Core storage management library

  • Implements logic to store and access data based on registered schemas
  • Implements update events
  • “Untyped” - uses bytes everywhere - typing is responsibility of wrapping libraries (see below)
  • Any contract can implement the IMudStore interface / extend the MudStore base contract to become compatible with a large chunk of MUD’s toolchain, like general-purpose indexers
  • Data is organised by table and index into the table, where the index can be a tuple of multiple bytes32 keys
    • This is a superset of ECS: In ECS, “Components” correspond to tables, and “Entities” are indices into the table. In this proposal, we allow to use tuples as index into the table, allowing more complex relationships to me modelled (and making the data model more similar to a relational database). However, single keys are still possible, so ECS is still possible.
    • The tuple of keys used to index a table is emitted as part of the event, so it can be picked up by indexers and we don’t have to rely on hacks like hashed composite entities anymore.

Illustration of data model

// Illustration of data model:
// Assume we want to index with two keys: A, B

keys: (A, B)
valueSchema: (X, Y, Z)

conceptually:
{
	[A1]: {
		[B1]: {
			X1,
			Y1,
			Z1
		},
		[B2]: {
			X2,
			Y2,
			Z2
		}
	},
	[A2] { ... }
}

-> translates into relational database:
| A  | B  | X  | Y  | Z  |
| -- | -- | -- | -- | -- |
| A1 | B1 | X1 | Y1 | Z1 |
| A1 | B2 | X2 | x2 | Z2 |
| ...

-> translates to on-chain:
mapping(
	keccak(A1,B1) => {X1, Y1, Z1},
	keccak(A1,B2) => {X2, Y2, Z2}
)

Pseudo-code implementation with more details

// Solidity-like pseudo code
// omitting some language features for readability
// eg using keccak(a,b,c) for keccak256(abi.encode(a,b,c))
// or omitting memory, public, pure etc

enum SchemaType {
	UINT8,
	..., // in steps of 8, so 32 total
	UINT256,
	INT8,
	..., // in steps of 8, so 32 total
	INT256,
	BYTES1,
	..., // in steps of 1, so 32 total
	BYTES32,
	BOOL,
	ADDRESS,
	BYTES,
	STRING, // until here we have 100 types
	BIT, // we could add a native "bitpacking" type using the same approach described below
	<T>_ARRAY // everything above as an array - until here we have 202 types
	// 54 more slots to define more types and keep SchemaType a uint8
}

// A table schema can have up to 32 keys, so it fits into a single evm word.
// (Schemas above 32 keys are definitely an anti-pattern anyway)
// Working with unnamed schemas makes the core library simpler; naming keys is the job of wrapping libraries
type Schema = SchemaType[32];

// Interface to turn any contract into a MudStore
interface IMudStore {
	event StoreUpdate(bytes32 table, bytes32[] index, uint8 schemaIndex, bytes[] data);
	function registerSchema(bytes32 table, SchemaType[] schema);
	function setData(bytes32 table, bytes32[] index, bytes[] data);
	function setData(bytes32 table, bytes32[] index, uint8 schemaIndex, bytes data);
	function getData(bytes32 table, bytes32[] index) returns (bytes[] data);
	function getDataAtIndex(bytes32 table, bytes32[] index, bytes32 schemaIndex) returns (bytes data);
	function isMudStore() returns (bool); // Checking for existence is this function is sufficient for consumers to check whether the caller is a MUD store (this could potentially be turned into eip-165 in the future)
}

library MudStoreCore {
	// Note: the preimage of the tuple of keys used to index is part of the event, so it can be used by indexers
	event StoreUpdate(bytes32 table, bytes32[] index, uint8 schemaIndex, bytes[] data);
	constant bytes32 _slot = keccak("mud.store");
	constant bytes32 _schemaTable = keccak("mud.store.table.schema");

	// Register a new schema
	// Stores the schema in the default "schema table", indexed by table id
	function registerSchema(bytes32 table, SchemaType[] schema) {
		// Optional: verify the schema only has one dynamic type at the last slot, see note 1 below
		setData(_schemaTable, table, Convert.encode(schema));
	}

	// Return the schema of a table
	function getSchema(bytes32 table) returns (SchemaType[] schema) {
		bytes value = getData(_schemaTable, table);
		return Convert.decodeUint8Array(value);
	}

	// Check whether a schema exists for a given table
	function hasTable(bytes32 table) returns (bool) {
		return getData(_schemaTable).length > 0;
	}

	// Update full data
	function setData(bytes32 table, bytes32[] index, bytes[] data) {
		// Optional: verify the value has the correct length for the table (based on the table's schema)
		// (Tradeoff, slightly higher cost due to additional sload, but higher security - library could also provide both options)

		// Store the provided value in storage
		bytes32 location = _getLocation(table, index);
		assembly {
			// loop over data and sstore it, starting at `location`
		}

		// Emit event to notify indexers
		emit StoreUpdate(table, index, 0, data);
	}

	// Update partial data (minimize sstore if full data wraps multiple evm words)
	function setData(bytes32 table, bytes32[] index, uint8 schemaIndex, bytes data) {
		// Get schema for this table to compute storage offset
		SchemaType[] schema = getSchema(table)[];

		// Compute storage location for given table, index and schemaIndex
		bytes32 location = _getLocation(table, index);
		uint256 offset = _getByteOffsetToSchemaIndex(schema, schemaIndex); // Simple helper function
		assembly {
			// set data at the computed location (location + offset)
		}
	
		// Emit event to notify indexers
		emit StoreUpdate(table, index, schemaIndex, [data]);
	}

	// Get full data
	function getData(bytes32 table, bytes32[] index) returns (bytes[] data) {
		// Get schema for this table
		// Compute length of the full schema
		// Load the data from storage using assembly
		// Split up data into bytes[] based on schema
		// Return the data as bytes[]
	}

	// Get partial data based on schema key
	// (Only access the minimum required number of storage slots)
	function getDataAtIndex(bytes32 table, bytes32[] index, bytes32 schemaIndex) returns (bytes data) {
		// Get schema for this table
		// Compute offset and length of this schema index
		// Load the data for this schema index from storage using assembly
		// Return the data as bytes
	}

	// Compute the storage location based on table id and index tuple
	// (Library could provide different overloads for single index and some fixed length array indices for better devex)
	function _getLocation(bytes32 table, bytes32[] index) returns (bytes32) {
		return keccak(_slot, table, index);
	}

	// Simple helper function to compute the byte offset to the given schema index based in the given schema
	function _getByteOffsetToSchemaIndex(schema, schemaIndex) returns (uint256) {
		// Sum `getByteLength(schemaType)` for every schema index before the given index
	}

	// Simple helper function to return the byte length for each schema type
	// (Because Solidity doesn't support constant arrays)
	function _getByteLength(SchemaType schemaType) returns (uint8) {
		// Binary tree using if/else to return the byte length for each type of schema
	}
}

// A helper library to convert any primitive type (+ arrays) into bytes and back
library Convert {
	// Overloads for all possible base types and array types
	// Encode dynamic arrays in such a way that the first 2 byte are reserved for the array length = max arr length 2**16 (to help decoding)
	function encode(uint256 input) returns (bytes);

	// Decoder functions for all possible base types and array types
	function decodeUint8Array(bytes input) returns (uint8[]);

	...
}

Notes

  1. If we only allow one dynamic array type per table schema, encoding/decoding/storing partial data gets much simpler and cheaper (the dynamic array type always has to come last in the schema)
    • cheaper because only one storage access to get the schema, instead of additional storage access to get the length of each dynamic array. Also, dynamic array types anywhere else but at the last schema slot would shift all remaining schema values (even non-dynamic ones), so modifying partial data would be much more expensive (worst case as expensive as modifying the full data) - we could save developers from having to think about this in their model by restricting schemas to one dynamic type that has to come last.

Wrapping typed libraries

  • While Solidity doesn’t support generic types, we can autogenerate libraries to set/get typed values based on user defined schemas to emulate the experience of working with a generically typed core library.
  • The libraries encode typed values to raw bytes and vice versa to improve developer experience (in theory devs could call the core functions manually but devex would suck)
  • The library detects whether the call comes from within a MudStore (eg if the contract using the library is called via delegatecall from a MudStore) or if the msg.sender is a MudStore (eg if the contract using the library is called via call from a MudStore) and automatically switches between writing to own storage using the core library and calling the respective access controlled methods on the calling MudStore.

Pseudo-code implementation with more details

// Solidity-like pseudo code
// omitting some language features for readability
// eg using keccak(a,b,c) for keccak256(abi.encode(a,b,c))
// or omitting memory, public, pure etc

// ----- Example of an auto-generated typed library for a Position table -----

// -- User defined schema and id --

bytes32 constant id = keccak("mud.store.table.position");

struct Schema {
	uint32 x;
	uint32 y;
}

// -- Autogenerated schema and library --

library PositionTable {

	// Detect whether the call to the system was done via delegatecall or a regular call
	// to switch between writing to own storage and using access controlled external storage functions
	// (see note 1. below)
	function isDelegateCall() internal returns (bool) {
		(bool success, bytes memory data) = address(this).call(
			abi.encodeWithSignature("isMudStore()")
		);

		return success && abi.decode(data, (bool));
    }

	// Register the table's schema
	// (used to compute data length when returning values from core lib and for input validation)
	function registerSchema() {
		// Autogenerated schema based on schema struct definition
		SchemaType[2] schema = [SchemaType.UINT32, SchemaType.UINT32];

		// Call core lib or wrapper contract to register schema
		if(isDelegateCall()) {
			MudStoreCore.registerSchema(id, schema);
		} else {
			MudStore(msg.sender).registerSchema(id, schema);
		}
	}

	// Set the full position value
	function set(uint256 entity, uint32 x, uint32 y) {
		bytes[] data = [
			Convert.encode(x),
			Convert.encode(y)
		];

		// Set the data via core lib or wrapper contract
		if(isDelegateCall()) {
			MudStoreCore.setData(id, entity, data);
		} else {
			MudStore(msg.sender).setData(id, entity, data);
		}
	}

	// Offer both syntax for convenience
	function set(uint256 entity, Schema data) {
		set(entity, data.x, data.y);
	}

	// Set partial schema values
	function setX(uint256 entity, uint32 x) {
		// Set the data via core lib or wrapper contract
		if(isDelegateCall()) {
			MudStoreCore.setData(id, entity, 0, x);
		} else {
			MudStore(msg.sender).setData(id, entity, data);
		}
	}

	function setY(uint256 entity, uint32 y) {
		// Set the data via core lib or wrapper contract
		if(isDelegateCall()) {
			MudStoreCore.setData(id, entity, 1, x);
		} else {
			MudStore(msg.sender).setData(id, entity, data);
		}
	}

	// Get the full position value
	function get(uint256 entity) returns (Schema) {
		// Get data via core lib or wrapper contract
		bytes[] data = isDelegateCall() 
			? MudStoreCore.getData(id, entity)
			: MudStore(msg.sender).getData(id, entity);
			
		return Schema(
			Convert.decodeUint32(data[0])),
			Convert.decodeUint32(data[1]))
		);
	}

	// Get partial schema values
	function getX(uint256 entity) returns (uint256) {
		// Get data via core lib or wrapper contract
		bytes data = isDelegateCall()
			? MudStoreCore.getData(id, entity, 0)
			: MudStore(msg.sender).getData(id, entity);

		return Convert.decodeUint32(data);
	}

	function getY(uint256 entity) returns (uint256) {
		bytes data = isDelegateCall()
			? MudStoreCore.getData(id, entity, 1)
			: MudStore(msg.sender).getData(id, entity);
		return Convert.decodeUint32(data);
	}
}

Usage examples

// Usage examples from within System:
PositionTable.set(0x01, 1, 2);

PositionTable.set(0x01, {x: 1, y: 2});

PositionTable.set({entity: 0x01, x: 1, y: 2});

PositionTable.setX(0x01, 1);

Schema position = PositionTable.get(0x01);

uint32 x = PositionTable.getX(0x01);

Notes

  1. We want to be able to detect deletegatecall in the storage library called in the system
    • If the system is called via delegatecall, it means it can write to storage using MudStoreCore directly without having to call functions with access control on a MudStore contract. This saves (700 call base gas + x calldata gas + y access control check gas) per storage operation
    • To detect delegatecall inside of a library, we can check if this has the isMudStore() function
      • since systems don’t implement their own isMudStore function, if this supports isMudStore, it means the current context is a MudStore and we can use libraries directly (this could be turned into something like ERC165’s supportsInterface)
      • This approach is cheaper than alternatives like setting a temporary storage variable (5k gas to temp store, 2.1k to read from the system)

Framework (aka World)

Edit: the original proposal included a section on the World framework. Since then we reworked the World framework concept and moved the discussion about it to a new issue (#393). For reference this toggle includes the original proposal.
  • Using the MudStoreCore library, any contract can become compatible with MUD’s toolchain
  • To further improve developer experience, we create a framework around MudStoreCore (like the current World contract and conventions)
    • Common patterns for modularising code (into modular systems)
    • Common patterns for approvals akin to ERC20-like approvals, used for:
      • system-to-system calls
      • session wallets
      • atomic contract interactions (akin to ERC20 swaps)
    • Replacing dynamic contract addresses with known and human-readable function names inside the framework
  • The framework has similarities to the well known diamond pattern, but implements facets differently to support an “autonomous mode”, where third party developers can register new tables and new systems on the core World contract.
    • Systems (akin to facets) can be registered as DELEGATE systems, meaning they are called via delegatecall from the World contract
      • DELEGATE systems have full access to all storage, so they can only be registered and upgraded by the World’s owner
      • The World can be made “autonomous” by setting its owner to address(0)
        • This means no more DELEGATE systems can be registered and the existing DELEGATE systems can not be upgraded anymore
    • Systems can be registered as AUTONOMOUS systems, meaning they are called via call from the World contract
      • AUTONOMOUS systems set state via the World’s access controlled setData method
        • They can read from all tables, but can only write data to tables they have write access to
      • Anyone can register a new AUTONOMOUS system
      • The owner of an AUTONOMOUS system can upgrade the system (by overwriting the existing entry in the SystemTable)
  • All systems are called via the World’s fallback method
    • Why?
      • The central World contract can implement logic like access control, approval pattern, system-to-system calls, account abstraction
      • This central logic can be upgraded by the World owner (which can be a DAO)
        • Access control bugs can be fixed and new features can be added for the entire World instead of each system separately
      • Neither do Systems need a reference to “their World” in storage, nor does the World parameter need to be passed via a parameter
        • Instead systems can trust the msg.sender to be the World contract (if called via call) and therefore read and write data via World’s access controlled methods, or have write access to the delegated storage directly (if called via delegatecall). All of this can be abstracted into the autogenerated libraries per table.
        • This also enables systems to be deployed once and then be registered in and called from multiple different World contracts (akin to diamond's facets).
      • Same developer and user experience independent of working in “diamond mode” with mostly DELEGATE systems or in “autonomous mode” with AUTONOMOUS systems.
    • How?
      • When registering a new system, the World computes a new function selector based on the system’s name and function signature
        • Example: Registering a CombatSystem’s attack function:
          • Register via call to world.registerSystem(<contractAddr>, "Combat", "attack(bytes32)")
          • Now the system can be called via world.Combat_attack(bytes32) (the call will be forwarded to CombatSystem.attack(bytes32))
      • Since systems are called via the World contract, msg.sender is either the external msg.sender (if the system is called via delegatecall) or the World contract (if the system is called via call).
        • Therefore all systems’s functions need to have address _from as their first parameter, which will be populated by the World contract with the external msg.sender, or other addresses based on some approval pattern (see discussion in Proposal: General approval pattern (for modular systems and session wallets) #327)
          • Great benefit of this approach: access control, account abstraction, etc can all be implemented (and upgraded) at the central World contract instead of separately in each system (see notes on “Why” above)

Pseudo-code implementation with more details

// Solidity-like pseudo code
// omitting some language features for readability
// eg using keccak(a,b,c) for keccak256(abi.encode(a,b,c))
// or omitting memory, public, pure etc

// `MudStore` base contract implements all view functions from IMudStore (getData, ...)
// that don't require access control checks.
// World contract extends `MudStore` and implements access control for write methods (`setData`) 
contract World is MudStore {
	error World_TableExists();

	function registerSchema(bytes32 table, SchemaType[] schema) {
		// Require unique table ids
		if(MudStoreCore.hasTable(table)) revert World_TableExists();
		
		// Register schema
		MudStoreCore.registerSchema(table, schema);

		// Set table's owner in owner tab
		// (OwnerTable uses auto-generated typed helper table like `PositionTable` described above)
		OwnerTable.set({ index: table, owner: msg.sender });
	}

	function setData(bytes32 table, bytes32[] index, bytes[] data) {
		// TODO: Require caller to have permission to modify table
		//       (access control details tbd)

		// Set data
		MudStoreCore.setData(table, index, data);
	}

	// Register a new system
	// -> Anyone can call this method, but only World owner can pass DELEGATE mode
	//    - DELEGATE systems are called via delegatecall and have access to all storage
	//    - AUTONOMOUS systems are called via call and modify storage via access controlled `setData` method
	function registerSystem(
		address contractAddress,
		string contractName,
		string functionSig,
		ExecutionMode mode) {
		// TODO: if mode is DELEGATE, require msg.sender to be World's owner
		
		// TODO: check if contract name is already registered
		//       - if so, require msg.sender to be owner
		//       - else, register contract name and set msg.sender as owner

		// TODO: check if function signature already exist for the given contract
		//       - if so, this is an upgrade
		//           - require msg.sender to be system's owner
    //           - and if the given system is a DELEGATE system, require World's owner to be system's owner
		//             (to prevent upgrades to DELEGATE systems in fully autonomous mode)
		
		// Compute the selector to use to call this system via the fallback() entry point
		// using the format <contractName>_<functionSig>()
		// NOTE: this is slightly simplified - in reality we have to remove the `address _from` parameter
		//       from the function signature because it will be automatically populated by the World based on `msg.sender` (see notes above)
		bytes4 worldSelector = bytes4(keccak(abi.encodePacked(contractName, "_", functionSig)));
		
		// Register World selector with contract address
		SystemTable.set({
			index: bytes32(worldSelector),
			addr: contractAddress,
			selector: bytes4(keccak(functionSig),
			mode: mode
		});
	}

	// TODO: Set approval (see general approval pattern discussion in mud#327)
	function approve( ... ) { ... }

	// The fallback function is used for consumers to call system functions
	// with proper types. We can generate an ABI for the World contract based
	// on registered systems.
	// The function selector is generated in `registerSystem` (see above)
	fallback() external payable {
		// Find system based on function selector
		SystemTableEntry system = SystemTable.get(msg.sig);
		
		if(system.mode == ExecutionMode.DELEGATE) {
			// TODO: If system is DELEGATE system, populate the _from parameter with msg.sender,
			//       forward the call via `delegatecall`, and return any value.
			//       This is almost equivalent to EIP2535 (diamond pattern), except from
			//       using `_from` instead of `msg.sender`
		} else {
			// TODO: If system is an AUTONOMOUS system, populate the _from parameter with msg.sender
			//       forward the call via `call`and return any value.
			//       The called system will use access controlled `setData` methods of this contract.
		}
		

	}
}

Usage example

// ----- Example of a move system -----

contract MoveSystem {
	// System can trust the `move` function will only be called via a `MudStore` contract (in our case World)
	// and must therefore use the _from parameter instead of msg.sender. (Note: this requires something like the "general access pattern" (#327) to be in place)
	// Since system doesn't have any internal state, it doesn't have to check whether the call actually comes from a `MudStore`
	// (because state will always be modified in the calling contract and the call fails if it doesn't come from a MudStore)
	function move(address _from, bytes32 _entity, Position _position) public {
		// Check if the `_from` address owns the given entity
		require(OwnerTable.get(_entity) == _from, "only owner can move entity");
		
		// Set the new entity's new position value
		PositionTable.set(entity, position);
	}
}

Further work / extensions

Table migrations

  • For a persistent world it is plausible that table schemas need to be upgraded from time to time. How could this be implemented in this proposal?
    • We could add an additional signature for setData and getData that includes a uint16 version parameter
    • MudStoreCore._getLocation includes the version to get the storage location hash
    • If the version parameter is omitted, it is set to 0 by default
    • To increase a table’s version, a “migration” has to be specified (how to interpret the original data with the new schema). This migration is used to generate a typed access library using the new schema, which calls setData with an incremented index value and the new schema, and implements the migration in the getter functions.

Acknowledgements

  • This proposal is based on many internal discussions with and ideas by @ludns, @holic, @Kooshaba and @authcall
  • Generating libraries to improve developer experience and allow typed access to tables is based on an idea by @FlynnSC
  • Registering contracts as “facets” and calling them via a fallback method, as well as using delegated storage is based on Nick Mudge, "EIP-2535: Diamonds, Multi-Facet Proxy," Ethereum Improvement Proposals, no. 2535, February 2020. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-2535.
  • Using diamond storage to improve the developer experience and gas efficiency of MUD is based on ideas by @cha0sg0d, @0xhank and @dk1a
@alvrs alvrs added the discussion A discussion (without clear action/outcome) label Jan 16, 2023
@alvrs alvrs pinned this issue Jan 16, 2023
@frolic
Copy link
Member

frolic commented Jan 16, 2023

isMudStore() ... (this could be turned into something like ERC165’s supportsInterface)
If there are no downsides to just using supportsInterface, should we just adapt the proposal to use that instead?

Not sure if this is already demonstrated in the code above (wasn't clear how to call the MoveSystem's move via the World), but I wonder if makes sense to prefix all fallback methods with something to avoid colliding with potential built-ins like supportsInterface etc. Basically preserving the "root" function sig namespace for us in the future. Imagining we upgrade the world over time and add new functions that collide with folks' existing systems, it would be a pain to ask them all to rename things.

There's also some potential for function sig collisions between systems, so maybe it makes sense to prefix each with the contract name or some other identifier, e.g. World.MoveSystem_move()

@frolic
Copy link
Member

frolic commented Jan 16, 2023

Before we land on anything, it'd be nice to turn some of this pseudocode into real code (and manually write the helpers that will be auto generated) and run some tests against it to see 1) how it feels to use in practice and 2) how much more gas efficient is it compared to MUD's current data model/approach.

@frolic
Copy link
Member

frolic commented Jan 16, 2023

Is event StoreUpdate(...) replacing ComponentValueSet and ComponentValueRemoved? If so, we may need to distinguish between "removed"/"unset" and just set to the Solidity-native "uninitialized" value like 0. For example, removing a Health component (not a living entity/attackable) is a different operation than setting Health to 0 (dead).

@dk1a
Copy link
Contributor

dk1a commented Jan 16, 2023

If we only allow one dynamic array type per table schema, encoding/decoding/storing partial data gets much simpler and cheaper (the dynamic array type always has to come last in the schema)

I think "always has to come last" already solves the efficiency problem, "only allow one dynamic" may be an excessive restriction

To detect delegatecall inside of a library, we can check if this has the isMudStore() function

(I assume you are putting off 165 because of interface volatility)
You can use ERC165 right away, and wrap the hardcoded check in a free isMudStore utility function. That would make the transition seamless later.
When the interface changes, you change the check within isMudStore and users are unaffected.

When registering a new system, the World computes a new function selector based on the system’s name and function signature

Have you thought about name squatting?

fallback

I'm not sure fallback(<args>) fits autonomous world as well as it does diamonds.
@holic already mentioned most issues here, I'll just present an alternative (not well thought out).
With something like callSystem(<deployer_guid>, <name>, <args>)

  • <deployer_guid> addresses inter-user collisions/squatting
  • <name> addresses namespacing (you can autoprefix) and function name collisions (you can hash as bytes32 instead of bytes4)
  • <args> system args, same as fallback

Devex can probably be salvaged with some helpers, it's gonna be on the client at that point anyways (except for internal systems).

Therefore all systems’s functions need to have address _from as their first parameter

Looks like a non-gsn use case for ERC2771. Tho it will hardly be more convenient - instead of 1 mandated argument, you get 1 mandated interface to inherit + 1 internal func (_msgSender) to think about

@alvrs
Copy link
Member Author

alvrs commented Jan 17, 2023

Re: supportsInterface

If there are no downsides to just using supportsInterface, should we just adapt the proposal to use that instead?
- #347 (comment) by @holic

(I assume you are putting off 165 because of interface volatility)
You can use ERC165 right away, and wrap the hardcoded check in a free isMudStore utility function. That would make the transition seamless later.
When the interface changes, you change the check within isMudStore and users are unaffected.
- #347 (comment) by @dk1a

I agree, we can use supportsInterface right away. If the interface is upgraded in way that doesn't break consumers (eg. adding an new method), supportsInterface could just return true for both the old and the new interface.


Re: calling systems via fallback()

Not sure if this is already demonstrated in the code above (wasn't clear how to call the MoveSystem's move via the World), but I wonder if makes sense to prefix all fallback methods with something to avoid colliding with potential built-ins like supportsInterface etc. Basically preserving the "root" function sig namespace for us in the future. Imagining we upgrade the world over time and add new functions that collide with folks' existing systems, it would be a pain to ask them all to rename things.
There's also some potential for function sig collisions between systems, so maybe it makes sense to prefix each with the contract name or some other identifier, e.g. World.MoveSystem_move()
- #347 (comment) by @holic

I agree and came to the same conclusion as you, see pseudocode for registerSystem above:

bytes4 worldSelector = bytes4(keccak(abi.encodePacked(contractName, "_", functionSig)));

The idea would be to create a new function selector in the World contract based on a unique system prefix (ie the system name) and the system's function selector. Registering the CombatSystem's attack function would become world.Combat_attack(...args) in the World. This requires the function signature to be passed to the registerSystem function as string pre-image instead of the already hashed 4 bytes function selector (from which we couldn't infer the function signature and therefore couldn't construct a new signature for calling it from within the world).

Have you thought about name squatting?
- #347 (comment) by @dk1a

Haha, interesting point. I don't think there is much prestige associated with function name prefixes, but if a World gets popular enough for this to be a problem, and gas prices are not high enough to disincentivise squatting, there could always be some small ETH amount required to register a new system / prefix to avoid excessive registration of systems / prefixes.

I'm not sure fallback(<args>) fits autonomous world as well as it does diamonds.
@holic already mentioned most issues here, I'll just present an alternative (not well thought out).
With something like callSystem(<deployer_guid>, <name>, <args>)

  • <deployer_guid> addresses inter-user collisions/squatting
  • <name> addresses namespacing (you can autoprefix) and function name collisions (you can hash as bytes32 instead of bytes4)
  • <args> system args, same as fallback
    Devex can probably be salvaged with some helpers, it's gonna be on the client at that point anyways (except for internal systems).

- #347 (comment) by @dk1a

The big advantage of using the fallback function is we can easily generate an ABI for the World that includes all registered systems, and thereby make it possible to call system functions with typed argument instead of raw bytes. I agree that on the client we could just have helpers adding types and encode the parameters automatically, but I would like encourage on-chain composability (ie creating new more complex systems by plugging together existing simpler ones), and the developer experience of doing that by manually encoding raw bytes is much worse in my opinion. (We could generate libraries to add types to system calls similar to how we'd have to do it for storing values in tables/components, but if we can get away with native Solidity features I'd prefer that over more code generation.)


Re: implementing prototype to benchmark and see how it feels

Before we land on anything, it'd be nice to turn some of this pseudocode into real code (and manually write the helpers that will be auto generated) and run some tests against it to see 1) how it feels to use in practice and 2) how much more gas efficient is it compared to MUD's current data model/approach.
- #347 (comment) by @holic

Very much agree with this! Will start implementing a prototype version today to create some benchmarks and share the results here.


Re: update events

Is event StoreUpdate(...) replacing ComponentValueSet and ComponentValueRemoved? If so, we may need to distinguish between "removed"/"unset" and just set to the Solidity-native "uninitialized" value like 0. For example, removing a Health component (not a living entity/attackable) is a different operation than setting Health to 0 (dead).
- #347 (comment) by @holic

The final name of the event is tbd but if we decide to move away from ComponentValueSet/Removed to something more inclusive of non-ECS use cases, we should probably still have a transition period in which we emit the deprecated old event to keep compatibility with tools until they are upgraded.

About "unset": I agree, this is definitely relevant for indexers (on-chain and off-chain). Btw, speaking of indexers, this made it not into the proposal above, but I think all tables/components should be "bare" by default (no reverse mapping) and instead we should have a built-in onUpdate callback or something that owner's of a table can use to register on-chain indexers which can keep track of a reverse mapping if needed. (Then we can can add support for more advanced indexers too.)


Re: restricting table schema to limit dynamic types

I think "always has to come last" already solves the efficiency problem, "only allow one dynamic" may be an excessive restriction
- #347 (comment) by @dk1a

Yeah this is a fair point. We could just allow multiple dynamic types per schema and make it the users' responsibility to be aware that values may get shifted in storage if they're placed behind a dynamic length type.


Re: call forwarding

Looks like a non-gsn use case for ERC2771. Tho it will hardly be more convenient - instead of 1 mandated argument, you get 1 mandated interface to inherit + 1 internal func (_msgSender) to think about
- #347 (comment) by @dk1a

True, conceptually it has a lot of similarities to ERC2771 meta transactions. I wonder if it would be worth implementing ERC2771 though, since systems would only accept transactions from a MudStore anyway, so I'm not sure if there is an additional benefit from supporting general meta-transactions. We could of course use ERC2771's approach if we think it provides the better developer experience. One minor thing I dislike with ERC2771's _msgData() is that it's raw bytes again - as mentioned above I feel like working with typed function params would lead to a better developer experience so I'd prefer that if it's feasible. You could also argue that requiring all systems to have an address _from parameter even if it's unused is not the most optimal developer experience, so happy to discuss further.

@dk1a
Copy link
Contributor

dk1a commented Jan 17, 2023

Re: Re: call forwarding

One minor thing I dislike with ERC2771's _msgData() is that it's raw bytes again - as mentioned above I feel like working with typed function params would lead to a better developer experience so I'd prefer that if it's feasible.

Oh you wouldn't even use _msgData in most cases. The idea of ERC2771-like approach is that _msgSender is stored at the end of calldata via assembly tricks. It may make more sense with an example:

// The extra complexity is the mandatory ERC2771Context that you need to inherit, and _msgSender it provides.
// Arguments are typed as before, that's unchanged.
contract MoveSystem is ERC2711Context {
  function move(bytes32 _entity, Position _position) public {
    // Check if the `_from` address owns the given entity
    require(OwnerTable.get(_entity) == _msgSender(), "only owner can move entity");

    // Set the new entity's new position value
    PositionTable.set(entity, position);
  }
}

@alvrs
Copy link
Member Author

alvrs commented Jan 17, 2023

Oh you wouldn't even use _msgData in most cases. The idea of ERC2771-like approach is that _msgSender is stored at the end of calldata via assembly tricks. It may make more sense with an example:

// The extra complexity is the mandatory ERC2771Context that you need to inherit, and _msgSender it provides.
// Arguments are typed as before, that's unchanged.
contract MoveSystem is ERC2711Context {
 function move(bytes32 _entity, Position _position) public {
   // Check if the `_from` address owns the given entity
   require(OwnerTable.get(_entity) == _msgSender(), "only owner can move entity");

   // Set the new entity's new position value
   PositionTable.set(entity, position);
 }
}

- #347 (comment) by @dk1a

Ohh I wasn't aware of that, thanks for pointing this out. In that case I think the ERC2711 approach provides a much better dev ex, since it doesn't require a potentially stale default param.

@ludns
Copy link
Member

ludns commented Jan 17, 2023

Instead of detecting the presence of the store on the contract storage via a call, i recommend doing it via a storage read. Reads get cheaper if accessed multiple time (if a system makes multiple store writes as an example); they also are cheaper than calls if optimised (a call has to go through an ABI encode).
We can of course benchmark; but i encourage exploring this mechanism.

@dk1a
Copy link
Contributor

dk1a commented Jan 17, 2023

@alvrs if u're working on this rn and wanna use some of my methods:
https://gist.github.com/dk1a/7293cdd1648aea2800e79307d3d572c7
It's not remotely finished and likely has bugs, but assembly store logic is mostly there, and SchemaType globals may be interesting.

@alvrs
Copy link
Member Author

alvrs commented Jan 17, 2023

Very cool! I'm working on a prototype on this branch: https://github.com/latticexyz/mud/tree/alvrs/datamodel (also very WIP still)

@latticexyz latticexyz deleted a comment from alvrs Jan 17, 2023
@dk1a
Copy link
Contributor

dk1a commented Jan 18, 2023

@alvrs for Bytes.sol you might be interested in Slice
e.g. you can getSubslice + toBytes32 to avoid intermediate memory allocation for slices

@dk1a
Copy link
Contributor

dk1a commented Jan 19, 2023

Started a branch too: https://github.com/dk1a/mud/tree/dk1a/datamodel
It's a different approach to StoreCore and encoding, should get some tests and gas metrics going later today

@dk1a
Copy link
Contributor

dk1a commented Jan 19, 2023

Some preliminary gas metrics (both branches aren't particularly optimized):
(@alvrs | @dk1a)

StoreCoreTest::testRegisterAndGetSchema()
gas used (register): 24293 | 24275
gas used (get schema, warm): 3562 | 2009
this is null-terminated vs bitpacked length

StoreCoreTest::testSetAndGetAndSplitData()
gas used (set): 26247 | 26284
gas used (get, warm): 978 | 988
gas used (split): 4686 | 5310
gas used (concat): 147 | gas used (packing): 5685
gas used (compute schema length): 1255 | gas used (packedSize): 2021
this is static vs dynamic encoding

@alvrs
Copy link
Member Author

alvrs commented Jan 19, 2023

Thanks @dk1a! Your branch has a lot of useful logic for dynamic length schemas, will port some of that over. In general I think we should first agree on an "interface" (how to register new tables and systems and how to interact with new tables and systems), and then go deep into gas optimization for the "reference implementation". For the gas optimization step it would be really useful to have a set of tests for every low level function and a github action to run the tests and parse out the gas results in comparison to before so we have an easy way to compare.

For now I created a draft PR for the prototype implementation: #352

The "end to end" test is in World.t.sol - showing how to register a new system, register a new table, use the table from within the system, and interact with the system (https://github.com/latticexyz/mud/pull/352/files#diff-bd77f4b8f4df46b4520bc7ea67fe42dadf8877ab9989be2cb4fb22f09ebeb562R11-R25)

@dk1a
Copy link
Contributor

dk1a commented Jan 19, 2023

@alvrs dynamic stuff (bytes, arrays) and storage functions are pretty bug-filled atm, should be ready tomorrow with a thorough TestTable

Haven't gotten around to it yet, but something like this might be cleaner for World:

target.call(
  abi.encodePacked(
    // not encodeWithSelector because arguments are already encoded
    selector,
    arguments,
    // forward msg.sender as last 20 bytes
    msgSender
  )
);

// place any return value into memory at position 0
returndatacopy(0, 0, returndatasize())

returndatasize can be longer than 0x40, which'd violate memory safety; you can copy it to the unallocated space instead

@frolic
Copy link
Member

frolic commented Jan 19, 2023

In general I think we should first agree on an "interface"

FWIW, I think it's a useful exercise to try some of the gas optimizing steps while designing the interfaces, because they can greatly inform one another. That's how I ultimately came to EthFS using a File struct as an abstraction (and file reading behavior as a function on that struct) because it was the most gas optimal approach to getting what I wanted. All the interfaces before that I had tried weren't getting the best gas.

But I also wanna be mindful that one particular interface/design isn't gonna constrain us in the future if we want to expand wider or go deeper in terms of features, gas optimizing, etc.

@alvrs
Copy link
Member Author

alvrs commented Jan 20, 2023

returndatasize can be longer than 0x40, which'd violate memory safety; you can copy it to the unallocated space instead
- #347 (comment) by @dk1a

good point - this part of the code was taken from EIP-2535 Diamonds, ooc do you know how they handle returndata longer than 0x40?

@dk1a
Copy link
Contributor

dk1a commented Jan 20, 2023

Added TestTable, and the full assortment of SchemaTypes seems to work now: https://github.com/dk1a/mud/tree/dk1a/datamodel

The only unimplemented types are BYTES_ARRAY and STRING_ARRAY. They need some special logic. Also, do we even want those? (I assume yes, seem useful)

@alvrs now that I'm done with slice packing, I imagine I should wait for you to update ur branch and then fork it to avoid 2 parallel ones.
Also wouldn't wanna do duplicate work, do you want help with anything particular, like gas optimizations, autogen etc?

@dk1a
Copy link
Contributor

dk1a commented Jan 20, 2023

good point - this part of the code was taken from EIP-2535 Diamonds, ooc do you know how they handle returndata longer than 0x40?

(untested pseudocode just to show the general idea):

assembly {
  let unallocPtr := mload(0x40)
  returndatacopy(unallocPtr, 0, returndatasize())
}

The idea is to use the unallocated memory as scratch space

https://github.com/solidstate-network/solidstate-solidity/blob/master/contracts/proxy/Proxy.sol
Well, memory just isn't safe there and it's all assembly that ignores solidity's memory model. I suppose you just have to be aware of it and mention in comments, if that's how you choose to do it

@alvrs
Copy link
Member Author

alvrs commented Jan 20, 2023

I'm still on the fence about how to best support dynamic length schema types. If we allow multiple dynamic length schema types and store them contiguously in storage, modifying one of them has the potential of exploding gas cost, since we'd have to shift all storage behind it. So I'm thinking it would be better to store each dynamic schema type in a different location in storage; We could store the lengths of each dynamic schema type in a single deterministic slot per table, and then store each compressed array at a different location. That would allow us to support things like pushing/popping to/from arrays and would prevent shifting all memory.

Since static data and dynamic data would be handled differently, I think it would be best to have different methods for setting/modifying them. Also, we might want to add another field to the event to allow emitting updates of individual dynamic array entries without having to read the entire array from storage.

// data index to update individual items of dynamic data types
event StoreUpdate(bytes32 table, bytes32[] key, bytes memory data, uint8 schemaIndex, uint8 dataIndex);

// Set full static data
// Approx gas cost: 20k * ceil(data.length / 32) for sstore + 2.1k for sload (verifying schema)
function setStaticData(bytes32 table, bytes32[] key, bytes memory data) {
	// Verify data matches schema length
	// Set data at keccak(table, key)
	// Emit StoreUpdate(table, key, data, 0, 0)
}

// Set an individual static data column
// Approx gas cost: 20k * ceil(data.length / 32) for sstore + 2.1k for sload (verifying schema)
function setStaticDataColumn(bytes32 table, bytes32[] key, bytes memory data, uint256 schemaIndex) {
	// Verify data matches schema length
	// Set data at keccak(table, key)
	// Emit StoreUpdate(table, key, data, schemaIndex, 0)
}

// Set an individual dynamic data column
// Approx gas cost: 20k * ceil((data.length + 2) / 32) for sstore
function setDynamicDataColumn(bytes32 table, bytes32[] key, bytes memory data, uint256 schemaIndex) {
	// Set length at keccak(table, key, DYNAMIC_DATA_LENGTH) + (2 * schemaIndex bytes)
	// -> length of all dynamic indices is stored in the same slot, so writing to multiple dynamic data schema indices per tx gets cheaper
	// Set data at keccak(table, key, schemaIndex)
	// Emit StoreUpdate(table, key, data, schemaIndex, 0)
}

// Set an individual item of a dynamic data column
// Approx gas cost: 20k * ceil((data.length + 2) / 32) for sstore + 2.1k for sload (verifying schema)
function setDynamicDataColumnItem(bytes32 table, bytes32[] key, bytes memory data, uint256 schemaIndex, uint256 dataIndex) {
	// Verify dataItem fits in one item of schema type at schemaIndex
	// If dataIndex > length: increase data length to dataIndex + 1
	// Set data at keccak(table, key, schemaIndex) + dataIndex
	// Emit StoreUpdate(table, key, data, schemaIndex, dataIndex)
}

// Push an individual item into a dynamic data array table
function pushDynamicDataColumnItem(bytes32 table, bytes32[] key, bytes memory dataItem, uint256 schemaIndex) {
	// Like like setDynamicDataColumnItem with dataIndex = length+1
}

// Pop an individual item from a dynamic data array table
function popDynamicDataColumnItem(bytes32 table, bytes32[] key, bytes memory dataItem, uint256 schemaIndex) {
	// Return item at length
	// Decrease length by 1
}

On the other hand this makes the core interface much more convoluted than before. I'm still in the process of thinking this through and am happy for your input too. But for now it would probably be good to hold back on further implementations to avoid double work.

@dk1a
Copy link
Contributor

dk1a commented Jan 20, 2023

So I'm thinking it would be better to store each dynamic schema type in a different location in storage; We could store the lengths of each dynamic schema type in a single deterministic slot per table, and then store each compressed array at a different location

It's actually pretty easy to change my implementation to account for this. Some parts would become even simpler than they are now (no need to compute slot offset for dynamic values).

It's probably better to pack lengths together with the data they refer to (the way I do it currently, in a 2-byte prefix. For arrays like uint32[] this also warms up the slot with a bunch of initial elements).
Packing lengths with each other seems like more complexity and an unlikely efficiency gain.

I'm still in the process of thinking this through and am happy for your input too. But for now it would probably be good to hold back on further implementations to avoid double work.

I'd say the more functions the better. Start with generic setters (like what we have now), then add more granular stuff. If you end up using my StoreCore I can then augment it with whatever granularity we decide upon

@dk1a
Copy link
Contributor

dk1a commented Jan 21, 2023

Added dynamic/static methods to my branch, see StoreCore, and TestTable for how it comes together

@alvrs
Copy link
Member Author

alvrs commented Jan 31, 2023

Updated store core spec

To move forward with this change and unblock follow up projects building on top of the core storage library / IStore interface, here an updated proposal for the core interface, data layout, and some other thoughts.

Motivation

See #347 (comment) (initial comment) - the motivation and design goals stayed the same.

Explanation of terms

  • Table: A data container with columns and rows (just like you’d expect from a table). Previously we called this “Component” as a data packet attacked to an “Entity” (in ECS), which corresponds to a table in which each row is addressed by a single unique id (= Entity). In the new model we allow composite keys (= rows being addressed not by a single unique id but a unique combination of multiple columns), and renaming “Components” to tables makes this more explicit. You can also think of a table as a nested key-value store.

    key: (A, B)
    value schema: (X, Y, Z)
    
    conceptually:
    {
    	[A1]: {
    		[B1]: {
    			X1,
    			Y1,
    			Z1
    		},
    		[B2]: {
    			X2,
    			Y2,
    			Z2
    		}
    	},
    	[A2] { ... }
    }
    
    -> translates into table
    | A  | B  | X  | Y  | Z  |
    | -- | -- | -- | -- | -- |
    | A1 | B1 | X1 | Y1 | Z1 |
    | A1 | B2 | X2 | x2 | Z2 |
    | ...
  • Schema: Each table has a schema, which corresponds to the data types of the table’s columns, or the value type when thinking about it as key-value store. (X,Y,Z in the example above).

  • Record: A complete row/entry in the table. Uniquely identified by the key tuple.

  • Field: A single column of a row/entry in the table (= a single field of the table).

Store Interface

// SPDX-License-Identifier: MIT
pragma solidity >=0.8.0;

import { SchemaType } from "./Types.sol";
import { Schema } from "./Schema.sol";

interface IStore {
	/* Event emitted when a full record is updated.
	 * table: unique identifier of the table
	 * key: tuple uniquely identifying a single row/record in the table
	 * data: packed bytes of the full record (see below for packing layout)
	 */
  event StoreSetRecord(bytes32 table, bytes32[] key, bytes data);

	/* Event emitted when a single field is updated.
	 * table: unique identifier of the table
	 * key: tuple uniquely identifying a single row/record in the table
	 * schemaIndex: index in the schema (= column) that is being updated
	 * data: packed bytes of the field being updated
	 */
  event StoreSetField(bytes32 table, bytes32[] key, uint8 schemaIndex, bytes data);
  
	/* Event emitted when a record is being removed (as opposed to just setting a value to 0)
	 * table: unique identifier of the table
	 * key: tuple uniquely identifying a single row/record in the table
	 */
	event StoreDeleteRecord(bytes32 table, bytes32[] key);

	/* Called to create a new table with the given schema.
	 * The schema is stored into a `SchemaTable`, so a `StoreSetRecord` event is emitted
	 * when a new table is emitted. This event can be used by indexers to create
	 * a new database table corresponding to the given schema.
	 */
  function registerSchema(bytes32 table, Schema schema) external;

	/* Return the registered schema for the given table */
  function getSchema(bytes32 table) external view returns (Schema schema);

  /* Set a full record (including values for the entire schema / all columns)
	 * table: unique identifier of the table
	 * key: tuple uniquely identifying a single row/record in the table
	 * data: packed bytes of the full record (see below for packing layout)
	 */
  function setRecord(
    bytes32 table,
    bytes32[] memory key,
    bytes memory data
  ) external;

  /* Update a single field
	 * table: unique identifier of the table
	 * key: tuple uniquely identifying a single row/record in the table
	 * schemaIndex: index in the schema (= column) that is being updated
	 * data: packed bytes of the field being updated
	 */
  function setField(
    bytes32 table,
    bytes32[] memory key,
    uint8 schemaIndex,
    bytes memory data
  ) external;

  /* Return the full record identified by the given table/key as packed bytes
	 * (see below for packing layout)
	 */
  function getRecord(bytes32 table, bytes32[] memory key) external view returns (bytes memory data);

  /* Return a single field of the record identified by the given table/key as packed bytes
	 */
  function getField(
    bytes32 table,
    bytes32[] memory key,
    uint8 schemaIndex
  ) external view returns (bytes memory data);

	/* Register a callback to be called when a record is updated */
  function registerOnUpdateHook(bytes32 table, IOnUpdateHook onUpdateHook) external;
}

interface IOnUpdateHook {
	/* Function to call when setRecord is called */
  function onUpdateRecord(
    bytes32 table,
    bytes32[] memory key,
    bytes memory data
  ) external;

	/* Function to call when setField is called */
  function onUpdateField(
    bytes32 table,
    bytes32[] memory key,
    uint8 schemaIndex,
    bytes memory data
  ) external;
}

Data layout

Data passed to and from storage library

Data sent to the store when setting a record and returned by the store when receiving a record needs to be a single blob of bytes (in order to have a single interface for arbitrary data types).

To minimise gas costs, the size of this blob should be as small as possible (ie not 32 bytes for each array element like in native solidity). In storage however, we need to store dynamic length data types (string, arrays) in different storage locations to avoid following data being shifted when the dynamic data length of a single field changes.

Therefore the data blob passed to the store should be packed and also include the sizes of all dynamic length data types.

I propose the following packing:

| static data (tightly packed according to schema)
| 32 bytes of encoded lengths of the dynamic data
| dynamic data (tightly packed according to schema)

The static data (all data types except string, arrays and bytes) can simply be packed tightly (without padding between the pieces of data). The individual fields can be extracted from the packed bytes blob by using the schema.

Following the static data is a 32 bytes word including the lengths of any dynamic data packets. The lengths are encoded as follows: 4 bytes for the total length of dynamic data, then 2 bytes per dynamic data length item (= up to 14 dynamic length data types per schema)

If the schema doesn’t include any dynamic length data types, these 32 bytes of encoded lengths can be omitted.

If the schema includes dynamic length data types, the dynamic data is appended after the encoded lengths. It is tightly packed, the minimal number of bytes per element required by the data type (eg 4 bytes per element for uint32 arrays).

Data stored in storage

How the data is represented in storage is an implementation detail and can be handled differently by different store implementations if they keep the interface and data packing for data passed into and out of the store described above.

In the reference implementation, all static length fields are stored tightly packed in a single storage location (spanning as many words as necessary). The encoded lengths of all dynamic length fields are stored in a single storage slot instead of a different slots for each dynamic length field. This limits us to 14 dynamic length fields per schema, but also saves a storage access per dynamic length fields and allows us to store data cheaper than native solidity. Lastly every dynamic length field is stored at a separate storage location to allow it to grow or shrink in size without shifting subsequent data.

Typed libraries

To improve the developer experience of working with the store, we automatically generate libraries wrapping the core storage library and handle encoding and decoding of the data.

Example interface of a Vector2 library:

// SPDX-License-Identifier: MIT
pragma solidity >=0.8.0;
import { Schema } from "../Schema.sol";

// -- User defined schema and id --

bytes32 constant id = keccak256("mud.store.table.vector3");

struct Vector2 {
  uint32 x;
  uint32 y;
}

// -- Autogenerated schema and library --

library Vector2Table {
  /** Get the table's schema */
  function getSchema() internal pure returns (Schema schema);

  /** Register the table's schema */
  function registerSchema() internal;

  /** Set the table's data */
  function set(bytes32 key, Vector2 memory vector) internal;

  function setX(bytes32 key, uint32 x) internal;

  function setY(bytes32 key, uint32 y) internal;

  /** Get the table's data */
  function get(bytes32 key) internal view returns (Vector2 memory vec2);

	function getX(bytes32 key) internal view returns (uint32);

	function getY(bytes32 key) internal view returns (uint32);

  function decode(bytes memory blob) internal pure returns (Vector2 memory vec2)
}

Example implementation of a Vector2 library:

(See also https://github.com/latticexyz/mud/blob/alvrs/datamodel/packages/store/src/schemas/Vector2.sol)

// SPDX-License-Identifier: MIT
pragma solidity >=0.8.0;

import { Store } from "../Store.sol";
import { SchemaType } from "../Types.sol";
import { Bytes } from "../Bytes.sol";
import { Schema, Schema_ } from "../Schema.sol";

// -- User defined schema and id --

bytes32 constant id = keccak256("mud.store.table.vector3");

struct Vector2 {
  uint32 x;
  uint32 y;
}

// -- Autogenerated schema and library --

library Vector2Table {
  /** Get the table's schema */
  function getSchema() internal pure returns (Schema schema) {
    schema = Schema_.encode(SchemaType.Uint32, SchemaType.Uint32);
  }

  /** Register the table's schema */
  function registerSchema() internal {
    Store.registerSchema(id, getSchema());
  }

  /** Set the table's data */
  function set(
    bytes32 key,
    uint32 x,
    uint32 y
  ) internal {
    bytes memory data = bytes.concat(bytes4(x), bytes4(y));
    bytes32[] memory keyTuple = new bytes32[](1);
    keyTuple[0] = key;
    StoreSwitch.setStaticData(id, keyTuple, data);
  }

  function setX(bytes32 key, uint32 x) internal {
    bytes32[] memory keyTuple = new bytes32[](1);
    keyTuple[0] = key;
    StoreSwitch.setField(id, keyTuple, 0, bytes.concat(bytes4(x)));
  }

  function setY(bytes32 key, uint32 y) internal {
    bytes32[] memory keyTuple = new bytes32[](1);
    keyTuple[0] = key;
    StoreSwitch.setField(id, keyTuple, 1, bytes.concat(bytes4(y)));
  }

  /** Get the table's data */
  function get(bytes32 key) internal view returns (Vector2 memory vec2) {
    bytes32[] memory keyTuple = new bytes32[](1);
    keyTuple[0] = key;
    bytes memory blob = StoreSwitch.getRecord(id, keyTuple);
    return decode(blob);
  }

  function decode(bytes memory blob) internal pure returns (Vector2 memory vec2) {
    return Vector2({ x: uint32(Bytes.slice4(blob, 0)), y: uint32(Bytes.slice4(blob, 4)) });
  }
}

Indexers / OnUpdateHook

  • Stores expose a registerOnUpdateHook function, which registers a contract implementing the IOnUpdateHook interface for a given table. All IOnUpdateHook contracts are called when setRecord or setField is called.
  • Access control for the registerOnUpdateHook function is the responsibility of the contract implementing the IStore interface. (For example, it is conceivable that only the creator of a table is allowed to register update hooks for the table).
  • This update hook functionality can be used to build arbitrary types of on-chain indexers, among other things.
  • In the previous version of MUD, all Components implemented a mapping from value to set of entities with this value by default. Experience has shown that for most components, this reverse mapping was not used and just caused unnecessary gas overhead, which is why BareComponent was added. In this new approach, all tables are “bare” by default, and the OnUpdateHook functionality can be used to implement a reverse mapping as before where it is needed.

FAQ

Why use a custom encoding/decoding instead of abi.encode/decode?

While abi.encode and abi.decode are very conveniently to use because of their support for “generic types” (the return type of abi.decode corresponds to the type passed in as an argument), the bytes memory blobs they produce are extremely bloated.

Example: Encoding the following struct using abi.encode and a custom encoding for comparison:

Mixed memory mixed = Mixed({ u32: 1, u128: 2, a32: new uint32[](3), s: "hello" });

bytes memory abiEncoded = abi.encode(mixed); // length: 352 bytes
bytes memory customEncoded = customEncode(mixed); // length: 37 bytes

For saving data to storage it is clear that we should try to access as few storage slots per storage operation as possible and can’t use a bloated encoding like abi.encode. But what about using abi.encode for passing data to the core library and abi.decode for decoding the returned data, and just using efficient packing methods under the hood of the store core library?

The issue with this approach arises once we want to use the store library via an external function call (like when modifying/querying table data via an access controlled World framework). The cost of external function calls scales with the amount of data transferred with the call, the impact can be seen in the gas report below (last two rows). Passing the 352 bytes of abi.encoded data to a function costs 6.5k gas, while passing the 37 bytes of custom encoded data only costs 1.3k gas.

CleanShot_2023-01-31_at_12 04 232x

Why store array lengths packed in a single storage slot instead of with the respective array?

  • Saving a storage access for every array in our schema above 1. If we only have a single array in our schema the number of storage accesses is equal to storing the length with the array.
  • Alternative approaches like packing the array’s length into the first two bytes of the array’s data leads to array elements larger than 2 bytes not being aligned with storage slots, so modifying a single array element might require writing to two storage slots, leading to unexpected gas costs.
  • Storing the total length of a record’s dynamic length fields in a single slot allows us to allocate the required memory space for loading the record at the beginning of the function, instead of having to access every dynamic field separately to compute the total required memory space.

Next steps

  • World framework: hinted at in the initial proposal, the next step is to create a “framework” to create complex and permissionlessly extendable Ethereum apps using the core storage library (similar to the current World contract). To focus this issue on discussing the core storage library, we’ll move discussing the new World framework to a new issue.
  • Gas optimizing core libraries: The current implementation in feat(store): add foundation for v2 data modelling #352 is intended to provide a foundation to start working on dependent projects (eg the World framework), but is not gas optimized yet. After agreeing on the interface, we can work on gas optimizing all core libraries in parallel to working on projects building on top.
  • Auto-generating typed libraries: The current typed libraries handling packing/unpacking data for interaction with the core library are “handcrafted”, but they only include logic that can be directly inferred from the user defined table schema. We should create CLI functionality to create these libraries automatically based on a user defined schema.

@dk1a
Copy link
Contributor

dk1a commented Jan 31, 2023

In this new approach, all tables are “bare” by default, and the OnUpdateHook functionality can be used to implement a reverse mapping as before where it is needed.

I imagine it can be eventually added to codegen as some optional thing to avoid manual boilerplate, since it's often useful in my experience.


Why store array lengths packed in a single storage slot instead of with the respective array?

I agree, came to the same conclusion eventually, having tried the 2-bytes-prefix approach


uint8 schemaIndex may be better off as uint256 schemaIndex, it needs a bounds-check either way, and uint256 is always the cheapest (and often most convenient).


A little tangent: Schema_StaticTypeAfterDynamicType or Schema__StaticTypeAfterDynamicType?
I tend to prefer 2 _ due to solidstate; 1 is also fine of course, just wanna establish it for consistency going forward

@alvrs
Copy link
Member Author

alvrs commented Feb 1, 2023

I imagine it can be eventually added to codegen as some optional thing to avoid manual boilerplate, since it's often useful in my experience.

100% agree! We should make it easy in table definitions to extend them with on-chain indexers and provide a couple out of the box (eg reverse mapping)

uint8 schemaIndex may be better off as uint256 schemaIndex, it needs a bounds-check either way, and uint256 is always the cheapest (and often most convenient).

Good point!

A little tangent: Schema_StaticTypeAfterDynamicType or Schema__StaticTypeAfterDynamicType?
I tend to prefer 2 _ due to solidstate; 1 is also fine of course, just wanna establish it for consistency going forward

Yeah, in general I'm still a little undecided on the best way to trigger meaningful errors for clients to work with - custom errors have the downside that the client needs the ABI to make sense of them. @holic proposed having a set of standard MUD errors similar to HTTP error codes. On the topic of naming, do you know what's solidstate's reason to use two __ instead of one?

@dk1a
Copy link
Contributor

dk1a commented Feb 1, 2023

custom errors have the downside that the client needs the ABI to make sense of them.

Systems already require abi with something like mud types; isn't it more convenient to make an aggregated cli thing that does all the abi parsing then, and use it for everything we need?
A file with all the errors could be generated very similarly to system types.

do you know what's solidstate's reason to use two __ instead of one?

Nope, my take on it:
__ is for special stuff like namespaces
_ is for readability in very long names, like how it's used for function names in https://book.getfoundry.sh/tutorials/best-practices

@dk1a
Copy link
Contributor

dk1a commented Feb 6, 2023

Screenshot 2023-02-06 125330

@dk1a
Copy link
Contributor

dk1a commented Feb 7, 2023

@alvrs wdyt about full-caps SchemaType props (UINT256), or would you rather I keep them camelCase (Uint256)?

@alvrs
Copy link
Member Author

alvrs commented Feb 7, 2023

No strong preference, happy with all caps!

@Boffee
Copy link
Contributor

Boffee commented Feb 9, 2023

Make sure to also check if worldSelector exists when the system being registered is not an upgrade. (Didn't see it in the TODOs)

bytes4 is pretty small, so should be feasible to find alternative pre-images, which would allow people to inject arbitrary systems.

@dk1a
Copy link
Contributor

dk1a commented Feb 10, 2023

Started refactoring SchemaType, and I think we don't need it. It's only used for getStaticByteLength right now. Arrays just ignore it. Autogen won't be querying on-chain schemas to get the types either.
@alvrs wdyt about using uint8 staticByteLength directly, instead of SchemaType?
For arrays it can be either ignored (as it is right now), or used as elementByteLength, or nesting depth or some other flag.

To clarify about the frontend - I'm thinking autogen can generate both solidity tables and typescript table types, bypassing the concept of getSchema

@alvrs alvrs unpinned this issue Feb 10, 2023
@alvrs
Copy link
Member Author

alvrs commented Feb 11, 2023

We need to store the table's schema on-chain to allow indexers and other integrations to dynamically decode data from any table that is registered. Even if the SchemaType enum is not used on-chain, I think it would be a good source of truth for which uint8 represents which type.

@alvrs
Copy link
Member Author

alvrs commented Feb 11, 2023

@Boffee good point! The section about the World framework above is slightly out-of-date, we published an updated proposal for that in #393. (I just edited the original post in this thread to mention that.) Very happy for your thoughts on the updated proposal!

@dk1a
Copy link
Contributor

dk1a commented Feb 11, 2023

We need to store the table's schema on-chain to allow indexers and other integrations to dynamically decode data from any table that is registered.

I agree, it does seems necessary for 3rd party integrations to work well. Was a bit too focused on 1st party aspects

Even if the SchemaType enum is not used on-chain, I think it would be a good source of truth for which uint8 represents which type.

I guess we haven't really decided whether to actually do onchain validation. I don't think it would even be all that expensive. Anyways can be added later down the line

@dk1a
Copy link
Contributor

dk1a commented Feb 17, 2023

The current data layout in v2 branch:
storediagram

@norswap
Copy link

norswap commented Feb 20, 2023

This is great! My single comment is on data packing:

In the reference implementation, all static length fields are stored tightly
packed in a single storage location (spanning as many words as necessary).

I'm assuming we still want to guarantee that fields won't span multiple words if
they are of length < 32 bytes? Not doing so means data access & writes becomes
considerably costlier, and the dev might need to insert dummy fields to avoid
fields straddling multiple words.

Edit: Actually, another remark.

All this seems rather cumbersome for "top-level non-datastructure fields". For instance, this could be the address of another contract that is upgradeable, or a version number, or some kind of running total. In practice, you can group them into a table that will contain a single row. Is better UX for that particular common use case warranted?

@alvrs
Copy link
Member Author

alvrs commented Mar 2, 2023

I'm assuming we still want to guarantee that fields won't span multiple words if
they are of length < 32 bytes? Not doing so means data access & writes becomes
considerably costlier, and the dev might need to insert dummy fields to avoid
fields straddling multiple words.

StoreCore is intended as a pretty low level library and probably never used directly, but instead interacted with via auto-generated wrapper libraries that handle packing and unpacking of the data. (We have a prototype of that here: https://github.com/latticexyz/mud/blob/main/packages/cli/src/commands/tablegen.ts)

It turns out that dynamically splitting up the data such that fields are guaranteed to never wrap words is more gas-costly and complex than leaving this optimisation to the wrapper libraries, which can automatically insert the dummy fields / optimise the order of elements (statically once at the time of their generation instead of dynamically in StoreCore).

All this seems rather cumbersome for "top-level non-datastructure fields". For instance, this could be the address of another contract that is upgradeable, or a version number, or some kind of running total.

I'm not sure I can follow, could you elaborate what this refers to in For instance, >this< could be the address [...]?

In practice, you can group them into a table that will contain a single row. Is better UX for that particular common use case warranted?

Do you mean the use case of grouping multiple unrelated fields together in a single word? If so: we have set/getField in the low level StoreCore and support for "singleton tables" (= just a single row, no need to specify a key in getter/setter) in the autogenerated wrapper libraries, so you could eg. create a table with schema { field1: uint128, field2: uint64, field3: uint64 } and interact with it via RandomDataTable.getField1() etc. (Similar to how you can define structs to group data in a single storage slot).

@ludns
Copy link
Member

ludns commented Mar 2, 2023 via email

@norswap
Copy link

norswap commented Mar 9, 2023

All this seems rather cumbersome for "top-level non-datastructure fields". For instance, this could be the address of another contract that is upgradeable, or a version number, or some kind of running total.

I'm not sure I can follow, could you elaborate what this refers to in For instance, >this< could be the address [...]?

this == top-level non-datastructure field. Just imagine a contract that has a MyOtherContract public myOtherContract storage field.

But both you and @ludns gave a good explanation on how to do this conveniently, sounds good 👍

@alvrs
Copy link
Member Author

alvrs commented Jul 10, 2023

Closing this since we it got implemented as part of v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion A discussion (without clear action/outcome)
Projects
None yet
Development

No branches or pull requests

6 participants