Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider addition RTTI to words #103

Closed
fomkin opened this issue Jun 4, 2018 · 14 comments
Closed

Consider addition RTTI to words #103

fomkin opened this issue Jun 4, 2018 · 14 comments
Assignees
Labels
architecture Project structure, code design issues enhancement New feature or request

Comments

@fomkin
Copy link
Contributor

fomkin commented Jun 4, 2018

Overview

Now RTTI (runtime type information) becomes obvious. It's required by .NET transaction support, by debug purposes like stack and head printing and transaction introspection. Also it will remove necessarity of type-bounded arithmetic instructions. I see two approaches to implementation.

  1. Use BSON or MessagePack. Pros: it's are well-known standards, stable third-party implementations. Cons: this is serialization formats, both don't support references, hard to mutate.
  2. More domain-specific format. Pros: we can use advantages of our VM without compromises. Cons: need to write a lot of code, bugs are coming, NIH is a shame.

I'm inclined to the second approach.

Proposal

This is specification pseudo-BNF of word format.

length := 0b00<6 bits of data>
       | 0b01<6 bits length>
       | 0b10<14 bits length>
       | 0b11<22 bits length>

bytes := length byte[&length]

int8    := 0x01
int16   := 0x02
int32   := 0x03
bigint  := 0x04
uint8   := 0x05
uint16  := 0x06
uint32  := 0x07
decimal := 0x08
boolean := 0x09
ref     := 0x0A
utf8    := 0x0B
array   := 0x0C
struct  := 0x0D

primitive_type := int8
               | int16
               | int32
               | bigint
               | uint8
               | uint16
               | uint256
               | double
               | boolean
               | ref

type := primitive_type
      | struct
      | array
      | utf8

int8_data    := bytes~1
int16_data   := bytes~2
int32_data   := bytes~4
bigint_data   := length bytes[&length]
uint8_data   := bytes~1
uint16_data  := bytes~2
uint32_data  := bytes~4
double_data := bytes~8 # strict IEEE-754 floating point number
ref_data     := bytes~4
boolean_data := bytes~1

primitive_data := int8 int8_data
                | int16 int16_data
                | int32 int32_data
                | bitint bigint_data
                | uint8 uint8_data
                | uint16 uint16_data
                | uint32 uint32_data
                | double double_data
                | ref ref_data
                | boolean

data := primitive_data
      | array primitive_type length data(primitive_type)[&length]
      | struct bytes length (bytes, primitive_data)[&length] # struct_name, [(field, field_value)]
      | utf8 bytes

smth[num] means that we duplicate smth structure num times. byte[8] means 8 bytes,
bytes~num means that we expect num of bytes (which length is dynamic).

&length refers to given length field and means an integer representation of that field.

(a, b) means pair type, e.g. two values of a and b are written consecutively.

data(primitive_type) means corresponding *_data structure for primtive_type

@pankratov
Copy link
Contributor

pankratov commented Jun 4, 2018

both don't support references

MessagePack supports extension types. So, you can define domain specific types with it. Although they have no versions for their specs ( see msgpack/msgpack#165 and msgpack/msgpack#195 ).

@fomkin
Copy link
Contributor Author

fomkin commented Jun 4, 2018

@pankratov MessagePack extension types looks good. But the more I watch, the less I like this format. Lack of format version is big concern. Also I think we trying to use tools (BSON/MP) looks like suitable but not actually suitable.

@vovapolu
Copy link
Contributor

vovapolu commented Jun 4, 2018

I don't think it would be hard to implement our own protocol. It seems quite simple, and it's an esential part of our vm.

@pankratov
Copy link
Contributor

pankratov commented Jun 4, 2018

struct length (utf8, primitive_data)[&length]

It seems like you meant data here, not primitive_data, i.e. struct length (utf8_data, data)[&length]. However maybe I wrong. Nevertheless, data looks good here, IMHO.

@fomkin Do you consider binary data (like hash for example) as an array of uint8 ?

@pankratov
Copy link
Contributor

pankratov commented Jun 4, 2018

I think tuples (i.e. tuple length (type data(type))[&length]) are also might be useful.

@pankratov
Copy link
Contributor

pankratov commented Jun 4, 2018

And here

array primitive_type length data(primitive_type)[&length]

Why primitive? How can I define array of arrays, for example? Should I use refs ?

@pankratov
Copy link
Contributor

pankratov commented Jun 4, 2018

Domain specific types (like address or code for example) could be defined as separate types.

@sherzodv
Copy link
Contributor

sherzodv commented Jun 5, 2018

How about a name of a struct itself, I mean how are we going to separate different struct types being of the same structure?

Another concern here is an array of a variable length data, e.g. array utf8 .... We won't be able neither to give its length nor have constant time indexed access to values.

If it's meant that only refs are used for variable length data in arrays, I think we need to show it in specs.

This is complementary to Vasilii's question.

ref_data size looks very modest to me, although this will reduce the size of a program significantly, we will have very few space for future changes.

@pankratov
Copy link
Contributor

I mean how are we going to separate different struct types being of the same structur

@sherzodv are we going to distinguish them? I thought it's something like a dictionary or map.

@sherzodv
Copy link
Contributor

sherzodv commented Jun 5, 2018

@sherzodv are we going to distinguish them? I thought it's something like a dictionary or map.

External mapping is OK too if we can guarantee that our data never moves. In other case things like dynamic cast will be in trouble

@fomkin
Copy link
Contributor Author

fomkin commented Jun 5, 2018

Why primitive? How can I define array of arrays, for example? Should I use refs ?

@pankratov Because I want to mutate without copy. Yes, you need to use refs.

How about a name of a struct itself, I mean how are we going to separate different struct types being of the same structure?

@sherzodv +1 great idea.

Another concern here is an array of a variable length data, e.g. array utf8 .... We won't be able neither to give its length nor have constant time indexed access to values.

@sherzodv thanks, I'll fix it.

ref_data size looks very modest to me, although this will reduce the size of a program significantly, we will have very few space for future changes.

@sherzodv Ok. You right. Let it be 32 bit.

@fomkin
Copy link
Contributor Author

fomkin commented Jun 7, 2018

  1. Changed decimal to double (strict IEEE-754 floating point number)
  2. Changed int64 and uint64 to int256 and uint256

@vovapolu
Copy link
Contributor

vovapolu commented Jun 19, 2018

We need to somehow encode type signature of a word. It's needed in meta method signature and definitely will be needed in future meta information. So I suggest to introduce new signature structure:

signature := primitive_type
           | struct bytes length (bytes, primitive_type)[&length] 
           | array primitive_type
           | utf8

in struct first bytes are the name of the structure, second bytes are the name of corresponding field.

Probably we're also going to add something for class representation (e.g. for methods in struct). What do you think?

@fomkin
Copy link
Contributor Author

fomkin commented Jun 19, 2018

@vovapolu looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Project structure, code design issues enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants