Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove validation code #348

Merged
merged 23 commits into from Feb 10, 2018
Merged

Conversation

mhchia
Copy link
Contributor

@mhchia mhchia commented Feb 4, 2018

What was wrong?

As the item#3 in ethereum/sharding#37 described, we remove the validation code logics from VMC, to reduce gas consumption.

How was it fixed?

  • Remove validation code construction
  • Modify add_header to accept parameters directly, instead of the rlp encoded bytes.
  • Verify msg.sender in add_header and withdraw, instead of using validation code.

Cute Animal Picture

put a cute animal picture link inside the parentheses

Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great downsizing for VMC. 👍

@@ -46,8 +41,8 @@
# Gas limit of the signature validation code
sig_gas_limit: num
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useless now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out. Will remove it

new_head_in_num = 0
raw_log(
[
sha3("CollationAdded(int128,bytes4096,bool,int128)"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these data types matching the actual results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually it matches log signature CollationAdded(indexed int128 shard, bytes4096 collationHeader, bool isNewHead, int128 score).
The reason why I use int128(i.e. num in vyper) is it is more friendly for calculation. And the reason why I use raw_log instead of log.CollationAdded() is previously vyper doesn't support bytes type whose length > 32. Actually there is a TODO comment in line 67

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the collationHeader argument from CollationAdded be unpacked?

So instead of using CollationAdded(indexed int128 shard, bytes4096 collationHeader, bool isNewHead, int128 score)
Use CollationAdded(indexed int128 shard, int128 expectedPeriodNumber, bytes32 periodStartPrevHash, bytes32 _parentCollationHash, bytes32 txListRoot, address collationCoinbase, bytes32 postStateRoot, bytes32 receiptRoot, int128 collationNumber, bool isNewHead, int128 score)

The benefits would be: 1 int less for the log: shard; easier to use with the ABI.

Copy link
Contributor Author

@mhchia mhchia Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good point. I will see if there are specific reasons for using CollationAdded(indexed int128 shard, bytes4096 collationHeader, bool isNewHead, int128 score). If not, we can change to that.
Thank you for the suggestion.

gas_price=None):
"""deposit(validation_code_addr: address, return_addr: address) -> num
def deposit(self, gas=None, gas_price=None):
"""deposit() -> num
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean the deposit function in VMC returns the index of validators mapping, but this docstring looks meaningless here. IMO it may be more helpful to describe the action of this function itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change it. Thank you!

def withdraw(self, validator_index, sig, gas=None, gas_price=None):
"""withdraw(validator_index: num, sig: bytes <= 1000) -> bool
def withdraw(self, validator_index, gas=None, gas_price=None):
"""withdraw(validator_index: num) -> bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

receipt_root,
collation_number,
gas=None,
gas_price=None):
"""add_header(header: bytes <= 4096) -> bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

)
sig = sign(sighash, privkey)
return rlp.encode([
header_tuple = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using evm.rlp.headers.CollationHeader to build this object?

Copy link
Contributor Author

@mhchia mhchia Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out!
I think it's a good idea to use evm.rlp.headers.CollationHeader to build the collation header from logs. I will change it.
But actually the result of mk_testing_colhdr is designed to be passed to VMC.add_header. How do you think in this condition?
Personally I prefer to keep this behavior here.
Will it help if I change the function name to mk_testing_colhdr_params?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking when the collator calls VMC.add_header, the collator would have built a real collation before. So there may be a property in CollationHeader:

@property
def add_header_params:
    # returns header fields except `sig`
    ...

Copy link
Contributor

@hwwhww hwwhww Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually the result of mk_testing_colhdr is designed to be passed to VMC.add_header. How do you think in this condition?
Personally I prefer to keep this behavior here.
Will it help if I change the function name to mk_testing_colhdr_params?

The second output (header_hash) of mk_testing_colhdr is not the input of VMC.add_header but the input of next mk_testing_colhdr call.

header_hash can be another property of CollationHeader. In this way, this mk_testing_colhdr function could just return CollationHeader object, and these two properties could be accessed in test_vmc_contract_calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header_hash can be another property of CollationHeader. In this way, this mk_testing_colhdr function could just return CollationHeader object, and these two properties could be accessed in test_vmc_contract_calls.

It makes sense. I will change it. Thank you!

Copy link
Contributor Author

@mhchia mhchia Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on. So in this case should I add another header_hash, which simply returns keccak(to_bytes32(shard_id) ++ to_bytes32(expected_period_number) ++ ...), is that correct?
Since the header_hash is no more sha3(rlp_list([fields]).

"""
Deploy validation code of and with the privkey, and do deposit
Deposit a validator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely, "do deposit in VMC to be a validator"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no problem

@hwwhww
Copy link
Contributor

hwwhww commented Feb 4, 2018

I think this PR also closes #325. ✌🏻

)
header_hash = keccak(
b''.join(
(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace nitpick

This should be a bit easier to read if you write this as:

b''.join((
    thing_a,
    thing_b,
))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I will change it.

@@ -44,27 +46,35 @@ class UnknownShard(Exception):
pass


@to_dict
def deserialize_header_bytes(header_bytes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function feels like something that should be a classmethod on the CollationHeader class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great suggestion! I will change it.

start_index = idx * obj_size
field_bytes = header_bytes[start_index:(start_index + obj_size)]
if field_type == rlp.sedes.big_endian_int:
# remove the leading zeros, to avoid `not minimal length` error in deserialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bug in the rlp library? If so can you open an issue for it in that repository?

Copy link
Contributor Author

@mhchia mhchia Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It just requires the big_endian_int to be in minimal length (e.g. b'\x00\x01' should be b'\x01')
The error is like this

self = <rlp.sedes.big_endian_int.BigEndianInt object at 0x111d6bb38>
serial = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x06'

    def deserialize(self, serial):
        if self.l is not None and len(serial) != self.l:
            raise DeserializationError('Invalid serialization (wrong size)',
                                       serial)
        if self.l is None and len(serial) > 0 and serial[0:1] == ascii_chr(0):
            raise DeserializationError('Invalid serialization (not minimal '
>                                      'length)', serial)
E           rlp.exceptions.DeserializationError: Invalid serialization (not minimal length)

/home/xxx/.pyenv/versions/py36env/lib/python3.6/site-packages/rlp/sedes/big_endian_int.py:40: DeserializationError

Will dig more into it to see if it is an issue.
Thank you:)

@@ -158,6 +168,9 @@ def __init__(self, *args, default_privkey, **kwargs):

super().__init__(*args, **kwargs)

def get_default_sender_address(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't appear to be used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I used them in test_vmc_handler.py, to get the default_sender_address for doing transactions.
Should I remove them? I can replace them by pass the address in use as argument in those testing functions in test_vmc_handler.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have missed it. disregard.

receipt_root,
as_bytes32(collation_number),
)
entire_header_hash = sha3(header_bytes)
assert entire_header_hash != as_bytes32(0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert is not needed anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thank you for pointing out.

def mk_initiating_transactions(sender_privkey,
sender_starting_nonce,
TransactionClass,
gas_price):
"""Make transactions of createing initial contracts
Including rlp_decoder, sighasher and validator_manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rlp_decoder and sighasher now, so the docstring needs to be updated.
Also createing -> creating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Thank you for discovering it.

],
gas=gas,
gas_price=gas_price,
)
return tx_hash

def add_header(self, header, gas=None, gas_price=None):
"""add_header(header: bytes <= 4096) -> bool
def add_header(self,
Copy link
Contributor

@hwwhww hwwhww Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VMC.add_header takes a lot of parameters and combines add_header inputs and gas setting in the same level. What do you think about passing CollationHeader to this function and moving logic of add_header_send_tx to here so that the params would be clearer:

def add_header(self, collation_header, gas=None, gas_price=None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I will change it. Thank you.

@mhchia mhchia force-pushed the sharding_vmc_modify_add_header branch 2 times, most recently from 255fb42 to 6eefd26 Compare February 6, 2018 14:12
@mhchia
Copy link
Contributor Author

mhchia commented Feb 6, 2018

@enriquefynn I have changed the signature of log CollationAdded. Please check it out if you have spare time.
Thank you for the great suggestion!

@@ -196,15 +207,57 @@ def __repr__(self):

@property
def hash(self):
return keccak(rlp.encode(self))
int_to_bytes32 = compose(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be good with this living in evm.utils.numeric. I suspect there are other places in the code where we'd use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved it to evm.utils.numeric, and added two tests for it.
Hope that would help.

header_sedes = cls.get_sedes()
# assume all fields are padded to 32 bytes
obj_size = 32
assert len(header_bytes) == obj_size * len(header_sedes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions are generally bad for runtime code as they can be easily disabled. This should probably raise a ValidationError

Copy link
Contributor Author

@mhchia mhchia Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. I have changed it. Thank you!

@@ -158,6 +168,9 @@ def __init__(self, *args, default_privkey, **kwargs):

super().__init__(*args, **kwargs)

def get_default_sender_address(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have missed it. disregard.

@mhchia
Copy link
Contributor Author

mhchia commented Feb 7, 2018

@pipermerriam Could you help me to review the newly added commits if you have spare time? Since I added the tests for int_to_bytes32, not sure if they are okay for merging.
Really appreciate it. :)

assert len(header_bytes) == obj_size * len(header_sedes)
if len(header_bytes) != obj_size * len(cls.fields):
raise ValidationError(
"Length of header bytes should be equal to obj_size * len(cls.fields)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually be very explicit here as it's frustrating to have to go dig when there is code mis-behaving.

raise Validation Error(
    "Expected header bytes to be of length: {0}.  Got length {1} instead.\n- {2}".format(
        obj_size * len(cls.fields), len(header_bytes), encode_hex(header_bytes),
    )
)

Good error messages are soooo valuable when things start going pear shaped.

Copy link
Contributor Author

@mhchia mhchia Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you sooo much! I didn't think of this. :)
I have changed it and really learned a lesson.
I will keep reminding myself with this suggestion.

pad32,
)
if len(value_bytes) > 32:
raise ValueError("value is too large")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing post validation, how about just adding a pre-condition check that it's a valid uint256 and raising a ValueError if it isn't. (hint, I don't think we should use evm.validation.validate_uint256 here since it's outside of evm.utils. Instead, just inline the same validation code.

Btw, I suggest ValueError for the same reason, not importing from outside of evm.utils from within evm.utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for suggesting this.
Actually I'm curious how to know when to import one function from outside directory or just inline it?
I can guess in this case the validation is simple so the cost(DRY) is not expensive. But if it is in other case, is it not good to import function in another layer?

@mhchia mhchia force-pushed the sharding_vmc_modify_add_header branch from 5daaa31 to dd4c8d5 Compare February 8, 2018 12:49
From `header <= 4096` to separate unpacked parameters
Besides, remove the call to `get_eriod_prevhash`
- Remove unneeded `sig_gas_limit` in VMC
- Modify the docstrings in VMC functions
- Modify `do_deposit`, return `vmc.default_sender_address` instead. And
change its docstring
- Add getter for `default_sender_address` in VMC
- Change `genesis_colhdr_hash` to upper case
For the consistency with `add_header_constant_call`
- Fix coding style
- Move `deserialize_header_bytes` and add `from_bytes` to
`CollationHeader`
- Add test for `CollationHeader.from_bytes`
- Remove unneeded assert in VMC
Remove unused imports
`collation_header` changes from `bytes` to respective parameters
Besides, add log for `Deposit` and `Withdraw`.
TODO:
- Determine the exact log signature of `Deposit` and `Withdraw
- Determine the log signature of `TxToShard`
- Move `int_to_bytes32` to evm.utils.numeric
- Raise `ValidationError` instead of AssertionError in
`CollationHeader.from_bytes`
- Add test for invalid length for `from_bytes`
- Change the error message, adding the value of variables
- Copied verification code from `evm.validation.validate_uint256`
@mhchia mhchia force-pushed the sharding_vmc_modify_add_header branch from 4d61d30 to ee0deb5 Compare February 10, 2018 06:58
@mhchia mhchia merged commit d919d2d into ethereum:sharding Feb 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants