-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network encoding default now CBOR with envelope code optimization #1132
Network encoding default now CBOR with envelope code optimization #1132
Conversation
Please note that this PR has also been tested here [1] with localnet at 100 TPS for 600 seconds. Also please note that the pre-this-PR plumbing for CBOR was previously merged in this [2] previous PR, and has lay dormant until this PR :-) [1] https://www.notion.so/dapperlabs/Change-network-message-encoding-to-CBOR-PR-52d29225c82e4369ba03ff326cf06b40 |
@@ -182,26 +178,18 @@ func switchv2what(v interface{}) (string, error) { | |||
return what, nil | |||
} | |||
|
|||
func v2envEncode(v interface{}, via string) ([]byte, uint8, error) { | |||
func v2envelopeCode(v interface{}) (string, uint8, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind add some comments for this function. Am not 100% sure what is it doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishalchangrani thanks for the comment. As requested I added extra comments. Please let me know if it's clearer now :-)
network/codec/cbor/codec.go
Outdated
err = encoder.Encode(v) | ||
//binstat.LeaveVal(bs2, int64(data.Len())) | ||
if err != nil { | ||
return nil, fmt.Errorf("could not encode cbor payload of type %s: %w", what, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will the what
here dump the whole payload? should we instead dump just code
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishalchangrani thanks for the comment. what
is actually just the human readable text name for code
. However, I updated the error message to contain both so it is clearer. So the payload is never dumped. Is the new error message clearer now?
network/codec/cbor/decoder.go
Outdated
err := d.dec.Decode(data) | ||
//binstat.LeaveVal(bs, int64(len(data))) | ||
//binstat.LeaveVal(bs1, int64(len(data))) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err != nil { | |
if err != nil || len(data) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishalchangrani thanks for the comment. I added the extra len(data) == 0
. Thanks for spotting this :-)
@@ -130,7 +130,7 @@ func (fnb *FlowNodeBuilder) BaseFlags() { | |||
func (fnb *FlowNodeBuilder) EnqueueNetworkInit(ctx context.Context) { | |||
fnb.Component("network", func(builder NodeBuilder, node *NodeConfig) (module.ReadyDoneAware, error) { | |||
|
|||
codec := jsoncodec.NewCodec() | |||
codec := cborcodec.NewCodec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change will also be needed here https://github.com/onflow/flow-go/blob/master/cmd/access/node_builder/access_node_builder.go#L644
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishalchangrani thanks for the comment!
Please see my response to @huitseeker below regarding testing. I'm wondering how I managed to run a 100 TPS load on localnet successfully for 600 seconds without the above change to access_node_builder.go
? localnet does have an access
node, so how could it have worked if it has a different codec?
Or is the link you kindly provided to the access node JSON
configuration for something else, e.g. just for integration tests, or just for the access node communicating with the outside world, or something else not to do with network messages with the other nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! I can see where this confusion can arise from. So, the EnqueueNetworkInit
here is not your regular Network init but the init of the unstaked
network optionally used by the the access node if it supports an unstaked network. Currently, by default, the access node will not bring up this network.
However, things have changed since I made the comment 😄 and we will actually be removing/modifying this significantly since we decided we will not have two separate networks. So, you can either just make the change for consistency or leave it as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishalchangrani Oh, thank you very much for explaining how the confusion occurred.
I think I'd prefer to just leave it as-is because... I don't currently have a good way to test it unless there is a localnet option to bring up and use the unstaked
network somehow? Or a different way to test it? And it sounds like that code is both not being currently used, and there will be no separate network in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1147 to record this for follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm..left u some comments.
thanks for doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues on the code, but can you tell me more about the testing / deployment plan for this feature?
Things I'd love to see (include, but are not limited to):
- property testing of
v2envelopeCode
,envelopeCode2v
, or similar exhaustive generation of the message types we expect, - bench/testnet-only options for dumping any problematic payload to a file to replay in integration tests later,
- monitoring of deserialization errors,
code := data[len(data)-1] // only last byte | ||
//binstat.LeaveVal(bs, int64(len(data))) | ||
//bs1 := binstat.EnterTime(binstat.BinNet + ":wire>3(cbor)payload2envelope") | ||
code := data[0] // only first byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an in-code documentation of this per-first-byte type discrimination? I expect this will eventually be part of public specs, but code comments would be a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huitseeker thanks for the comment! FYI I added some more in-code comments for @vishalchangrani and they also mention code
. Please let me know if these are sufficient, or if you have any more suggestions and/or wishes.
Codecov Report
@@ Coverage Diff @@
## master #1132 +/- ##
==========================================
- Coverage 53.11% 53.08% -0.04%
==========================================
Files 322 322
Lines 21830 21830
==========================================
- Hits 11596 11588 -8
- Misses 8659 8670 +11
+ Partials 1575 1572 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@huitseeker thanks for the comment!
There are 4 levels at which the feature was tested:
It could be that there are other message types not exercised by the localnet load test, and I'm hoping that if this is the case then these would be exercised at the latest in the following step: Lastly, in addition -- and long before this code would get to mainnet -- I'm assuming this code would be tested further as part of pre-mainnet tests (e.g. on canary and benchnet), and that if there is any remaining issue then it would be found at the latest then. Other assumptions that I have made about testing
Background: I just copy and pasted most of this code from the old Do you think the things loved to be seen should be in the scope of this PR, or could they be captured as new issues for later enhancement to testing and maintenance? [1] https://github.com/onflow/flow-go/blob/simonhf/5610-optimize-network-message-serialization/network/codec/roundTripHeader_test.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch for those comments on current and future testing approaches. You're right: while we have good runtime experience at testing those changes, there's currently not a lot in the direction of feeding inputs to the serialization statically. It would be fantastic if you could issue up a couple of those items to tackle later.
@@ -130,7 +130,7 @@ func (fnb *FlowNodeBuilder) BaseFlags() { | |||
func (fnb *FlowNodeBuilder) EnqueueNetworkInit(ctx context.Context) { | |||
fnb.Component("network", func(builder NodeBuilder, node *NodeConfig) (module.ReadyDoneAware, error) { | |||
|
|||
codec := jsoncodec.NewCodec() | |||
codec := cborcodec.NewCodec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1147 to record this for follow-up.
Use CBOR with optimized envelope code for network message encoding; > 4x faster on localnet at 100 TPS.