-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simonhf/5589 binstat efficient stats in bins #891
Simonhf/5589 binstat efficient stats in bins #891
Conversation
Codecov Report
@@ Coverage Diff @@
## master #891 +/- ##
==========================================
- Coverage 53.35% 53.30% -0.05%
==========================================
Files 318 318
Lines 21506 21511 +5
==========================================
- Hits 11474 11467 -7
- Misses 8463 8477 +14
+ Partials 1569 1567 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@simonhf I think this tool is amazing! I also think it should be part of a separate repo (maybe under your own public repo) which we can then import into flow. That way you can have other folks use it more easily, contribute to it and spread the love ❤️ |
@vishalchangrani thanks :-) Yeah, I put it here initially under the guidance from @turbolent (who I asked to review this as well) but am happy to move it if there is common consensus, and maybe it's something to discuss here now, or later? :-) Also, putting it here initially probably also makes it easier to do reviews, etc, while it's in its early formative stages of development? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
I mainly just looked at the Go code, I'm not familiar with the business logic
binstat/binstat.go
Outdated
// functions BEFORE go fmt v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v | ||
|
||
/* | ||
func New (what string, callerParams string ) *BinStat { return newGeneric(what, callerParams, false, 0 , sizeNotUsed, true ) } | ||
func newInternal (what string ) *BinStat { return newGeneric(what, "" , false, 0 , sizeNotUsed, false) } | ||
func NewTime (what string, callerParams string ) *BinStat { return newGeneric(what, callerParams, true , 0 , sizeNotUsed, true ) } | ||
func newTimeInternal (what string ) *BinStat { return newGeneric(what, "" , true , 0 , sizeNotUsed, false) } | ||
func NewTimeVal (what string, callerParams string, callerSize int64) *BinStat { return newGeneric(what, callerParams, true , callerSize, sizeAtEnter, true ) } | ||
func newTimeValInternal(what string, callerSize int64) *BinStat { return newGeneric(what, "" , true , callerSize, sizeAtEnter, false) } | ||
|
||
func Pnt (p *BinStat, pointUnique string ) time.Duration { return pntGeneric(p , pointUnique , 0 , sizeNotUsed, true ) } | ||
func pntInternal (p *BinStat, pointUnique string ) time.Duration { return pntGeneric(p , pointUnique , 0 , sizeNotUsed, false) } | ||
func End (p *BinStat ) time.Duration { return pntGeneric(p , "Leave" , 0 , sizeNotUsed, true ) } | ||
func endInternal (p *BinStat ) time.Duration { return pntGeneric(p , "Leave" , 0 , sizeNotUsed, false) } | ||
func EndVal (p *BinStat, callerSize int64) time.Duration { return pntGeneric(p , "Leave" , callerSize, sizeAtLeave, true ) } | ||
func endValInternal (p *BinStat, callerSize int64) time.Duration { return pntGeneric(p , "Leave" , callerSize, sizeAtLeave, false) } | ||
|
||
func Dbg (p *BinStat, debugText string ) { dbgGeneric(p , debugText , true ) } | ||
*/ | ||
|
||
// functions W/&W/O go fmt ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just remove this, the common convention is to just commit the result of go fmt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! The reason why I left in the comment is for code comprehension, because there is a 'pattern' to the arguments etc which can be easily seen with a glance when looking at the white space aligned comment, but is not obvious even with a longer look at the go fmt
version of the functions below the comment. I tried researching for a better alternative -- because I'm not super happy about this either -- but I couldn't find one :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turbolent another idea might be to export the comment to a separate file in the same directory, e.g. binstat.txt
?
Another idea might be to put the comment into a gist and just have a comment pointing to the gist?
I dislike the last suggestion the most, the separate file suggestion the next, and the as-is comment the least. But all options are annoying... do you have any more ideas apart from just deleting the comment? :-)
@turbolent thanks! And thanks very much for the comments which I will fix shortly, except for the couple I commented on for discussion. If you're interested in the business logic then I included a README file with biz logic explanations, and also one of the tests shows a comparison between Also, @vishalchangrani suggested (above) moving the package to a private repo. I don't mind either way. But it might be easier to leave it here -- as you kindly originally suggested -- for now for easier reviews, etc? We can always move it out later...? |
Trying to replicate the unit test failure on local machine. Ran the
Running a 3rd time... :-) And it worked the 3rd time :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonhf I appreciate your time and efforts in applying the comments. Although the code looks logically sound, I still have the same concerns regarding its maintainability, encapsulation, safety, and readability. I tried to shed more light on my comments by adding more details/replies.
network/p2p/middleware.go
Outdated
data, err := msg.Marshal() | ||
var data []byte | ||
var err error | ||
bs := binstat.EnterTime("~3net:wire<4message2protobuf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still recommend having these string arguments as constants similar to what we did here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhassanzadeh13 thanks for the comment! I have now changed the root of the string to a constant because the roots are used often. It's also easy to grep the constants and this is the recommended way to use the new binstat
tool to uncomment / unhide instrumentation for a particular bin root.
module/mempool/stdmap/queues.go
Outdated
@@ -53,7 +54,11 @@ func (b *Queues) Get(queueID flow.Identifier) (*queue.Queue, bool) { | |||
} | |||
|
|||
func (b *Queues) Run(f func(backdata *QueuesBackdata) error) error { | |||
p1 := binstat.EnterTime("~4lock:w:Backend.Run(Queues)", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonhf thanks for your reply, I think having them either on const
or using string modifiers is still safer than inline strings like what we have here.
binstat.Fin() | ||
|
||
// cat and sort binstat stats file | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest encapsulating the similar parts into a separate test function and invoke it at both places with proper parameters.
command := "ls -al ./binstat.test.pid-*.binstat.txt ; cat ./binstat.test.pid-*.binstat.txt | sort --version-sort" | ||
out, err := exec.Command("bash", "-c", command).Output() | ||
require.NoError(t, err) | ||
zlog.Debug().Msgf("test: output of command: %s\n%s", command, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think going with the suggested pattern is more readable and maintainable over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow-up on my review, since this PR already has gone through lots of reviews by other team members and got enough approvals for merging, I change my feedback from "Request Changes" to "Comment" (i.e., submitting general feedback without explicit approval). In this way, my comments are not a blocker for merging, and you consider them at your discretion or may move forward with the merge.
@yhassanzadeh13 thank you so much for all your valuable comments. I feel that I have learned so much through this process. The PR is now up-to-date and all comments should have been addressed unless I forgot one which I don't think I did :-) Also, the tests are passing and the latest master is merged too! Looking forward to any more comments! Thanks! |
@yhassanzadeh13 added some minor helper functions which ended up being useful while analyzing compression on network CBOR messages, and might be useful for similar future work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👏 , I just added a few comments about code styling, which appreciate applying them before merge.
// DKG | ||
CodeDKGMessage | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this codec is still not in action, but it is a good practice always to append codes at the end of this enum, otherwise it may interfere with existing codes in action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhassanzadeh13 thanks for the comment!
FYI I believe CodeDKGMessage
got merged yesterday into master [1] by Jordan! I merged master into this PR branch later yesterday and that caused integration issues for me. Why? Because there is a new function which looks up the human readable name for the code and it didn't exist yet for master or this branch. But I fixed that yesterday evening.
[1] https://github.com/onflow/flow-go/blob/master/network/codec/json/envelope.go#L56
model/flow/service_event_test.go
Outdated
options.Time = cbor.TimeRFC3339Nano // option needed for wanted time format | ||
encMode, err := options.EncMode() | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this same code snipped replaced several places. In order to be DRY (Don't Repeat Yourself), I suggest the following workaround:
- Moving this function as an exported function to
network/codec/cbor/encode
. In general, we need to bubble up the errors and let the caller decide on what to do with error handling case instead of panic upon an error:func EncMode() (cbor.EncMode, error) { options := cbor.CoreDetEncOptions() // CBOR deterministic options // default: "2021-07-06 21:20:00 +0000 UTC" <- unwanted // option : "2021-07-06 21:20:00.820603 +0000 UTC" <- wanted options.Time = cbor.TimeRFC3339Nano // option needed for wanted time format encMode, err := options.EncMode() if err != nil { return nil, fmt.Errorf("could not extract encoding mode: %w", err) } return encMode, nil
}
- In our tests, we use `require` method of `testify` to terminate the test upon an error:
import "github.com/stretchr/testify/require"
encMode, err := EncMode()
require.NoError(t, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhassanzadeh13 thanks for the comment!
I refactored the code so that the block of code only exists once instead of three times. This was a useful exercise because the first two refactors failed due to Golang circular dependencies, which was my first tangle with this particular error :-)
model/flow/service_event_test.go
Outdated
err = msgpack.Unmarshal(b, outer) | ||
t.Logf("- debug: outer=%+v <-- after .Unmarshal()\n", outer) | ||
require.Nil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.NoError(t, err)
is a better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhassanzadeh13 thanks for the comment!
Yep, I just copy and pasted the code already there for the existing tests for the other serialization formats. Now, I mercilessly refactored the whole file to use require.NoError()
instead of require.Nil()
:-)
return fmt.Errorf("invalid message envelope code: %d", code) | ||
} | ||
|
||
data = append(data, code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking don't we need any delimiter to demarcate the data from code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhassanzadeh13 thanks for the comment!
FYI in theory then no delimiter is needed. Why? Because it's just an array of bytes and the code byte is always the last byte. That means upon decoding then the last byte is always the code byte and the CBOR bytes to become unmarshalled are all the other bytes. This is an optimization for CBOR and possible because everything is just an array of bytes. Whereas, the JSON code does an expensive second marshal and second unmarshal to wrap the code in an 'envelope'.
As a side note, it turns out that appending the code is not for free either, although much faster than doing the second marshal with the envelope technique. However, I have discovered a yet faster way -- documented in Notion -- to include the code and intend to push that optimization with a future PR which actually enables CBOR.
const ( | ||
BinNet = "~3net" | ||
BinStdmap = "~6stdmap" | ||
BinMakeID = "~6MakeID" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
defer b.RUnlock() | ||
has := b.Backdata.Has(entityID) | ||
return has | ||
} | ||
|
||
// Add adds the given item to the pool. | ||
func (b *Backend) Add(entity flow.Entity) bool { | ||
//bs0 := binstat.EnterTime(binstat.BinStdmap + ".<<lock.(Backend)Add") | ||
entityID := entity.ID() // this expensive operation done OUTSIDE of lock :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
The final message before merging: So all the CI tests passed. Then I merged master into this branch and all the CI tests passed again :-) Then I ran a load test on my benchnet dev box but came up against this recent loader bug [1]. I did the workaround suggested in the issue [1] and the loader started! But I saw many new unexpected node warnings and errors (I added them to the issue [1]). And sometimes the nodes just stop after a fatal regarding epochs. I added that to the issue [1] too. Finally out of paranoia -- too rule out that this PR has anything to do with the issue [1] and the extra warnings, errors, and fatals -- I run the loader against the latest master without this PR and it did all the same stuff. So finally I'm hitting 'Merge pull request'... :-) |
Implement WIP MVP binstat.
Implement binstat instrumentation for net, stdmap, and MakeID.
Optimize stdmap .Add() function.
Prep CBOR for network messages (but JSON still default for now).