-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use atx ids with pointer semantics #6058
Conversation
15e840e
to
abc8fb7
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6058 +/- ##
=======================================
Coverage 81.4% 81.5%
=======================================
Files 299 299
Lines 32172 32239 +67
=======================================
+ Hits 26205 26281 +76
+ Misses 4252 4246 -6
+ Partials 1715 1712 -3 ☔ View full report in Codecov by Sentry. |
bors try |
tryBuild failed: |
bors try |
tryBuild failed: |
bors try |
tryBuild succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if changing "the whole" codebase in one go is a good idea. The PR is huge and it's not confirmed if it actually improves performance. It could be that some places get slower.
You wrote:
The codebase uses value semantics over ATX IDs and this puts unnecessary stress on the GC.
I'm not sure about this. Using values might cause copying, but it shouldn't cause GC pressure as the values live on the stack. Using pointers could increase the risk of escaping to the heap and more GC pressure. Wdyt?
@@ -111,7 +111,7 @@ type Builder struct { | |||
type positioningAtxFinder struct { | |||
finding sync.Mutex | |||
found *struct { | |||
id types.ATXID | |||
id *types.ATXID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that absolutely every use of ATXID
must be via a pointer. IMHO, it should be judged by the need. It makes sense in arguments to functions on ~hot path or in big collections - to allow reusing the same memory.
For example, here I think it's fine to have id types.ATXID
. It simplifies the code and avoids confusion of having two values for "none/empty". Does nil
here means not found, or found but types.EmptyATXID
? The nil
is not even a valid value for this structure (it must never be nil
) so why open a possibility for breaking this invariant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I find this specific block to be not a good example of doing things - using anonymous structs in such cases is something I'd generally like to avoid. I found a lot of occurences where code was written such that the assumption that an uninitialized variable/field defaults to an empty ATX ID. I personally object to this sort of working paradigm, preferring the use of constructors with arguments that set field values, and concrete types. It reduces the mental taxation of reading code and understanding what it's doing. Having a default value doesn't really help from my perspective, I'd rather see the field being set explicitly always.
@@ -86,7 +86,7 @@ func (d *Data) AddFromAtx(atx *types.ActivationTx, malicious bool) *ATX { | |||
|
|||
// Add adds ATX data to the store. | |||
// Returns whether the ATX was added to the store. | |||
func (d *Data) AddAtx(target types.EpochID, id types.ATXID, atx *ATX) bool { | |||
func (d *Data) AddAtx(target types.EpochID, id *types.ATXID, atx *ATX) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make a noticeable difference here? This function ends up dereferencing the pointer anyway on the more common "happy path".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done for the sake of API uniformity and to reduce the amount of dereferencing we do "on the edges" (the points where we must use the ATXs by value).
Ideally we shouldn't have to do this sort of dereferencing but I've done this more out of a time pressure rather than from a style preference for this. Ideally the map should have a string key and we could just use the string method of the ATX id to return the string. I think that would be favorable, but it is something we can also do later. I was just a bit afraid to introduce changes on that level, we can do them later on a per-package basis.
@@ -78,7 +101,7 @@ func (t *ATXID) UnmarshalText(buf []byte) error { | |||
} | |||
|
|||
// EmptyATXID is a canonical empty ATXID. | |||
var EmptyATXID = ATXID{} | |||
var EmptyATXID = &ATXID{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should stay being a value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we switch to pointer semantics, comparison against this value become useless since you actually need to deref the pointer to check against it, however, when using pointer semantics you actually can't compare against this because you're basically just comparing memory addresses. If we want to use pointer semantics I would actually scratch this empty value and use nil
's wherever necessary. It isn't really clear what does an "Empty ATX ID" even mean. I'd personally use a nil
there (I know about the billion dollar mistake
, however, we don't have Rust's typesystem here unfortunately).
} | ||
|
||
func (t *ATXID) Empty() bool { | ||
return t == nil || len(t[:]) == 0 || bytes.Equal(t[:], EmptyATXID[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can len(t[:]) == 0
be ever true? The ATXID is a byte array - it always has length equal 32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATXID is a byte array but sometimes fields on structs aren't initialized so I had to introduce this ugly thing as a catch-all for those cases. This is definitely not how it should be. But also, something we could improve on easily. Also t == nil
is a bit bizarre and shouldn't ever be. The fields should always be initialized (and methods never called on nil
values)
@@ -194,7 +194,7 @@ func NotMalicious(data *ATX, _ lockGuard) bool { | |||
// IterateInEpoch calls `fn` for every ATX in epoch. | |||
// If filters are provided, only atxs that pass all filters are returned. | |||
// SAFETY: The returned pointer MUST NOT be modified. | |||
func (d *Data) IterateInEpoch(epoch types.EpochID, fn func(types.ATXID, *ATX), filters ...AtxFilter) { | |||
func (d *Data) IterateInEpoch(epoch types.EpochID, fn func(*types.ATXID, *ATX), filters ...AtxFilter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you certain that in cases like this it actually improves the performance? Passing by pointer could cause more GC pressure as opposed to passing by value, in which case the atxid would leave on the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a value to a method doesn't necessarily mean that it won't escape to the heap. We'd have to read the escape analysis for the whole codebase to figure that out. If you pass a value it might end up on the heap anyway depending on what you do with it. So my consideration is ideally: "end up with one allocation of value" vs. "end up with possible multiple allocations".
The issue that triggered this optimization was increased memory consumption because of logging of For me it would be enough to only change the This would a) improve memory consumption of logging these values b) all incorrect uses of the Wdyt? |
So I close the PR because I think it just won't go through the pipeline in it's current form, even if we do settle on the approach. Some major pain-points that have to do with pointer-value semantics aren't addressed here (for example, the passing around of millions of ATX ID slices that come out of the p2p messages). I will try to run this PR nevertheless to see what and if there's any performance gain (and if it even works on a real node). |
Motivation
The codebase uses value semantics over ATX IDs and this puts unnecessary stress on the GC. This PR tries to mitigate most of this usage (apart from places where it is absolutely necessary/difficult to change).
Description
The PR changes the value semantics to pointer semantics wherever possible. Unfortunately this is not possible everywhere because some of the protocol messages encode the ATX IDs as values, not pointers to values and changing this will break protocol specs. To overcome this I've introduced a couple of helper methods that help the conversions wherever necessary, between a slice to values and a slice to pointers (these are generics and can be reused, however they might cause inefficiencies with memory if abused).
Another point which cannot be changed is that
map
s that usetypes.ATXID
as a key cannot be changed to bemap[*ATXID]
because the map key cannot use pointers to values (the pointer will not be dereferenced for the key, meaning that the same value pointed to by different pointers will result in different map entries).Due to the switch to pointer semantics, several helper methods had to be introduced, for example to check whether an ATX ID is empty or to compare it against another value. This also cleans up the API and will allow future modularity with changes to these types, if necessary.
One visible thing that was noticed in tests was that some previously uninitialized fields on structs for example (that pointed to
types.ATXID
) were falling back to an empty ATX ID whenever not explicitly set. This, combined with a recurring pattern of initializing complex types without the use of a constructor but through a literal, sometimes makes refactoring difficult and ends up with a lot of magic. Constructor use is therefore encouraged.Test Plan
TODO