Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use atx ids with pointer semantics #6058

Closed
wants to merge 3 commits into from
Closed

Conversation

acud
Copy link
Contributor

@acud acud commented Jun 20, 2024

Motivation

The codebase uses value semantics over ATX IDs and this puts unnecessary stress on the GC. This PR tries to mitigate most of this usage (apart from places where it is absolutely necessary/difficult to change).

Description

The PR changes the value semantics to pointer semantics wherever possible. Unfortunately this is not possible everywhere because some of the protocol messages encode the ATX IDs as values, not pointers to values and changing this will break protocol specs. To overcome this I've introduced a couple of helper methods that help the conversions wherever necessary, between a slice to values and a slice to pointers (these are generics and can be reused, however they might cause inefficiencies with memory if abused).

Another point which cannot be changed is that maps that use types.ATXID as a key cannot be changed to be map[*ATXID] because the map key cannot use pointers to values (the pointer will not be dereferenced for the key, meaning that the same value pointed to by different pointers will result in different map entries).

Due to the switch to pointer semantics, several helper methods had to be introduced, for example to check whether an ATX ID is empty or to compare it against another value. This also cleans up the API and will allow future modularity with changes to these types, if necessary.

One visible thing that was noticed in tests was that some previously uninitialized fields on structs for example (that pointed to types.ATXID) were falling back to an empty ATX ID whenever not explicitly set. This, combined with a recurring pattern of initializing complex types without the use of a constructor but through a literal, sometimes makes refactoring difficult and ends up with a lot of magic. Constructor use is therefore encouraged.

Test Plan

  • fix remaining tests
  • run on a node to see that there's no panicks

TODO

  • Explain motivation or link existing issue(s)
  • Test changes and document test plan
  • Update documentation as needed
  • Update changelog as needed

@acud acud self-assigned this Jun 20, 2024
@acud acud marked this pull request as draft June 20, 2024 17:33
@acud acud force-pushed the atx-id-pointer branch 2 times, most recently from 15e840e to abc8fb7 Compare June 20, 2024 19:12
Copy link

codecov bot commented Jun 20, 2024

Codecov Report

Attention: Patch coverage is 92.43902% with 31 lines in your changes missing coverage. Please review.

Project coverage is 81.5%. Comparing base (a608922) to head (0d3563e).

Files Patch % Lines
node/node.go 36.3% 6 Missing and 1 partial ⚠️
sql/atxs/atxs.go 92.3% 6 Missing ⚠️
activation/handler_v1.go 94.4% 0 Missing and 2 partials ⚠️
activation/validation.go 92.3% 1 Missing and 1 partial ⚠️
datastore/store.go 60.0% 1 Missing and 1 partial ⚠️
miner/active_set_generator.go 88.2% 1 Missing and 1 partial ⚠️
syncer/atxsync/syncer.go 66.6% 1 Missing and 1 partial ⚠️
activation/activation.go 95.0% 1 Missing ⚠️
activation/certifier.go 50.0% 1 Missing ⚠️
activation/handler_v2.go 97.3% 0 Missing and 1 partial ⚠️
... and 5 more
Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #6058   +/-   ##
=======================================
  Coverage     81.4%   81.5%           
=======================================
  Files          299     299           
  Lines        32172   32239   +67     
=======================================
+ Hits         26205   26281   +76     
+ Misses        4252    4246    -6     
+ Partials      1715    1712    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@acud
Copy link
Contributor Author

acud commented Jun 20, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jun 20, 2024
@spacemesh-bors
Copy link

try

Build failed:

@acud
Copy link
Contributor Author

acud commented Jun 20, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jun 20, 2024
@spacemesh-bors
Copy link

try

Build failed:

@acud
Copy link
Contributor Author

acud commented Jun 20, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jun 20, 2024
@spacemesh-bors
Copy link

try

Build succeeded:

@acud acud marked this pull request as ready for review June 21, 2024 03:51
Copy link
Contributor

@poszu poszu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if changing "the whole" codebase in one go is a good idea. The PR is huge and it's not confirmed if it actually improves performance. It could be that some places get slower.

You wrote:

The codebase uses value semantics over ATX IDs and this puts unnecessary stress on the GC.

I'm not sure about this. Using values might cause copying, but it shouldn't cause GC pressure as the values live on the stack. Using pointers could increase the risk of escaping to the heap and more GC pressure. Wdyt?

@@ -111,7 +111,7 @@ type Builder struct {
type positioningAtxFinder struct {
finding sync.Mutex
found *struct {
id types.ATXID
id *types.ATXID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that absolutely every use of ATXID must be via a pointer. IMHO, it should be judged by the need. It makes sense in arguments to functions on ~hot path or in big collections - to allow reusing the same memory.

For example, here I think it's fine to have id types.ATXID. It simplifies the code and avoids confusion of having two values for "none/empty". Does nil here means not found, or found but types.EmptyATXID? The nil is not even a valid value for this structure (it must never be nil) so why open a possibility for breaking this invariant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find this specific block to be not a good example of doing things - using anonymous structs in such cases is something I'd generally like to avoid. I found a lot of occurences where code was written such that the assumption that an uninitialized variable/field defaults to an empty ATX ID. I personally object to this sort of working paradigm, preferring the use of constructors with arguments that set field values, and concrete types. It reduces the mental taxation of reading code and understanding what it's doing. Having a default value doesn't really help from my perspective, I'd rather see the field being set explicitly always.

@@ -86,7 +86,7 @@ func (d *Data) AddFromAtx(atx *types.ActivationTx, malicious bool) *ATX {

// Add adds ATX data to the store.
// Returns whether the ATX was added to the store.
func (d *Data) AddAtx(target types.EpochID, id types.ATXID, atx *ATX) bool {
func (d *Data) AddAtx(target types.EpochID, id *types.ATXID, atx *ATX) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make a noticeable difference here? This function ends up dereferencing the pointer anyway on the more common "happy path".

Copy link
Contributor Author

@acud acud Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done for the sake of API uniformity and to reduce the amount of dereferencing we do "on the edges" (the points where we must use the ATXs by value).

Ideally we shouldn't have to do this sort of dereferencing but I've done this more out of a time pressure rather than from a style preference for this. Ideally the map should have a string key and we could just use the string method of the ATX id to return the string. I think that would be favorable, but it is something we can also do later. I was just a bit afraid to introduce changes on that level, we can do them later on a per-package basis.

@@ -78,7 +101,7 @@ func (t *ATXID) UnmarshalText(buf []byte) error {
}

// EmptyATXID is a canonical empty ATXID.
var EmptyATXID = ATXID{}
var EmptyATXID = &ATXID{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should stay being a value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we switch to pointer semantics, comparison against this value become useless since you actually need to deref the pointer to check against it, however, when using pointer semantics you actually can't compare against this because you're basically just comparing memory addresses. If we want to use pointer semantics I would actually scratch this empty value and use nil's wherever necessary. It isn't really clear what does an "Empty ATX ID" even mean. I'd personally use a nil there (I know about the billion dollar mistake, however, we don't have Rust's typesystem here unfortunately).

}

func (t *ATXID) Empty() bool {
return t == nil || len(t[:]) == 0 || bytes.Equal(t[:], EmptyATXID[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can len(t[:]) == 0 be ever true? The ATXID is a byte array - it always has length equal 32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATXID is a byte array but sometimes fields on structs aren't initialized so I had to introduce this ugly thing as a catch-all for those cases. This is definitely not how it should be. But also, something we could improve on easily. Also t == nil is a bit bizarre and shouldn't ever be. The fields should always be initialized (and methods never called on nil values)

@@ -194,7 +194,7 @@ func NotMalicious(data *ATX, _ lockGuard) bool {
// IterateInEpoch calls `fn` for every ATX in epoch.
// If filters are provided, only atxs that pass all filters are returned.
// SAFETY: The returned pointer MUST NOT be modified.
func (d *Data) IterateInEpoch(epoch types.EpochID, fn func(types.ATXID, *ATX), filters ...AtxFilter) {
func (d *Data) IterateInEpoch(epoch types.EpochID, fn func(*types.ATXID, *ATX), filters ...AtxFilter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that in cases like this it actually improves the performance? Passing by pointer could cause more GC pressure as opposed to passing by value, in which case the atxid would leave on the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a value to a method doesn't necessarily mean that it won't escape to the heap. We'd have to read the escape analysis for the whole codebase to figure that out. If you pass a value it might end up on the heap anyway depending on what you do with it. So my consideration is ideally: "end up with one allocation of value" vs. "end up with possible multiple allocations".

@fasmat
Copy link
Member

fasmat commented Jun 21, 2024

The issue that triggered this optimization was increased memory consumption because of logging of ATXIDs and NodeIDs because every log statement (if printed or not) needs to copy this value without optimization.

For me it would be enough to only change the String() method of ATXID/NodeID to be defined on the pointer instead of on the value of the respective type. In all other circumstances I'm with @poszu and do it on a case by case basis if a pointer or the value should be used.

This would a) improve memory consumption of logging these values b) all incorrect uses of the String() method should be easy to fix (the compiler will warn) and we won't overlook any incorrect uses by accident and c) we don't make code worse in places where we didn't check if value or pointer would be better.

Wdyt?

@acud acud closed this Jun 21, 2024
@acud
Copy link
Contributor Author

acud commented Jun 21, 2024

So I close the PR because I think it just won't go through the pipeline in it's current form, even if we do settle on the approach. Some major pain-points that have to do with pointer-value semantics aren't addressed here (for example, the passing around of millions of ATX ID slices that come out of the p2p messages). I will try to run this PR nevertheless to see what and if there's any performance gain (and if it even works on a real node).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants