perf: use atx ids with pointer semantics #6058

acud · 2024-06-20T17:33:02Z

Motivation

The codebase uses value semantics over ATX IDs and this puts unnecessary stress on the GC. This PR tries to mitigate most of this usage (apart from places where it is absolutely necessary/difficult to change).

Description

The PR changes the value semantics to pointer semantics wherever possible. Unfortunately this is not possible everywhere because some of the protocol messages encode the ATX IDs as values, not pointers to values and changing this will break protocol specs. To overcome this I've introduced a couple of helper methods that help the conversions wherever necessary, between a slice to values and a slice to pointers (these are generics and can be reused, however they might cause inefficiencies with memory if abused).

Another point which cannot be changed is that maps that use types.ATXID as a key cannot be changed to be map[*ATXID] because the map key cannot use pointers to values (the pointer will not be dereferenced for the key, meaning that the same value pointed to by different pointers will result in different map entries).

Due to the switch to pointer semantics, several helper methods had to be introduced, for example to check whether an ATX ID is empty or to compare it against another value. This also cleans up the API and will allow future modularity with changes to these types, if necessary.

One visible thing that was noticed in tests was that some previously uninitialized fields on structs for example (that pointed to types.ATXID) were falling back to an empty ATX ID whenever not explicitly set. This, combined with a recurring pattern of initializing complex types without the use of a constructor but through a literal, sometimes makes refactoring difficult and ends up with a lot of magic. Constructor use is therefore encouraged.

Test Plan

fix remaining tests
run on a node to see that there's no panicks

TODO

Explain motivation or link existing issue(s)
Test changes and document test plan
Update documentation as needed
Update changelog as needed

codecov · 2024-06-20T21:56:16Z

Codecov Report

Attention: Patch coverage is 92.43902% with 31 lines in your changes missing coverage. Please review.

Project coverage is 81.5%. Comparing base (a608922) to head (0d3563e).

Files	Patch %	Lines
node/node.go	36.3%	6 Missing and 1 partial ⚠️
sql/atxs/atxs.go	92.3%	6 Missing ⚠️
activation/handler_v1.go	94.4%	0 Missing and 2 partials ⚠️
activation/validation.go	92.3%	1 Missing and 1 partial ⚠️
datastore/store.go	60.0%	1 Missing and 1 partial ⚠️
miner/active_set_generator.go	88.2%	1 Missing and 1 partial ⚠️
syncer/atxsync/syncer.go	66.6%	1 Missing and 1 partial ⚠️
activation/activation.go	95.0%	1 Missing ⚠️
activation/certifier.go	50.0%	1 Missing ⚠️
activation/handler_v2.go	97.3%	0 Missing and 1 partial ⚠️
... and 5 more

Additional details and impacted files

@@           Coverage Diff           @@
##           develop   #6058   +/-   ##
=======================================
  Coverage     81.4%   81.5%           
=======================================
  Files          299     299           
  Lines        32172   32239   +67     
=======================================
+ Hits         26205   26281   +76     
+ Misses        4252    4246    -6     
+ Partials      1715    1712    -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

acud · 2024-06-20T21:57:24Z

bors try

spacemesh-bors · 2024-06-20T22:00:40Z

try

Build failed:

systest-status

acud · 2024-06-20T22:14:21Z

bors try

spacemesh-bors · 2024-06-20T22:16:20Z

try

Build failed:

systest-status

acud · 2024-06-20T22:56:48Z

bors try

spacemesh-bors · 2024-06-20T23:40:51Z

try

Build succeeded:

poszu

I'm not sure if changing "the whole" codebase in one go is a good idea. The PR is huge and it's not confirmed if it actually improves performance. It could be that some places get slower.

You wrote:

The codebase uses value semantics over ATX IDs and this puts unnecessary stress on the GC.

I'm not sure about this. Using values might cause copying, but it shouldn't cause GC pressure as the values live on the stack. Using pointers could increase the risk of escaping to the heap and more GC pressure. Wdyt?

poszu · 2024-06-21T06:19:42Z

activation/activation.go

@@ -111,7 +111,7 @@ type Builder struct {
 type positioningAtxFinder struct {
 	finding sync.Mutex
 	found   *struct {
-		id         types.ATXID
+		id         *types.ATXID


I don't think that absolutely every use of ATXID must be via a pointer. IMHO, it should be judged by the need. It makes sense in arguments to functions on ~hot path or in big collections - to allow reusing the same memory.

For example, here I think it's fine to have id types.ATXID. It simplifies the code and avoids confusion of having two values for "none/empty". Does nil here means not found, or found but types.EmptyATXID? The nil is not even a valid value for this structure (it must never be nil) so why open a possibility for breaking this invariant?

Personally I find this specific block to be not a good example of doing things - using anonymous structs in such cases is something I'd generally like to avoid. I found a lot of occurences where code was written such that the assumption that an uninitialized variable/field defaults to an empty ATX ID. I personally object to this sort of working paradigm, preferring the use of constructors with arguments that set field values, and concrete types. It reduces the mental taxation of reading code and understanding what it's doing. Having a default value doesn't really help from my perspective, I'd rather see the field being set explicitly always.

poszu · 2024-06-21T06:23:09Z

atxsdata/data.go

@@ -86,7 +86,7 @@ func (d *Data) AddFromAtx(atx *types.ActivationTx, malicious bool) *ATX {

 // Add adds ATX data to the store.
 // Returns whether the ATX was added to the store.
-func (d *Data) AddAtx(target types.EpochID, id types.ATXID, atx *ATX) bool {
+func (d *Data) AddAtx(target types.EpochID, id *types.ATXID, atx *ATX) bool {


Does it make a noticeable difference here? This function ends up dereferencing the pointer anyway on the more common "happy path".

This is done for the sake of API uniformity and to reduce the amount of dereferencing we do "on the edges" (the points where we must use the ATXs by value).

Ideally we shouldn't have to do this sort of dereferencing but I've done this more out of a time pressure rather than from a style preference for this. Ideally the map should have a string key and we could just use the string method of the ATX id to return the string. I think that would be favorable, but it is something we can also do later. I was just a bit afraid to introduce changes on that level, we can do them later on a per-package basis.

poszu · 2024-06-21T06:25:39Z

common/types/activation.go

@@ -78,7 +101,7 @@ func (t *ATXID) UnmarshalText(buf []byte) error {
 }

 // EmptyATXID is a canonical empty ATXID.
-var EmptyATXID = ATXID{}
+var EmptyATXID = &ATXID{}


I think it should stay being a value

When we switch to pointer semantics, comparison against this value become useless since you actually need to deref the pointer to check against it, however, when using pointer semantics you actually can't compare against this because you're basically just comparing memory addresses. If we want to use pointer semantics I would actually scratch this empty value and use nil's wherever necessary. It isn't really clear what does an "Empty ATX ID" even mean. I'd personally use a nil there (I know about the billion dollar mistake, however, we don't have Rust's typesystem here unfortunately).

poszu · 2024-06-21T06:28:11Z

common/types/activation.go

+}
+
+func (t *ATXID) Empty() bool {
+	return t == nil || len(t[:]) == 0 || bytes.Equal(t[:], EmptyATXID[:])


Can len(t[:]) == 0 be ever true? The ATXID is a byte array - it always has length equal 32.

ATXID is a byte array but sometimes fields on structs aren't initialized so I had to introduce this ugly thing as a catch-all for those cases. This is definitely not how it should be. But also, something we could improve on easily. Also t == nil is a bit bizarre and shouldn't ever be. The fields should always be initialized (and methods never called on nil values)

poszu · 2024-06-21T07:02:48Z

atxsdata/data.go

@@ -194,7 +194,7 @@ func NotMalicious(data *ATX, _ lockGuard) bool {
 // IterateInEpoch calls `fn` for every ATX in epoch.
 // If filters are provided, only atxs that pass all filters are returned.
 // SAFETY: The returned pointer MUST NOT be modified.
-func (d *Data) IterateInEpoch(epoch types.EpochID, fn func(types.ATXID, *ATX), filters ...AtxFilter) {
+func (d *Data) IterateInEpoch(epoch types.EpochID, fn func(*types.ATXID, *ATX), filters ...AtxFilter) {


Are you certain that in cases like this it actually improves the performance? Passing by pointer could cause more GC pressure as opposed to passing by value, in which case the atxid would leave on the stack.

Passing a value to a method doesn't necessarily mean that it won't escape to the heap. We'd have to read the escape analysis for the whole codebase to figure that out. If you pass a value it might end up on the heap anyway depending on what you do with it. So my consideration is ideally: "end up with one allocation of value" vs. "end up with possible multiple allocations".

fasmat · 2024-06-21T16:16:48Z

The issue that triggered this optimization was increased memory consumption because of logging of ATXIDs and NodeIDs because every log statement (if printed or not) needs to copy this value without optimization.

For me it would be enough to only change the String() method of ATXID/NodeID to be defined on the pointer instead of on the value of the respective type. In all other circumstances I'm with @poszu and do it on a case by case basis if a pointer or the value should be used.

This would a) improve memory consumption of logging these values b) all incorrect uses of the String() method should be easy to fix (the compiler will warn) and we won't overlook any incorrect uses by accident and c) we don't make code worse in places where we didn't check if value or pointer would be better.

Wdyt?

acud · 2024-06-21T19:40:35Z

So I close the PR because I think it just won't go through the pipeline in it's current form, even if we do settle on the approach. Some major pain-points that have to do with pointer-value semantics aren't addressed here (for example, the passing around of millions of ATX ID slices that come out of the p2p messages). I will try to run this PR nevertheless to see what and if there's any performance gain (and if it even works on a real node).

acud self-assigned this Jun 20, 2024

acud requested review from dshulyak, fasmat, poszu and ivan4th as code owners June 20, 2024 17:33

acud marked this pull request as draft June 20, 2024 17:33

acud force-pushed the atx-id-pointer branch 2 times, most recently from 15e840e to abc8fb7 Compare June 20, 2024 19:12

perf: use atx ids with pointer semantics

739d70e

acud force-pushed the atx-id-pointer branch from abc8fb7 to 739d70e Compare June 20, 2024 19:17

acud added 2 commits June 20, 2024 14:12

wiptests

72994aa

gix e2e test

0d3563e

spacemesh-bors bot added a commit that referenced this pull request Jun 20, 2024

Try #6058:

ad91889

spacemesh-bors bot added a commit that referenced this pull request Jun 20, 2024

Try #6058:

4d3ace6

spacemesh-bors bot added a commit that referenced this pull request Jun 20, 2024

Try #6058:

36bad97

acud marked this pull request as ready for review June 21, 2024 03:51

poszu reviewed Jun 21, 2024

View reviewed changes

acud closed this Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use atx ids with pointer semantics #6058

perf: use atx ids with pointer semantics #6058

acud commented Jun 20, 2024

codecov bot commented Jun 20, 2024

acud commented Jun 20, 2024

spacemesh-bors bot commented Jun 20, 2024

acud commented Jun 20, 2024

spacemesh-bors bot commented Jun 20, 2024

acud commented Jun 20, 2024

spacemesh-bors bot commented Jun 20, 2024

poszu left a comment

poszu Jun 21, 2024

acud Jun 21, 2024

poszu Jun 21, 2024

acud Jun 21, 2024 •

edited

Loading

poszu Jun 21, 2024

acud Jun 21, 2024

poszu Jun 21, 2024

acud Jun 21, 2024

poszu Jun 21, 2024

acud Jun 21, 2024

fasmat commented Jun 21, 2024 •

edited

Loading

acud commented Jun 21, 2024

perf: use atx ids with pointer semantics #6058

perf: use atx ids with pointer semantics #6058

Conversation

acud commented Jun 20, 2024

Motivation

Description

Test Plan

TODO

codecov bot commented Jun 20, 2024

Codecov Report

acud commented Jun 20, 2024

spacemesh-bors bot commented Jun 20, 2024

try

acud commented Jun 20, 2024

spacemesh-bors bot commented Jun 20, 2024

try

acud commented Jun 20, 2024

spacemesh-bors bot commented Jun 20, 2024

try

poszu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acud Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fasmat commented Jun 21, 2024 • edited Loading

acud commented Jun 21, 2024

acud Jun 21, 2024 •

edited

Loading

fasmat commented Jun 21, 2024 •

edited

Loading