Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(StargateQueries): use a sync pool when unmarshalling responses of protobuf objects #7346

Merged
merged 7 commits into from
Jan 20, 2024

Conversation

testinginprod
Copy link
Contributor

@testinginprod testinginprod commented Jan 19, 2024

Closes: #XXX

What is the purpose of the change

This PR uses a sync pool to unmarshal responses of protobuf objects in stargate queries.

We were previously utilizing pointers, which under heavy load can result in nondeterminism.

Testing and Verifying

This code was backported to v21 and tested against mainnet.

Previously, we were able to app hash nodes within 10 minutes of spam. With this change, the node has been running for 1 hour with no issues.

if err != nil {
return nil, err
}
// no matter what happens after this point, but we must return
// the response type to the pool.
defer returnStargateResponseToPool(request.Path, protoResponseType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to return the sync.Pool object, so it does not leak

// The query is multi-threaded so we're using a sync.Pool
// to manage the allocation and de-allocation of newly created
// pb objects.
var stargateResponsePools map[string]*sync.Pool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use a sync.Pool of multiple proto responses type so we do not allocate every time, this should provide relief to the GC in moments of high traffic.

@@ -184,34 +185,48 @@ func init() {
setWhitelistedQuery("/osmosis.concentratedliquidity.v1beta1.Query/CFMMPoolIdLinkFromConcentratedPoolId", &concentratedliquidityquery.CFMMPoolIdLinkFromConcentratedPoolIdResponse{})
}

// GetWhitelistedQuery returns the whitelisted query at the provided path.
// IsWhitelistedQuery returns if the query is not whitelisted.
func IsWhitelistedQuery(queryPath string) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed this method in place of getWhitelistedQuery to avoid unexported usage of a function that can leak memory if not used properly

codec.ProtoMarshaler
}

func setWhitelistedQuery[T any, PT protoTypeG[T]](queryPath string, _ PT) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this creates a sync.Pool for the given protobuf object, we use generics so we can properly instantiate an object that queryPath expects as response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment in the code with this context please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment here 801eae8

}

func setWhitelistedQuery(queryPath string, protoType codec.ProtoMarshaler) {
stargateWhitelist.Store(queryPath, protoType)
func returnStargateResponseToPool(queryPath string, pb codec.ProtoMarshaler) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returns the protobuf object to its appropriate pool (based on the queryPath)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transferring this into a comment would also be helpful IMO 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment here 2192082

@czarcas7ic czarcas7ic added V:state/compatible/backport State machine compatible PR, should be backported A:backport/v22.x backport patches to v22.x branch A:backport/v21.x backport patches to v21.x branch labels Jan 19, 2024
if !ok {
return nil, wasmvmtypes.Unknown{}
return nil, fmt.Errorf("failed to assert type to codec.ProtoMarshaler")
Copy link
Member

@czarcas7ic czarcas7ic Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use a wasmvmtypes error here?

Looking at the caller this seems fine, but wanted to flag to ensure I wasn't sneaking this in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with Roman offline, looks fine.

Copy link
Member

@czarcas7ic czarcas7ic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! I added a type assertion that doesn't return a wasmtype error, would like someone to ACK that this is okay.

@czarcas7ic czarcas7ic marked this pull request as ready for review January 19, 2024 20:27
Copy link
Member

@p0mvn p0mvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Requesting additional comments and clarifications

codec.ProtoMarshaler
}

func setWhitelistedQuery[T any, PT protoTypeG[T]](queryPath string, _ PT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment in the code with this context please?

}

func setWhitelistedQuery(queryPath string, protoType codec.ProtoMarshaler) {
stargateWhitelist.Store(queryPath, protoType)
func returnStargateResponseToPool(queryPath string, pb codec.ProtoMarshaler) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transferring this into a comment would also be helpful IMO 🙏

Comment on lines -41 to +44
// The query can be multi-thread, so we have to use
// thread safe sync.Map.
var stargateWhitelist sync.Map
// The query is multi-threaded so we're using a sync.Pool
// to manage the allocation and de-allocation of newly created
// pb objects.
var stargateResponsePools = make(map[string]*sync.Pool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand - is the main reason sync.Pool works and sync.Map doesn't is that the former allocates new objects for concurrent requests?

Copy link
Contributor Author

@testinginprod testinginprod Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically the sync.Map was keeping safe the map, which was not needed since after init the map is readonly.

the value of the map was a pointer to a protobuf object.

So we had a map of Map[K, *V]

and simulate + delivertx (which shared the same string request path) where all editing the same pointer, meaning they were editing the same variable underneath, concurrently (not the map but the value associated with the key in that map).

So what were we using that value for? To unmarshal a stargate query into a protobuf object that then we marshal back as JSON for CosmWasm contracts.

So what this map of sync pool does is that it provides a way to create new objects matching to a specific gRPC query response type (creation=allocation), and when we're done with them we put them it the pool so we can use them again without allocating anymore. sync.Pool takes care of de-allocating them when they're not needed anymore so we do not have to worry.

hope this clarifies the issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is yes, the later requires a pointer and shares the same struct to unmarshal into, whereas this creates a new object for each request. Utilizing sync.Pool allows us to reallocate the object once completed though so it doesn't have a large impact on performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! We could have used a map of map[string]func() codec.ProtoMarshaler, where string is the request path and func() codec.ProtoMarshaler is a function that returns a newly and freshly created protobuf object to be used as target for response unmarshalling, but this could cause GC over-head in concurrent scenarios as the object is created and needs to immediately be GC'd (eg: during a lot of concurrent sims).

So sync.Pool simply allows us to recycle unused objects instead of immediately forcing the GC to de-allocate them immediately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testinginprod's explanation is much more thorough, thanks.

Comment on lines -194 to +205
protoResponseType, ok := protoResponseAny.(codec.ProtoMarshaler)
protoMarshaler, ok := protoResponseAny.Get().(codec.ProtoMarshaler)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this cast is the primary source of the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreleated: #7346 (comment)

@czarcas7ic czarcas7ic merged commit 2caa5c6 into osmosis-labs:main Jan 20, 2024
1 check passed
mergify bot pushed a commit that referenced this pull request Jan 20, 2024
… protobuf objects (#7346)

* use a sync pool when unmarshalling responses of protobuf objects in StargateQueries

* fix uninitted pool

* type assertion and lints

* changelog

* add comment for returnStargateResponseToPool

* add setWhitelistedQuery comment

* lint

---------

Co-authored-by: unknown unknown <unknown@unknown>
Co-authored-by: Adam Tucker <adamleetucker@outlook.com>
(cherry picked from commit 2caa5c6)

# Conflicts:
#	CHANGELOG.md
mergify bot pushed a commit that referenced this pull request Jan 20, 2024
… protobuf objects (#7346)

* use a sync pool when unmarshalling responses of protobuf objects in StargateQueries

* fix uninitted pool

* type assertion and lints

* changelog

* add comment for returnStargateResponseToPool

* add setWhitelistedQuery comment

* lint

---------

Co-authored-by: unknown unknown <unknown@unknown>
Co-authored-by: Adam Tucker <adamleetucker@outlook.com>
(cherry picked from commit 2caa5c6)

# Conflicts:
#	CHANGELOG.md
@czarcas7ic czarcas7ic removed the A:backport/v21.x backport patches to v21.x branch label Jan 20, 2024
czarcas7ic added a commit that referenced this pull request Jan 20, 2024
… protobuf objects (backport #7346) (#7349)

* fix(StargateQueries): use a sync pool when unmarshalling responses of protobuf objects  (#7346)

* use a sync pool when unmarshalling responses of protobuf objects in StargateQueries

* fix uninitted pool

* type assertion and lints

* changelog

* add comment for returnStargateResponseToPool

* add setWhitelistedQuery comment

* lint

---------

Co-authored-by: unknown unknown <unknown@unknown>
Co-authored-by: Adam Tucker <adamleetucker@outlook.com>
(cherry picked from commit 2caa5c6)

# Conflicts:
#	CHANGELOG.md

* changelog

---------

Co-authored-by: testinginprod <98415576+testinginprod@users.noreply.github.com>
Co-authored-by: Adam Tucker <adamleetucker@outlook.com>
Copy link
Contributor

@nicolaslara nicolaslara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix looks really good (and better than the naive solution of copying and creating a new allocation on each use). Great find too! GG @czarcas7ic and @testinginprod

@github-actions github-actions bot mentioned this pull request Mar 15, 2024
@mattverse mattverse added A:backport/v19.x backport patches to v19.x branch A:backport/v20.x backport patches to v20.x branch A:backport/v21.x backport patches to v21.x branch labels Apr 29, 2024
mergify bot pushed a commit that referenced this pull request Apr 29, 2024
… protobuf objects (#7346)

* use a sync pool when unmarshalling responses of protobuf objects in StargateQueries

* fix uninitted pool

* type assertion and lints

* changelog

* add comment for returnStargateResponseToPool

* add setWhitelistedQuery comment

* lint

---------

Co-authored-by: unknown unknown <unknown@unknown>
Co-authored-by: Adam Tucker <adamleetucker@outlook.com>
(cherry picked from commit 2caa5c6)

# Conflicts:
#	CHANGELOG.md
@github-actions github-actions bot mentioned this pull request May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:backport/v19.x backport patches to v19.x branch A:backport/v20.x backport patches to v20.x branch A:backport/v21.x backport patches to v21.x branch A:backport/v22.x backport patches to v22.x branch V:state/compatible/backport State machine compatible PR, should be backported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants