Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/p2p: Improve seed node performance #5456

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

peternose
Copy link
Contributor

@peternose peternose commented Nov 20, 2023

Some nodes cannot bootstrap because the seed node has too many open connections.

{"attempted":1,"caller":"scope.go:584","current":77,"direction":"Inbound","edge":"system","error":"system: cannot reserve inbound connection: resource limit exceeded","level":"debug","limit":77,"module":"libp2p:rcmgr","msg":"blocked connection from constraining edge","scope":"conn-9516","stat":{"NumStreamsInbound":0,"NumStreamsOutbound":0,"NumConnsInbound":77,"NumConnsOutbound":0,"NumFD":77,"Memory":0},"ts":"2023-11-16T21:09:27.902582862Z","usefd":true}
{"caller":"listener.go:99","error":"conn-9516: system: cannot reserve inbound connection: resource limit exceeded","level":"debug","module":"libp2p:upgrader","msg":"resource manager blocked accept of new connection","ts":"2023-11-16T21:09:27.902778381Z"}

The problem lies in the default settings of the resource manager, which are set too low. Consequently, the connection manager never trims connections down to the low watermark (set at 100 by default) because the high watermark (set at 130 by default) is never attained. Increasing the number of inbound connections for seed nodes to 128+ (extra connections come from autoscaling), should solve the problem.

@peternose peternose force-pushed the peternose/bugfix/improve-seed-node-performance branch from 2735dae to cbf6506 Compare November 20, 2023 11:21
@@ -133,6 +133,13 @@ func (c *client) Advertise(ctx context.Context, ns string, _ ...discovery.Option

pf.RecordSuccess()

// Close connections after every call because requests to the seed node are infrequent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also do the same on the server side to handle clients that don't close connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but its a bit uglier because currently the server only handles streams. And I'm not sure if the caller will receive the data if the connection is closed immediately after the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostko Tested locally. If connection is closed after the stream, the caller may receive an error indicating that the stream was reset. Therefore, if the server intends to close connections, it should wait a few seconds before doing so.

Connection handling is the responsibility of the connection manager. To speed up connection closing, we can set the PeerGracePeriod configuration for seed nodes to 2 seconds (default 20 seconds), for instance. Optionally, we might consider introducing default configurations for various roles.

Copy link
Member

@kostko kostko Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection handling is the responsibility of the connection manager. To speed up connection closing, we can set the PeerGracePeriod configuration for seed nodes to 2 seconds (default 20 seconds), for instance.

Sounds like the cleanest solution to me. But do test that this actually causes the connection to be closed if idle after 2 sec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, connection is not closed after 2 secs. It is just out of the grace period and if the number of connections is 130+ (equal or above high watermark) it can be closed.

If we want to keep the number of connections low, we would need to set low watermark to 1 and high to 2. This way, all connections but 1 will be closed after 2 seconds. Currently, we cannot set this in the config as we hard coded difference between low and high (30), so changes to the config are needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's good as well (e.g. that connections are only closed under stress, as long as this results in new connections succeeding).

go/p2p/host.go Outdated
// Tweak limits for seed nodes.
defaultLimits := rcmgr.DefaultLimits
defaultLimits.SystemBaseLimit.ConnsInbound = rcmgr.DefaultLimits.SystemBaseLimit.ConnsOutbound
defaultLimits.SystemLimitIncrease.ConnsInbound = rcmgr.DefaultLimits.SystemLimitIncrease.ConnsOutbound
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should StreamsInbound also be updated, as it is smaller compared to StreamsOutbound?

go/p2p/host.go Outdated

// Tweak limits for seed nodes.
defaultLimits := rcmgr.DefaultLimits
defaultLimits.SystemBaseLimit.ConnsInbound = rcmgr.DefaultLimits.SystemBaseLimit.ConnsOutbound
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment about what these values are (e.g. 64 -> 128). Imo, it might actually be nicer to specify values here (with some comments) instead of reusing variables from the default limits.

Btw do we expect these to be enough? On mainnet there's a lot more nodes (vs testnet) that will be trying to connect to the seed node. I guess it should be fine assuming the connections will get closed quickly and clients will be retrying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they should be enough. As stated in the description, the connection manager will trim connections if resource manager's limits are larger than the high watermark.

@peternose peternose force-pushed the peternose/bugfix/improve-seed-node-performance branch 3 times, most recently from 8805ca3 to 7cf5660 Compare November 20, 2023 12:09
Copy link

codecov bot commented Nov 20, 2023

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (0039ebf) 66.84% compared to head (ecb0be7) 66.51%.
Report is 1 commits behind head on master.

Files Patch % Lines
go/p2p/discovery/bootstrap/client.go 0.00% 6 Missing and 2 partials ⚠️
go/p2p/host.go 87.50% 1 Missing and 1 partial ⚠️
go/p2p/rpc/nop.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5456      +/-   ##
==========================================
- Coverage   66.84%   66.51%   -0.34%     
==========================================
  Files         533      533              
  Lines       56390    56422      +32     
==========================================
- Hits        37694    37527     -167     
- Misses      14289    14490     +201     
+ Partials     4407     4405       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@peternose peternose marked this pull request as ready for review November 20, 2023 13:35
Bootstrap client, which is responsible for peer discovery and advertisement,
now terminates connection to the seed node after every request. This action
should free up recourses (e.g. inbound/outbound connections) on both sides
without affecting performance since discovered peers are cached (see retention
period) and advertisement is done infrequently (see TTL).
@peternose peternose force-pushed the peternose/bugfix/improve-seed-node-performance branch from 7cf5660 to ecb0be7 Compare November 21, 2023 09:10
@peternose peternose merged commit adf3314 into master Nov 21, 2023
4 of 6 checks passed
@peternose peternose deleted the peternose/bugfix/improve-seed-node-performance branch November 21, 2023 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants