-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/p2p: Improve seed node performance #5456
go/p2p: Improve seed node performance #5456
Conversation
2735dae
to
cbf6506
Compare
@@ -133,6 +133,13 @@ func (c *client) Advertise(ctx context.Context, ns string, _ ...discovery.Option | |||
|
|||
pf.RecordSuccess() | |||
|
|||
// Close connections after every call because requests to the seed node are infrequent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also do the same on the server side to handle clients that don't close connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but its a bit uglier because currently the server only handles streams. And I'm not sure if the caller will receive the data if the connection is closed immediately after the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kostko Tested locally. If connection is closed after the stream, the caller may receive an error indicating that the stream was reset. Therefore, if the server intends to close connections, it should wait a few seconds before doing so.
Connection handling is the responsibility of the connection manager. To speed up connection closing, we can set the PeerGracePeriod
configuration for seed nodes to 2 seconds (default 20 seconds), for instance. Optionally, we might consider introducing default configurations for various roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection handling is the responsibility of the connection manager. To speed up connection closing, we can set the PeerGracePeriod configuration for seed nodes to 2 seconds (default 20 seconds), for instance.
Sounds like the cleanest solution to me. But do test that this actually causes the connection to be closed if idle after 2 sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, connection is not closed after 2 secs. It is just out of the grace period and if the number of connections is 130+ (equal or above high watermark) it can be closed.
If we want to keep the number of connections low, we would need to set low watermark to 1 and high to 2. This way, all connections but 1 will be closed after 2 seconds. Currently, we cannot set this in the config as we hard coded difference between low and high (30), so changes to the config are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's good as well (e.g. that connections are only closed under stress, as long as this results in new connections succeeding).
go/p2p/host.go
Outdated
// Tweak limits for seed nodes. | ||
defaultLimits := rcmgr.DefaultLimits | ||
defaultLimits.SystemBaseLimit.ConnsInbound = rcmgr.DefaultLimits.SystemBaseLimit.ConnsOutbound | ||
defaultLimits.SystemLimitIncrease.ConnsInbound = rcmgr.DefaultLimits.SystemLimitIncrease.ConnsOutbound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should StreamsInbound
also be updated, as it is smaller compared to StreamsOutbound
?
go/p2p/host.go
Outdated
|
||
// Tweak limits for seed nodes. | ||
defaultLimits := rcmgr.DefaultLimits | ||
defaultLimits.SystemBaseLimit.ConnsInbound = rcmgr.DefaultLimits.SystemBaseLimit.ConnsOutbound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment about what these values are (e.g. 64 -> 128). Imo, it might actually be nicer to specify values here (with some comments) instead of reusing variables from the default limits.
Btw do we expect these to be enough? On mainnet there's a lot more nodes (vs testnet) that will be trying to connect to the seed node. I guess it should be fine assuming the connections will get closed quickly and clients will be retrying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they should be enough. As stated in the description, the connection manager will trim connections if resource manager's limits are larger than the high watermark.
8805ca3
to
7cf5660
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5456 +/- ##
==========================================
- Coverage 66.84% 66.51% -0.34%
==========================================
Files 533 533
Lines 56390 56422 +32
==========================================
- Hits 37694 37527 -167
- Misses 14289 14490 +201
+ Partials 4407 4405 -2 ☔ View full report in Codecov by Sentry. |
Bootstrap client, which is responsible for peer discovery and advertisement, now terminates connection to the seed node after every request. This action should free up recourses (e.g. inbound/outbound connections) on both sides without affecting performance since discovered peers are cached (see retention period) and advertisement is done infrequently (see TTL).
7cf5660
to
ecb0be7
Compare
Some nodes cannot bootstrap because the seed node has too many open connections.
The problem lies in the default settings of the resource manager, which are set too low. Consequently, the connection manager never trims connections down to the low watermark (set at 100 by default) because the high watermark (set at 130 by default) is never attained. Increasing the number of inbound connections for seed nodes to 128+ (extra connections come from autoscaling), should solve the problem.