Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stateless validation]: Improve network connection performance on burst traffic #68

Closed
3 tasks
Tracked by #46
saketh-are opened this issue Apr 17, 2024 · 3 comments
Closed
3 tasks
Tracked by #46
Assignees

Comments

@saketh-are
Copy link

When a chunk validator needs to distribute a state witness of a significant size, we face a situation in which an idle network connection may need to urgently transmit a significant amount of data. Testing shows that our existing network stack does not handle this situation well.

We explored and benchmarked multiple options for addressing this issue, including tuning existing TCP-based connections and switching to QUIC or UDP.

Based on the investigation, TCP tuning along with breaking state witness into parts should deliver the performance we need in the least intrusive manner. For our p2p TCP connections, we need to:

  • Set the appropriate socket options from nearcore where possible (SO_SNDBUF, SO_RCVBUF)
  • Convince node operators to set system-wide options as needed (rmem_max, wmem_max, tcp_rmem, tcp_wmem, tcp_slow_start_after_idle)
  • See how we can drive TIER1 adoption to completion so that communication can happen over direct connections
@walnut-the-cat
Copy link

walnut-the-cat commented Apr 18, 2024

@saketh-are 's note (link)

Summarizing the reasoning behind estimated state witness size we can support:

  • Suppose N = 50 chunk validators will participate in distributing the state witness
  • We will break the state witness into N parts of equal size, each of which will be sent to one CV to forward to all others
  • Erasure coding adds 50% overhead to part sizes
  • 2-hop distribution strategy with the 50% redundancy performs acceptably in mainnet for chunk part delivery
    With modifications shared above our P2P connections can handle >2 MB bursts of traffic on each connection and deliver them at ping latency. Assume for now the part size is 2 MB
  • 50 participants * 2 MB parts * 2/3 = ~66 MB of data distributed at ping latency

Overall each participant uploads exactly the state witness size + 50%.

We need to differentiate committed rate vs burstable rate here.

Taking 8 MB state witness for example, each node will want to upload 12 MB as quickly as possible. At a 1 Gbps burst rate that's going to be 12 MB / 1 Gbps = ~0.09 s of added latency from the connection speed. 100 Mbps peak rate would be unusable (+1s latency).

I think what will happen is that configuring a low committed rate and a high burstable rate will make the most sense economically for chunk validators.

Worth noting that this bandwidth discussion is fundamentally a flow problem. The only overhead coming from our distribution strategy is the constant factor +50%. If we want to support slower connections we need to reduce the state witness size.

Most ISPs use a five-minute sampling and 95% usage when calculating usage. It allows usage to exceed a specified threshold for brief periods of time without the financial penalty of purchasing a higher committed information rate

The funny thing is assuming typical state witness size of 4 MB and a burst capacity of at least 1 Gbps, each chunk validator should spend 4 MB * 1.5 / (1.3 seconds) / (1 Gbps) = 3.69% which is less than 5% of their time uploading state witness. That means assuming a reasonable burstable rate the state witness is irrelevant when it comes to deciding the committed rate on the connection.

@saketh-are
Copy link
Author

Current status: we have implemented the TCP configuration changes described in this issue in forknet, where we own the instances and can freely set the system tunables. There we are using SO_SNDBUF = SO_RCVBUF = 2000000.

After we also deploy #67 there, we should observe the impact on the performance of state witness distribution and tune the TCP buffer sizes appropriately. Although setting larger buffers improves the network layer's performance, it also increases the RAM usage of the node.

Once we decide the buffer sizes to configure, we will need to communicate with node operators the need to configure system tunables described in this issue. Since we are able to detect whether the node is configured appropriately from within neard, we can take that opportunity to print an error or warn containing details of how to set the tunables.

@walnut-the-cat
Copy link

@staffik shared analysis showing that TCP optimization improves performance in ForkNet with MainNet traffic: link Need to confirm if this is 'enough' improvement along with @shreyan-gupta 's work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants