Socket write queue fixes and improvements #4202
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The socket asynchronous write loop, first introduced in #1938 got broken by #2918. The problematic PR attempted to streamline logic, seemingly with the assumption that just a strand is enough to guarantee correct order of execution of async writes, however this assumption is not correct. A strand can only ensure that tasks are executed sequentially with other tasks, so an unfortunate call sequence of
async_write_1 > async_write_2 > async_write_1_complete > async_write_2_complete ...
is still possible. Here is a link to a brief discussion from the Boost ASIO mailing list that mentions this problem: https://boost-users.boost.narkive.com/MrmFQST2/multiple-async-operations-on-a-socket To correctly implement a multi-writer async write loop, both strand and a write queue of some sort is required. Failure to do so might result in data from multiple write requests being interleaved, which is exactly the issue observed during high load local tests. This could also explain why a drop in connected nodes is observed each time live network activity spikes.A related improvement is that the newly introduced socket send queue is further split into multiple subqueues, based on traffic type. This allows for better prioritization of inter node communication when network throughtput is limited, eg. ensuring that bootstrap traffic won't preempt voting when network is experiencing a period of heightened activity. This should eliminate the need for aggressive throttling of ascending bootstrapper rate, as both client and server should now automatically throttle their rate in response to network back pressure (socket send queue being filled). This could be extended further to differently prioritize live voting / voting requests / voting responses / block publishing, which should give the network additional resiliency, but this is a TODO.