Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| self.bytes_sharded += bytes; | ||
|
|
||
| // Route each row to the right shard(s). send_one is a buffered write | ||
| // so this loop does no I/O — no concurrency needed here. |
There was a problem hiding this comment.
That's not true. send_one does a buffered write only if there is space in the buffer. Once that space runs out, it will flush the data to the socket.
With copy, where it writes a lot of data, this happens very frequently. This part definitely needs to be parallelized.
There was a problem hiding this comment.
you are right about send_one.
but at the end that's the same send_one in the listener that will block the listener loop when the io buffer is full. So by using ParallelConnection we are defining additional buffer (channel) above that. And there is third buffer above it inside the copy_data.
The task offloading can help there to not block reading from the source while writing to the destination, but the computation cost should be the same (basically higher due to additional machinery).
My bet that's the size of the buffer is the important part since with enough size we can achieve parallel reading and writing. Where is this buffer is defined that's not important.
So, anyway, need to test and benchmark it.
No description provided.