-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing (*sentPacketHistory).FirstOutstanding #3462
Comments
We call Generating a packet number happens for every single packet, no matter the loss right. Am I right to assume that in low loss conditions, we do more iterations of the loop? Maybe the solution here is to have two separate lists, one that has all packets The only problem is that you'd have to move packets between the two lists whenever you want to declare a packet lost, and that's a |
I think you mean in high loss conditions? |
Can you elaborate on why it would be Also, I'm not sure how to implement
or must it behave as it's still one list? |
Yes, that's what I meant.
Correct. Although the
Order does matter, since it allows the caller to abort early. One way to implement this is to have a |
Might be a dumb question but what would happen if we simply remove all I tried to look around the code but didn't find anything obvious. |
My understanding is that, since ACKed packets are immediately removed from the list, but lost packets are only cleared after |
And if that's the case,
The N would still be pretty large, right? |
Good question. What about this scenario: We declare a packet lost, then we receive an ACK for it later. If we haven't had the chance to retransmit the frames, we don't need to do that any more. We can also use it to generate a RTT sample (this is very valuable in this situation).
Not if you traverse the list from the end. You'll only encounter skipped and MTU packets. |
I see. I'll work on an implementation using 2 lists. |
Hi @marten-seemann , here's the pprof result after this optimization. |
Why does it require generics? |
I was hoping that we'd be able to find a well-tested, type-safe generic tree implementation. |
Could this be useful https://github.com/tidwall/btree |
I don't think there's any library that is readily available. Even if they support custom comparators, they only support comparisons between the same types - which doesn't fit our "find gap by offset" use case. |
Hi @marten-seemann,
quic-go currently spends a lot of CPU time on
(*sentPacketHistory).FirstOutstanding
when transmitting at high speed over a high packet loss connection. (Of course, usually this can't happen at all, as the built-in CUBIC congestion control slows the connection down to a crawl when packet loss rate is high. I have replaced it with a custom constant-speed CC.)The following call graph was generated by a transmission test between 2 servers ($5 AWS Lightsail) @ 200 Mbps, simulated 30ms, 4% packet loss link
FirstOutstanding
was taking up nearly 30% of the runtime and was a major CPU bottleneck. The program occupied the entire CPU core, couldn't get faster than a merely 100 Mbps because of it. And it gets worse as I crank up the loss rate. Switching to servers with better CPU performance does improve the situation - I was able to hit ~700 Mbps between two Ryzen 7 machines over the same link.Do you have any thoughts on what could be done to optimize this part of the code? Perhaps we can add some kind of cache, or pre-calculate the first outstanding packet every time the linked list is modified, instead of iterating it every call?
pprof sample attached:
pprof.hy_linux_amd64.samples.cpu.002.pb.gz
The text was updated successfully, but these errors were encountered: