Tracking issue: Bad RTT during connection startup #2224
Labels
c-iroh-net
metrics
extracting quantified mesurements from iroh
perf
performance related issues
tracking
an overview issue that tracks completion of a project
At the beginning of a connection we have observed it takes a while before we get a reasonable and stable Round-Trip-Time. We need to be a bit better at this:
Connections initially have a bad RTT, take too long to settle down. #2176 is the main issue in which we've been exploring how to improve.
connections keep switching away from direct connection and switching back #2169 is very related, it results in too many connection type changes.
We should have metrics that show the RTT of a long-lived connection. Due to how metric collection works this would not result in being able to observe this problem at the start of the connection very much. But would still be a worthwhile thing to expose.
We should be able to properly quantify this, ideally integrated into CI perf tests so we do not regress on this. Maybe have an option to keep a window of 30s of RTT values for small (e.g. 200ms) time windows around for connections in a magic socket and exposing this via the doctor and/or the perf binary. We could also make this a sliding window of the last 30s, but this is likely too expensive to keep during every connection of a production setup.
We need a way of ensuring this does not regress in CI
The text was updated successfully, but these errors were encountered: