fix(headless-client/windows): improve Client startup times on Windows#5375
fix(headless-client/windows): improve Client startup times on Windows#5375thomaseizinger merged 93 commits intomainfrom
Conversation
…zone into fix/ipc-service-log-filter
… fix/windows-slow-startup
e.g. `#[tracing::instrument]` will now print the time a function spent busy and idle without having to add a timer crate or manually capture `Instant`s
… fix/windows-slow-startup
…ug-commands' into fix/windows-slow-startup
thomaseizinger
left a comment
There was a problem hiding this comment.
Great stuff! I left some comments, nothing blocking.
Regarding the PR description: I appreciate the thorough benchmarking and write-up. Can we move that to a comment on the PR instead? With our squash-merging policy, the PR description turns into a commit message on main. For bisecting for example, it is nice to be able to read the commit message as short prose of why we ended up making certain changes and have it not intermediate results. It is a balance acts obviously :)
| dns_control::deactivate()?; | ||
| impl<'a> Handler<'a> { | ||
| async fn new(server: &mut IpcServer, dns_controller: &'a mut DnsController) -> Result<Self> { | ||
| dns_controller.deactivate()?; |
There was a problem hiding this comment.
Given this, is the default of may_be_active: true redundant?
There was a problem hiding this comment.
No, maybe active is the wrong word for it. It's more like a dirty flag in a cache, it means we need to deactivate our control if we should not be in control.
If I split up the I/O benchmarking from the deterministic parts as I wrote in the other thread, I might be able to remove this completely.
There's 3 reasons to deactivate DNS control:
- The IPC service or Headless Client just started, and we need to recover from a possible Firezone crash or system crash
- Connlib initiated a disconnect and we need to bail out of this iteration of the IPC service's main loop
- The GUI initiated a sign-out and we need to stop controlling DNS
Number 1 should always happen. Numbers 2 and 3 can afford to be slow since Firezone is shutting down anyway. So I think this optimization made sense a few days ago when it removed the defensive "deactivate during sign-in" but now it may be unnecessary.
There was a problem hiding this comment.
The IPC service or Headless Client just started, and we need to recover from a possible Firezone crash or system crash
Can we move this concern to the main of the respective binary?
|
(Old benchmarks were removed) All benchmarks rounded to the nearest 0.1 seconds Benchmark 2-
Ubuntu 20.04 UTM VM:
Windows 11 Parallels VM:
Benchmark 3
Windows 11 Parallels VM:
Why is Benchmark 4
Windows 11 Parallels VM:
Now this PR is faster again. I must have made some human error in the testing. Benchmark 5
Windows 11 Parallels VM:
Benchmark 6
Windows 11 Parallels VM:
Benchmark 7
Main with measurement added,
Main with speedups added,
It's about twice as fast on median. |
Closes #5026
Closes #5879
On the resource-constrained Windows Server 2022 test VM, the median sign-in time dropped from 5.0 seconds to 2.2 seconds.
Changes
ipconfiginstead of Powershell to flush DNS fasterSet-DnsClientServerAddressstep from activating DNS controlRemove-NetRoutepowershell cmdlet that seems to do nothingBenchmark 7
Main with measurement added,
c1c99197efrom #5864Main with speedups added,
2128329f9from #5375, this PRHypothetical further optimizations
netshsubprocess inset_ips