Skip to content

fix(connlib): discard timer once it fired#7288

Merged
thomaseizinger merged 4 commits into
mainfrom
fix/reset-timer-no-timeout
Nov 8, 2024
Merged

fix(connlib): discard timer once it fired#7288
thomaseizinger merged 4 commits into
mainfrom
fix/reset-timer-no-timeout

Conversation

@thomaseizinger
Copy link
Copy Markdown
Member

Within connlib, we have many nested state machines. Many of them have internal timers by means of timestamps with which they indicate, when they'd like to be "woken" to perform time-related processing. For example, the Allocation state machine would indicate with a timestamp 5 minutes from the time an allocation is created that it needs to be woken again in order to send the refresh message to the relay.

When we reset our network connections, we pretty much discard all state within connlib and together with that, all of these timers. Thus the poll_timeout function would return None, indicating that our state machines are not waiting for anything.

Within the eventloop, the most outer state machine, i.e. ClientState is paired with an Io component that actually implements the timer by scheduling a wake-up aggregated as the earliest point of all state machines.

In order to not fire the same timer multiple times in a row, we already intended to reset the timer once it fired. It turns out that this never worked and the timer still lingered around.

When we call reset, poll_timeout - which feeds this timer - returns None and the timer doesn't get updated until it will finally return Some with an Instant. Because the previous timer didn't get cleared when it fired, this caused connlib to busy loop and prevent some(?) other parts of it from progressing, resulting in us never being able to reconnect to the portal. Yet, because the event loop itself was still operating, we could still resolve DNS queries and such.

Resolves: #7254.

@vercel
Copy link
Copy Markdown

vercel Bot commented Nov 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 8, 2024 0:05am

@thomaseizinger thomaseizinger added this pull request to the merge queue Nov 8, 2024
Merged via the queue into main with commit 8653146 Nov 8, 2024
@thomaseizinger thomaseizinger deleted the fix/reset-timer-no-timeout branch November 8, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows 1.3.10 fails to establish connections to CIDR and DNS resources

2 participants