Skip to content

Conversation

@thomaseizinger
Copy link
Member

@thomaseizinger thomaseizinger commented Oct 28, 2024

During normal operation, we should never lose connectivity to the set of assigned relays in a client or gateway. In the presence of odd network conditions and partitions however, it is possible that we disconnect from a relay that is in fact only temporarily unavailable. Without an explicit mechanism to retrieve new relays, this means that both clients and gateways can end up with no relays at all. For clients, this can be fixed by either roaming or signing out and in again. For gateways, this can only be fixed by a restart!

Without connected relays, no connections can be established. With #7163, we will at least be able to still establish direct connections. Yet, that isn't good enough and we need a mechanism for restoring full connectivity in such a case.

We creating a new connection, we already sample one of our relays and assign it to this particular connection. This ensures that we don't create an excessive amount of candidates for each individual connection. Currently, this selection is allowed to be silently fallible. With this PR, we make this a hard-error and bubble up the error that all the way to the client's and gateway's event-loop. There, we initiate a reconnect to the portal as a compensating action. Reconnecting to the portal means we will receive another init message that allows us to reconnect the relays.

Due to the nature of this implementation, this fix may only apply with a certain delay from when we actually lost connectivity to the last relay. However, this design has the advantage that we don't have to introduce an additional state within snownet: Connections now simply fail to establish and the next one soon after should succeed again because we will have received a new init message.

Resolves: #7162.

@thomaseizinger thomaseizinger self-assigned this Oct 28, 2024
@vercel
Copy link

vercel bot commented Oct 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 29, 2024 0:49am

Copy link
Member Author

@thomaseizinger thomaseizinger Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these tests, we don't add any relays and thus they can't succeed. The only thing they really test are timeouts which I couldn't be bothered to make work 😅

Copy link
Contributor

@conectado conectado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomaseizinger thomaseizinger force-pushed the fix/reconnect-portal-no-relays branch from 843680b to 7b093b0 Compare October 29, 2024 00:48
@thomaseizinger thomaseizinger added this pull request to the merge queue Oct 29, 2024
Merged via the queue into main with commit f7a3883 Oct 29, 2024
@thomaseizinger thomaseizinger deleted the fix/reconnect-portal-no-relays branch October 29, 2024 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disconnecting from relay; no response received. Is STUN blocked?

3 participants