Skip to content

feat(connlib): remember recently connected gateways#6361

Merged
thomaseizinger merged 5 commits intomainfrom
feat/buffer-recently-used-gateways
Aug 21, 2024
Merged

feat(connlib): remember recently connected gateways#6361
thomaseizinger merged 5 commits intomainfrom
feat/buffer-recently-used-gateways

Conversation

@thomaseizinger
Copy link
Member

@thomaseizinger thomaseizinger commented Aug 20, 2024

Previously, connlib would only send the currently connected gateways to the portal upon a new connection intent. With our introduced idle connection timeout, this could result in the portal choosing a different gateway upon reconnecting to the resource.

To fix this, we introduce an LRU cache with at most 100 entries. Iteration over the LRU cache happens in MRU order, meaning a recently connected gateway will be at the front of the list.

We assume that this list is processed in order and thus still prefer gateways that we are still connected to.

Related: #6347.

@vercel
Copy link

vercel bot commented Aug 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 21, 2024 6:38am

@jamilbk
Copy link
Member

jamilbk commented Aug 20, 2024

Does this require portal changes to work?

@thomaseizinger
Copy link
Member Author

Does this require portal changes to work?

No, we use the existing field in connection intents to send these gateways. It would be good to get a confirmation from @AndrewDryga that the list of "connected gateways" is indeed processed in order.

Copy link
Member

@jamilbk jamilbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I checked the portal code and I think this will have the behavior we're intending.

However, we may want to think about edge cases here. This effectively removes any load balancing logic for the duration of a Client's session. If the user travels on a plane for example while leaving Firezone signed in, the connections will never be re-routed to a closer Gateway. Currently the lat, lon pair is used to do that with each prepare_connection message.

I do think having sticky Gateways makes sense, but maybe we should clear this cache when we reset connection state to prevent the scenario above. I.e. when we roam.

@jamilbk
Copy link
Member

jamilbk commented Aug 20, 2024

See here for where the load_balance function is called:

https://github.com/firezone/firezone/blob/main/elixir/apps/api/lib/api/client/channel.ex#L498

And here for its implementation where we will use a connected Gateway preferred by the connlib:

https://github.com/firezone/firezone/blob/main/elixir/apps/domain/lib/domain/gateways.ex#L361

@thomaseizinger
Copy link
Member Author

So I checked the portal code and I think this will have the behavior we're intending.

However, we may want to think about edge cases here. This effectively removes any load balancing logic for the duration of a Client's session. If the user travels on a plane for example while leaving Firezone signed in, the connections will never be re-routed to a closer Gateway. Currently the lat, lon pair is used to do that with each prepare_connection message.

I do think having sticky Gateways makes sense, but maybe we should clear this cache when we reset connection state to prevent the scenario above. I.e. when we roam.

That is a good idea.

Copy link
Member

@jamilbk jamilbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good - I think this should fix possible unforeseen issues with multi-site routing that might crop up due to Gateway selection not being deterministic (namely poorly-behaved load balancers or session management that behave differently for different regions / IPs)

@thomaseizinger thomaseizinger added this pull request to the merge queue Aug 20, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 20, 2024
@jamilbk jamilbk added this pull request to the merge queue Aug 20, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 20, 2024
@jamilbk
Copy link
Member

jamilbk commented Aug 20, 2024

Tested on macOS - seems to be working ok.

@thomaseizinger thomaseizinger added this pull request to the merge queue Aug 21, 2024
Merged via the queue into main with commit 16da501 Aug 21, 2024
@thomaseizinger thomaseizinger deleted the feat/buffer-recently-used-gateways branch August 21, 2024 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants