feat(connlib): remember recently connected gateways#6361
feat(connlib): remember recently connected gateways#6361thomaseizinger merged 5 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
Does this require portal changes to work? |
No, we use the existing field in connection intents to send these gateways. It would be good to get a confirmation from @AndrewDryga that the list of "connected gateways" is indeed processed in order. |
395adde to
1677fa7
Compare
There was a problem hiding this comment.
So I checked the portal code and I think this will have the behavior we're intending.
However, we may want to think about edge cases here. This effectively removes any load balancing logic for the duration of a Client's session. If the user travels on a plane for example while leaving Firezone signed in, the connections will never be re-routed to a closer Gateway. Currently the lat, lon pair is used to do that with each prepare_connection message.
I do think having sticky Gateways makes sense, but maybe we should clear this cache when we reset connection state to prevent the scenario above. I.e. when we roam.
|
See here for where the load_balance function is called: https://github.com/firezone/firezone/blob/main/elixir/apps/api/lib/api/client/channel.ex#L498 And here for its implementation where we will use a connected Gateway preferred by the connlib: https://github.com/firezone/firezone/blob/main/elixir/apps/domain/lib/domain/gateways.ex#L361 |
That is a good idea. |
1677fa7 to
82fda15
Compare
jamilbk
left a comment
There was a problem hiding this comment.
This is good - I think this should fix possible unforeseen issues with multi-site routing that might crop up due to Gateway selection not being deterministic (namely poorly-behaved load balancers or session management that behave differently for different regions / IPs)
|
Tested on macOS - seems to be working ok. |
82fda15 to
bf9996d
Compare
Previously,
connlibwould only send the currently connected gateways to the portal upon a new connection intent. With our introduced idle connection timeout, this could result in the portal choosing a different gateway upon reconnecting to the resource.To fix this, we introduce an LRU cache with at most 100 entries. Iteration over the LRU cache happens in MRU order, meaning a recently connected gateway will be at the front of the list.
We assume that this list is processed in order and thus still prefer gateways that we are still connected to.
Related: #6347.