feat(connlib): implement idempotent control protocol for gateway#6941
feat(connlib): implement idempotent control protocol for gateway#6941thomaseizinger merged 4 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
908c7b7 to
a04d4bc
Compare
a04d4bc to
5cebba8
Compare
d211416 to
cbab854
Compare
cbab854 to
f313946
Compare
f313946 to
d5bd2c4
Compare
|
|
||
| def psk do | ||
| random_token(@wg_psk_length, encoder: :base64) | ||
| |> String.slice(0, @wg_psk_length) |
There was a problem hiding this comment.
@AndrewDryga I had to remove this otherwise the string isn't valid base64 and we can't decode it. If we are receiving this as a string, the encoding needs to be well-defined.
926b84c to
9a61f39
Compare
|
CI is green! Ready for review @conectado ! |
9a61f39 to
b48a3a0
Compare
b48a3a0 to
a67a416
Compare
25db33c to
f24158e
Compare
81787d4 to
f24158e
Compare
| }; | ||
|
|
||
| if !peer.is_allowed(req.resource) { | ||
| tracing::debug!(cid = %peer.id(), resource = %req.resource, "Received `AssignedIpsEvent` for resource that is not allowed"); |
There was a problem hiding this comment.
This shouldn't happen during normal operation right? Client would wait until the client receive the ok response for the resource before sending this assigned ip event. So I think we should bump this log since it means something weird is going on?
| return Some(packet); | ||
| } | ||
|
|
||
| // TODO: Should we throttle concurrent events for the same domain? |
There was a problem hiding this comment.
When we address this TODO, should we just not process an event for a domain if there's another being processed in the upper layer.
This log should only ever happen if clients are buggy or someone is using a custom client. Thus worth a `warn`. Follow-up from #6941.
Building on top of the gateway PR (#6941), this PR transitions the clients to the new control protocol. Clients are **not** backwards-compatible with old gateways. As a result, a certain customer environment MUST have at least one gateway with the above PR running in order for clients to be able to establish connections. With this transition, Clients send explicit events to Gateways whenever they assign IPs to a DNS resource name. The actual assignment only happens once and the IPs then remain stable for the duration of the client session. When the Gateway receives such an event, it will perform a DNS resolution of the requested domain name and set up the NAT between the assigned proxy IPs and the IPs the domain actually resolves to. In order to support self-healing of any problems that happen during this process, the client will send an "Assigned IPs" event every time it receives a DNS query for a particular domain. This in turn will trigger another DNS resolution on the Gateway. Effectively, this means that DNS queries for DNS resources propagate to the Gateway, triggering a DNS resolution there. In case the domain resolves to the same set of IPs, no state is changed to ensure existing connections are not interrupted. With this new functionality in place, we can delete the old logic around detecting "expired" IPs. This is considered a bugfix as this logic isn't currently working as intended. It has been observed multiple times that the Gateway can loop on this behaviour and resolving the same domain over and over again. The only theoretical "incompatibility" here is that pre-1.4.0 clients won't have access to this functionality of triggering DNS refreshes on a Gateway 1.4.2+ Gateway. However, as soon as this PR merges, we expect all admins to have already upgraded to a 1.4.0+ Gateway anyway which already mandates clients to be on 1.4.0+. Resolves: #7391. Resolves: #6828.
This PR implements the new idempotent control protocol for the gateway. We retain backwards-compatibility with old clients to allow admins to perform a disruption-free update to the latest version.
With this new control protocol, we are moving the responsibility of exchanging the proxy IPs we assigned to DNS resources to a p2p protocol between client and gateway. As a result, wildcard DNS resources only get authorized on the first access. Accessing a new domain within the same resource will thus no longer require a roundtrip to the portal.
Overall, users will see a greatly decreased connection setup latency. On top of that, the new protocol will allow us to more easily implement packet buffering which will be another UX boost for Firezone.