Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(connlib): clients make use of DNS mangling on gateways #5049

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

conectado
Copy link
Collaborator

@conectado conectado commented May 20, 2024

This PR is the "client-side" of things for #4994. Up until now, when a user wanted to connect to a DNS resource, we would establish a connection to the gateway and pass along the domain we are trying to access. The gateway would resolve that domain and send the response back to the client, allowing them to finally send a DNS response.

Now, we instantly assign and respond with 4x A and 4x AAAA records to any query for one of our DNS resources. Upon the first IP packet for one of these "proxy IPs", we select a gateway, establish a connection and send our proxy IPs along. The gateway then performs the necessary mangling and NATing of all packets. See #5354 for details.

Copy link

vercel bot commented May 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
firezone ⬜️ Ignored (Inspect) Visit Preview Jun 20, 2024 5:40am

@github-actions github-actions bot added the kind/refactor Code refactoring label May 20, 2024
Copy link

github-actions bot commented May 20, 2024

Terraform Cloud Plan Output

Plan: 15 to add, 23 to change, 15 to destroy.

Terraform Cloud Plan

Copy link

github-actions bot commented May 20, 2024

Performance Test Results

TCP

Test Name Received/s Sent/s Retransmits
direct-tcp-client2server 232.6 MiB (-1%) 235.3 MiB (-0%) 270 (-22%)
direct-tcp-server2client 240.2 MiB (+2%) 241.7 MiB (+2%) 324 (-7%)
relayed-tcp-client2server 236.6 MiB (-3%) 237.4 MiB (-3%) 408 (+29%)
relayed-tcp-server2client 245.0 MiB (+3%) 246.2 MiB (+2%) 842 (+36%)

UDP

Test Name Total/s Jitter Lost
direct-udp-client2server 500.0 MiB (-0%) 0.04ms (+58%) 44.99% (+2%)
direct-udp-server2client 500.0 MiB (+0%) 0.01ms (-46%) 24.84% (-10%)
relayed-udp-client2server 500.0 MiB (+0%) 0.03ms (-24%) 54.05% (+1%)
relayed-udp-server2client 500.0 MiB (-0%) 0.02ms (-66%) 36.32% (+5%)

@conectado conectado force-pushed the refactor/dns-translation-on-gateway branch from 279f273 to 885d0c2 Compare May 20, 2024 21:16
@conectado conectado force-pushed the refactor/dns-translation-on-gateway branch from 051462f to 5fb602f Compare May 28, 2024 04:16
github-merge-queue bot pushed a commit that referenced this pull request May 30, 2024
…way` (#5160)

To encode that clients always have both ipv4 and ipv6 and they are the
only allowed source ips for any given client, into the type, we split
those into their specific fields in the `ClientOnGateway` struct and
update tests accordingly.

Furthermore, these will be used for the DNS refactor for ipv6-in-ipv4
and ipv4-in-ipv6 to set the source ip of outgoing packets, without
having to do additional routing or mappings. There will be more notes on
this on the corresponding PR #5049 .

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
@conectado conectado force-pushed the refactor/dns-translation-on-gateway branch from 03e0c32 to f5df55f Compare May 31, 2024 04:58
@jamilbk
Copy link
Member

jamilbk commented Jun 2, 2024

Leaving a note here to delay merging/releasing this until 2 weeks after announcement is made to current users. Planning to send that out this week.

Copy link
Member

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an early pass. Looks pretty good already! Curious to hear whether my new tests found any bugs!

rust/connlib/shared/src/messages.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/gateway.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/lib.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/peer.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/peer.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/client.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/client.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/client.rs Show resolved Hide resolved
rust/connlib/tunnel/src/client.rs Outdated Show resolved Hide resolved
rust/connlib/tunnel/src/client.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Member

Leaving a note here to delay merging/releasing this until 2 weeks after announcement is made to current users. Planning to send that out this week.

@jamilbk We discussed an idea today that would allow us to make a comparatively easy, backwards-compatible change to the gateway that allows old and new clients to connect at the same time. We can roll this out early, and tell everyone to upgrade. After a week or two, we can then merge the client changes which then require a new gateway.

This should make deployment easier because nothing has to be upgraded in lockstep.

@jamilbk
Copy link
Member

jamilbk commented Jun 5, 2024

Leaving a note here to delay merging/releasing this until 2 weeks after announcement is made to current users. Planning to send that out this week.

@jamilbk We discussed an idea today that would allow us to make a comparatively easy, backwards-compatible change to the gateway that allows old and new clients to connect at the same time. We can roll this out early, and tell everyone to upgrade. After a week or two, we can then merge the client changes which then require a new gateway.

This should make deployment easier because nothing has to be upgraded in lockstep.

This would be amazing. Let's opt for that if we can to avoid production disruptions.

@thomaseizinger
Copy link
Member

Leaving a note here to delay merging/releasing this until 2 weeks after announcement is made to current users. Planning to send that out this week.

@jamilbk We discussed an idea today that would allow us to make a comparatively easy, backwards-compatible change to the gateway that allows old and new clients to connect at the same time. We can roll this out early, and tell everyone to upgrade. After a week or two, we can then merge the client changes which then require a new gateway.
This should make deployment easier because nothing has to be upgraded in lockstep.

This would be amazing. Let's opt for that if we can to avoid production disruptions.

The plan is:

  1. Make the deserialisation of the message in the gateway handle both messages, new and old (i.e. with IPs and without)
  2. In case of old, default to 0 IPs
  3. Make the behaviour in the gateway such that no addresses means no translation, i.e. the current behaviour

What I would like to avoid is having a state of main that doesn't allow us to release. Hence, we would split this PR into two:

  1. Only build the new packet translation logic in the gateway with the above behaviour to allow old clients to function. Compatibility tests and manual testing should give us high-confidence that this indeed works.
  2. Make a 2nd, stacked PR that includes all the client changes. CI in that PR should tell us whether the new behaviour works as intended.

We shouldn't merge PR (1) until CI in PR (2) is green. In fact, I think we should continue to develop the new feature in this PR until we know it is working. Then we can see if a PR only changing the gateway also passes CI.

The window inbetween merging these two PRs is essentially the timeframe that customers have to upgrade their gateways (assuming we make a release soon-ish after each PR merges).

github-merge-queue bot pushed a commit that referenced this pull request Jun 5, 2024
Currently, the transition for sending ICMP packets does not explicitly
state whether we want to send an IPv4 or an IPv6 packet. Being explicit
about this makes things a bit easier to understand.

It may also simplify the adaption of the tests for #5049.
@jamilbk
Copy link
Member

jamilbk commented Jun 5, 2024

The window inbetween merging these two PRs is essentially the timeframe that customers have to upgrade their gateways (assuming we make a release soon-ish after each PR merges).

Got it. Aiming for 2 weeks for this one, so ideally the first PR will land end of this week / early next (Gateway 1.1), we'll make the announcement, and the 2nd PR would sometime end of June (Clients 1.1).

That means I probably need to refactor CI to split off the client releases from elixir/gateway releases. #4397

cc @conectado

github-merge-queue bot pushed a commit that referenced this pull request Jun 5, 2024
With #5049, connlib will start mangling and translating ICMP requests
which means we can no longer rely on the ICMP request emitted by the
gateway to have the same sequence number and identifier as originally
generated by the client. In the end, that is actually also not a
property we care about.

What we do care about is that an ICMP request results in an ICMP reply
and that _those_ have a matching sequence number and identifier.
Additionally, every ICMP request arriving at the gateway should target
the correct resource. For CIDR resources, we already now which IP that
should be. For DNS resources, it has to be one of the resolved IPs for
the domain.
github-merge-queue bot pushed a commit that referenced this pull request Jun 5, 2024
With #5049, we will allocate a fixed set of 4 IPs per DNS resource on
the client. In order to ensure that this always works correctly,
increase the number of resolved IPs to at most 6.
@conectado conectado force-pushed the refactor/dns-translation-on-gateway branch from 25a4532 to aa47f9b Compare June 6, 2024 02:58
@conectado conectado force-pushed the refactor/dns-translation-on-gateway branch 2 times, most recently from 92864ca to 3425f6a Compare June 10, 2024 21:55
@conectado conectado force-pushed the refactor/dns-translation-on-gateway branch from dc58fc1 to 3b3594e Compare June 12, 2024 23:19
@thomaseizinger thomaseizinger force-pushed the refactor/dns-translation-on-gateway branch from d239369 to 340fc2a Compare June 13, 2024 02:44
@thomaseizinger thomaseizinger force-pushed the refactor/dns-translation-on-gateway branch 2 times, most recently from 9b76ed1 to 6b73955 Compare June 14, 2024 05:56
@thomaseizinger thomaseizinger changed the base branch from main to refactor/gateway-only-support-dns-translation June 14, 2024 05:56
@thomaseizinger thomaseizinger changed the title refactor(connlib): dns translation on gateway feat(connlib): clients make use of DNS mangling on gateways Jun 14, 2024
@conectado conectado force-pushed the refactor/gateway-only-support-dns-translation branch from deac45a to ab202c9 Compare June 15, 2024 02:00
@conectado conectado force-pushed the refactor/dns-translation-on-gateway branch from 6b73955 to 1f054d5 Compare June 15, 2024 04:30
@thomaseizinger thomaseizinger force-pushed the refactor/gateway-only-support-dns-translation branch from ab202c9 to 6aa74b1 Compare June 17, 2024 00:54
@thomaseizinger thomaseizinger force-pushed the refactor/dns-translation-on-gateway branch 2 times, most recently from 94b9ca3 to 043b684 Compare June 17, 2024 02:00
@conectado conectado force-pushed the refactor/gateway-only-support-dns-translation branch from 1fa85b5 to 3528a69 Compare June 19, 2024 00:52
Base automatically changed from refactor/gateway-only-support-dns-translation to main June 19, 2024 23:30
@thomaseizinger thomaseizinger force-pushed the refactor/dns-translation-on-gateway branch from cba6435 to 56eeaeb Compare June 19, 2024 23:41
Comment on lines +563 to +564
self.peers.insert(
GatewayOnClient::new(gateway_id, &ips, HashSet::from([resource_id])),
&[],
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conectado Because we need access to awaiting_connection_details (which we remove in this function now) in order to create the GatewayOnClient in peers, I had to move the creation of the peer into this function.

@thomaseizinger thomaseizinger marked this pull request as ready for review June 20, 2024 01:27
@thomaseizinger thomaseizinger force-pushed the refactor/dns-translation-on-gateway branch from 56eeaeb to 98f567a Compare June 20, 2024 01:30
conectado and others added 8 commits June 20, 2024 15:38
This is not a working commit, though it does compiles.

With this refactor, each time a DNS query for one of our resources is
seen we return a unique set of ips for it. Might be the case we have
seen the resource before, that means we return forever the same one.

Now we don't initiate a connection when we see the query but we defer
that until a later time since mangling will be done in the gateway.

remove dns_internal_ips

send intent based on ip

refactor(connlib): remove translation on client peers

refactor(connlib): keep track of the fqdn and ips in the awaiting connection details and send that up to the gateway

chore(connlib): remove unused function

refactor(connlib): have the gateway translate from the proxy ips to the real ip addresses for DNS

refactor(connlib): do nat propperly with source port and translate destination ip

refactor(connlib): expire nat table after not seeing packet for 1 min

fix(connlib): delete the correct table entry when expirying nat

feat(connlib): add initial ipv4-to-ipv6 translation methods

feat(connlib): add ipv6 to ipv4 header translation for non-fragmented packets

chore(connlib): remove unused functions

fix(connlib): set translation ips in gateways for reuse connections

refactor(connlib): add padding for packets to be able to convert them

feat(connlib): implement ipv6 to ipv4 conversion with icmpv6 to icmpv4 still missing

fix(connlib): icmp conversion

feat(connlib): set ipv4 for ipv6 translation if ipv6 is empty and vice-versa

chore(connlib): split allowed_ips into ipv4 and ipv6 in `ClientOnGateway`

To encode into the types that clients always have both ipv4 and ipv6 and
they are the only allowed source ips for any given client, we split
those into their specific fields in the `ClientOnGateway` struct and
update tests accordingly.

Furthermore, these will be used for the DNS refactor for ipv6-in-ipv4
and ipv4-in-ipv6 to set the source ip of outgoing packets, without
having to do additional routing or mappings. There will be more notes on
this on the corresponding PR.

fix(connlib): don't shorten packets after decapsulation

fix(connlib): poll_read to account for additional 20 bytes

fix(connlib): payload_mut for convertable ip packets

fix(connlib): correctly set source instead of destination for translate_source

fix(connlib): ipv4 to ipv6 payload length calculation

fix(connlib): icmp echo and reply

fix(rust): to_owned for convertable packets

chore(connlib): add todo

chore(connlib): remove stray dbgs

fix(connlib): always ensure source before transforming packets

fix(connlib): obtain proxy ips

fix(connlib): send proxy ips for reusing resource

feat(connlib): refresh mapping after 30 seconds of no incoming traffic

fix semantic conflicts

fix(connlib) get proptests to compile

encode new test logic

fix(connlib): don't generate gaps for proxy-ips

remove unused dependency

fix(connlib): fix rust warnings

fix(connlib): clippy lints

rename DnsMap to StubDns

move dns_resources within stub_dns

rename peer transform functions to keep same scheme

fix typo

rename StubDns to StubResolver

simplify ip mapping

fix gateway compilation on windows

fix warning message

unify decapsulate functions in the gateway

unify decapsulate functions in the gateway

enable deserialization of old message format

add backward compatibility test

make gateway handle backward compatible messages

Fix compile errors

Fix itertools dep

Merge `pick_dns_resource` into `dns_resource_by_domain`

Use `sorted_by_key`

apply pr feedback

address PR feedback

simplify nat port assign

Construct `outside` only once

Use `inspect_err` instead of `match`

Update NAT logs

Only log if NAT entry was present

Fix typos

Fix warning

Minor updates to tests

Make errors around unsupported IP protocols more obvious

Rename functions for consistency

Ad some more docs

Add some basic proptests for NAT table

Fix test
@thomaseizinger thomaseizinger force-pushed the refactor/dns-translation-on-gateway branch from 98f567a to fe6cced Compare June 20, 2024 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/refactor Code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants