Skip to content

Conversation

@FelixMcFelix
Copy link
Collaborator

@FelixMcFelix FelixMcFelix commented Nov 14, 2023

This PR focusses on adding support for floating IP addresses to ports. Broadly speaking the intended behaviour is that:

  • a host will listen to all inbound traffic on all its external IPs (floating and ephemeral),
  • outbound traffic will choose an external IP in order of priority from floating > ephemeral > SNat.
  • outbound flows using a floating IP will choose between several (if supplied) by hashing the inner flow ID.

This includes the necessary changes such that UFTs created by inbound flows will now store their original destination address in their ActionDesc, preserving flow affinity. Similarly, ActionDescs can now be optionally preserved (and reverified) after flowtable resets to prevent the loss of prior choices of external IP which remain valid.

Will close #196.

We can now pass floating IPs in, not that they're being used in any
meaningful way. There will be some troublesome issues given that we want
to attach and detach these far more dynamically: namely, that Nat
overrides SNat and both are defined at creation of a port as rules. The
actual in/outbound logic will be quite simple (we just need to start
looking at flowids for 'external IP' and then validate against our own
set), but rules themselves need to move to `set_rules` like firewalls
allow.
Inbound NAT has been redesigned so that it will generate an action_desc
bound to the IP which traffic was received on, rather than the
designated ephemeral IP. This will provide the required flow affinity
when there are multiple floating + ephemeral IPs assigned.

Outbound NAT now stores a reference to all available outbound addresses
(on a per-rule basis). In the event that several are available at
the same rule priority, we split them using the CRC of the flow hash.

The main TODO after that is making this dynamic (i.e., adding a new
ioctl for setting network config). Broadly, this should just require
rebuilding the ruleset so shouldn't really need deep-reaching Arcs +
KMutexes.

We are, however, going to run into a tough problem around UFT
invalidation. On any epoch change, currently all UFTs are discarded. The
problem here is that we need to maintain UFTs so that flows don't
reassign themselves to the wrong external IP, even when seemingly
unrelated rulesets (ie., firewall) are changed. There should be some way
to revalidate them against the new layer/rule/action set to preserve
entries where possible.
Probably more important to make this distinction now that floating IPs
are also (a different class of) external IPs.
There are a few things here in an early form: we want the engine's
definition of VpcCfg to be its own thing rather than relying on
oxide_vpc::api, to simplify storing locks etc. in VpcCfg for more
dynamic state. We also want a shared reference/resource type so that the
central VpcCfg and any built actions can easily update their behaviour
(and any memoized outputs).

While this is *less* necessary for external IP configuration, this work
will better suit changing other aspects to be dynamic such as DHCP, in a
better manner than attempt number 1.

This is very messy and is just enough to get an ioctl working in a clean
way.
These have unearthed some issues; most pressing is the in->out flow
affinity.
From testing, it now seems that epoch bumps via e.g. set_rules on other
layers don't break NAT: they flush their own internal state but the UFT
is already easily rebuilt from the othe flow table entries.

The exception is set_rules on the NAT layer itself: I believe this is
the last issue, and specifically where we need to revalidate all LFT
entries.
There are still some issues I'm aware of, but this meets the criteria
for getting affinity-preservation on external IP add/remove working. The
most obvious is that this *isn't* yet accurately handling most classes
of action which a rule can output, the second is that we perform action
matching effectively twice on a dirty entry to make sure that stats are
working fine.
///
/// See [FlowTable::mark_dirty] for the performance and correctness
/// implications.
pub fn set_rules_soft(
Copy link
Collaborator Author

@FelixMcFelix FelixMcFelix Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, with the current design, just revalidate all actions at the time set_rules/_soft is called. The lazy approach was partly designed around possibly needing access to packet metadata and body for action validation down the line.

Heapless, crc32fast were mstakenly being pulled in by omicron.
We can now see the semantic differences a bit better, and this helps
with my testing too.
@FelixMcFelix FelixMcFelix marked this pull request as ready for review November 28, 2023 19:50
Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this together Kyle!

Still looking through things but one overall thing: The whole ephemeral vs floating distinction feels like it should live in just oxide-vpc/omicron vs opte proper.

@FelixMcFelix
Copy link
Collaborator Author

Thanks for getting this together Kyle!

Still looking through things but one overall thing: The whole ephemeral vs floating distinction feels like it should live in just oxide-vpc/omicron vs opte proper.

Thanks, Luqman. I agree -- I've generally tried to structure this so that OPTE itself still only knows about NAT rules wrt individual external IPs, plus mechanisms for NetworkImpl providers to re-validate rules and preserve affinity (Inbound vs Outbound NAT actions). oxide-vpc is responsible for knowing about and enforcing the (eph + float IPs)<->(rules + priorities) mapping.

@luqmana
Copy link
Contributor

luqmana commented Nov 28, 2023

Thanks for getting this together Kyle!
Still looking through things but one overall thing: The whole ephemeral vs floating distinction feels like it should live in just oxide-vpc/omicron vs opte proper.

Thanks, Luqman. I agree -- I've generally tried to structure this so that OPTE itself still only knows about NAT rules wrt individual external IPs, plus mechanisms for NetworkImpl providers to re-validate rules and preserve affinity (Inbound vs Outbound NAT actions). oxide-vpc is responsible for knowing about and enforcing the (eph + float IPs)<->(rules + priorities) mapping.

Yup, looked through the rest of the PR some more now and looks good. The only confusing bit was opteadm referring to ephemeral v.s. floating, but it's already oxide-vpc specific (maybe we should rename it at some point)

Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@FelixMcFelix FelixMcFelix merged commit 24ceba1 into master Dec 1, 2023
@FelixMcFelix FelixMcFelix deleted the floating-ip branch December 1, 2023 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Want support for multiple external IP addresses for a Port

3 participants