Skip to content

Stateful nat flow handling across reconfigs#1414

Merged
Fredi-raspall merged 25 commits intomainfrom
pr/fredi/stateful_nat_fixes
Apr 23, 2026
Merged

Stateful nat flow handling across reconfigs#1414
Fredi-raspall merged 25 commits intomainfrom
pr/fredi/stateful_nat_fixes

Conversation

@Fredi-raspall
Copy link
Copy Markdown
Contributor

@Fredi-raspall Fredi-raspall commented Mar 27, 2026

Fixes https://github.com/githedgehog/internal/issues/342

There are further changes needed. I'll open follow-up PRs.

@Fredi-raspall Fredi-raspall requested a review from qmonnet March 27, 2026 11:54
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/stateful_nat_fixes branch from 1583ca3 to 091d9de Compare March 27, 2026 11:55
Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this! The logics look good, I have mostly trivial comments.

You commented about “freezing the current allocator”, one concern is indeed how we ensure that no new flow is created in the flow table while we are busy validating the existing ones.

And we probably need some unit testing to mock a config change and check that we process the flows and allocations as expected.

Comment thread dataplane/src/packet_processor/mod.rs
Comment thread nat/src/stateful/apalloc/mod.rs
Comment thread nat/src/stateful/allocator_writer.rs Outdated
Comment thread nat/src/stateful/flows.rs
Comment thread nat/src/stateful/flows.rs Outdated
Comment thread nat/src/stateful/flows.rs Outdated
Comment thread nat/src/stateful/flows.rs Outdated
Comment thread nat/src/stateful/mod.rs
Comment thread nat/src/stateful/apalloc/setup.rs
Comment thread nat/src/stateful/allocator_writer.rs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates stateful NAT to better handle existing flows across configuration re-applies by validating/upgrading/invalidation of NAT-related flow entries during allocator updates, rather than updating flow genids opportunistically on the packet hot-path.

Changes:

  • Introduces stateful NAT flow validation utilities and wires allocator updates to re-reserve existing masquerade allocations (or invalidate flows) during reconfig.
  • Refactors stateful NAT allocator configuration/update API (StatefulNatConfig, update_nat_allocator) and propagates the FlowTable into mgmt/dataplane so reconfigs can act on live flows.
  • Reworks allocator randomness plumbing (randomize vs deterministic tests) and removes some previously fallible allocator build paths.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
nat/src/stateful/test.rs Updates tests to use StatefulNatConfig + update_nat_allocator with a FlowTable.
nat/src/stateful/natip.rs Simplifies NatIp trait by removing redundant conversion helpers.
nat/src/stateful/mod.rs Adds flows module and re-exports StatefulNatConfig.
nat/src/stateful/flows.rs New flow-table scan helpers to invalidate/upgrade/validate stateful NAT flows across reconfigs.
nat/src/stateful/apalloc/test_alloc.rs Adjusts allocator test helpers for new config API and randomness handling.
nat/src/stateful/apalloc/setup.rs Refactors allocator build-from-config and pool construction; threads randomize flag into allocators.
nat/src/stateful/apalloc/port_alloc.rs Adds randomize toggle to port allocator initialization.
nat/src/stateful/apalloc/natip_with_bitmap.rs Adapts to map_address API change.
nat/src/stateful/apalloc/mod.rs Moves randomness to runtime field, adds re-reservation APIs, updates allocation call chain.
nat/src/stateful/apalloc/display.rs Updates allocator display output for new randomness flag.
nat/src/stateful/apalloc/alloc.rs Threads randomness through allocators and changes IPv6 mapping helpers to be infallible (now panicking).
nat/src/stateful/allocator_writer.rs Replaces allocator update API with update_nat_allocator(config, flow_table) and adds genid-aware flow handling.
mgmt/src/tests/mgmt.rs Adds FlowTable to config processor test harness.
mgmt/src/processor/proc.rs Passes FlowTable + genid into stateful NAT apply path; marks it “infallible”.
mgmt/Cargo.toml Adds flow-entry dependency.
flow-entry/src/flow_table/table.rs Adds lock_read() accessor for read-locking the underlying table.
dataplane/src/packet_processor/mod.rs Stores flow_table in setup and plumbs it through pipeline builder.
dataplane/src/main.rs Passes flow_table into mgmt config processor params.
config/src/external/overlay/vpc.rs Adds stateful_nat_peerings() iterator helper.
Cargo.lock Updates lockfile for new mgmt dependency graph.

Comment thread nat/src/stateful/allocator_writer.rs Outdated
Comment thread flow-entry/src/flow_table/table.rs Outdated
Comment thread nat/src/stateful/apalloc/alloc.rs
Comment thread nat/src/stateful/flows.rs Outdated
Comment thread nat/src/stateful/allocator_writer.rs Outdated
@qmonnet qmonnet added the area/nat Related to Network Address Translation (NAT) label Mar 27, 2026
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/stateful_nat_fixes branch 4 times, most recently from 2d72214 to 48f0185 Compare April 2, 2026 17:30
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/stateful_nat_fixes branch from 48f0185 to f370d28 Compare April 10, 2026 19:27
@qmonnet
Copy link
Copy Markdown
Member

qmonnet commented Apr 13, 2026

Note: Compilation is broken due to changes in flow.rs between feat(stateful-nat): add logic to invalidate / renew flows and feat(flow-table): let lock_read() panic.

@qmonnet
Copy link
Copy Markdown
Member

qmonnet commented Apr 13, 2026

I rebased the work in this branch on top of #1454 in https://github.com/githedgehog/dataplane/tree/pr/qmonnet/fredi-rebased-nat-fixes

@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/stateful_nat_fixes branch from f370d28 to 62f2ba1 Compare April 16, 2026 16:13
@Fredi-raspall Fredi-raspall added the ci:+vlab Enable VLAB tests label Apr 16, 2026
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/stateful_nat_fixes branch from 9062e15 to 1353c78 Compare April 17, 2026 21:09
@Fredi-raspall Fredi-raspall marked this pull request as ready for review April 17, 2026 21:12
@Fredi-raspall Fredi-raspall requested a review from a team as a code owner April 17, 2026 21:12
@Fredi-raspall Fredi-raspall requested review from qmonnet and removed request for a team April 17, 2026 21:12
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/stateful_nat_fixes branch from 1353c78 to 6b0693a Compare April 17, 2026 21:13
@Fredi-raspall Fredi-raspall requested a review from Copilot April 17, 2026 21:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 4 comments.

Comment thread nat/src/stateful/allocator_writer.rs
Comment thread nat/src/stateful/flows.rs
Comment thread flow-entry/src/flow_table/table.rs
Comment thread nat/src/stateful/flows.rs
@Fredi-raspall Fredi-raspall changed the title Stateful nat flow handling accross reconfigs Stateful nat flow handling across reconfigs Apr 20, 2026
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/stateful_nat_fixes branch from 965421a to 72cc70b Compare April 20, 2026 12:14
Add the logic to deal with config changes with masquerading.
This includes:
  - invalidating flows whose masquerading state is incompatible
    with the new configuration.
  - upgrading the flows which should continue to be serviced.
  - reserving the ips and ports used by flows that should continue
    after a config change in the new NAT allocator

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
Call the new method to update the stateful NAT allocator.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Augment the StateFulNatConfig with boolean to enable / disable
randomization.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
The goal of this commit was updating the tests to use the new
update method that checks the flow table. To do that, the existing
code had to be refactored in several ways. Specifically, the
disabling randomization is no longer a feature only for tests, but
made part of the configuration. This significantly simplifies the
code.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
When processing a packet in stateful NAT, if the packet refers to
a flow, blindly obey the state in the flow and do not attempt to
upgrade it, since that's done anytime a configuration is applied.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
The current methods and functions to create an allocator return
Result<> to check for failures. However, such failures may mostly
be either because the data was not sanitized / validated or
due to issues somewhere else. Handling those errors, makes the code
significantly harder. Because those error conditions should either
never occur or we'd like them to happen before we apply a NAT config
(e.g. during validation) this commit makes those methods infallible.
Further work should check if those error conditions could happen
and, if so, prevent them from occuring when the config is applied.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
... make genid private, and properly construct the configuration.

Implementing PartialEq is needed so that when comparing two
configs the genid is not taken into account.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
Let method lock_read() panic as that is what we do in all of the
rest of methods and callers would unwrap() anyway.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
- Remove duplicate methods used to invalidate flows
- Add a one-line formatting for flow key and flow info to simplify
  logging.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
When the stateful nat config changes, we need to replace the
NAT allocator with a new one. While we build the new allocator
and "transfer" the reservations from the flow table, it may
happen that new flows get allocations from the old allocator.
Such allocations should be reflected in the new allocator.
Otherwise, the allocated ports could be re-allocated later
from the new allocator. There's multiple ways how this could
be solved:

1) updating the allocator (instead of replacing it). This is
   complex with the current implementation.
2) locking the allocator to defer allocations until the new
   allocator is installed. This is problematic since we don't
   want to penalize data path.
3) making the exiting allocator invisible by pulling it out
   of the data path. This is the simplest and chosen at the
   moment.
   Allocation attempts during the time window when the new
   allocator is being prepared will fail, but that's okay since
   with the new configuration we don't know if the allocation
   would be okay anyway or permitted.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
Adds methods that allow iterating over the flow table
and executing a function or closure for each of the flows.
These methods are added to avoid exposing a lock of the entire
table. Some of the methods are not used and must be refined
before we do so.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
... write the nat methods to check the flow table using the
indirect helpers.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
We have several iterators that iterate over all vpcs and their
peerings. When we iterate over all vpcs, there's no point in
checking both local and remote peering exposes since we'll
otherwise step on those twice.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Avoid calling expect() on bug condition.
Ideally, such conditions would be detected at validation time,
but this requires significant refactoring atm.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Rename num_masquerading_peerings to has_masquerading_peerings and
return a bool instead of a count.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.

Comment thread net/src/packet/mod.rs
Comment thread nat/src/stateful/allocator_writer.rs
Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but please make sure to follow up with the tests and fixes for the potential race in the flow table + allocator update, as we discussed. Thanks a lot!

@Fredi-raspall Fredi-raspall added this pull request to the merge queue Apr 23, 2026
Merged via the queue into main with commit 040a012 Apr 23, 2026
34 checks passed
@Fredi-raspall Fredi-raspall deleted the pr/fredi/stateful_nat_fixes branch April 23, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/nat Related to Network Address Translation (NAT) ci:+release Enable VLAB release tests ci:+vlab Enable VLAB tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants