Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nat to nat #146

Merged
merged 7 commits into from
Apr 21, 2021
Merged

Nat to nat #146

merged 7 commits into from
Apr 21, 2021

Conversation

JulienVdG
Copy link
Contributor

My approach is highly inspired from @squat comments in #109, with a difference on the annotation usage:
The annotation is note used for a node to present it's own endpoint detected by other, but the other way around: a node published the NAT endpoints it detected.

This has 2 advantages:

  • the node detecting the endpoint will only update it's own annotation (and can use the existing code for this)
  • if several nodes detect different values, they won't toggle the same annotation but instead publish each their own view.

Then I updated the topology code to resolve the detected endpoint so that the local node will use :

  • the endpoint it detected itself (to avoid changing a working configuration)
  • the endpoint detected by a node in the topology (looking segments and nodes in sorted order to get a stable output)

I tested this on a cluster with public nodes and 2 NAT regions and the NAT regions could then see each-other.

The code probably still needs polishing.

@JulienVdG
Copy link
Contributor Author

@remyleone if you want to have a look.

pkg/wireguard/conf.go Outdated Show resolved Hide resolved
Copy link
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JulienVdG 😍 thank you for taking a look at this issue. I definitely love the approach. I think that keeping a map of discovered endpoints in each node is great and will indeed help maintain stability in the cluster!

Also, thank you for adding the tests! I know these are very verbose and daunting.

One question: don't we need to remove the requirement in node.Ready for the node to have an endpoint?
Edit: as commented in #109 (comment), we don't need to modify node.Ready

pkg/wireguard/conf.go Outdated Show resolved Hide resolved
pkg/mesh/topology.go Outdated Show resolved Hide resolved
pkg/mesh/topology.go Show resolved Hide resolved
pkg/mesh/mesh.go Outdated Show resolved Hide resolved
pkg/mesh/mesh.go Outdated Show resolved Hide resolved
pkg/mesh/mesh.go Outdated
updateNATEndpoints(nodes, peers, oldConf)
natEndpoints := updateNATEndpoints(nodes, peers, oldConf, m.logger)
nodes[m.hostname].DiscoveredEndpoints = natEndpoints
m.nodes[m.hostname].DiscoveredEndpoints = natEndpoints
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes is a map of pointers to Nodes, so I think updating the nodes map should suffice and we should not need to update m.nodes as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyTopology function starts by doing shallow copies of the Ready Nodes ...
Updating nodes will provide the updated endpoint to NewTopology.
Updating m.nodes ensure that next call to checkIn will publish the DiscoveredEndpoints to k8s.

There are others way to do it, but this looked easy enough.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I definitely agree that we want both nodes[m.hostname] and m.nodes[m.hostname] to include the updated endpoints so that the checkIn method can update the endpoints annotation.

What I don't see is why we need to explicitly set .DiscoveredEndpoints = natEndpoints for both of them. Since nodes is a shallow copy, updating nodes[m.hostname] will also update m.nodes[m.hostname], no?

e.g. https://play.golang.com/p/JXCE1jnUN-c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the copy is not a pointer copy but a shallow copy so more like:
https://play.golang.com/p/istIdrj__0_P

for it to work I need to make the endpoint list a pointer so that the copy points to the same map
eg: https://play.golang.com/p/M2uunAQXRyI

I can do that if you think it's better that way.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're totally right 🤦

// Make a shallow copy of the node.

This lgtm then

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more, it occurs to me that the only reason we really had to do the shallow copies was to prevent the endpoints of the nodes to be overwritten by the updateNATEndpoints func, which used to mutate the parameters. Now that the func no longer mutates data we could probably switch from shallow copies to just copying the pointers. We can do this in a follow up however

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

n.Endpoint = peer.Endpoint
level.Debug(logger).Log("msg", "WireGuard Update NAT Endpoint", "node", n.Name, "endpoint", peer.Endpoint, "former-endpoint", n.Endpoint, "same", n.Endpoint.Equal(peer.Endpoint))
// Only public ip ? (should check location leader but only in topology ... or have topology handle that list)
// Better check wg latest-handshake
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is to only set the endpoint, if the last handshake is not older than some constant, instead of how it is now?

So nodes that have an outdated endpoint will eventually forget about it and get the updated endpoint from another nodes' annotation?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a great feature to add eventually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare minimum is when latest-handshake is 0 wireguard didn't see any connection yet, so better not advertise it (anyway in that cas it's still equal to n.Endpoint). Then yes we could detect outdated endpoint.

Also in topology instead of a static ordering we could sort by latest-handshake, the fact that we give priority to the current detected endpoint already ensure stability, but having more noise would allow to try the different endpoints one node could have (in case of really complicated network)

The cost however in another call to wg (ie wg show kilo0 latest-handshakes) as the wg showconf call will not return the latest-handshake

Makefile Outdated
@@ -17,6 +17,10 @@ PROJECT := kilo
PKG := github.com/squat/$(PROJECT)
REGISTRY ?= index.docker.io
IMAGE ?= squat/$(PROJECT)
ifneq ($(REGISTRY),index.docker.io)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind splitting this up into a new PR so we can keep each chunk of work separate and still squash the commits related to NAT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done #147

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the drop from this PR, sorry...

Now that updateNATEndpoints was updated to discoverNATEndpoints and that
the endpoints are overridden by topology instead of mutating the nodes and
peers object, we can safely drop this copy.
@JulienVdG
Copy link
Contributor Author

This one should be fine now, do you want me to squash the commits ?

Copy link
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to squash in the PR, I'll squash now while merging 🛸✈️🚀🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants