-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nat to nat #146
Nat to nat #146
Conversation
@remyleone if you want to have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JulienVdG 😍 thank you for taking a look at this issue. I definitely love the approach. I think that keeping a map of discovered endpoints in each node is great and will indeed help maintain stability in the cluster!
Also, thank you for adding the tests! I know these are very verbose and daunting.
One question: don't we need to remove the requirement in node.Ready
for the node to have an endpoint?
Edit: as commented in #109 (comment), we don't need to modify node.Ready
pkg/mesh/mesh.go
Outdated
updateNATEndpoints(nodes, peers, oldConf) | ||
natEndpoints := updateNATEndpoints(nodes, peers, oldConf, m.logger) | ||
nodes[m.hostname].DiscoveredEndpoints = natEndpoints | ||
m.nodes[m.hostname].DiscoveredEndpoints = natEndpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodes
is a map of pointers to Nodes, so I think updating the nodes
map should suffice and we should not need to update m.nodes
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The applyTopology
function starts by doing shallow copies of the Ready Nodes ...
Updating nodes
will provide the updated endpoint to NewTopology
.
Updating m.nodes
ensure that next call to checkIn
will publish the DiscoveredEndpoints
to k8s.
There are others way to do it, but this looked easy enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I definitely agree that we want both nodes[m.hostname]
and m.nodes[m.hostname]
to include the updated endpoints so that the checkIn
method can update the endpoints annotation.
What I don't see is why we need to explicitly set .DiscoveredEndpoints = natEndpoints
for both of them. Since nodes
is a shallow copy, updating nodes[m.hostname]
will also update m.nodes[m.hostname]
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the copy is not a pointer copy but a shallow copy so more like:
https://play.golang.com/p/istIdrj__0_P
for it to work I need to make the endpoint list a pointer so that the copy points to the same map
eg: https://play.golang.com/p/M2uunAQXRyI
I can do that if you think it's better that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this some more, it occurs to me that the only reason we really had to do the shallow copies was to prevent the endpoints of the nodes to be overwritten by the updateNATEndpoints
func, which used to mutate the parameters. Now that the func no longer mutates data we could probably switch from shallow copies to just copying the pointers. We can do this in a follow up however
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
n.Endpoint = peer.Endpoint | ||
level.Debug(logger).Log("msg", "WireGuard Update NAT Endpoint", "node", n.Name, "endpoint", peer.Endpoint, "former-endpoint", n.Endpoint, "same", n.Endpoint.Equal(peer.Endpoint)) | ||
// Only public ip ? (should check location leader but only in topology ... or have topology handle that list) | ||
// Better check wg latest-handshake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea is to only set the endpoint, if the last handshake is not older than some constant, instead of how it is now?
So nodes that have an outdated endpoint will eventually forget about it and get the updated endpoint from another nodes' annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a great feature to add eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bare minimum is when latest-handshake
is 0 wireguard didn't see any connection yet, so better not advertise it (anyway in that cas it's still equal to n.Endpoint). Then yes we could detect outdated endpoint.
Also in topology instead of a static ordering we could sort by latest-handshake
, the fact that we give priority to the current detected endpoint already ensure stability, but having more noise would allow to try the different endpoints one node could have (in case of really complicated network)
The cost however in another call to wg
(ie wg show kilo0 latest-handshakes
) as the wg showconf
call will not return the latest-handshake
Makefile
Outdated
@@ -17,6 +17,10 @@ PROJECT := kilo | |||
PKG := github.com/squat/$(PROJECT) | |||
REGISTRY ?= index.docker.io | |||
IMAGE ?= squat/$(PROJECT) | |||
ifneq ($(REGISTRY),index.docker.io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind splitting this up into a new PR so we can keep each chunk of work separate and still squash the commits related to NAT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done #147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the drop from this PR, sorry...
Compare IP first by default and compare DNS name first when we know the Endpoint was resolved.
Now that updateNATEndpoints was updated to discoverNATEndpoints and that the endpoints are overridden by topology instead of mutating the nodes and peers object, we can safely drop this copy.
This one should be fine now, do you want me to squash the commits ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to squash in the PR, I'll squash now while merging 🛸
My approach is highly inspired from @squat comments in #109, with a difference on the annotation usage:
The annotation is note used for a node to present it's own endpoint detected by other, but the other way around: a node published the NAT endpoints it detected.
This has 2 advantages:
Then I updated the topology code to resolve the detected endpoint so that the local node will use :
I tested this on a cluster with public nodes and 2 NAT regions and the NAT regions could then see each-other.
The code probably still needs polishing.