Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rehandshaking #838

Merged
merged 11 commits into from May 4, 2023
Merged

Rehandshaking #838

merged 11 commits into from May 4, 2023

Conversation

nbrownus
Copy link
Collaborator

@nbrownus nbrownus commented Mar 31, 2023

Nebula can soft-reload its host certificate on a HUP signal. When it reloads a certificate, any new peer tunnels created will use the updated certificate. However, any previously established tunnels will not be updated with this new certificate information.

With this change, Nebula will detect when an older certificate is in use on an existing tunnel, and rebuild the tunnel with the newly updated host certificate. Once hosts have established a new tunnel, packet traversal is migrated to use the new tunnel, and the old tunnel is torn down when its lack of use is detected.

This PR also includes migrating relay information from the previously established tunnel to the new tunnel. A relay established over an older tunnel will be migrated over to the new tunnel, ensuring that the old tunnel can be torn down from lack of use.

@nbrownus nbrownus force-pushed the rehandshake branch 3 times, most recently from 119b60c to a4fcdfe Compare April 4, 2023 18:45
@nbrownus nbrownus added this to the v1.7.0 milestone Apr 10, 2023
@wadey wadey self-assigned this Apr 10, 2023
@nbrownus nbrownus marked this pull request as ready for review April 11, 2023 15:58
@jasikpark
Copy link
Collaborator

how is age of the cert determined?

@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @brad-defined to sign the Salesforce.com Contributor License Agreement.

@wadey wadey added needs-slack-review Needs review from a Slack team member needs-defined-net-review Review needed from a Defined Networking team member labels Apr 24, 2023
connection_manager.go Outdated Show resolved Hide resolved
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
Copy link
Member

@wadey wadey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM and working live


switch decision {
case deleteTunnel:
n.hostMap.DeleteHostInfo(hostinfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use n.intf.closeTunnel(hostinfo) here which also does f.lightHouse.DeleteVpnIp ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to use that one specifically but I could see a good argument for dropping it from the lighthouse cache. The only time we hit this case today is when we aren't receiving traffic even after sending a test packet and waiting some time.

I can spin another PR pretty quick to take care of that after merging this.

Comment on lines -281 to +283
f.SendMessageToVpnIp(header.Control, 0, target, msg, make([]byte, 12), make([]byte, mtu))
f.SendMessageToHostInfo(header.Control, 0, peer, msg, make([]byte, 12), make([]byte, mtu))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug that it was target before?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly not a bug? idk. But I think it's better this way.

'peer' is a HostInfo object, and 'target' is the VPN IP address of that HostInfo, so conceptually it's sending the message to the same target. Note the function call also changed from SendMessageToVpnIp to SendMessageToHostInfo. The difference is that SendMessageToVpnIp will always send a message to the top HostInfo in its list, while SendMessageToHostInfo will send using a specific HostInfo that may or may not be the top one in the list.

We now have multiple HostInfos for the same VPN IP in the main hostmap, and we want to be 100% sure that we're sending the CreateRelayRequest over the tunnel associated with the HostInfo on which we're storing relay state (in the AddRelay() call up in line 264).

If a new HostInfo were to successfully handshake and move to the front of the list after AddRelay() but before SendMessageToVpnIP(), then using SendMessageToVpnIp (which always sends to the top HostInfo in the list) we'd send the CreateRelayRequest to the peer's HostInfo for which our corresponding HostInfo lacked the Relay state. Oof, hope that makes sense.

@wadey wadey removed the needs-slack-review Needs review from a Slack team member label May 4, 2023
@nbrownus nbrownus merged commit 03e4a7f into master May 4, 2023
6 checks passed
@nbrownus nbrownus deleted the rehandshake branch May 4, 2023 20:16
gabos31 pushed a commit to oneclick-ag/nebula that referenced this pull request Jul 4, 2023
Co-authored-by: Brad Higgins <brad@defined.net>
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:signed needs-defined-net-review Review needed from a Defined Networking team member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants