Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix relay #827

Merged
merged 10 commits into from Mar 30, 2023
Merged

Fix relay #827

merged 10 commits into from Mar 30, 2023

Conversation

brad-defined
Copy link
Collaborator

Relay creation may race, meaning a host that has sent a CreateRelayRequest may receive a CreateRelayRequest message for the exact same tunnel, before receiving a CreateRelayResponse. When this happens, a relay's locally stored remote index will be 0. When a corresponding CreateRelayRequest arrives, it'll have a non-zero value for that remote index. The current code on Master detects this mismatch as a broken relay, and tears down the relay state. Destroying this relay state breaks all possible connectivity for the tunnel.

To work in all cases, once a relay's index is created and advertised (in a CreateRelayRequest), it should never be torn down until the associated HostInfo is destroyed. This PR updates the handling of CreateRelayRequests to avoid tearing down relay state in these detected mismatch cases.

I also added some synchronization around the Relay struct, using an atomic pointer.

I edited an existing test that could reproduce errors on Master, but which runs correctly with these updates. (Nate made the test.)

@brad-defined
Copy link
Collaborator Author

Pushed some updates:

(1) Sometimes a HostInfo that is not the main HostInfo sees traffic, which prevents it from being deleted by ConnectionManager. Since no more outgoing traffic is sent over this non-main HostInfo, it never cycles back through ConnectionManager. PR updates ConnectionManager to place a HostInfo back onto the watch list if traffic is seen as received during the deletion phase.
(2) When tearing down all tunnels, we must iterate all Indexes, not all Hosts (which will not return non-main HostInfos).
(3) Included a test that @nbrownus wrote that sussed out some races through relays
(4) Sped up some deletion timers during tests to 4 seconds, in order to make tests that wait for hostinfos to be removed to run faster.
(5) Included some locks around hostmap access during tests, which occasionally raced with Nebula code accessing the same main host maps.
(6) Modified HostInfo teardown code to take multiple host infos into account. Only removes Relay state if the HostInfo being deleted is the final HostInfo for that vpnIp in the hostmap.
(7) Updated tests to not return a copy of RelayState, as this could copy a locked mutex and deadlock the tests.

hostmap.go Outdated Show resolved Hide resolved
outside.go Outdated Show resolved Hide resolved
outside.go Outdated Show resolved Hide resolved
relay_manager.go Outdated Show resolved Hide resolved
@@ -88,17 +88,14 @@ func AddRelay(l *logrus.Logger, relayHostInfo *HostInfo, hm *HostMap, vpnIp iput

// EstablishRelay updates a Requested Relay to become an Established Relay, which can pass traffic.
func (rm *relayManager) EstablishRelay(relayHostInfo *HostInfo, m *NebulaControl) (*Relay, error) {
relay, ok := relayHostInfo.relayState.QueryRelayForByIdx(m.InitiatorRelayIndex)
relay, ok := relayHostInfo.relayState.CompleteRelayByIdx(m.InitiatorRelayIndex, m.ResponderRelayIndex)
if !ok {
rm.l.WithFields(logrus.Fields{"relayHostInfo": relayHostInfo.vpnIp,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relayHostInfo -> relayVpnIp.

Probably would be good to do a separate pass to make sure all logs are using the same field names.

@nbrownus nbrownus merged commit 2801fb2 into slackhq:master Mar 30, 2023
6 checks passed
@wadey wadey added this to the v1.7.0 milestone Apr 5, 2023
gabos31 pushed a commit to oneclick-ag/nebula that referenced this pull request Jul 4, 2023
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants