Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix re-entrant GetOrHandshake issues #1044

Merged
merged 5 commits into from
Dec 19, 2023
Merged

Conversation

nbrownus
Copy link
Collaborator

@nbrownus nbrownus commented Dec 17, 2023

@wadey got a deadlock with v1.8.0 and was able to pull the stack trace. HandshakeManager.GetOrHandshake can be re-entrant via HandshakeManager.StartHandshake through a call to hm.lightHouse.QueryServer(). Aside from the double read lock on the main hostmap not being great, the ConnectionManager go routine had fired between the 1st and 2nd calls to HandshakeManager.GetOrHandshake and was waiting on a write lock for the main hostmap while blocking any future read locks.

This is fixed by adding a channel and handling the actual lighthouse queries in a go routine. Should also speed up the hot path when many handshakes are occurring.

There is also a case when a tunnel is being tested and is using a relay for a double read lock in ConnectionManager.

This is fixed by turning the test packet into a traffic decision result and handling outside of the read lock.

My primary concern is in handling the QueryServer writes on a nonblocking buffered channel. I think we will want to block when full but I am leaving as nonblocking for now to review.

@nbrownus nbrownus changed the title Dont hold a read lock on the main hostmap when starting a handshake Fix re-entrant GetOrHandshake issues Dec 18, 2023
wadey
wadey previously approved these changes Dec 18, 2023
Copy link
Member

@wadey wadey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with a small comment

lighthouse.go Outdated
Comment on lines 473 to 475
if lh.l.Level >= logrus.DebugLevel {
lh.l.WithField("vpnIp", ip).Debug("Lighthouse query buffer was full, dropping request")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be higher than debug, since without debug logs on it would be hard to tell this is happening and that you need to increase the buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to just make this a blocking write to a buffered channel

wadey
wadey previously approved these changes Dec 18, 2023
@wadey wadey added this to the v1.8.1 milestone Dec 18, 2023
Copy link
Collaborator

@brad-defined brad-defined left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have misread it, but I think the implementation may block when the channel is full.

@nbrownus nbrownus merged commit 072edd5 into master Dec 19, 2023
7 checks passed
@nbrownus nbrownus deleted the reentrant-getorhandshake branch December 19, 2023 17:58
@wadey wadey self-assigned this Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants