Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(connlib): delay initialization of Sockets until we have a tokio runtime #4286

Merged
merged 7 commits into from
Mar 25, 2024

Conversation

jamilbk
Copy link
Member

@jamilbk jamilbk commented Mar 24, 2024

Our sockets need to be initialized within a tokio runtime context. To achieve this, we don't actually initialize anything on Sockets::new. Instead, we call rebind within the constructor of Tunnel which already runs in a tokio context.

Fixes: #4282

Copy link

vercel bot commented Mar 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
firezone ⬜️ Ignored (Inspect) Visit Preview Mar 25, 2024 10:41pm

Copy link

github-actions bot commented Mar 24, 2024

Terraform Cloud Plan Output

Plan: 9 to add, 8 to change, 9 to destroy.

Terraform Cloud Plan

Copy link

github-actions bot commented Mar 24, 2024

Performance Test Results

TCP

Test Name Received/s Sent/s Retransmits
direct-tcp-client2server 221.0 MiB (-2%) 222.4 MiB (-2%) 169 (-18%)
direct-tcp-server2client 227.4 MiB (+0%) 229.2 MiB (+0%) 216 (-49%)
relayed-tcp-client2server 145.0 MiB (-4%) 145.6 MiB (-4%) 122 (-36%)
relayed-tcp-server2client 154.0 MiB (+2%) 154.4 MiB (+2%) 175 (+1%)

UDP

Test Name Total/s Jitter Lost
direct-udp-client2server 50.0 MiB (-0%) 0.05ms (-85%) 0.00% (NaN%)
direct-udp-server2client 50.0 MiB (+0%) 0.01ms (-6%) 0.00% (NaN%)
relayed-udp-client2server 50.0 MiB (-0%) 0.17ms (+92%) 0.00% (NaN%)
relayed-udp-server2client 50.0 MiB (-0%) 0.05ms (-2%) 0.00% (NaN%)

@jamilbk
Copy link
Member Author

jamilbk commented Mar 24, 2024

@thomaseizinger I think you'd be much faster at wrapping this up, so I'll assign it to you. I went down the road of adding a new with_runtime closure to Session like you have with with_protect but got stuck. Not sure if that's the right approach.

I think Gabi pulled 2c2c617 out because it was causing CI failures.

We need to call this within a tokio runtime so we can't call it as
part of setting up the `Session` but need to delay it to `Tunnel::new`.
@thomaseizinger thomaseizinger changed the title refactor(connlib): Hide protect callback inside Sockets and ensure Sockets::new() is called within a tokio runtime refactor(connlib): delay initialization of Sockets until we have a tokio runtime Mar 24, 2024
@thomaseizinger
Copy link
Member

@jamilbk I hope the patch I pushed fixes it! It also removes some duplication from the previous approach which is nice :)

pub fn new() -> io::Result<Self> {
#[cfg(unix)]
pub fn with_protect(
protect: impl Fn(std::os::fd::RawFd) -> io::Result<()> + Send + 'static,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conectado As part of #4159, I want to move the initialization of the TUN device to the upper layers, i.e. outside of Tunnel. At that point, the use of FIREZONE_MARK can be entirely in the clients and the linux client can also use this callback to call set_mark on the socket which unifies this behaviour across all platforms.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this, in fact, I think making connlib completely platform-independent would be very nice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it doesn't make the FFI / callbacks more complex just to do something that connlib already knows it needs to do.

@thomaseizinger thomaseizinger marked this pull request as ready for review March 24, 2024 12:38
Copy link
Member Author

@jamilbk jamilbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll slot this in my PR #4133 and let the others review this week.

@jamilbk
Copy link
Member Author

jamilbk commented Mar 24, 2024

Tested on Android (emulator 10), macOS, and iOS and this seems to fix the issue.

@@ -13,10 +13,47 @@ use crate::Result;
pub struct Sockets {
socket_v4: Option<Socket>,
socket_v6: Option<Socket>,

#[cfg(unix)]
protect: Box<dyn Fn(std::os::fd::RawFd) -> io::Result<()> + Send + 'static>,
Copy link
Collaborator

@conectado conectado Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that sure about this platform-dependent callback.

What if instead of having a protect callback, we have a create_socket callback, that returns the socket that each platform can create as it want.

Another idea, that perhaps works better, is getting a socket2::UdpSocket instead of a tokio::Socketand have the client just send the socket and we create the Sockets and Socket when it is received.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conectado Would it make sense to save this refactor for another PR just to get these fixes into main? I believe @thomaseizinger is out this week and you next week so maybe sometime after that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'll be in favor of this (sorry, I missed this comment) fixing main asap seems more important :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a great idea @conectado ! Despite some duplication across the clients, I think it is much cleaner to let them choose, how to initialize they sockets. There is actually no reason why connlib should dictate the use of a random port. it should be up to the client to pick the port. That would make it trivial to e.g. implement a PORT env variable on the gateway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool 😎

Copy link
Collaborator

@ReactorScram ReactorScram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Some of the last month of refactors have definitely been fighting each other though, I just had to add that Sockets::new() to session.reconnect() in the GUI client on another PR branch, and now it's already going away again lol

rust/connlib/tunnel/src/io.rs Show resolved Hide resolved
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
@jamilbk jamilbk added this pull request to the merge queue Mar 25, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 25, 2024
@conectado conectado added this pull request to the merge queue Mar 25, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 25, 2024
@jamilbk
Copy link
Member Author

jamilbk commented Mar 25, 2024

@ReactorScram I think this is failing to get merged because #4198 got merged before.

@jamilbk
Copy link
Member Author

jamilbk commented Mar 25, 2024

Updating from main, just need to fix the semantic errors.

@AndrewDryga We finally hit a good example of where the merge queue prevented two PRs in flight prevent from landing in main and causing a semantic error

@ReactorScram
Copy link
Collaborator

Yeah 4198 is the one I was thinking of, where I had to do session.reconnect(Sockets::new())

@jamilbk jamilbk enabled auto-merge March 25, 2024 22:46
@jamilbk jamilbk added this pull request to the merge queue Mar 25, 2024
Merged via the queue into main with commit 2283898 Mar 25, 2024
138 checks passed
@jamilbk jamilbk deleted the fix/sockets-inside-tokio-runtime branch March 25, 2024 23:02
@thomaseizinger
Copy link
Member

Makes sense. Some of the last month of refactors have definitely been fighting each other though, I just had to add that Sockets::new() to session.reconnect() in the GUI client on another PR branch, and now it's already going away again lol

Yeah sorry about that. What was merged in #4263 was actually an in-between design, see #4263 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sockets::new() crashes on Apple and Android
4 participants