Skip to content

feat: Faster UDP/IO on Apple platforms#1993

Merged
djc merged 1 commit intoquinn-rs:mainfrom
larseggert:feat-apple-datapath
Oct 25, 2024
Merged

feat: Faster UDP/IO on Apple platforms#1993
djc merged 1 commit intoquinn-rs:mainfrom
larseggert:feat-apple-datapath

Conversation

@larseggert
Copy link
Copy Markdown
Contributor

@larseggert larseggert commented Sep 20, 2024

This uses Apple's private sendmsg_x and recvmsg_x system calls for multi-packet UDP I/O.

CC @mxinden

@Ralith
Copy link
Copy Markdown
Collaborator

Ralith commented Sep 20, 2024

Is there interest in seeing TX support via sendmsg_x?

We found there wasn't much performance benefit, and was considerable difficulty taking advantage of, sendmmsg-style batching. IIRC the _x functions on macOS have more to offer than that, though. Will this unblock segmentation offload or other incidental optimizations?

Comment thread quinn-udp/build.rs Outdated
@larseggert
Copy link
Copy Markdown
Contributor Author

Bench on main:

test large_data_10_streams  ... bench:  27,558,791 ns/iter (+/- 13,459,810) = 380 MB/s
test large_data_1_stream    ... bench:  24,324,266 ns/iter (+/- 19,219,937) = 43 MB/s
test small_data_100_streams ... bench:  19,437,900 ns/iter (+/- 20,065,941)
test small_data_1_stream    ... bench:  11,465,128 ns/iter (+/- 8,699,934)

Bench with this PR:

test large_data_10_streams  ... bench:  28,829,216 ns/iter (+/- 15,924,956) = 363 MB/s
test large_data_1_stream    ... bench:  14,354,999 ns/iter (+/- 20,039,122) = 73 MB/s
test small_data_100_streams ... bench:  14,061,741 ns/iter (+/- 17,311,517)
test small_data_1_stream    ... bench:  19,194,441 ns/iter (+/- 5,012,070)

Surprised that large_data_10_streams and small_data_1_stream are slower...

@Ralith
Copy link
Copy Markdown
Collaborator

Ralith commented Sep 23, 2024

Those tests tend to be extremely noisy, as the huge variance suggests. A targeted quinn-udp benchmark might be more useful.

@larseggert
Copy link
Copy Markdown
Contributor Author

We've also found on neqo that multi-packet RX without multi-packet TX has limited benefits, since the RX batch size will be very small.

@larseggert
Copy link
Copy Markdown
Contributor Author

I added sendmsg_x support, mostly to see what the performance difference would be. But it seems that none of the benches or tests call send with a Transmit struct where segment_size is not None?

@larseggert larseggert marked this pull request as ready for review September 23, 2024 09:10
@mxinden
Copy link
Copy Markdown
Collaborator

mxinden commented Sep 23, 2024

A targeted quinn-udp benchmark might be more useful.

How about using the throughput.rs benchmark @larseggert?

https://github.com/quinn-rs/quinn/blob/main/quinn-udp/benches/throughput.rs

@larseggert
Copy link
Copy Markdown
Contributor Author

larseggert commented Sep 23, 2024

With @mxinden's benchmark. Baseline:

gso_true/throughput     time:   [58.076 ms 58.230 ms 58.387 ms]
                        thrpt:  [171.27 MiB/s 171.73 MiB/s 172.19 MiB/s]

Only sendmsg_x:

gso_true/throughput     time:   [15.143 ms 15.189 ms 15.236 ms]
                        thrpt:  [656.35 MiB/s 658.36 MiB/s 660.37 MiB/s]
                 change:
                        time:   [-74.028% -73.915% -73.808%] (p = 0.00 < 0.05)
                        thrpt:  [+281.80% +283.36% +285.04%]
                        Performance has improved.

Both sendmsg_x and recvmsg_x:

gso_true/throughput     time:   [12.632 ms 12.682 ms 12.731 ms]
                        thrpt:  [785.46 MiB/s 788.53 MiB/s 791.61 MiB/s]
                 change:
                        time:   [-78.321% -78.221% -78.112%] (p = 0.00 < 0.05)
                        thrpt:  [+356.88% +359.16% +361.27%]
                        Performance has improved.

Both sendmsg_x and recvmsg_x with BATCH_SIZE of 64:

gso_true/throughput     time:   [11.640 ms 11.682 ms 11.725 ms]
                        thrpt:  [852.85 MiB/s 856.00 MiB/s 859.07 MiB/s]
                 change:
                        time:   [-80.030% -79.938% -79.844%] (p = 0.00 < 0.05)
                        thrpt:  [+396.13% +398.45% +400.75%]
                        Performance has improved.

Copy link
Copy Markdown
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Please squash all of the changes into a single commit?

Comment thread quinn-udp/benches/throughput.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Copy link
Copy Markdown
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive results. Great to see the MacOS _x syscalls work for QUIC UDP IO.

Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Ralith
Ralith previously requested changes Sep 23, 2024
Copy link
Copy Markdown
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable real GSO/GRO using these interfaces?

Comment thread quinn-udp/benches/throughput.rs Outdated
@larseggert
Copy link
Copy Markdown
Contributor Author

No. They are the equivalent of the mmsg Linux calls. AFAIK Apple doesn't have GSO/GRO via the socket interface.

Comment thread quinn-udp/src/cmsg/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/benches/throughput.rs Outdated
Comment thread quinn-udp/benches/throughput.rs Outdated
@larseggert
Copy link
Copy Markdown
Contributor Author

Are you waiting on anything from me on this?

Copy link
Copy Markdown
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quinn-udp/benches/throughput.rs will need more changes to still support non-apple platform. @larseggert I believe we will need to either run it multi-threaded, or use some kind of executor, e.g. tokio. I can prepare a commit in the next couple of days. Sorry for missing this in earlier reviews.

Changes itself look good to me.

Comment thread quinn-udp/benches/throughput.rs Outdated
Comment thread quinn-udp/benches/throughput.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs
@mxinden mxinden mentioned this pull request Oct 8, 2024
Comment thread quinn-udp/src/unix.rs Outdated
Comment thread quinn-udp/src/unix.rs Outdated
@djc
Copy link
Copy Markdown
Member

djc commented Oct 9, 2024

@Ralith can you do another round on this one?

@larseggert
Copy link
Copy Markdown
Contributor Author

larseggert commented Oct 10, 2024

Once @mxinden's fix to the bench is in, I will rebase and squash this PR.

@AndrewDryga
Copy link
Copy Markdown

AndrewDryga commented Oct 10, 2024

Hey guys 👋, is there any chance Apple won't approve apps that are using those private syscalls in the App Store? They are notorious for doing so and even de-listing apps for using anything "undocumented". See 2.5.1 here: https://developer.apple.com/app-store/review/guidelines/

See one of such cases here: https://9to5mac.com/2019/11/04/electron-app-rejections/

How they will find out? Apple employs automated tools to scan apps for the usage of private APIs. If sendmsg_x and recvmsg_x are detected, the app is at risk of being flagged.

@larseggert
Copy link
Copy Markdown
Contributor Author

The use of the private syscalls is now behind a non-default feature.

@AndrewDryga
Copy link
Copy Markdown

AndrewDryga commented Oct 10, 2024

@larseggert should we add a big fat warning saying that if you enable this flag you will violate Apple ToS so it's only should be enabled if app is not distributed via App Store (or notarized for EU)?

@larseggert
Copy link
Copy Markdown
Contributor Author

Shouldn't the features CI workflow have included fast-apple-datapath? I don't see it in the logs. (Or I am missing something.)

@mxinden
Copy link
Copy Markdown
Collaborator

mxinden commented Oct 24, 2024

Should be fine I believe:

running `cargo check --no-default-features --features direct-log,fast-apple-datapath,log,tracing` on quinn-udp (1754/1772)
running `cargo check --no-default-features --features default` on quinn-udp (1755/1772)
running `cargo check --no-default-features --features direct-log` on quinn-udp (1756/1772)
running `cargo check --no-default-features --features default,direct-log` on quinn-udp (1757/1772)
running `cargo check --no-default-features --features fast-apple-datapath` on quinn-udp (1758/1772)
running `cargo check --no-default-features --features default,fast-apple-datapath` on quinn-udp (1759/1772)
running `cargo check --no-default-features --features direct-log,fast-apple-datapath` on quinn-udp (1760/1772)
running `cargo check --no-default-features --features default,direct-log,fast-apple-datapath` on quinn-udp (1761/1772)
running `cargo check --no-default-features --features log` on quinn-udp (1762/1772)
running `cargo check --no-default-features --features direct-log,log` on quinn-udp (1763/1772)
running `cargo check --no-default-features --features fast-apple-datapath,log` on quinn-udp (1764/1772)
running `cargo check --no-default-features --features direct-log,fast-apple-datapath,log` on quinn-udp (1765/1772)
running `cargo check --no-default-features --features tracing` on quinn-udp (1766/1772)
running `cargo check --no-default-features --features direct-log,tracing` on quinn-udp (1767/1772)
running `cargo check --no-default-features --features fast-apple-datapath,tracing` on quinn-udp (1768/1772)
running `cargo check --no-default-features --features direct-log,fast-apple-datapath,tracing` on quinn-udp (1769/1772)
running `cargo check --no-default-features --features log,tracing` on quinn-udp (1770/1772)
running `cargo check --no-default-features --features direct-log,log,tracing` on quinn-udp (1771/1772)
running `cargo check --no-default-features --features fast-apple-datapath,log,tracing` on quinn-udp (1772/1772)

@djc djc added this pull request to the merge queue Oct 25, 2024
Merged via the queue into quinn-rs:main with commit adc4a06 Oct 25, 2024
@larseggert larseggert deleted the feat-apple-datapath branch October 25, 2024 07:29
mxinden added a commit to mxinden/neqo that referenced this pull request Oct 30, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case mozilla#2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 30, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.

Co-authored-by: Lars Eggert <lars@eggert.org>
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.

Co-authored-by: Lars Eggert <lars@eggert.org>
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Nov 1, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.

Co-authored-by: Lars Eggert <lars@eggert.org>
psumbera added a commit to psumbera/quinn that referenced this pull request Nov 5, 2024
psumbera added a commit to psumbera/quinn that referenced this pull request Nov 5, 2024
djc pushed a commit that referenced this pull request Nov 5, 2024
Comment thread quinn-udp/src/unix.rs
Comment on lines +374 to +376
io::ErrorKind::Interrupted => {
// Retry the transmission
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is not executed in a loop, it doesn't actually retry, but instead drop the packet to be sent.

We do retry on Linux:

quinn/quinn-udp/src/unix.rs

Lines 311 to 314 in 2edf192

io::ErrorKind::Interrupted => {
// Retry the transmission
continue;
}

But not on the BSDs or the slow Apple:

quinn/quinn-udp/src/unix.rs

Lines 433 to 435 in 2edf192

io::ErrorKind::Interrupted => {
// Retry the transmission
}

I don't recall whether we discussed this. @larseggert was this by design?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was an oversight. It would be good to have identical behavior.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. @larseggert I will follow-up with a pull request.

mxinden added a commit to mxinden/quinn that referenced this pull request Nov 29, 2024
With quinn-rs#2017, the concrete `send`
implementations per platform are supposed to propagate `io::Error`s. Those
errors are then eiter logged and dropped in `UdpSocketState::send` or further
propagated in `UdpSocketState::try_send`.

The `fast_apple` `send` implementation added in
quinn-rs#1993 does not follow this pattern.

This commit adjusts the `fast_apple` implementation accordingly.
github-merge-queue bot pushed a commit that referenced this pull request Nov 29, 2024
With #2017, the concrete `send`
implementations per platform are supposed to propagate `io::Error`s. Those
errors are then eiter logged and dropped in `UdpSocketState::send` or further
propagated in `UdpSocketState::try_send`.

The `fast_apple` `send` implementation added in
#1993 does not follow this pattern.

This commit adjusts the `fast_apple` implementation accordingly.
@mxinden
Copy link
Copy Markdown
Collaborator

mxinden commented Mar 9, 2025

For those interested in using the fast-apple-datapath feature:

  • All MacOS Firefox Nightly users are now using it.
  • We see ~11% performance improvement on our localhost CPU bound throughput benchmark.
  • Every other (50th percentile) QUIC UDP recvmsg_x call reads 2 datagrams.
  • We are soon rolling it out to all Firefox Beta users. Once that succeeds it will make it into the Firefox release channel.

image

https://glam.telemetry.mozilla.org/fog/probe/networking_http_3_udp_datagram_segments_received/explore?aggType=avg&os=Darwin&timeHorizon=WEEK&visiblePercentiles=%5B99.9%2C99%2C95%2C75%2C50%2C25%2C5%5D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants