core/time: avoid divisions in Duration::new #120308

utkarshgupta137 · 2024-01-24T14:10:49Z

In our (decently large) code base, we use SystemTime::UNIX_EPOCH.elapsed() in a lot of places & often in a loop or in the hot path. On Unix at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet Duration::new() still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case.

Benchmarks:
M1 Pro (14-inch base model):

duration/current/checked
                        time:   [1.5945 ns 1.6167 ns 1.6407 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
duration/current/unchecked
                        time:   [1.5941 ns 1.6051 ns 1.6179 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

duration/branched/checked
                        time:   [1.1997 ns 1.2048 ns 1.2104 ns]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
duration/branched/unchecked
                        time:   [1.5881 ns 1.5957 ns 1.6039 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

EC2 c7gd.16xlarge (Graviton 3):

duration/current/checked
                        time:   [2.7996 ns 2.8000 ns 2.8003 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
duration/current/unchecked
                        time:   [2.9922 ns 2.9925 ns 2.9928 ns]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild

duration/branched/checked
                        time:   [2.0830 ns 2.0843 ns 2.0857 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
duration/branched/unchecked
                        time:   [2.9879 ns 2.9886 ns 2.9893 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild

EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)):

duration/current/checked
                        time:   [980.60 ps 980.79 ps 980.99 ps]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
duration/current/unchecked
                        time:   [979.53 ps 979.74 ps 979.96 ps]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

duration/branched/checked
                        time:   [938.72 ps 938.96 ps 939.22 ps]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
duration/branched/unchecked
                        time:   [1.0103 ns 1.0110 ns 1.0118 ns]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  7 (7.00%) high mild
  1 (1.00%) high severe

Bench code (ran using stable 1.75.0 & criterion latest 0.5.1):
I couldn't find any benches for Duration in this repo, so I just copied the relevant types & recreated it.

use criterion::{black_box, criterion_group, criterion_main, Criterion};

pub fn duration_bench(c: &mut Criterion) {
    const NANOS_PER_SEC: u32 = 1_000_000_000;

    #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
    #[repr(transparent)]
    struct Nanoseconds(u32);

    impl Default for Nanoseconds {
        #[inline]
        fn default() -> Self {
            // SAFETY: 0 is within the valid range
            unsafe { Nanoseconds(0) }
        }
    }

    #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)]
    pub struct Duration {
        secs: u64,
        nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC
    }

    impl Duration {
        #[inline]
        pub const fn new_current(secs: u64, nanos: u32) -> Duration {
            let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) {
                Some(secs) => secs,
                None => panic!("overflow in Duration::new"),
            };
            let nanos = nanos % NANOS_PER_SEC;
            // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range
            Duration { secs, nanos: unsafe { Nanoseconds(nanos) } }
        }

        #[inline]
        pub const fn new_branched(secs: u64, nanos: u32) -> Duration {
            if nanos < NANOS_PER_SEC {
                // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range
                Duration { secs, nanos: unsafe { Nanoseconds(nanos) } }
            } else {
                let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) {
                    Some(secs) => secs,
                    None => panic!("overflow in Duration::new"),
                };
                let nanos = nanos % NANOS_PER_SEC;
                // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range
                Duration { secs, nanos: unsafe { Nanoseconds(nanos) } }
            }
        }
    }

    let mut group = c.benchmark_group("duration/current");
    group.bench_function("checked", |b| {
        b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000))));
    });
    group.bench_function("unchecked", |b| {
        b.iter(|| {
            black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000)))
        });
    });
    drop(group);
    let mut group = c.benchmark_group("duration/branched");
    group.bench_function("checked", |b| {
        b.iter(|| {
            black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000)))
        });
    });
    group.bench_function("unchecked", |b| {
        b.iter(|| {
            black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000)))
        });
    });
}

criterion_group!(duration_benches, duration_bench);
criterion_main!(duration_benches);

rustbot · 2024-01-24T14:10:57Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @m-ou-se (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

quaternic · 2024-01-24T19:25:08Z

It looks like your benchmarks are testing with fixed input values. Since this change is adding a fast path branch to the function, branch prediction is very relevant to how it performs. You should definitely benchmark it with this something where the newly added branch is unpredictable.

m-ou-se · 2024-02-08T10:33:25Z

The vast majority of use cases will have nanos < NANOS_PER_SEC, so this makes sense to me.

@bors r+

bors · 2024-02-08T10:33:28Z

📌 Commit 8a850cd has been approved by m-ou-se

It is now in the queue for this repository.

Nadrieril · 2024-02-08T14:53:26Z

@bors rollup

utkarshgupta137 · 2024-02-08T15:18:36Z

Should I also add likely() to the branch, or is that overkill?

…ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```

…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120589 (std::thread::available_parallelism merging linux/android/freebsd version) - rust-lang#120596 ([rustdoc] Correctly generate path for non-local items in source code pages) - rust-lang#120672 (std::thread update freebsd stack guard handling.) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120806 (Clippy subtree update) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) r? `@ghost` `@rustbot` modify labels: rollup

…ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```

…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120596 ([rustdoc] Correctly generate path for non-local items in source code pages) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) r? `@ghost` `@rustbot` modify labels: rollup

…ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```

…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#113026 (Introduce `run-make` V2 infrastructure, a `run_make_support` library and port over 2 tests as example) - rust-lang#113671 (Make privacy visitor use types more (instead of HIR)) - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) - rust-lang#120831 (Startup objects disappearing from sysroot) r? `@ghost` `@rustbot` modify labels: rollup

…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#113671 (Make privacy visitor use types more (instead of HIR)) - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) - rust-lang#120831 (Startup objects disappearing from sysroot) r? `@ghost` `@rustbot` modify labels: rollup

m-ou-se · 2024-02-09T16:47:08Z

Should I also add likely() to the branch, or is that overkill?

I'm guessing that is probably overkill.

Rollup merge of rust-lang#120308 - utkarshgupta137:duration-opt, r=m-ou-se core/time: avoid divisions in Duration::new In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case. Benchmarks: M1 Pro (14-inch base model): ``` duration/current/checked time: [1.5945 ns 1.6167 ns 1.6407 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe duration/current/unchecked time: [1.5941 ns 1.6051 ns 1.6179 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [1.1997 ns 1.2048 ns 1.2104 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe duration/branched/unchecked time: [1.5881 ns 1.5957 ns 1.6039 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` EC2 c7gd.16xlarge (Graviton 3): ``` duration/current/checked time: [2.7996 ns 2.8000 ns 2.8003 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) low mild duration/current/unchecked time: [2.9922 ns 2.9925 ns 2.9928 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild duration/branched/checked time: [2.0830 ns 2.0843 ns 2.0857 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild duration/branched/unchecked time: [2.9879 ns 2.9886 ns 2.9893 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low severe 2 (2.00%) low mild ``` EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)): ``` duration/current/checked time: [980.60 ps 980.79 ps 980.99 ps] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe duration/current/unchecked time: [979.53 ps 979.74 ps 979.96 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe duration/branched/checked time: [938.72 ps 938.96 ps 939.22 ps] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe duration/branched/unchecked time: [1.0103 ns 1.0110 ns 1.0118 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe ``` Bench code (ran using stable 1.75.0 & criterion latest 0.5.1): I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it. ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; pub fn duration_bench(c: &mut Criterion) { const NANOS_PER_SEC: u32 = 1_000_000_000; #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(transparent)] struct Nanoseconds(u32); impl Default for Nanoseconds { #[inline] fn default() -> Self { // SAFETY: 0 is within the valid range unsafe { Nanoseconds(0) } } } #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration { secs: u64, nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC } impl Duration { #[inline] pub const fn new_current(secs: u64, nanos: u32) -> Duration { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } #[inline] pub const fn new_branched(secs: u64, nanos: u32) -> Duration { if nanos < NANOS_PER_SEC { // SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } else { let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) { Some(secs) => secs, None => panic!("overflow in Duration::new"), }; let nanos = nanos % NANOS_PER_SEC; // SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range Duration { secs, nanos: unsafe { Nanoseconds(nanos) } } } } } let mut group = c.benchmark_group("duration/current"); group.bench_function("checked", |b| { b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000)))); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); drop(group); let mut group = c.benchmark_group("duration/branched"); group.bench_function("checked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000))) }); }); group.bench_function("unchecked", |b| { b.iter(|| { black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000))) }); }); } criterion_group!(duration_benches, duration_bench); criterion_main!(duration_benches); ```

…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#113671 (Make privacy visitor use types more (instead of HIR)) - rust-lang#120308 (core/time: avoid divisions in Duration::new) - rust-lang#120693 (Invert diagnostic lints.) - rust-lang#120704 (A drive-by rewrite of `give_region_a_name()`) - rust-lang#120809 (Use `transmute_unchecked` in `NonZero::new`.) - rust-lang#120817 (Fix more `ty::Error` ICEs in MIR passes) - rust-lang#120828 (Fix `ErrorGuaranteed` unsoundness with stash/steal.) - rust-lang#120831 (Startup objects disappearing from sysroot) r? `@ghost` `@rustbot` modify labels: rollup

std/time: avoid divisions in Duration::new

8a850cd

rustbot assigned m-ou-se Jan 24, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 24, 2024

utkarshgupta137 changed the title ~~std/time: avoid divisions in Duration::new~~ core/time: avoid divisions in Duration::new Jan 24, 2024

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 8, 2024

matthiaskrgr mentioned this pull request Feb 9, 2024

Rollup of 9 pull requests #120824

Closed

matthiaskrgr mentioned this pull request Feb 9, 2024

Rollup of 7 pull requests #120829

Closed

matthiaskrgr mentioned this pull request Feb 9, 2024

Rollup of 9 pull requests #120837

Closed

matthiaskrgr mentioned this pull request Feb 9, 2024

Rollup of 8 pull requests #120843

Merged

bors merged commit 8b8adfd into rust-lang:master Feb 9, 2024
11 checks passed

rustbot added this to the 1.78.0 milestone Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/time: avoid divisions in Duration::new #120308

core/time: avoid divisions in Duration::new #120308

utkarshgupta137 commented Jan 24, 2024

rustbot commented Jan 24, 2024

quaternic commented Jan 24, 2024

m-ou-se commented Feb 8, 2024

bors commented Feb 8, 2024

Nadrieril commented Feb 8, 2024

utkarshgupta137 commented Feb 8, 2024

m-ou-se commented Feb 9, 2024

core/time: avoid divisions in Duration::new #120308

core/time: avoid divisions in Duration::new #120308

Conversation

utkarshgupta137 commented Jan 24, 2024

rustbot commented Jan 24, 2024

quaternic commented Jan 24, 2024

m-ou-se commented Feb 8, 2024

bors commented Feb 8, 2024

Nadrieril commented Feb 8, 2024

utkarshgupta137 commented Feb 8, 2024

m-ou-se commented Feb 9, 2024