Skip to content

Enable software prefetching on MSVC and clang-cl x86 and x86_64#13238

Merged
nojb merged 2 commits intoocaml:trunkfrom
MisterDA:windows-software-prefetching
Jun 17, 2024
Merged

Enable software prefetching on MSVC and clang-cl x86 and x86_64#13238
nojb merged 2 commits intoocaml:trunkfrom
MisterDA:windows-software-prefetching

Conversation

@MisterDA
Copy link
Contributor

@MisterDA MisterDA commented Jun 14, 2024

Chasing the last bits of the MSVC/clang-cl restore, re-enable software prefetching hints to help the garbage collector on MSVC and clang-cl.

Both GCC and MSVC emit the prefetcht0 instruction (check on godbolt).
Prefer clang's builtin instead of the intrinsic, I find it a bit cleaner and it's one fewer header.
With MSVC, call the intrinsic directly in order not to have to pull winnt.h via windows.h (for PreFetchCacheLine). I'm using the same definition as PreFetchCacheLine.

An ad-hoc benchmark using hyperfine's defaults on Stephen Dolan's markbench regression test for software prefetching gives:

$ hyperfine test-trunk.exe
Benchmark 1: test-trunk.exe
  Time (mean ± σ):     32.399 s ±  0.629 s    [User: 31.762 s, System: 0.331 s]
  Range (min … max):   30.834 s … 33.000 s    10 runs
$ hyperfine test-prefetch.exe
Benchmark 1: test-prefetch.exe
  Time (mean ± σ):     13.267 s ±  0.670 s    [User: 12.629 s, System: 0.301 s]
  Range (min … max):   12.664 s … 14.605 s    10 runs

Half the time, twice the fun!

The first commit aligns compiler builtin detection with compiler attributes detection, and reorders a bit the macros tests as clang-cl doesn't define __GNUC__ and warns if -Wundef is enabled. It has the side-effect of enabling the built-in functions to perform arithmetic with overflow checking under clang-cl too.

@MisterDA MisterDA force-pushed the windows-software-prefetching branch from 32a7fa6 to 9d4a838 Compare June 17, 2024 06:52
@MisterDA MisterDA force-pushed the windows-software-prefetching branch from 9d4a838 to cc1a463 Compare June 17, 2024 07:15
@MisterDA MisterDA force-pushed the windows-software-prefetching branch from cc1a463 to 0e151d6 Compare June 17, 2024 08:19
Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (on behalf of @dustanddreams)

@nojb nojb added the merge-me label Jun 17, 2024
MisterDA added 2 commits June 17, 2024 11:10
Reorders a bit the macros tests as clang-cl doesn't define __GNUC__
and warns if -Wundef is enabled. It has the side-effect of enabling
the built-in functions to perform arithmetic with overflow checking
under clang-cl too.
Both GCC and MSVC emit the prefetcht0 instruction.
Prefer clang's builtin instead of the intrinsic.
With MSVC, call the intrinsic directly in order not to have to pull
winnt.h via windows.h for PreFetchCacheLine.
@MisterDA MisterDA force-pushed the windows-software-prefetching branch from 0e151d6 to 99f2e84 Compare June 17, 2024 09:10
@nojb nojb merged commit a7e2967 into ocaml:trunk Jun 17, 2024
@MisterDA MisterDA deleted the windows-software-prefetching branch June 17, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants