-
-
Notifications
You must be signed in to change notification settings - Fork 95
Closed
Labels
Description
Proposed by @data-man
Examples:
- https://gist.github.com/osimola/7917568
- In depth discussion on Stack Overflow.
- Torvalds thoughts.
Software prefetching is moronic.
It's a great way to generate almost-optimal behavior on one particular CPU with one particular cache setup and memory subsystem (and one particular load), but then it falls flat on its face whenever there is some other micro-architecture or cache layout, or when you have other things going on on that same machine.
I am still not sure where software prefetching is advantageous. The main use case would be tensor iterations but if tensors are contiguous, the next memory locations are loaded in cache anyway.
If it's not contiguous, what is the difference between prefetching and just plain loading the data at that location.