-
Notifications
You must be signed in to change notification settings - Fork 15k
Description
Bugzilla Link | 45873 |
Version | trunk |
OS | Windows NT |
CC | @adibiagio,@legrosbuffle,@jrmuizel,@LebedevRI,@RKSimon |
Extended Description
This was suggested by Andy in https://lists.llvm.org/pipermail/llvm-dev/2020-May/141487.html
The idea is to add a DelayCycles vector to SchedWriteRes to indicate the relative start cycle for each reserved resource. That would effectively model dependent uOps.
At the moment, it is not possible to delay the consumption of specific hardware resources. The expectation is that resource consumption always starts at relative cycle #0 (i.e. relative to the instruction issue cycle).
A vector of DelayCycles (if present) would contain unsigned integer values (ideally one per each processor resource consumed by a write), and those values would be offsets in cycles relative to the issue cycle.
The absence of a DelayCycles vector would be semantically equivalent to a all-zeroes DelayCycles vector.
This would require a mostly mechanical change in tablegen to teach how to parse and semantically analyze this new concept. The subtarget-emitter would eventually generate information about those delay-cycles in a table.
A more complicated change would be needed for the bookkeping logic in mca (HardwareUnits/ResourceManager.cpp).
Most x86 processor models would probably benefit from this change. SchedWrite definitions which might benefit from this change are writes for horizontal operations. On most x86 processors, horizontal add/sub is usually decoded into a pair of shuffles uOPs followed by a single (data-dependent) vector ADD uOP.
The ADD uOP doesn't execute immediately because it needs to wait for the other two shuffle uOPs. So the ALU pipe is still available at relative cycle #0, and it is only consumed by the horizontal operation starting from relative cycle #1.
This was just an example. There are probably various write descriptors (not just writes for microcoded instructions) which would benefit from this change.
This will also solve a number of known problems with the descriptors in Haswell/Broadwell. Last but not least it would allow us to simplify the bookkeping logic in llvm-mca and get rid of the not-so-nice "reserved" bit for processor resource groups. More details about those two issues can be found in the above mentioned llvmdev thread.