-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Remove SlowDivide tuning from GRTTuning #84676
Conversation
The DIV32/64 throughput was improved since Goldmont in the Atom architecture. The Alder Lake-E shows similar number too. So we shouldn't add such tunings to Gracemont and later architectures.
@llvm/pr-subscribers-backend-x86 Author: Phoebe Wang (phoebewang) ChangesThe DIV32/64 throughput was improved since Goldmont in the Atom architecture. The Alder Lake-E shows similar number too. So we shouldn't add such tunings to Gracemont and later products. Checked from Agner Fog's table and uops.info. Full diff: https://github.com/llvm/llvm-project/pull/84676.diff 1 Files Affected:
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index a2a65ce75d6b9a..8367f938c0ddfa 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -1237,8 +1237,6 @@ def ProcessorFeatures {
// Gracemont
list<SubtargetFeature> GRTTuning = [TuningMacroFusion,
TuningSlow3OpsLEA,
- TuningSlowDivide32,
- TuningSlowDivide64,
TuningFastScalarFSQRT,
TuningFastVectorFSQRT,
TuningFast15ByteNOP,
|
@@ -1237,8 +1237,6 @@ def ProcessorFeatures { | |||
// Gracemont | |||
list<SubtargetFeature> GRTTuning = [TuningMacroFusion, | |||
TuningSlow3OpsLEA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LEA also seems to be reasonably fast on GRT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sure. We have TuningSlow3OpsLEA
for AlderLake and uops.info shows it has longer latency on it.
I don't have any information about newer targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2c latency, but other slow lea is 3c. Bit of a middle ground.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the latency was reduced to 1 since IceLake but was grown back to 2 on AlderLake.
I think we can change it for IceLake, but AlderLake etc. need more date provements.
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you improve the cpu test coverage for bypass-slow-division-tune.ll please?
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - cheers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The DIV32/64 throughput was improved since Goldmont in the Atom architecture. The Alder Lake-E shows similar number too. So we shouldn't add such tunings to Gracemont and later products.
Checked from Agner Fog's table and uops.info.