[X86] Attempt to use VPMADD52L/VPMULUDQ instead of VPMULLQ on slow VPMULLQ targets (or when VPMULLQ is unavailable)

Some Intel targets have notoriously slow VPMULLQ instructions - they should attempt to use alternatives such as VPMULUDQ and VPMADD52L (if IFMA52 is available - with accumulator set to zero) whenever possible. 

- [ ] Confirm which AVX512 targets have slower VPMULLQ than VPMULUDQ/VPMADD52L and add a new TuningSlowPMULLQ tuning flag for those targets - I think its just Intel targets since Cannonlake?
- [ ] In LowerMUL - on IFMA (AVX/AVX512) capable targets attempt to use a single VPMADD52L instruction instead of a sequence of multiple VPMULUDQ ops, although a single VPMULUDQ is still the best option. VPMADD52L requires the input operands and the multiplication result to have zero bits in the upper 12-bits (see #156714 for details). We can refactor the existing vXi64 knownbits analysis in LowerMul to handle this.
- [ ] On TuningSlowPMULLQ targets, attempt to lower to VPMADD52L if the upper 12 bits are all known zero (or VPMULUDQ) - this might be possible as a isel tablegen pattern, or perform it in combineMul, or we set vXi64 ISD::MUL Custom for TuningSlowPMULLQ targets and handle it in the same LowerMUL logic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Attempt to use VPMADD52L/VPMULUDQ instead of VPMULLQ on slow VPMULLQ targets (or when VPMULLQ is unavailable) #158854

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[X86] Attempt to use VPMADD52L/VPMULUDQ instead of VPMULLQ on slow VPMULLQ targets (or when VPMULLQ is unavailable) #158854

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions