_mm_shuffle_ps / _mm256_shuffle_ps / _mm512_shuffle_ps
_mm_mask_shuffle_ps / _mm256_mask_shuffle_ps / _mm512_mask_shuffle_ps
_mm_maskz_shuffle_ps / _mm256_maskz_shuffle_ps / _mm512_maskz_shuffle_ps
_mm_shuffle_pd / _mm256_shuffle_pd / _mm512_shuffle_pd
_mm_mask_shuffle_pd / _mm256_mask_shuffle_pd / _mm512_mask_shuffle_pd
_mm_maskz_shuffle_pd / _mm256_maskz_shuffle_pd / _mm512_maskz_shuffle_pd
Handle the underlying __builtin_ia32_shufps/pd builtins and add test coverage.
Consult the Intel Intrinsics Guide to understand the nuances of the SHUFPS/PD shuffles - including repetition across lanes, LHS/RHS halves etc. - the expansion in CodeGenFunction::EmitX86BuiltinExpr should help as well.
Ideally this can be done with relatively generically to simplify adding other shuffles in the future.