FSHL/FSHR instructions are legal on VBMI2 targets for v32i16 / v16i32 / v8i64 types
FSHL/FSHR instructions are legal on VBMI2 + VLX targets for v8i16 / v16i16 / v4i32 / v8i32 / v2i64 / v4i64 types
There's no need to go through LowerFunnelShift for these cases, instead we should make them legal and add a combineFunnelShift to convert instructions with uniform constant shift amounts to use X86ISD::VSHLD/X86ISD::VSHRD.
We'll still need LowerFunnelShift to handle widening to 512-bits for VBMI2 only (no VLX) - but the constant handling can be moved entirely to the combine.