This can be implemented using xegpu.load_nd %x {transpose = {1, 0}} for B tiles.
We should support both patterns:
- Explicit transpose in op
linalg.matmul_transpose_b.
- Transpose before
matmul (this is how MLIR from OV will look like):
%b_tr = linalg.transpose %b ...
%res = linalg.matmul %a, %b, ...
This functionality is required for OV integration (#207).