-
Notifications
You must be signed in to change notification settings - Fork 305
s390x: use simd_shuffle! macro
#1965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // NOTE: `vflls` and `vldeb` are equivalent; our disassmbler prefers vflls. | ||
| #[cfg_attr( | ||
| all(test, target_feature = "vector-enhancements-1"), | ||
| assert_instr(vflls) | ||
| )] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking, is that correct? this implementation produces vldeb on godbolt, so I really think it's the disassembler that is picking vflls here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the two are equivalent; in fact the binary machine code is identical, it's just different assembler mnemonics for the same instruction. Both vldeb v1, v2 and vflls v1, v2 are simply extended mnemonics for the "base" instruction vfll v1, v2, 0, 0 (see the PoP chapter 24 under VECTOR FP LOAD LENGTHENED). It's a bit unfortunate to have two mnemonics for the same thing, but it's due to historical reasons (vflls is the more recent one, and probably should be preferred).
| simd_shuffle( | ||
| truncated, | ||
| truncated, | ||
| const { u32x4::from_array([0, 0, 1, 1]) }, | ||
| ) | ||
| simd_shuffle!(truncated, truncated, [0, 0, 1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here, vec_floate does not specify what happens to the odd elements. clang + llvm end up using poison values, but rust does not have those. So, is there anything that can be done?
this is not all that important of course, but it's a bit of a wart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's deliberately left unspecified what happens to the odd elements. This matches the behavior of the underlying VECTOR FP LOAD ROUNDED instruction, which also states: "The data in the odd elements of the first operand is unpredictable." You could mask it to some defined value, but at extra runtime cost of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, well there is nothing we can do then. There is always inline assembly if someone really needs to emit the exact instruction.
No description provided.