-
Notifications
You must be signed in to change notification settings - Fork 305
Format f16 values correctly in intrinsic-test
#1968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ecf65e to
f2e124b
Compare
|
r? @folkertdev Just lmk before merging, will need to remove the last commit |
|
Failed to set assignee to
|
|
cc @tgross35 was there some caveat to this approach? |
|
Not sure about the rest but just regarding formatting, these were added at a time that we didn't have |
|
Right just double-checking that there is not some edge case where we'd get incorrect outputs by upcasting to |
folkertdev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be good to go then with the most recent commit removed
f2e124b to
5536f1c
Compare
| _mm512_castsi256_si512 | ||
| # _mm512_conj_pch | ||
|
|
||
| # Clang bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we document what clang bug(s) these are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was thinking about putting some clang PRs to correct these. in summary,
_mm{256}_extract_epi{8,16}does(int) (unsigned {char,short}), but afaik C does sign-extension whenever the target type is signed, so this is getting sign-extended, whereas Intel specifies that it should zero-extend._mm512_mask_reduce_{max,min}_{ps,pd}uses{f32,f64}::{MIN,MAX}in place of masked-out values, but Intel specifies it should use{f32,f64}::{NEG_INFINITY,INFINITY}
Fix some x86 intrinsics too
_mm_{u}comineq_sh(these should be unordered, i.e. should return true if either operand is nan)_mm_mask_cvt{epi16_epi8, pd_ps, pd_epi32}(top 64 bits should be 0)