-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Water - HOST Tensor AVX2 Support and Vectorized HIP support #126
Water - HOST Tensor AVX2 Support and Vectorized HIP support #126
Conversation
@r-abishek Please note |
src/modules/cpu/kernel/water.hpp
Outdated
{ | ||
pSrcY = _mm_fmadd_ps(pWaterParams[1], pCosFactor, pDstY); | ||
pSrcX = _mm_fmadd_ps(pWaterParams[0], pSinFactor, pDstX); | ||
pDstX = _mm_add_ps(pDstX, xmm_p4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure compute_water_src_loc_sse() and compute_water_src_loc() are doing the same thing? The sse seems to have an extra add_ps.
minor changes in PLN variant load functions
…d to add store functions for completion
removed commented code
updated i8 pln1 load as per the optimized u8 pln1 load
@sampath1117 Is this ready now? |
Hi @r-abishek |
src/include/cpu/rpp_cpu_simd.hpp
Outdated
p128[1] = _mm256_extractf128_ps(p[1], 0); | ||
p128[2] = _mm256_extractf128_ps(p[2], 0); | ||
_MM_TRANSPOSE4_PS(p128[0], p128[1], p128[2], p128[3]); | ||
p128[0] = _mm256_castps256_ps128(pRow[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some inline comments for better readability on vectorized code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/include/cpu/rpp_cpu_simd.hpp
Outdated
__m128i p128[2]; | ||
const __m128i maskR1 = _mm_setr_epi8(0, 3, 6, 9, 12, 15, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80); | ||
const __m128i maskG1 = _mm_setr_epi8(1, 4, 7, 10, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80); | ||
const __m128i maskB1 = _mm_setr_epi8(2, 5, 8, 11, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here for reader to understand masks - say as to why maskR1 has 6 Rs and maskR2 has 2 Rs etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/modules/cpu/kernel/water.hpp
Outdated
Rpp32f dstX, dstY, sinFactor; | ||
__m256 pDstX, pDstY, pSinFactor; | ||
dstY = (Rpp32f)i; | ||
sinFactor= std::sin((freqX * dstY) + phaseX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Looks okay, but again, do we have CSVs here for passing QA tests with/without tolerance? |
resolved spacing issues and added comments for AVX codes for better understanding made changes to handle cases where QA Tests are not supported
@r-abishek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
*added for U8 , F32 , I8 , F16 variants
*added for PKD3,PLN3,PLN1 with toggle
*added test case for water in new test suite
*added golden outputs for water