Water - HOST Tensor AVX2 Support and Vectorized HIP support #126

sampath1117 · 2023-05-04T13:39:58Z

*added for U8 , F32 , I8 , F16 variants
*added for PKD3,PLN3,PLN1 with toggle
*added test case for water in new test suite
*added golden outputs for water

sampath1117 · 2023-05-04T13:40:59Z

@r-abishek
For water the QA tests are failing currently. I need to debug more on this

Please note

src/include/cpu/rpp_cpu_simd.hpp

src/modules/cpu/kernel/water.hpp

r-abishek · 2023-05-05T02:44:51Z

src/modules/cpu/kernel/water.hpp

+{
+    pSrcY = _mm_fmadd_ps(pWaterParams[1], pCosFactor, pDstY);
+    pSrcX = _mm_fmadd_ps(pWaterParams[0], pSinFactor, pDstX);
+    pDstX = _mm_add_ps(pDstX, xmm_p4);


Are you sure compute_water_src_loc_sse() and compute_water_src_loc() are doing the same thing? The sse seems to have an extra add_ps.

src/modules/cpu/kernel/water.hpp

minor changes in PLN variant load functions

…d to add store functions for completion

…tion

removed commented code

updated i8 pln1 load as per the optimized u8 pln1 load

r-abishek · 2023-07-13T16:45:02Z

@sampath1117 Is this ready now?

sampath1117 · 2023-07-13T16:48:17Z

Hi @r-abishek
I have made all the changes and the PR is ready for internal review

r-abishek · 2023-07-14T00:49:24Z

src/include/cpu/rpp_cpu_simd.hpp

-    p128[1] = _mm256_extractf128_ps(p[1], 0);
-    p128[2] = _mm256_extractf128_ps(p[2], 0);
-    _MM_TRANSPOSE4_PS(p128[0], p128[1], p128[2], p128[3]);
+    p128[0] = _mm256_castps256_ps128(pRow[0]);


Please add some inline comments for better readability on vectorized code

r-abishek · 2023-07-14T00:52:08Z

src/include/cpu/rpp_cpu_simd.hpp

+    __m128i p128[2];
+    const __m128i maskR1 = _mm_setr_epi8(0, 3, 6, 9, 12, 15, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80);
+    const __m128i maskG1 = _mm_setr_epi8(1, 4, 7, 10, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80);
+    const __m128i maskB1 = _mm_setr_epi8(2, 5, 8, 11, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80);


Same comment here for reader to understand masks - say as to why maskR1 has 6 Rs and maskR2 has 2 Rs etc

r-abishek · 2023-07-14T00:53:26Z

src/modules/cpu/kernel/water.hpp

+                Rpp32f dstX, dstY, sinFactor;
+                __m256 pDstX, pDstY, pSinFactor;
+                dstY = (Rpp32f)i;
+                sinFactor= std::sin((freqX * dstY) + phaseX);


space before =

r-abishek · 2023-07-14T00:56:11Z

Looks okay, but again, do we have CSVs here for passing QA tests with/without tolerance?

resolved spacing issues and added comments for AVX codes for better understanding made changes to handle cases where QA Tests are not supported

sampath1117 · 2023-07-14T12:15:06Z

Looks okay, but again, do we have CSVs here for passing QA tests with/without tolerance?

@r-abishek
I have regenerated the golden outputs with latest codes and QA tests are passing for both HOST and HIP

…torized code

r-abishek

lgtm

sampath1117 added 3 commits May 4, 2023 12:32

added water HOST and HIP codes

52b83c6

added water case in test suite

8dafad2

added golden outputs for water

890437a

r-abishek reviewed May 5, 2023

View reviewed changes

sampath1117 and others added 18 commits May 22, 2023 13:52

added omp thread changes for water augmentation

643281c

experimental changes

336188d

fixed output issue with AVX2 instructions

87ab19f

added AVX2 support for PKD3 load function

94c8340

minor changes in PLN variant load functions

nwc commit - added avx2 changes for u8 layout toggle variants but nee…

56190d8

…d to add store functions for completion

Add Avx2 implementation for F32 and U8 toggle variants

3b18a58

Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmenta…

754e353

…tion

change F32 load and store logic

c4f69a9

optimized the store function for F32 PLN3-PKD3

e9e74a2

Merge branch 'master' into water_avx_exp

6c8fa57

reverted back irrelevant changes

fb1fdb4

minor change

4ffe9f6

optimized load and store functions for water U8 and F32 variants in host

6e0756a

removed commented code

merge with master

0cf2626

removed golden outputs for water

81553d3

minor changes

a5567e6

renamed few functions and removed unused functions

89380a5

updated i8 pln1 load as per the optimized u8 pln1 load

fixed bug in i8 load function

27b318b

r-abishek reviewed Jul 14, 2023

View reviewed changes

r-abishek assigned sampath1117 Jul 14, 2023

r-abishek added the enhancement New feature or request label Jul 14, 2023

r-abishek added this to the sow8ms4 milestone Jul 14, 2023

sampath1117 added 3 commits July 14, 2023 10:01

changed cast to c++ style

d3943b5

resolved spacing issues and added comments for AVX codes for better understanding made changes to handle cases where QA Tests are not supported

added golden outputs for water

31d1624

updated golden outputs with latest changes

2729b1e

sampath1117 added 2 commits July 14, 2023 13:24

modified the u8, i8 pkd3-pln3 function and added comments for the vec…

8b763ad

…torized code

fixed minor bug in I8 variants

b418e24

r-abishek changed the base branch from master to ar/opt_water July 18, 2023 03:14

r-abishek approved these changes Jul 18, 2023

View reviewed changes

r-abishek merged commit 9da6be1 into r-abishek:ar/opt_water Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Water - HOST Tensor AVX2 Support and Vectorized HIP support #126

Water - HOST Tensor AVX2 Support and Vectorized HIP support #126

sampath1117 commented May 4, 2023

sampath1117 commented May 4, 2023

r-abishek May 5, 2023

r-abishek commented Jul 13, 2023

sampath1117 commented Jul 13, 2023

r-abishek Jul 14, 2023

sampath1117 Jul 14, 2023

r-abishek Jul 14, 2023

sampath1117 Jul 14, 2023

r-abishek Jul 14, 2023

sampath1117 Jul 14, 2023

r-abishek commented Jul 14, 2023

sampath1117 commented Jul 14, 2023

r-abishek left a comment

Water - HOST Tensor AVX2 Support and Vectorized HIP support #126

Water - HOST Tensor AVX2 Support and Vectorized HIP support #126

Conversation

sampath1117 commented May 4, 2023

sampath1117 commented May 4, 2023

r-abishek May 5, 2023

Choose a reason for hiding this comment

r-abishek commented Jul 13, 2023

sampath1117 commented Jul 13, 2023

r-abishek Jul 14, 2023

Choose a reason for hiding this comment

sampath1117 Jul 14, 2023

Choose a reason for hiding this comment

r-abishek Jul 14, 2023

Choose a reason for hiding this comment

sampath1117 Jul 14, 2023

Choose a reason for hiding this comment

r-abishek Jul 14, 2023

Choose a reason for hiding this comment

sampath1117 Jul 14, 2023

Choose a reason for hiding this comment

r-abishek commented Jul 14, 2023

sampath1117 commented Jul 14, 2023

r-abishek left a comment

Choose a reason for hiding this comment