Shortening #-of Vector Lanes #16

itzmeanjan · 2021-12-27T16:04:49Z

Previously I represented Rescue Prime hash state using vector with 16 lanes, where each element was 64 -bit unsigned integer ( i.e. sycl::ulong16 ), motivated by discussion here rust-lang/portable-simd#215 , I decided to rewrite rescue prime hash routines, while representing hash state using sycl::ulong4[3], which is exactly 12 -field elements wide; not wasting any space like I was doing before.

See how performance of Rescue Prime Hash and Merklization improves after aforementioned change is benchmarked on Nvidia Tesla V100

And on Intel CPU perform improvements due to vector lane shortening are visible here

Rescue Prime Merge

Platform	# -of Work Items	Op/ sec
Intel CPU	4096 x 4096	1.41329e+06
Nvidia GPU	4096 x 4096	2.02718e+07

Merklization using Rescue Prime Merge

Platform	# -of Work Items	Total Time
Intel CPU	8388608	5974.54 ms
Nvidia GPU	8388608	431.217 ms

…scue prime hash state [wip]

…lication functions

…ons ( instead of exponentiating hash state, it takes help of cheaper multiplication )

…are functions

- there were some logical errors I made, discovered when running test cases

…UDA backend

itzmeanjan added 14 commits December 27, 2021 03:46

[port-to-sycl::ulong4] changing vector lanes used for representing re…

2361d27

…scue prime hash state [wip]

[port-to-sycl::ulong4] ported vector state addition functions [wip]

f4f3fda

[port-to-sycl::ulong4] ported apply_sbox and round key constant app…

854cf21

…lication functions

[port-to-sycl::ulong4] ported apply mds matrix related functions

41284c7

[port-to-sycl::ulong4] ported inverse sbox application related functi…

5e2dc33

…ons ( instead of exponentiating hash state, it takes help of cheaper multiplication )

[port-to-sycl::ulong4] ported rescue permutation related functions

7b4922c

[port-to-sycl::ulong4] made necessary changes to rescue constant prep…

0b7ec65

…are functions

[port-to-sycl::ulong4] ported hash_elements and merge function

6651654

[port-to-sycl::ulong4] made some corrections in routine implementations

fe7f3ed

- there were some logical errors I made, discovered when running test cases

updated test cases

702bf31

made all necessary changes in dependent routines

3145c58

updated benchmark results of rescue prime/ merkle tree functions on C…

0cc0c25

…UDA backend

updated documentation/ benchmark results etc.

4300bca

swapped merklization approach 1 and 2 identifiers

8e48057

itzmeanjan merged commit a3152cd into main Dec 29, 2021

itzmeanjan deleted the shorter-vector-lanes branch December 29, 2021 05:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shortening #-of Vector Lanes #16

Shortening #-of Vector Lanes #16

itzmeanjan commented Dec 27, 2021 •

edited

Shortening #-of Vector Lanes #16

Shortening #-of Vector Lanes #16

Conversation

itzmeanjan commented Dec 27, 2021 • edited

Rescue Prime Merge

Merklization using Rescue Prime Merge

itzmeanjan commented Dec 27, 2021 •

edited