Skip to content

VLIW5 speedup from vectorizing

magnumripper edited this page Dec 22, 2014 · 2 revisions

These figures are from Juniper

Natural vector size versus scalar

Ratio:  1.15406 real, 1.17261 virtual   PBKDF2-HMAC-SHA1-opencl:Raw
Ratio:  1.17748 real, 1.22909 virtual   RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Many salts
Ratio:  1.15708 real, 1.17223 virtual   RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Only one salt
Ratio:  1.10558 real, 1.11278 virtual   encfs-opencl, EncFS:Raw
Ratio:  1.12377 real, 1.11362 virtual   krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18:Raw
Ratio:  2.30797 real, 2.27428 virtual   ntlmv2-opencl, NTLMv2 C/R:Many salts
Ratio:  1.55751 real, 1.43351 virtual   ntlmv2-opencl, NTLMv2 C/R:Only one salt
Ratio:  1.76623 real, 1.75933 virtual   office2007-opencl, MS Office 2007 (50,000 iterations):Raw
Ratio:  1.77146 real, 1.74888 virtual   office2010-opencl, MS Office 2010 (100,000 iterations):Raw
Ratio:  0.06893 real, 0.06890 virtual   office2013-opencl, MS Office 2013 (100,000 iterations):Raw
Ratio:  1.11096 real, 1.10500 virtual   sha1crypt-opencl, (NetBSD):Raw
Ratio:  1.11199 real, 1.11693 virtual   wpapsk-opencl, WPA/WPA2 PSK:Raw

After tuning some to 2x, and completely turned vectorizing off for Office 2013

Ratio:	1.42069 real, 1.43795 virtual	PBKDF2-HMAC-SHA1-opencl:Raw
Ratio:	1.37714 real, 1.38273 virtual	RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Many salts
Ratio:	1.17866 real, 1.14452 virtual	RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Only one salt
Ratio:	1.32869 real, 1.32331 virtual	encfs-opencl, EncFS:Raw
Ratio:	1.43004 real, 1.39547 virtual	krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18:Raw
Ratio:	2.30797 real, 2.21851 virtual	ntlmv2-opencl, NTLMv2 C/R:Many salts
Ratio:	1.54382 real, 1.42823 virtual	ntlmv2-opencl, NTLMv2 C/R:Only one salt
Ratio:	1.76929 real, 1.74862 virtual	office2007-opencl, MS Office 2007 (50,000 iterations):Raw
Ratio:	1.77418 real, 1.76354 virtual	office2010-opencl, MS Office 2010 (100,000 iterations):Raw
Ratio:	1.00000 real, 0.97561 virtual	office2013-opencl, MS Office 2013 (100,000 iterations):Raw
Ratio:	1.33333 real, 1.34906 virtual	sha1crypt-opencl, (NetBSD):Raw
Ratio:	1.42564 real, 1.41961 virtual	wpapsk-opencl, WPA/WPA2 PSK:Raw

So even a register starved device can get over 2x speed from vectorizing. Only Office2013 (64-bit) needed to run scalar for best performance. Actually it too can get 21% faster than scalar running 2x, but the kernel duration got too long in wall clock time.