Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cracking SolarWinds Orion hashes #3449

Merged
merged 1 commit into from Nov 4, 2018
Merged

Add support for cracking SolarWinds Orion hashes #3449

merged 1 commit into from Nov 4, 2018

Conversation

kholia
Copy link
Member

@kholia kholia commented Nov 3, 2018

Current speed,

$ ../run/john --format=solarwinds-opencl --test
Device 6: GeForce GTX TITAN X
Benchmarking: solarwinds-opencl, SolarWinds Orion [PBKDF2-SHA1 OpenCL]... DONE
Raw:	45936 c/s real, 46369 c/s virtual, GPU util: 99%

CPU speed,

$ ../run/john --format=solarwinds --test
Will run 32 OpenMP threads
Benchmarking: solarwinds, SolarWinds Orion [PBKDF2-SHA1 128/128 AVX 4x]... (32xOMP) DONE
Raw:	3200 c/s real, 103 c/s virtual

The GPU utilization percentage is decent.

* be chosen for a kernel duration of not more than 200 ms
*/
#define HASH_LOOPS (3 * 271)
#define ITERATIONS 100000 /* Just for auto tune */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this #define ITERATIONS line, it's an artefact from something else. Or maybe better: Set it to 1000 if that is the fixed value, and pass it to the kernel (in line 193) for possibly better optimization by the runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be used in line 215 (autotune_run(self, 2 * (ITERATIONS - 1) + 4, 0, 200)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK but then it has to be correct or auto-tune will end up incorrect by two orders of magnitude. That comment "Just for auto tune" sounds like it's not important - it is!

So set it to 1000 and while at it, add it to build_opts as well, like it said may end up in a better optimized loop kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you watch an auto-tune with -v:5? Do so and ensure it looks correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW where did you get 3 * 271 from? It doesn't look like a good figure for splitting 1000 iterations into several calls. Something like 333 would match the comment above it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It came from a copy-paste operation ;(. I need to sit down and read this auto-tune stuff one day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these changes should be in place now. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HASH_LOOPS should be just 333 (a.k.a 3 * 3 * 37) as opposed to 3 * 333. Current code is effectively a non-split kernel: It only runs the loop kernel once (using 333 it would be three times). Current figures run fine on these high-end cards but on a weaker device it will put a limit on reaching optimal GWS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I have pushed a new commit now.

SALT_ALIGN,
MIN_KEYS_PER_CRYPT,
MAX_KEYS_PER_CRYPT,
FMT_CASE | FMT_8_BIT | FMT_HUGE_INPUT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really FMT_HUGE_INPUT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Fixed.

SALT_ALIGN,
MIN_KEYS_PER_CRYPT,
MAX_KEYS_PER_CRYPT,
FMT_CASE | FMT_8_BIT | FMT_OMP | FMT_HUGE_INPUT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMT_HUGE_INPUT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@kholia
Copy link
Member Author

kholia commented Nov 4, 2018

On a local box,

$ ../run/john --format=solarwinds-opencl --test -v:5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: GeForce GTX 1050 Ti
Benchmarking: solarwinds-opencl, SolarWinds Orion [PBKDF2-SHA1 OpenCL]... Loaded 2 hashes with 1 different salts to test db from test vectors
Options used: -I ..../JtR/run/kernels -cl-mad-enable -DSM_MAJOR=6 -DSM_MINOR=1 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=524306 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=390 -DDEV_VER_MINOR=87 -D_OPENCL_COMPILER -DHASH_LOOPS=999 -DOUTLEN=1024 -DPLAINTEXT_LENGTH=28 -DITERATIONS=999 $JOHN/kernels/solarwinds_kernel.cl
Calculating best GWS for LWS=32; max. 100ms single kernel invocation.
Raw speed figures including buffer transfers:
P xfer: 1.696us, init: 22.528us, loop: 1x2.723ms, final: 784.384us, res xfer: 7.008us(null)1.088us
gws:       128	  36097c/s    72266194 rounds/s   3.545ms per crypt_all()!
P xfer: 2.048us, init: 32.288us, loop: 1x2.878ms, final: 20.384us, res xfer: 6.880us(null)2.912us*
gws:       256	  86816c/s   173805632 rounds/s   2.948ms per crypt_all()!
P xfer: 3.072us, init: 31.744us, loop: 1x2.477ms, final: 20.480us, res xfer: 7.648us(null)2.880us*
gws:       512	 200954c/s   402309908 rounds/s   2.547ms per crypt_all()!
P xfer: 6.144us, init: 33.408us, loop: 1x2.172ms, final: 20.480us, res xfer: 5.120us(null)1.664us
gws:      1024	 456308c/s   913528616 rounds/s   2.244ms per crypt_all()!
P xfer: 11.104us, init: 34.432us, loop: 1x2.674ms, final: 26.528us, res xfer: 9.088us(null)2.720us
gws:      2048	 740978c/s  1483437956 rounds/s   2.763ms per crypt_all()+
P xfer: 29.472us, init: 49.952us, loop: 1x5.105ms, final: 46.080us, res xfer: 16.384us(null)4.640us
gws:      4096	 778350c/s  1558256700 rounds/s   5.262ms per crypt_all()+
P xfer: 49.280us, init: 73.728us, loop: 1x10.061ms, final: 72.704us, res xfer: 39.936us(null)8.128us
gws:      8192	 793356c/s  1588298712 rounds/s  10.325ms per crypt_all()+
P xfer: 80.512us, init: 126.784us, loop: 1x19.396ms, final: 121.856us, res xfer: 60.416us(null)15.872us
gws:     16384	 825769c/s  1653189538 rounds/s  19.840ms per crypt_all()+
P xfer: 237.088us, init: 221.056us, loop: 1x37.786ms, final: 196.608us, res xfer: 95.232us(null)30.688us
gws:     32768	 847968c/s  1697631936 rounds/s  38.642ms per crypt_all()+
P xfer: 592.448us, init: 399.872us, loop: 1x72.976ms, final: 354.976us, res xfer: 176.128us(null)60.736us
gws:     65536	 877244c/s  1756242488 rounds/s  74.706ms per crypt_all()+
P xfer: 1.656ms, init: 750.080us, loop: 1x143.868ms (exceeds 100ms)
Calculating best LWS for GWS=65536
Testing LWS=32 GWS=65536 ... 145.445ms+
Testing LWS=64 GWS=65536 ... 146.535ms
Testing LWS=128 GWS=65536 ... 144.673ms+
Testing LWS=256 GWS=65536 ... 143.154ms+
Testing LWS=512 GWS=65536 ... 139.944ms+
Calculating best GWS for LWS=512; max. 200ms single kernel invocation.
Raw speed figures including buffer transfers:
P xfer: 8.864us, init: 40.736us, loop: 1x3.151ms, final: 28.128us, res xfer: 10.240us(null)2.112us
gws:      1536	 472868c/s   946681736 rounds/s   3.248ms per crypt_all()!
P xfer: 15.872us, init: 43.008us, loop: 1x3.147ms, final: 33.792us, res xfer: 15.360us(null)3.520us
gws:      3072	 940706c/s  1883293412 rounds/s   3.265ms per crypt_all()+
P xfer: 39.488us, init: 58.880us, loop: 1x6.291ms, final: 50.816us, res xfer: 30.304us(null)6.240us
gws:      6144	 946719c/s  1895331438 rounds/s   6.489ms per crypt_all()
P xfer: 69.408us, init: 98.816us, loop: 1x12.782ms, final: 98.304us, res xfer: 145.408us(null)12us
gws:     12288	 928649c/s  1859155298 rounds/s  13.232ms per crypt_all()
P xfer: 167.232us, init: 181.888us, loop: 1x26.032ms, final: 169.984us, res xfer: 274.432us(null)23.232us
gws:     24576	 913571c/s  1828969142 rounds/s  26.901ms per crypt_all()
P xfer: 519.872us, init: 333.312us, loop: 1x51.469ms, final: 314.368us, res xfer: 542.720us(null)45.568us
gws:     49152	 921688c/s  1845219376 rounds/s  53.328ms per crypt_all()
P xfer: 1.010ms, init: 632.320us, loop: 1x101.181ms, final: 599.744us, res xfer: 999.424us(null)94.144us
gws:     98304	 938730c/s  1879337460 rounds/s 104.720ms per crypt_all()
P xfer: 2.549ms, init: 1.271ms, loop: 1x205.826ms (exceeds 200ms)

Local worksize (LWS) 512, global worksize (GWS) 3072
DONE
Raw:	18432 c/s real, 18432 c/s virtual, GPU util: 100%

This seems to be OK?

@kholia kholia merged commit 9c715c6 into openwall:bleeding-jumbo Nov 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants