LM problem w/ -fork #758

frank-dittrich · 2014-09-25T15:35:38Z

./jtrts.pl -noprelims -passthru '--fork=3'
...
form=pwdump_lm                    guesses: 1122 -show=2757 0:00:00:00 DONE : Expected count(s) (986)(-show2760)  [!!!FAILED!!!]
.pot CHK:pwdump_lm                guesses:  983 -show=2757 0:00:00:00 DONE : Expected count(s) (986)  [!!!FAILED!!!]
...
Some tests had Errors. Performed 280 tests.1 errors  1 errors reprocessing the .POT files
Time used was 465 seconds

This is with latest jtrts commit (19ca304106ebebb5d0d9b717adc2cb4626cc9808) and latest bleeding-jumbo commit (4454406)

$ ./john --list=build-info 
Version: 1.8.0.2-bleeding-jumbo
Build: linux-gnu 64-bit AVX-autoconf
Arch: 64-bit LE
$JOHN is ./
Format interface version: 12
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
Max. Markov mode level: 400
Max. Markov mode password length: 30
Compiler version: 4.2.1 Compatible Clang 3.4 (tags/RELEASE_34/final)
gcc version: 4.2.1
clang version: 3.4 (tags/RELEASE_34/final)
OpenSSL library version: 01000105f
OpenSSL 1.0.1e-fips 11 Feb 2013
GMP library version: 5.1.2
NSS library version: 3.17 Basic ECC (loaded: 3.17 Extended ECC)
NSPR library version: 4.10.7
Kerberos version 5 support enabled
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's

The text was updated successfully, but these errors were encountered:

frank-dittrich · 2014-09-25T15:45:57Z

I get

form=pwdump_lm                    guesses: 1125 0:00:00:00 DONE  [PASSED]
.pot CHK:pwdump_lm                guesses:  981 -show=2734 0:00:00:00 DONE : Expected count(s) (986)  [!!!FAILED!!!]

with a GCC build (OMP enabled) when I pass --fork=3.

 $ ./john --list=build-info 
Version: 1.8.0.2-bleeding-jumbo OMP
Build: linux-gnu 64-bit AVX-autoconf
Arch: 64-bit LE
$JOHN is ./
Format interface version: 12
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
Max. Markov mode level: 400
Max. Markov mode password length: 30
Compiler version: 4.8.3 20140911 (Red Hat 4.8.3-7)
gcc version: 4.8.3
OpenSSL library version: 01000105f
OpenSSL 1.0.1e-fips 11 Feb 2013
GMP library version: 5.1.2
NSS library version: 3.17 Basic ECC (loaded: 3.17 Extended ECC)
NSPR library version: 4.10.7
Kerberos version 5 support enabled
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's

jfoug · 2014-09-25T16:35:06Z

I get lm fails on GCC build, OMP -fork=3

JtR/bleed/test$ ./jtrts.pl -noprelims -passthru '--fork=3' -q
.......
form=lm guesses: 1493 -show=2992 0:00:00:00 DONE : Expected count(s) (1500)(-show3000) [!!!FAILED!!!]
.pot CHK:lm guesses: 1487 -show=2987 0:00:00:00 DONE : Expected count(s) (1500) [!!!FAILED!!!]

NOTE, if I just run against lm, I get LM failures, then LM working, then lm_pwdump working sometimes and failing others.

The runs are inconsistant. I do not think this is a problem with TS, it is something between LM and fork mode.

magnumripper · 2014-09-25T16:51:26Z

I can't reproduce (on x86-64 OSX). Note that the exact number for "guesses" is supposed to vary "randomly" depending on how many dupe cracks you happened to get, but the -show figure should not vary at all.

In general, issues like this "can't" really be due to -fork. Basically a forked process is no different than a -node process - there is virtually no fork specific code anywhere in john after the cracking has started. But -fork may reveal a bug that happened to hide without it.

jfoug · 2014-09-25T17:21:18Z

Run this over and over again, and see if you get failures at times:

./jtrts.pl -noprelims -passthru '--fork=3' lm

On my ubuntu-64 (OMP build), sometimes it works properly, sometimes not. Yes, I understand that guesses will vary, but show should not, but I am seeing that at times jtr is NOT cracking all the hashes. I have not looked any deeper than showing that I see the exact same thing Frank is seeing.

magnumripper · 2014-09-25T18:30:22Z

Nope, but I made a for loop with increasing fork count and found I can trigger it with sufficiently high number of forks.

$ ./jtrts.pl -noprelims -passthru '--fork=9' lm --stoponerror
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.16, Sept 24, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [darwin13.4.0 64-bit AVX-autoconf]
--------------------------------------------------------------------------------

John Jumbo build detected.

form=lm                           guesses: 1503 -show=2997 0:00:00:00 DONE : Expected count(s) (1500)(-show3000)  [!!!FAILED!!!]
Exiting on error.  The pot file ./tst.pot contains the found data
The command used to run this test was:
../run/john -ses=./tst --fork=9  -pot=./tst.pot LM_tst.in --wordlist=pw.dic -form=lm

$ ../run/john -pot=./tst.pot LM_tst.in -form:lm -show=left
u540-LM:2:$LM$8756e789c3e08314
u441-LM:2:$LM$34acd11f14f1a3ec
u108-LM:2:$LM$11028e91b87b43a5

1497 password hashes cracked, 3 left

Indeed, three hashes are not cracked. So this is some problem in JtR.

I can't seem to trigger the problem when using -node=1/9 instead of -fork=9... and I did try 2/9 and so on too.

magnumripper · 2014-09-25T18:34:35Z

I have verified that this problem exist in versions prior to our recently added self-tests. So they are not causing it.

magnumripper · 2014-09-25T18:35:51Z

What scares me the most is that this is not likely a problem with LM at all. It might be wordlist mode vs. node/fork, or perhaps pot sync.

frank-dittrich · 2014-09-25T19:16:24Z

Yes, I also get a varying number of guesses when I repeatedly run

for i in `seq 2 40`; do echo $i; ./jtrts.pl -noprelims -q -passthru "--fork=$i" lm --stoponerror; done

1st run:

2
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
form=pwdump_lm                    guesses:  982 -show=2756 0:00:00:00 DONE : Expected count(s) (986)(-show2760)  [!!!FAILED!!!]
...
40
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
form=lm                           guesses: 1481 -show=2980 0:00:00:00 DONE : Expected count(s) (1500)(-show3000)  [!!!FAILED!!!]

2nd run:

2
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 2 tests.  Time used was 1 seconds
...
40
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
form=lm                           guesses: 1511 -show=2995 0:00:00:00 DONE : Expected count(s) (1500)(-show3000)  [!!!FAILED!!!]

magnumripper · 2014-09-25T19:38:54Z

This is too hard to debug using gdb or the like. We should probably use git-bisect.

magnumripper · 2014-09-25T19:49:04Z

Core John does not seem to have the issue (but TS has an issue openwall/john-tests#17 when trying)

magnumripper · 2014-09-25T20:35:33Z

git-bisect is a piece of work. Here's the offending commit: 1f827c7

Ouch! This doesn't make it easy.

magnumripper · 2014-09-25T20:51:05Z

OK so it happens during -fork with a shared memory mapped buffer but per-node indexes (of each node's words in that shared buffer). But how can it be intermittent!?

frank-dittrich · 2014-09-25T21:01:16Z

And what causes only LM tests to fail? For --fork=2 ... --fork=14, I only see lm and pwdump_lm tests fail.

magnumripper · 2014-09-25T21:05:23Z

Not sure if that's worth contemplating yet.

I tried disabling mmap() and the bug is still there. So it's not about the shared memory map but the per-node index buffer. That's a relief.

magnumripper · 2014-09-25T21:22:31Z

I was wrong - we don't use any index buffer unless rules are involved. So what we have is a shared memory buffer, and mgetl() instead of fgetl(). But I tried disabling mmap() and bug still shows, which means we are just using fgetl() and no magic. We practically do not use anything that was touched in that commit. So what the heck is wrong!? And how come it was introduced with that patch?

magnumripper · 2014-09-25T21:33:56Z

OK, it's not that commit! If I check out the commit before it, and add -mem=1 to disable memory buffering, the bug shows up!

./jtrts.pl -noprelims -passthru '--fork=9 -mem=1' lm
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.16, Sept 24, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [macosx-x86-64]
--------------------------------------------------------------------------------

John Jumbo build detected.

form=lm                           guesses: 1506 -show=2334 0:00:00:00 DONE : Expected count(s) (1500)(-show3000)  [!!!FAILED!!!]
.pot CHK:lm                       guesses:  834 -show=2334 0:00:00:00 DONE : Expected count(s) (1500)  [!!!FAILED!!!]

form=pwdump_lm                    guesses: 1276 -show=1571 0:00:00:00 DONE : Expected count(s) (986)(-show2760)  [!!!FAILED!!!]
.pot CHK:pwdump_lm                guesses:  318 -show=1571 0:00:00:00 DONE : Expected count(s) (986)  [!!!FAILED!!!]

Some tests had Errors. Performed 2 tests.2 errors  2 errors reprocessing the .POT files
Time used was 2 seconds

Back to bisecting. But this was a relief!!

magnumripper · 2014-09-25T21:51:14Z

0b870fa is bad. The problem is I do not have any good commit because 0b870fa is the very merge from core that implemented -fork. So we can't bisect this. I need to re-test core.

magnumripper · 2014-09-25T22:10:22Z

I really can't trigger the bug in core.

magnumripper · 2014-09-25T22:18:47Z

OK I'm puzzled. I now tested bleeding-jumbo but with a wordlist.c from core John, just modified a tad for autoconf so it builds.

$ ./jtrts.pl -noprelims -passthru '--fork=9' lm 
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.16, Sept 24, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [darwin13.4.0 64-bit AVX-autoconf]
--------------------------------------------------------------------------------

John Jumbo build detected.

form=lm                           guesses: 1504 -show=2998 0:00:00:00 DONE : Expected count(s) (1500)(-show3000)  [!!!FAILED!!!]
.pot CHK:lm                       guesses: 1498 -show=2998 0:00:00:00 DONE : Expected count(s) (1500)  [!!!FAILED!!!]

form=pwdump_lm                    guesses: 1372 0:00:00:00 DONE  [PASSED]
.pot CHK:pwdump_lm                guesses:  984 -show=2758 0:00:00:00 DONE : Expected count(s) (986)  [!!!FAILED!!!]

Some tests had Errors. Performed 2 tests.1 errors  2 errors reprocessing the .POT files
Time used was 2 seconds

Whatever the bug is, it's not in wordlist.c

magnumripper · 2014-09-25T23:03:28Z

bd10920 is the latest merge from core. That's the version to focus on - what differences do we have that can have this result?

LM_fmt.c has no differences against core
wordlist.c has differences, but using a wordlist.c from core still show the bug

magnumripper · 2014-09-25T23:16:32Z

wordlist.c has differences, but using a wordlist.c from core still show the bug

Not true! If I use wordlist.c from core, in a bleeding-jumbo of bd10920, bug goes away. To use it I had to insert this line:

diff --git a/src/wordlist.c b/src/wordlist.c
index 064b33d..9a4b855 100644
--- a/src/wordlist.c
+++ b/src/wordlist.c
@@ -19,6 +19,7 @@
 #include "params.h"
 #include "common.h"
 #include "path.h"
+#include "os.h"
 #include "signals.h"
 #include "loader.h"
 #include "logger.h"

Weirdest thing, the same operation does not make the bug go away in HEAD of bleeding-jumbo.

magnumripper · 2014-09-25T23:54:35Z

I officially give up here, for now.

magnumripper · 2014-09-26T00:05:39Z

~~Heureka, I found a way to trigger it without using fork.~~

No, that just showed I'm too tired. I will give up now.

frank-dittrich · 2014-09-26T11:46:22Z

IMHO its not an issue with locking the pot file.
I ran

$ ./john --format=lm ../test/LM_tst.in --wordlist=../test/pw.dic --fork=11 > john.stdout ; sort -u john.pot |wc -l
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-11 of 11 (fork)
2 138g 0:00:00:00 DONE (2014-09-26 13:40) 6900g/s 26650p/s 26650c/s 39975KC/s SKIPPIN..R1234B5
Press 'q' or Ctrl-C to abort, almost any other key for status
10 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s B123456..Z123456
7 135g 0:00:00:00 DONE (2014-09-26 13:40) 4500g/s 17766p/s 17766c/s 24198KC/s Z123456..W123456
8 134g 0:00:00:00 DONE (2014-09-26 13:40) 4466g/s 17766p/s 17766c/s 24198KC/s LATE*NI..X123456
5 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s 1234567..U123456
9 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s ..Y123456
6 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s BERT*ER..V123456
3 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s BERT$ER..S12345C
4 135g 0:00:00:00 DONE (2014-09-26 13:40) 4500g/s 17766p/s 17766c/s 24198KC/s BERT$ER..T123456
11 137g 0:00:00:00 DONE (2014-09-26 13:40) 4566g/s 17766p/s 17766c/s 24198KC/s  ..0A23456
1 127g 0:00:00:00 DONE (2014-09-26 13:40) 3175g/s 13325p/s 13325c/s 2052KC/s SKIPPIN..Q123A45
Waiting for 10 children to terminate
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1486

and

$ ./john --format=lm ../test/LM_tst.in --wordlist=../test/pw.dic > john.stdout2 ; sort -u john.pot |wc -l
Warning: poor OpenMP scalability for this hash type, consider --fork=8
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
14g 0:00:00:00 DONE (2014-09-26 13:41) 350.0g/s 146575p/s 146575c/s 2052KC/s SKIPPIN..0A23456
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1500

Let's see what has not been cracked in the --fork=11 run:

$ cat john.stdout2 
Loaded 1500 password hashes with no different salts (LM [DES 128/128 AVX-16])
Remaining 14 password hashes with no different salts
AB1266           (u1266-LM:2)
AB1244           (u1244-LM:2)
AB1086           (u1086-LM:2)
AB936            (u936-LM:2)
AB760            (u760-LM:2)
AB755            (u755-LM:2)
AB723            (u723-LM:2)
AB716            (u716-LM:2)
AB540            (u540-LM:2)
AB441            (u441-LM:2)
AB268            (u268-LM:2)
AB265            (u265-LM:2)
AB232            (u232-LM:2)
AB78             (u78-LM:2)

$ grep AB1266 john.stdout*
john.stdout2:AB1266           (u1266-LM:2)

$ grep u1244-LM:2 john.stdout*
john.stdout2:AB1244           (u1244-LM:2)

frank-dittrich · 2014-09-26T13:56:59Z

I can reproduce the bug with Markov mode.

$ cut -d: -f 2- john.pot > lm.txt
$ ./calc_stat lm.txt lm-stats
zero -10*log proba1[65] (1499) / 1500 converted to 1
zero -10*log proba2[65*256+66] (1499) / 1499, converted to 1 to prevent infinite length candidates
zero -10*log proba2[75*256+73] (1) / 1, converted to 1 to prevent infinite length candidates
zero -10*log proba2[83*256+75] (1) / 1, converted to 1 to prevent infinite length candidates
$ ./mkvcalcproba lm-stats lm.txt |cut -f 3|sort -n|uniq -c
reading stats ... [0x1ac5010]
119 lines parsed [0x1ac5010]
scanning password file ...
freeing stuff ...
charsetsize = 17
      1 10
     13 28
      5 41
      5 46
     27 50
     93 51
      5 59
      9 63
     46 64
     24 68
     76 69
     27 72
    402 73
    650 74
      1 75
      5 77
     24 81
     86 82
      1 99

Then I created a Markov mode that should crack 1498 out of those 1500 passwords:

 [Markov:lm]
Statsfile = $JOHN/lm-stats
MkvLvl = 82
MkvMaxLen = 7
MkvMinLvl = 28
; MkvMinLen = 0

I verified that this Markov mode works as expected:

$ ./john --format=lm ../test/LM_tst.in --markov=lm > /dev/null ; sort -u john.pot |wc -l;     
Warning: poor OpenMP scalability for this hash type, consider --fork=8
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538)
1498g 0:00:00:00 DONE (2014-09-26 15:51) 49933g/s 51133p/s 51133c/s 76700KC/s AB1000..S
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1498

Then I tried --fork:

$ for i in `seq 2 40`; do echo $i; rm john.pot ; ./john --format=lm ../test/LM_tst.in --markov=lm --fork=$i > /dev/null ; sort -u john.pot |wc -l; done
...
4
Will run 2 OpenMP threads per process (8 total across 4 processes)
Node numbers 1-4 of 4 (fork)
2 367g 0:00:00:00 DONE (2014-09-26 15:53) 12233g/s 13100p/s 13100c/s 19650KC/s AB1341..AB322
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538 split over nodes)
3 383g 0:00:00:00 DONE (2014-09-26 15:53) 12766g/s 13133p/s 13133c/s 14906KC/s AB314..AB669
1 375g 0:00:00:00 DONE (2014-09-26 15:53) 9375g/s 9850p/s 9850c/s 3851KC/s AB1000..AB1350
Waiting for 3 children to terminate
4 373g 0:00:00:00 DONE (2014-09-26 15:53) 9325g/s 9600p/s 9600c/s 10896KC/s AB660..S
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1486
5
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-5 of 5 (fork)
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538 split over nodes)
4 317g 0:00:00:00 DONE (2014-09-26 15:53) 15850g/s 15850p/s 15850c/s 23775KC/s AB452..AB737
2 296g 0:00:00:00 DONE (2014-09-26 15:53) 14800g/s 15850p/s 15850c/s 23775KC/s AB1272..AB189
3 310g 0:00:00:00 DONE (2014-09-26 15:53) 15500g/s 15800p/s 15800c/s 23700KC/s AB1814..AB460
5 303g 0:00:00:00 DONE (2014-09-26 15:53) 15150g/s 15400p/s 15400c/s 23100KC/s AB729..S
1 312g 0:00:00:00 DONE (2014-09-26 15:53) 15600g/s 15850p/s 15850c/s 23775KC/s AB1000..AB1281
Waiting for 4 children to terminate
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1498
6
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-6 of 6 (fork)
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538 split over nodes)
2 266g 0:00:00:00 DONE (2014-09-26 15:53) 13300g/s 13300p/s 13300c/s 19950KC/s AB1226..AB1465
3 239g 0:00:00:00 DONE (2014-09-26 15:53) 11950g/s 13250p/s 13250c/s 19875KC/s AB1457..AB322
4 266g 0:00:00:00 DONE (2014-09-26 15:53) 13300g/s 13300p/s 13300c/s 19950KC/s AB314..AB553
1 248g 0:00:00:00 DONE (2014-09-26 15:53) 12400g/s 13300p/s 13300c/s 3511KC/s AB1000..AB1235
Waiting for 5 children to terminate
6 251g 0:00:00:00 DONE (2014-09-26 15:53) 25100g/s 25600p/s 25600c/s 38400KC/s AB776..S
5 266g 0:00:00:00 DONE (2014-09-26 15:53) 13300g/s 13300p/s 13300c/s 19950KC/s AB545..AB784
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1496
...

As with wordlist mode, the number of cracked passwords varies when I run the same command repeatedly.

jfoug · 2014-09-26T14:07:46Z

Should we focus on the obvious? LM built with OMP has some issues (somehow) with -fork

frank-dittrich · 2014-09-26T14:15:32Z

With --single=none and user name = password,

AB251:$LM$0004c62411e4e2a4
...

I am not able to reproduce the error.
I guess that's due to the fact that in this case it's not the list of password candidates that gets split across "nodes".

When I change 1499 user names into AB and use this as single mode

[List.Rules:lm]
:
$[0-9]
Az"[1-90][0-9]"
Az"[1-90][0-9][0-9]"
Az"1[0-4][0-9][0-9]"

I can reproduce the error using single mode:

$ for i in `seq 2 40`; do echo -n $i; echo -n " "; rm john.pot ; ./john --format=lm lm.hashes --single:lm --fork=$i > /dev/null 2> /dev/null ; sort -u john.pot |wc -l; done
2 1496
3 1494
4 1488
5 1484
6 1491
7 1495
8 1493
9 1491
10 1486
11 1474
12 1492
13 1479
14 1488
15 1478
16 1492
17 1490
18 1495
19 1491
20 1494
21 1496
22 1490
23 1496
24 1495
25 1489
26 1498
27 1491
28 1493
29 1492
30 1497
31 1480
32 1493
33 1477
34 1494
35 1489
36 1495
37 1485
38 1479
39 1479
40 1497

frank-dittrich · 2014-09-26T14:22:26Z

On 09/26/2014 04:07 PM, JimF wrote:

Should we focus on the obvious? LM built with OMP has some issues
(somehow) with -fork?

Since I reproduced this with clang (which doesn't support OMP), it is
not related to OMP.

And so far we couldn't reproduce it in core.
So, LM triggers it either because it is extremely fast (which helps to
trigger race conditions) or may be because hashes are split into 2 halves.
If it is because hashes are split, we should be able do reproduce this
with bigcrypt (descrypt) hashes. But I doubt that is the issue.

frank-dittrich · 2014-10-11T08:35:04Z

Then may be you did something wrong.
Even with ReloadAtCrack = Y, your latest commit 3b4c29f fixes that problem for me.
But when I checkout 479a6ea, the bug still exists.

frank-dittrich · 2014-10-11T09:55:24Z

This bug seems to be related to #798.
With latest bleeding-jumbo, I (rarely) get the problem with the number of forked processes close to the number when bug #798 occurs:

 $ (for i in `seq 650 999`; do echo -n "$i "; rm john.pot john.log; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} > /dev/null 2> /dev/null; LC_ALL=C sort -u john.pot | wc -l; done); echo $?
650 1500
651 1500
652 1500
653 1500
654 1500
655 1500
656 1500
657 1500
658 1500
659 1500
660 1500
661 1500
662 1500
663 1499
664 1500
665 1500
666 1500
667 1500
668 1500
669 User defined signal 2
140

In this case, --fork=663 cracked just 1499 hashes.

frank-dittrich · 2014-10-11T10:07:16Z

At least, john now behaves in a consistent way:

$ (for i in `seq 1 99`; do rm john.pot john.log; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=663 > /dev/null 2> /dev/null; echo -n " `LC_ALL=C sort -u john.pot | wc -l`"; done); echo $?
 1499 1499 1499 1499 1499 1499 1499 1499 1499^C^C

Which reminds me of http://www.despair.com/consistency.html

frank-dittrich · 2014-10-12T17:28:05Z

With this john.local.conf

[Local:Options]
ReloadAtCrack = N
ReloadAtSave = N

I manage to get less than 1500 unique john.pot lines only if I hit the ulimit of (in my case currently 1024) processes per user.
That looks like this:

[fd@f20x32 run]$ (for i in `seq 1020 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt; LC_ALL=C sort -u john.pot | wc -l; done |LC_ALL=C grep -v 1500 ); echo $?
/bin/rm: cannot remove ‘john.pot’: No such file or directory
bash: fork: retry: Resource temporarily unavailable
1020    174
bash: fork: retry: Resource temporarily unavailable
1021    32
1022    53
1023    15
1024    16
0
[fd@f20x32 run]$ grep error t-102*.log
t-1020.log:1 0:00:00:00 Terminating on error, john.c:496
t-1020.log:1 0:00:00:00 Terminating on error, john.c:496
t-1021.log:1 0:00:00:00 Terminating on error, john.c:496
t-1021.log:1 0:00:00:00 Terminating on error, john.c:496
t-1022.log:1 0:00:00:00 Terminating on error, john.c:496
t-1022.log:1 0:00:00:00 Terminating on error, john.c:496
t-1023.log:1 0:00:00:00 Terminating on error, john.c:496
t-1023.log:1 0:00:00:00 Terminating on error, john.c:496
t-1024.log:1 0:00:00:00 Terminating on error, john.c:496
t-1024.log:1 0:00:00:00 Terminating on error, john.c:496

So I guess I'll kep that john.local.conf ans issue a

killall -s USR2 john

Whenever I want a pot file sync.

magnumripper · 2014-10-12T22:30:21Z

I think we should fix (or at least consider) openwall/john-tests#20 before going on with this.

magnumripper · 2014-10-12T22:32:44Z

Anyway, I can not reproduce this bug on OSX anymore, even with default john.conf. Except when I go past ulimits.

magnumripper · 2014-10-12T22:39:28Z

However, I can reproduce it using 9d9c271 (and -mem=1) which was before there was any pot reload or USR2 signalling at all.

magnumripper · 2014-11-07T23:51:13Z

Can you reproduce this with current code?

frank-dittrich · 2014-11-08T10:40:24Z

Before Monday I'll not have access to the system where I want to do the re-test.

frank-dittrich · 2014-11-09T21:56:47Z

$ (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
323 0   1499
325 0   1499
331 0   1499
342 0   1499
346 0   1499
350 0   1499
352 0   1499
356 0   1499
358 0   1499
373 0   1499
374 0   1499
376 0   1499
377 0   1499
379 
0
$ sort -u john.pot | wc -l
1499
$ grep -i error t-3??.log
$ grep -c Cracked t-377.log 
1518
$ grep Cracked t-377.log |cut -d+ -f 2- |sort -u|wc -l 
1499
$ wc -l stdout-377.txt 
1519 stdout-377.txt
$ head stdout-377.txt 
Loaded 1500 password hashes with no different salts (LM [DES 128/128 SSE2])
AB208            (u208-LM:2)
AB585            (u585-LM:2)
AB962            (u962-LM:2)
AB1339           (u1339-LM:2)
AB207            (u207-LM:2)
AB584            (u584-LM:2)
AB961            (u961-LM:2)
AB1338           (u1338-LM:2)
AB216            (u216-LM:2)
$ head -n 4 stderr-377.txt 
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-377 of 377 (fork)
6 4g 0:00:00:00 DONE (2014-11-09 22:42) 12.12g/s 48.48p/s 48.48c/s 72727C/s BERT*ER..IZE__12
3 4g 0:00:00:00 DONE (2014-11-09 22:42) 6.779g/s 27.11p/s 27.11c/s 40677C/s BERT$ER..L3__12
$ grep -c " DONE " stderr-377.txt 
377
$ grep -v -n " DONE " stderr-377.txt 
1:Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
2:Node numbers 1-377 of 377 (fork)
30:Press 'q' or Ctrl-C to abort, almost any other key for status
195:Waiting for 376 children to terminate
382:Warning: passwords printed above might be partial and not be all those cracked
383:Use the "--show" option to display all of the cracked passwords reliably
384:Session completed

So, I need a much higher number of forked processes to reproduce this.
But the number of forked processes (> 300, but < 400 on an otherwise idle system)is still way lower than the ulimit -u (1024).
I get no indication of errors in log files, in stderr or stdout output or exit code.

magnumripper · 2014-11-09T22:38:52Z

Trying the same on OSX, it works fine up to 572, and from that point the problem can be seen in the stderr file: "fork: Resource temporarily unavailable". So I can't reproduce any problem with John. That's with an ulimit of 709 max user processes.

frank-dittrich · 2014-11-09T23:01:31Z

When I change Save = 60 to Save = 62 in john.conf, the error occurs much later:

606 0   1499
621 0   1499
626 0   1499
675 0   1499
688 0   1499
690 0   1499
692 0   1499
698 0   1499
706 0   1499
707 0   1499
708 0   1497
716 0   1499
718 0   1499
723 0   1499
726

Still no indication of an error in *.log stderr-*.txt stdout-*.txt.

Can you try schedtool -a 0x1 -e ./john ... instead of ./john?

magnumripper · 2014-11-09T23:06:57Z

I do not have anything named schedtool.

magnumripper · 2014-11-09T23:14:17Z

I can't see why the Save timer would affect this. No session runs for that long anyway, right?

magnumripper · 2014-11-09T23:43:11Z

I can't see why the Save timer would affect this

OK, it's because OS_TIMER counts backwards.

frank-dittrich · 2014-11-10T01:00:16Z

With Save = 2 I get

$ (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
268 0   1499
282 0   1499
293 0   1499
304 0   1499
318 0   1499
319 0   1499
325 0   1499
332 0   1499
333 0   1499
335 0   1497
340 0   1497
...
957 0   1499
958 0   1499
959 0   1499
960 0   1498
961 0   1496
bash: fork: retry: Resource temporarily unavailable
962 1   79
963 1   23
964 1   2
965 1   97
966 1   2
bash: fork: retry: Resource temporarily unavailable
967 1   0
968 1   0
969 1   41
970 1   7
971 1   0
972 1   0
973 1   0
974 1   4
975 1   0
976 1   5
977 1   29
978 1   0
979 1   0
bash: fork: retry: Resource temporarily unavailable
980 1   0
981 1   53
bash: fork: retry: Resource temporarily unavailable
982 1   12
983 1   25
984 1   31
985 1   0
986 1   0
987 1   0
988 1   0
989 1   7
bash: fork: retry: Resource temporarily unavailable
990 1   2
991 1   0
992 1   2
993 1   32
994 1   0
995 1   41
996 1   5
997 1   0
998 1   9
999 1   25
1000    1   15
1001    1   5
1002    1   33
1003    1   0
bash: fork: retry: Resource temporarily unavailable
1004    1   3
1005    1   13
1006    1   2
1007    1   5
1008    1   0
1009    1   2
1010    1   0
1011    1   28
1012    1   44
1013    1   18
1014    1   4
bash: fork: retry: Resource temporarily unavailable
1015    1   2
1016    1   10
1017    1   7
1018    1   2
1019    1   0
1020    1   25
1021    1   9
1022    1   19
1023    1   9
1024    1   1
0

$ grep -i error t-96[12].log
t-962.log:1 0:00:00:00 Terminating on error, john.c:489

frank-dittrich · 2014-11-10T01:23:22Z

On 11/09/2014 11:38 PM, magnum wrote:

Trying the same on OSX, it works fine up to 572, and from that point
the problem can be seen in the stderr file: "fork: Resource
temporarily unavailable". So I can't reproduce any problem with John.

I bet with less than 572 forks, john finishes in less than 1 second on
you OSX system, so you don't get the signals when 59 seconds of the 60
are left.

frank-dittrich · 2014-11-10T01:37:25Z

On my 64bit Linux system with an Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz (quad core, no hyperthreading) I manage to get the john killed with SUGUSR2 at 761 forked processes (with Idle=Y in john.conf, when I run 4 other john processes (with Idle = N in john.conf) trying (not) to crack the rar test hashes: ./john --incremental --min-length=6 hashes.rar --session=rar4 ...

30670 fd        20   0  542700  48620   9224 R  92.8  0.6  18:16.65 ./john --session=rar1 --incremental --min-length=6 hashes.rar                                           
27686 fd        20   0  542704  45892   9188 R  90.8  0.6  14:33.13 ./john --incremental --min-length=6 hashes.rar --session=rar5                                           
20452 fd        20   0  542700  48624   9216 R  85.8  0.6  20:27.23 ./john --incremental --min-length=6 hashes.rar --session=rar3                                           
20484 fd        20   0  542700  48592   9188 R  85.8  0.6  20:00.28 ./john --incremental --min-length=6 hashes.rar --session=rar4                                           
10178 fd        20   0   93968  10644   6844 R   4.0  0.1   0:00.12 ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=510 --session=t-510

And some tests with a smaller number of forked processes exited with $? = 0, but less than 1500 unique hashes cracked.

$ (for i in `seq 500 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
631 0   1499
726 0   1499
737 0   1499
743 0   1499
755 0   1498
761 
0

frank-dittrich · 2014-11-10T17:49:15Z

As root, with ulimits -u = 7794 (compared to 1024 as a regular user), nothing changes.

# (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
/bin/rm: cannot remove ‘john.pot’: No such file or directory
350 0   1499
355 0   1499
356 0   1499
358 0   1499
373 0   1499
374 0   1499
380 0   1497
389 0   1498
391 0   1499
394 
0
# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7794
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 7794
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

With some more changes in ulimits

# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15588
max locked memory       (kbytes, -l) 256
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 1638400
real-time priority              (-r) 0
stack size              (kbytes, -s) 16384
cpu time               (seconds, -t) unlimited
max user processes              (-u) 7794
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I get

# (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
/bin/rm: cannot remove ‘john.pot’: No such file or directory
202 0   1499
207 0   1499
238 0   1498
266 0   1499
268 0   1499
271 
0

So, this is most likely not related to ulimit.

magnumripper · 2014-11-10T19:46:54Z

I bet #798 (comment) would make it much better, (at least provided you don't use a Save interval that is a multiple of 3).

frank-dittrich · 2014-11-10T21:10:07Z

I bet #798 (comment) would make it much better, (at least provided you don't use a Save interval that is a multiple of 3).

I tested with the default Save interval of 60, which is a multiple of 3. Why do you think that matters here? All these test runs are much faster than 60 seconds, even on my old 32bit system.

(for i in `seq 300 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done |grep -v 1500 ); echo $?
rm: cannot remove ‘t-*.rec’: No such file or directory
603 0   1499
608 0   1499
663 0   1499
675 0   1499
696 0   1499
700 0   1497
711 0   1498
714 0   1498
717 
0

No indication of problems in log file, stderr or stdout output.
So, the improvement is similar to what I got when I use Save = 60 without that patch.

frank-dittrich · 2014-11-10T21:19:06Z

BTW, the --fork=717 test that got killed produced a john.pot file with 1500 unique hashes.

magnumripper · 2014-11-10T23:18:48Z

I tested with the default Save interval of 60, which is a multiple of 3. Why do you think that matters here? All these test runs are much faster than 60 seconds, even on my old 32bit system.

Because, like you found out, without that patch and with OS_TIMER, some things would happen after 0 seconds instead of after three seconds. Maybe that is unrelated.

frank-dittrich · 2014-11-10T23:34:35Z

Because, like you found out, without that patch and with OS_TIMER, some things would happen after 0 seconds instead of after three seconds. Maybe that is unrelated.

Yes, but that was without the patch.
With Save =60, after one second, you'd get the SIGUSR2 signals, because 59 & 3 == 3.

With your patch, you'd get the SIGUSR2 signals 2 seconds later, no matter if you have Save = 60 or Save = 62.
(60 - 57) & 3 == 3 and (62 - 59) & 3 == 3.

magnumripper · 2014-11-11T00:19:09Z

I had the vague idea that under this crazy over-booking of resources there could be a difference. But now that I think about it, we should already be "protected" against USR2 anyway: The real key is calling sig_init() and sig_init_child() early enough and this doesn't change that. I can't think of any way to do those earlier than we do now iirc.

magnumripper · 2014-11-11T00:23:45Z

I can't think of any way to do those earlier than we do now iirc.

On the other hand, if we indeed introduce yet another counter instead of using ((timer_save_interval - timer_save_value) & 3) == 3), we could also opt to never ever send a USR2 during the first minute, or whatever we choose.

magnumripper · 2014-11-11T00:25:21Z

Anyway we put it, I think this issue and #798 are purely academical. If we can find solutions for them, fine. If we can't, I will not lose any sleep over it.

magnumripper · 2015-03-19T21:16:29Z

I'm closing this. If some similar problem can still be triggered, please open a brand new issue. This issue is too clobbered anyway.

magnumripper changed the title ~~./jtrts fails with 64bit linux clang build and -passthru '--fork=3'~~ LM problem w/ -fork Sep 25, 2014

magnumripper added the bug label Sep 25, 2014

magnumripper mentioned this issue Oct 12, 2014

john.local.conf is a footgun openwall/john-tests#19

Closed

magnumripper closed this as completed Mar 19, 2015

LM problem w/ -fork #758

LM problem w/ -fork #758

Comments

frank-dittrich commented Sep 25, 2014

frank-dittrich commented Sep 25, 2014

jfoug commented Sep 25, 2014

magnumripper commented Sep 25, 2014

jfoug commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

frank-dittrich commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

frank-dittrich commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 25, 2014

magnumripper commented Sep 26, 2014

frank-dittrich commented Sep 26, 2014

frank-dittrich commented Sep 26, 2014

jfoug commented Sep 26, 2014

frank-dittrich commented Sep 26, 2014

frank-dittrich commented Sep 26, 2014

frank-dittrich commented Oct 11, 2014

frank-dittrich commented Oct 11, 2014

frank-dittrich commented Oct 11, 2014

frank-dittrich commented Oct 12, 2014

magnumripper commented Oct 12, 2014

magnumripper commented Oct 12, 2014

magnumripper commented Oct 12, 2014

magnumripper commented Nov 7, 2014

frank-dittrich commented Nov 8, 2014

frank-dittrich commented Nov 9, 2014

magnumripper commented Nov 9, 2014

frank-dittrich commented Nov 9, 2014

magnumripper commented Nov 9, 2014

magnumripper commented Nov 9, 2014

magnumripper commented Nov 9, 2014

frank-dittrich commented Nov 10, 2014

frank-dittrich commented Nov 10, 2014

frank-dittrich commented Nov 10, 2014

frank-dittrich commented Nov 10, 2014

magnumripper commented Nov 10, 2014

frank-dittrich commented Nov 10, 2014

frank-dittrich commented Nov 10, 2014

magnumripper commented Nov 10, 2014

frank-dittrich commented Nov 10, 2014

magnumripper commented Nov 11, 2014

magnumripper commented Nov 11, 2014

magnumripper commented Nov 11, 2014

magnumripper commented Mar 19, 2015