-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LM problem w/ -fork #758
Comments
I get
with a GCC build (OMP enabled) when I pass --fork=3.
|
I get lm fails on GCC build, OMP -fork=3 JtR/bleed/test$ ./jtrts.pl -noprelims -passthru '--fork=3' -q NOTE, if I just run against lm, I get LM failures, then LM working, then lm_pwdump working sometimes and failing others. The runs are inconsistant. I do not think this is a problem with TS, it is something between LM and fork mode. |
I can't reproduce (on x86-64 OSX). Note that the exact number for "guesses" is supposed to vary "randomly" depending on how many dupe cracks you happened to get, but the -show figure should not vary at all. In general, issues like this "can't" really be due to -fork. Basically a forked process is no different than a -node process - there is virtually no fork specific code anywhere in john after the cracking has started. But -fork may reveal a bug that happened to hide without it. |
Run this over and over again, and see if you get failures at times: ./jtrts.pl -noprelims -passthru '--fork=3' lm On my ubuntu-64 (OMP build), sometimes it works properly, sometimes not. Yes, I understand that guesses will vary, but show should not, but I am seeing that at times jtr is NOT cracking all the hashes. I have not looked any deeper than showing that I see the exact same thing Frank is seeing. |
Nope, but I made a for loop with increasing fork count and found I can trigger it with sufficiently high number of forks.
Indeed, three hashes are not cracked. So this is some problem in JtR. I can't seem to trigger the problem when using |
I have verified that this problem exist in versions prior to our recently added self-tests. So they are not causing it. |
What scares me the most is that this is not likely a problem with LM at all. It might be wordlist mode vs. node/fork, or perhaps pot sync. |
Yes, I also get a varying number of guesses when I repeatedly run
1st run:
2nd run:
|
This is too hard to debug using gdb or the like. We should probably use git-bisect. |
Core John does not seem to have the issue (but TS has an issue openwall/john-tests#17 when trying) |
git-bisect is a piece of work. Here's the offending commit: 1f827c7 Ouch! This doesn't make it easy. |
OK so it happens during -fork with a shared memory mapped buffer but per-node indexes (of each node's words in that shared buffer). But how can it be intermittent!? |
And what causes only LM tests to fail? For --fork=2 ... --fork=14, I only see lm and pwdump_lm tests fail. |
Not sure if that's worth contemplating yet. I tried disabling mmap() and the bug is still there. So it's not about the shared memory map but the per-node index buffer. That's a relief. |
I was wrong - we don't use any index buffer unless rules are involved. So what we have is a shared memory buffer, and mgetl() instead of fgetl(). But I tried disabling mmap() and bug still shows, which means we are just using fgetl() and no magic. We practically do not use anything that was touched in that commit. So what the heck is wrong!? And how come it was introduced with that patch? |
OK, it's not that commit! If I check out the commit before it, and add -mem=1 to disable memory buffering, the bug shows up!
Back to bisecting. But this was a relief!! |
I really can't trigger the bug in core. |
OK I'm puzzled. I now tested bleeding-jumbo but with a wordlist.c from core John, just modified a tad for autoconf so it builds.
Whatever the bug is, it's not in wordlist.c |
bd10920 is the latest merge from core. That's the version to focus on - what differences do we have that can have this result?
|
Not true! If I use wordlist.c from core, in a bleeding-jumbo of bd10920, bug goes away. To use it I had to insert this line:
Weirdest thing, the same operation does not make the bug go away in HEAD of bleeding-jumbo. |
I officially give up here, for now. |
No, that just showed I'm too tired. I will give up now. |
IMHO its not an issue with locking the pot file.
and
Let's see what has not been cracked in the --fork=11 run:
|
I can reproduce the bug with Markov mode.
Then I created a Markov mode that should crack 1498 out of those 1500 passwords:
I verified that this Markov mode works as expected:
Then I tried --fork:
As with wordlist mode, the number of cracked passwords varies when I run the same command repeatedly. |
Should we focus on the obvious? LM built with OMP has some issues (somehow) with -fork |
With --single=none and user name = password,
I am not able to reproduce the error. When I change 1499 user names into AB and use this as single mode
I can reproduce the error using single mode:
|
On 09/26/2014 04:07 PM, JimF wrote:
Since I reproduced this with clang (which doesn't support OMP), it is And so far we couldn't reproduce it in core. |
This bug seems to be related to #798.
In this case, --fork=663 cracked just 1499 hashes. |
At least, john now behaves in a consistent way:
Which reminds me of http://www.despair.com/consistency.html |
With this john.local.conf
I manage to get less than 1500 unique john.pot lines only if I hit the ulimit of (in my case currently 1024) processes per user.
So I guess I'll kep that john.local.conf ans issue a
Whenever I want a pot file sync. |
I think we should fix (or at least consider) openwall/john-tests#20 before going on with this. |
Anyway, I can not reproduce this bug on OSX anymore, even with default john.conf. Except when I go past ulimits. |
However, I can reproduce it using 9d9c271 (and -mem=1) which was before there was any pot reload or USR2 signalling at all. |
Can you reproduce this with current code? |
Before Monday I'll not have access to the system where I want to do the re-test. |
So, I need a much higher number of forked processes to reproduce this. |
Trying the same on OSX, it works fine up to 572, and from that point the problem can be seen in the stderr file: "fork: Resource temporarily unavailable". So I can't reproduce any problem with John. That's with an ulimit of 709 max user processes. |
When I change
Still no indication of an error in Can you try |
I do not have anything named schedtool. |
I can't see why the Save timer would affect this. No session runs for that long anyway, right? |
OK, it's because OS_TIMER counts backwards. |
With
|
On 11/09/2014 11:38 PM, magnum wrote:
I bet with less than 572 forks, john finishes in less than 1 second on |
On my 64bit Linux system with an Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz (quad core, no hyperthreading) I manage to get the john killed with SUGUSR2 at 761 forked processes (with Idle=Y in john.conf, when I run 4 other john processes (with Idle = N in john.conf) trying (not) to crack the rar test hashes:
And some tests with a smaller number of forked processes exited with $? = 0, but less than 1500 unique hashes cracked.
|
As
With some more changes in ulimits
I get
So, this is most likely not related to ulimit. |
I bet #798 (comment) would make it much better, (at least provided you don't use a Save interval that is a multiple of 3). |
I tested with the default Save interval of 60, which is a multiple of 3. Why do you think that matters here? All these test runs are much faster than 60 seconds, even on my old 32bit system.
No indication of problems in log file, stderr or stdout output. |
BTW, the |
Because, like you found out, without that patch and with OS_TIMER, some things would happen after 0 seconds instead of after three seconds. Maybe that is unrelated. |
Yes, but that was without the patch. With your patch, you'd get the SIGUSR2 signals 2 seconds later, no matter if you have |
I had the vague idea that under this crazy over-booking of resources there could be a difference. But now that I think about it, we should already be "protected" against USR2 anyway: The real key is calling sig_init() and sig_init_child() early enough and this doesn't change that. I can't think of any way to do those earlier than we do now iirc. |
On the other hand, if we indeed introduce yet another counter instead of using |
Anyway we put it, I think this issue and #798 are purely academical. If we can find solutions for them, fine. If we can't, I will not lose any sleep over it. |
I'm closing this. If some similar problem can still be triggered, please open a brand new issue. This issue is too clobbered anyway. |
This is with latest jtrts commit (19ca304106ebebb5d0d9b717adc2cb4626cc9808) and latest bleeding-jumbo commit (4454406)
The text was updated successfully, but these errors were encountered: