Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus Error on Mac OSX Yosemite with Clang #1025

Closed
shellster opened this issue Jan 14, 2015 · 54 comments
Closed

Bus Error on Mac OSX Yosemite with Clang #1025

shellster opened this issue Jan 14, 2015 · 54 comments

Comments

@shellster
Copy link

Hi,

I have been trying to get the latest copy to compile and run under OSX with Yosemite. I finally got it to compile using clang. I tried GNU gcc but keep running into syntax errors in the opencl libraries:

$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.0.0
Thread model: posix

I get the following warnings, when running make, but otherwise everything seems to work:

./configure --enable-mpi && make clean && make -s
gpg2john.c:470:1: warning: variable 'SYM_ALGS' is not needed and will not be emitted [-Wunneeded-internal-declaration]
SYM_ALGS[] = {
^
gpg2john.c:1057:1: warning: variable 'TAG' is not needed and will not be emitted [-Wunneeded-internal-declaration]
TAG[] = {
^
gpg2john.c:1266:1: warning: variable 'SIGSUB' is not needed and will not be emitted [-Wunneeded-internal-declaration]
SIGSUB[] = {
^
3 warnings generated.

However, when I actually try running john I get the following error:

./john ./hashes.txt -w:yahoo.txt --format=ntlmv2-opencl

[test-lab:42105] *** Process received signal ***
[test-lab:42105] Signal: Bus error: 10 (10)
[test-lab:42105] Signal code: Non-existant physical address (2)
[test-lab:42105] Failing at address: 0x14d19f000
[test-lab:42105] [ 0] 0   libsystem_platform.dylib            0x00007fff8f9a2f1a _sigtramp + 26
[test-lab:42105] [ 1] 0   ???                                 0x00007fbd5143e690 0x0 + 140451088950928
[test-lab:42105] [ 2] 0   ???                                 0x696e690000007972 0x0 + 7597125070141553010
[test-lab:42105] *** End of error message ***
Bus error: 10

The error is inconsistent, but happens about 90% of the time. Usually it is a fatal error, but sometimes the cracker continues to run. Yesterday, it spit out the error, and then the entire Mac seg-faulted and rebooted.

@magnumripper
Copy link
Member

I run JtR on Yosemite all the time, with CUDA and OpenCL. No problem except the header bugs described below..

First, the gpg2john.c warnings are benign (actually sort of false positives) and hard to mute.

I tried GNU gcc but keep running into syntax errors in the opencl libraries:

Yosemite come with header errors (that was a first for me), maybe that is the problem you describe? The headers only work with Objective C. I fixed it with the patch below:

--- /usr/include/dispatch/object.h.orig       2014-09-09 22:53:42.000000000 +0200
+++ /usr/include/dispatch/object.h    2014-10-22 08:39:39.000000000 +0200
@@ -140,7 +140,11 @@
  * Instead, the block literal must be copied to the heap with the Block_copy()
  * function or by sending it a -[copy] message.
  */
+#if __clang__ || OS_OBJECT_USE_OBJC
 typedef void (^dispatch_block_t)(void);
+#else
+typedef void (dispatch_block_t)(void);
+#endif

 __BEGIN_DECLS

--- /usr/include/dispatch/queue.h.orig        2014-09-09 22:53:42.000000000 +0200
+++ /usr/include/dispatch/queue.h     2014-10-22 08:44:08.000000000 +0200
@@ -360,6 +360,9 @@

 typedef long dispatch_queue_priority_t;

+#ifndef __has_include
+#define __has_include(x) 0
+#endif
 /*!
  * @typedef dispatch_qos_class_t
  * Alias for qos_class_t type.

This may or may not be the cause of your segfaults too. After applying the above you should be able to build with gcc too. Please apply and report back.

magnumripper added a commit that referenced this issue Jan 14, 2015
@magnumripper
Copy link
Member

Latest commit includes that patch and instructions in doc/INSTALL

@shellster
Copy link
Author

Thanks for the quick reply. The patch worked for me. However I still couldn't build with gcc initially:

./configure --enable-mpi CC=/usr/local/bin/gcc-4.9 CXX=/usr/local/bin/g++-4.9

As I would get the following error:

/var/folders/j6/wy95tjc950n5352c_z9mxqm40000gn/T//cc1wG4p1.s:398:no such instruction: `andn %edi, %r8d,%eax'

I was able to work around this error by removing the following wherever it appeared in the Makefile:

-march=native

After that, I was finally able to build the entire thing with GNU gcc 4.9, but alas, I'm still getting the same error.

5 0g 0:00:00:49 15.84% (ETA: 02:23:06) 0g/s 106454p/s 532272c/s 532272C/s 7preallue..thehope3
[test-lab:52959] *** Process received signal ***
[test-lab:52959] Signal: Bus error: 10 (10)
[test-lab:52959] Signal code: Non-existant physical address (2)
[test-lab:52959] Failing at address: 0x181ab9000
[test-lab:52959] [ 0] 0   libsystem_platform.dylib            0x00007fff8f9a2f1a _sigtramp + 26
[test-lab:52959] [ 1] 0   ???                                 0x0000000000000050 0x0 + 80
[test-lab:52959] *** End of error message ***

@magnumripper
Copy link
Member

That -march=native problem should be fixed if you follow doc/INSTALL - it's the native 'as' lacking support for assembling AVX instructions so you need to copy osx_as_wrapper.sh to as at some place in your path before /usr/bin (presumably /usr/local/bin)

@shellster
Copy link
Author

I do not believe this item to be closed. I have followed the doc/INSTALL directions completely, and tried again. With the doc/INSTALL directions, there is no need for removing the native flag as you stated. The Yosemite patch you provided is also not needed and actually broke things (I had to go remove it) otherwise I was getting build errors.

The underlying problem still remains, however. I'm still getting the bus error. This is the problem that is preventing use of the latest John edition. Should I open a new bug just about that error?

@magnumripper
Copy link
Member

What is the output of which as?

@magnumripper magnumripper reopened this Jan 14, 2015
@magnumripper
Copy link
Member

Did you make -s clean after patching the system headers?

@magnumripper
Copy link
Member

Thanks for the quick reply. The patch worked for me.

The Yosemite patch you provided is also not needed and actually broke things

Please decide. Which is it? If it broke things, what did it break?

@shellster
Copy link
Author

Sorry, I was not being very clear, and I think I may have sent us down a wild goose chase. Before you told me about the OSX directions, I was running configure and manually passing it CC= which came from macports. That version was giving me the errors which your Yosemite patch "fixed".

After reading your directions for OSX on the install page, I grabbed gcc 4.9 from brew as the directions indicated, and set up osx_as_wrapper.sh. Now when I attempted to compile and make, I was getting a new error about not being able to find 'dispatch_block_t'. On a hunch, I went and removed the Yosemite patch, and did a new make and make clean. This built just fine. So the patch was unneeded and breaks things. I apologize for not following the build directions sooner.

After building john, and running it again using the same format as before, I'm seeing numerous rounds of the Bus Error / non-existant physical address error. This is concerning to me, though I haven't re-experienced a complete crash of john or the system, so it may not be actually breaking anything, but I can't tell.

@Bodo-von-Greif
Copy link

I use brew and get a "Illegal instruction: 4"
brew info john-jumbo
john-jumbo: stable 1.8.0 (bottled)
OSX Yosemite, 10.10.1

@DomT4
Copy link

DomT4 commented Jan 14, 2015

^ What exactly triggers that Illegal Instruction error?

@magnumripper
Copy link
Member

@Bodo-von-Greif for problems with Homebrew bottles, please report to them, not us. Although I will try to help them out if I can. This is probably not related to this issue.

@magnumripper
Copy link
Member

@shellster this is what I get with gcc and without that system header patch

In file included from /usr/include/dispatch/dispatch.h:51:0,
                 from /System/Library/Frameworks/OpenCL.framework/Headers/gcl.h:23,
                 from /System/Library/Frameworks/OpenCL.framework/Headers/opencl.h:16,
                 from common-opencl.h:19,
                 from common_opencl_pbkdf2.h:11,
                 from common_opencl_pbkdf2_plug.c:13:
/usr/include/dispatch/object.h:143:15: error: expected identifier or '(' before '^' token
 typedef void (^dispatch_block_t)(void);
               ^
/usr/include/dispatch/object.h:362:3: error: unknown type name 'dispatch_block_t'
   dispatch_block_t notification_block);
   ^
make[1]: *** [common_opencl_pbkdf2_plug.o] Error 1

Applying it and retrying:

$ sudo patch -b -N -p0 <unused/Yosemite.patch 
patching file /usr/include/dispatch/object.h
patching file /usr/include/dispatch/queue.h

$ make -sj8
ar: creating archive aes.a

Make process completed.

@magnumripper
Copy link
Member

Here's md5sums of original files

$ md5 /usr/include/dispatch/{object,queue}.h.orig
MD5 (/usr/include/dispatch/object.h.orig) = ff02d8108b7bb88a76a1026c12c75c58
MD5 (/usr/include/dispatch/queue.h.orig) = ac7652c772575cc33d7abebbda881742

And here's for patched files

$ md5 /usr/include/dispatch/{object,queue}.h
MD5 (/usr/include/dispatch/object.h) = 69be3cf9cbb5aaa5588f7d360ebb3f75
MD5 (/usr/include/dispatch/queue.h) = a6cb47be112d54889add0ae5e4a038b8

EDIT here's when using the latest patch as updated above

$ md5 /usr/include/dispatch/{object,queue}.h
MD5 (/usr/include/dispatch/object.h) = 5c903b0e784ac9205cdf6aa507daefbc
MD5 (/usr/include/dispatch/queue.h) = a6cb47be112d54889add0ae5e4a038b8

@magnumripper
Copy link
Member

What exactly triggers that Illegal Instruction error?

That usually means, for example, that you built for AVX and then try to run it on a system lacking AVX.

@magnumripper
Copy link
Member

This is OT but anyway: @Bodo-von-Greif @DomT4 I just tested the Homebrew bottle and it looks like they built it for AVX. This means it will only run on an AVX-capable CPU or else it will fail with "illegal instruction" just like I thought. A proper build for binary dists would have automatic fallbacks to john-sse2. Or at least it should be configured for SSE2 only (all x86-64 has SSE2) but that would be suboptimal for people having better than that.

@DomT4
Copy link

DomT4 commented Jan 14, 2015

This means it will only run on an AVX-capable CPU or else it will fail with "illegal instruction" just like I thought.

The bottles are aimed at the widest user base possible usually. Feel free to raise an issue at Homebrew/Homebrew with that information though; Mike who handles the bottles may be able to do something about it :).

@magnumripper
Copy link
Member

@DomT4
Copy link

DomT4 commented Jan 14, 2015

Thanks Magnum, Appreciate that.

@magnumripper
Copy link
Member

@shellster I will close this as user error. If you still think the patch is causing problems rather than fixing them, please reopen and elaborate.

@magnumripper
Copy link
Member

FWIW 3c28891 updates that patch (I also updated the copy above) in order to avoid recent problems when not using gcc but native clang. So with the new patch you can use "real" gcc or not, and there really shouldn't be any side effects.

@shellster maybe this was what you were trying to describe all the time? Re-reading the above you were pretty vague about what issue you actually had. You said you got errors with the (old) patch applied which indicates you actually weren't ending up using your newly installed gcc-4.9 but still using clang for one reason or the other (that would usually be a PATH issue - or maybe you did not re-run configure and make clean, or YMMV).

Oh and BTW that as-wrapper shell script was dropped in a91cd2c in favor of a more elegant solution but in the end there's no difference.

@shellster
Copy link
Author

Just pulled the latest build. I was able to build it with no issues. It still crashes with the error at the bottom of my first post whenever I try to crack netntlm hashes. It did not crash when trying to crack kerberos tickets. I don't know what the issue is, but john 1.8.0-jumbo is unusable on my Yosemite box for netntlm hashes.

The only non-standard thing that I am still doing when compiling, is adding --enable-mpi to the configure call. Everytime I build, I do a make clean. I never have not called make clean before a build.

@magnumripper magnumripper reopened this Jan 28, 2015
@magnumripper
Copy link
Member

OK, thanks. And this is still with a clang build, right? You mention netntlm but your pasted problem was with ntlmv2-opencl - does this mean you have this problem with CPU format(s) too or is it only with the OpenCL version(s)? If it's OpenCL only, you could try using --device=cpu for testing whether the problem is bound to the GPU device or not. And please post the output of john --list=build-info as well as john --list=opencl-devices

@magnumripper
Copy link
Member

The only non-standard thing that I am still doing when compiling, is adding --enable-mpi to the configure call

Using Homebrew's OpenMPI this will force use of clang - gcc won't be used even if available. This explains much of the initial confusion.

@shellster
Copy link
Author

Running john with --device=cpu by itself did not produce the error. However, when I tried running it with MPI I got the error again. As you mentioned, I am using the HomeBrew OpenMPI for the build.

mpirun -np 8 john -w:/uniq.txt --device=cpu ./hashes.txt
Warning: detected hash type "netntlm", but the string is also recognized as "netntlm-naive"
Use the "--format=netntlm-naive" option to force loading these as that type instead
Loaded 3 password hashes with 3 different salts (netntlm, NTLMv1 C/R [MD4 DES (ESS MD5) 128/128 AVX 16x])
Node numbers 1-8 of 8 (MPI)
Send SIGUSR1 to mpirun for status
[test:01509] *** Process received signal ***
[test:01509] Signal: Bus error: 10 (10)
[test:01509] Signal code: Non-existant physical address (2)
[test:01509] Failing at address: 0x147a60000
[test:01509] [ 0] 0   libsystem_platform.dylib            0x00007fff97e76f1a _sigtramp + 26
[test:01509] [ 1] 0   ???                                 0x1b9f392474a1bf3d 0x0 + 1990372389059411773 

This error is inconsistent however, I ran multiple runs without the error, and then it popped up again. This is making it exceedingly difficult to test.

@shellster
Copy link
Author

As I said previously I built with default options except --enable-mpi. However, I still have the "as" hack from the OSX build instructions. Should I remove that? I also have the modified system headers:
/usr/include/dispatch/object.h
/usr/include/dispatch/queue.h

I didn't make back-ups (stupidly) before modifying them before. If you have the originals and can send them to me via email, I'll put them back and recompile.

@magnumripper
Copy link
Member

I'll try and run some tests actually utilizing MPI.

If you run the latest Git version you can remove the 'as' wrapper script from /usr/local/bin but you don't need to rebuild, the new version does the exact same thing but without needing the script so the resulting binaries will be the same anyway.

If you give me your email I can hand you the pristine system headers but you should be able to just run the patch backwards (provided you use the same version of the patch that was used before) by adding -R to patch. Or just edit them manually, it's just a few lines.

Do you actually need MPI, eg. for running sessions across several hosts over the network? If you just want multiple local processes you should use --fork instead but maybe you knew that.

@magnumripper
Copy link
Member

I still can't reproduce any problem. We need to rule out stuff in order to nail this:

  • OpenCL is ruled out already, the netntlm format is not an OpenCL format.
  • Does it ever happen without using MPI? I think you said it does.
  • So does it ever happen with a non-MPI build?
  • Does it happen with other modes than wordlist?
  • Does it only happen with very large wordlist files (or very small, etc)?

You could try "brew doctor", "brew update" and "brew upgrade" just in case this is a Homebrew problem that was fixed recently.

@shellster
Copy link
Author

I was not aware of the Fork option. I will use that and see if the problem goes away. I'll report back once I can confirm.

@shellster
Copy link
Author

I had a chance to crack some more netntlm hashes using john with the fork option. I got the same errors. I was not using the opencl option. My command looked like this john --fork=8 -w:/path/to/wordlist --rules /path/to/hashes.txt. It ran fine for quite awhile then crashed. This, I believe, should indicated that opencl is not the issue, and neither is openmpi.

I have not been able to rule out size of word lists. It only has ever happened using large wordlists, but I don't know if this is time related, or size related. It might not be either, since it is so inconsistent. I'll keep reporting back as I find more information. Hopefully, someone else, will contribute to this thread if they experience the same thing.

In the meantime, I've gone back to the last john jumbo build based on the v1.7.x. It has never crashed with the error in this thread.

@magnumripper
Copy link
Member

If we could get a backtrace from the debugger it would give us lots of clues. Are you familiar with gdb? Basically you'd just prepend your command line with gdb --args, eg:

$ gdb --args ./john --fork=8 -w:/path/to/wordlist --rules /path/to/hashes.txt

Once inside the debugger (under OSX it will initially spew out lots of "warnings" that you can ignore), type r (for run) and John will start running as usual. But if it crashes, gdb will tell exactly where the crash occured (source file and line number). You'd then type bt for a backtrace and copy-paste the gdb error and the backtrace here.

You'd need to use a build of john made with just make as opposed to make install since the latter strips debug information from the binary.

@shellster
Copy link
Author

Update. I ran it again after recompiling the latest john and not running make install.
After figuring out how to get gdb going, I ran it through gdb. The first thing I noticed is that when running the process through gdb, I get the error almost immediately, however it isn't a critical error when running through gdb, so I can't back trace it. I'm going to try running without forking and see that's the reason I'm not getting a chance to backtrace it:

$ gdb --args ./john --fork=8 -w:./wordlist.txt --rules /tmp/hashes.txt
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin14.0.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./john...done.
(gdb) r
Starting program: ./john --fork=8 -w:./wordlist.txt --rules /tmp/hashes.txt
Warning: detected hash type "netntlmv2", but the string is also recognized as "ntlmv2-opencl"
Use the "--format=ntlmv2-opencl" option to force loading these as that type instead
Loaded 45 password hashes with 45 different salts (netntlmv2, NTLMv2 C/R [MD4 HMAC-MD5 32/64])
Node numbers 1-8 of 8 (fork)
Press 'q' or Ctrl-C to abort, almost any other key for status
[pc:12919] *** Process received signal ***
[pc:12919] Signal: Bus error: 10 (10)
[pc:12919] Signal code: Non-existant physical address (2)
[pc:12919] Failing at address: 0x11d706000
[pc:12919] [ 0] 0   libsystem_platform.dylib            0x00007fff8e38ff1a _sigtramp + 26
[pc:12919] *** End of error message ***
1 0g 0:00:00:10 0.00% (ETA: 2015-03-07 13:56) 0g/s 14227p/s 640255c/s 640255C/s !el0!e$
6 0g 0:00:00:10 8.83% (ETA: 18:40:32) 0g/s 12505p/s 562748c/s 562748C/s Eri1
7 0g 0:00:00:10 10.59% (ETA: 18:40:13) 0g/s 12471p/s 561233c/s 561233C/s carolicaroli
4 0g 0:00:00:10 5.32% (ETA: 18:41:46) 0g/s 12499p/s 562462c/s 562462C/s nounos
5 0g 0:00:00:10 7.08% (ETA: 18:41:00) 0g/s 12492p/s 562159c/s 562159C/s caroline1
3 0g 0:00:00:11 3.57% (ETA: 18:43:46) 0g/s 12702p/s 571634c/s 571634C/s Luty7
8 0g 0:00:00:10 12.34% (ETA: 18:40:00) 0g/s 12592p/s 566653c/s 566653C/s iesram

@shellster
Copy link
Author

All right, it took much longer when I wasn't running a forked process, but this time I got a proper break, and did a backtrace:

[New Thread 0x140b of process 13037]

Program received signal SIGBUS, Bus error.
0x00000001001b07f0 in mgetl (res=<optimized out>) at wordlist.c:179
179     while (map_pos < map_scan_end &&
(gdb) bt
#0  0x00000001001b07f0 in mgetl (res=<optimized out>) at wordlist.c:179
#1  do_wordlist_crack (db=0x100cea1d8, name=<optimized out>, rules=<optimized out>) at wordlist.c:1186
#2  0x0000000000000000 in ?? ()

@magnumripper
Copy link
Member

Very interesting. Perhaps you could also supply some lines from a log file of a session with this problem? The interesting lines are like

0:00:00:00 Proceeding with wordlist mode
0:00:00:00 - Rules: Wordlist
0:00:00:00 - Wordlist file: ../run/password.lst
0:00:00:00 - memory mapping wordlist (26325 bytes)
0:00:00:00 - loading wordfile $JOHN/password.lst into memory (26325 bytes, max_size=5000000)
0:00:00:00 - wordfile had 3559 lines and required 28472 bytes for index.
0:00:00:00 - suppressed 13 duplicate lines and/or comments from wordlist.
0:00:00:00 - 57 preprocessed word mangling rules

...essentially all lines the have a dash between timestamp and message, except lines like - Rule #x accepted as y

@shellster
Copy link
Author

Here's the entire john.log until I got the previous crash:

0:00:00:00 Starting a new session
0:00:00:00 Loaded a total of 45 password hashes with 45 different salts
0:00:00:00 Sorting salts, for performance
0:00:00:00 - Hash type: netntlmv2, NTLMv2 C/R (lengths up to 125)
0:00:00:00 - Algorithm: MD4 HMAC-MD5 32/64
0:00:00:00 - Configured to use otherwise idle processor cycles only
0:00:00:00 Proceeding with wordlist mode
0:00:00:00 - Rules: Wordlist
0:00:00:00 - Wordlist file: /tmp/uniq.txt
0:00:00:00 - memory mapping wordlist (2260819010 bytes)
0:00:00:00 - 57 preprocessed word mangling rules
0:00:00:00 - Rule #1: ':' accepted as ''

I don't know if it is relevant, but the wordlist I am using is a modified dazzlepod uniq word list. Mine is currently 241584732 words (lines) long, so it is fairly large.

magnumripper added a commit that referenced this issue Feb 26, 2015
@magnumripper
Copy link
Member

Many thanks for persisting in trying to nail this! I truly hope you did now. This is an edge case bug, very unlikely to happen. I'm really glad we found it. I think (and hope) that 433982e fixes the problem for good. Please report back.

I am pretty sure this depends on the exact size of wordlist and exact lengths of words so if you run the same command using the same wordlist you should be able to reproduce the problem without the patch, and verify it's gone with it.

@shellster
Copy link
Author

Thank you for your support and working on a possible patch. Initial testing seems to indicate that the issue may be fixed for john with no forking or openmpi and john with forking or john with openmpi. However, I instantly get the original bus error message when I try --format=ntlmv2-opencl and --fork=8. I think we may have found a bug, but I'm not sure it's the same bug...

Since the issue can take a long time to manifest, and I can't debug it when running via fork or openmpi, I will try running a single thread all night and see if I can get either issue to pop up.

@magnumripper
Copy link
Member

It seems pretty unlikely to me that you would experience two separate bugs that noone else has reported. Maybe the bug was somehow not 100% fixed. I will try to stage a wordlist that does trigger the bug (without the patch), and work from there.

@magnumripper
Copy link
Member

BTW, what is the very last word of that wordlist? Or, more importantly, what length is that word?

@shellster
Copy link
Author

Here's the last "word" and length:

$ tail -n 1 /tmp/uniq.txt


$ tail -n 1 /tmp/uniq.txt | wc -c
      21

@magnumripper
Copy link
Member

Perhaps that very last word of your list also lacks a linefeed? That is supposed to be supported, but I seem to be able to trigger the bug (without the patch) only when last word lacks a linefeed.

@magnumripper
Copy link
Member

Your tail output indicates it does have a LF though. The length is 20 without a LF.

@shellster
Copy link
Author

I piped it through xxd just to be sure, and there is a \n on the end.
My single thread test, with the patch, no OpenCL, has been running for over an hour and a half without any crash. I'll report back when it breaks or finishes.

@magnumripper
Copy link
Member

On another note:

I get the error almost immediately, however it isn't a critical error when running through gdb, so I can't back trace it.

[pc:12919] *** Process received signal ***
[pc:12919] Signal: Bus error: 10 (10)
[pc:12919] Signal code: Non-existant physical address (2)
[pc:12919] Failing at address: 0x11d706000
[pc:12919] [ 0] 0   libsystem_platform.dylib            0x00007fff8e38ff1a _sigtramp + 26
[pc:12919] *** End of error message ***
1 0g 0:00:00:10 0.00% (ETA: 2015-03-07 13:56) 0g/s 14227p/s 640255c/s 640255C/s !el0!e$
6 0g 0:00:00:10 8.83% (ETA: 18:40:32) 0g/s 12505p/s 562748c/s 562748C/s Eri1
7 0g 0:00:00:10 10.59% (ETA: 18:40:13) 0g/s 12471p/s 561233c/s 561233C/s carolicaroli
4 0g 0:00:00:10 5.32% (ETA: 18:41:46) 0g/s 12499p/s 562462c/s 562462C/s nounos
5 0g 0:00:00:10 7.08% (ETA: 18:41:00) 0g/s 12492p/s 562159c/s 562159C/s caroline1
3 0g 0:00:00:11 3.57% (ETA: 18:43:46) 0g/s 12702p/s 571634c/s 571634C/s Luty7
8 0g 0:00:00:10 12.34% (ETA: 18:40:00) 0g/s 12592p/s 566653c/s 566653C/s iesram

I think what happens is process 2 (with pid 12919) crashes (note the lack of status line for it). In that situation I think you should be able to halt the debugger with ctrl-c and then attach 12919 (as the Bus error was listed for 12919) and then bt or something like that. At any rate, this looks like a totally different crash (although they might be related).

EDIT https://sourceware.org/gdb/onlinedocs/gdb/Forks.html indicates the child processes run detached from the debugger so when the above crash happens, it's too late.

@magnumripper
Copy link
Member

My single thread test, with the patch, no OpenCL, has been running for over an hour and a half without any crash. I'll report back when it breaks or finishes.

I think as soon as it starts processing rule #2, you will not see any crash related to that bug. Either you get it when it processes the last word for the first time (at rule #1), or it won't trigger at all.

Also, the number of hashes or hash type, or using rules or not, does not matter for the bug that was fixed. I test it with a single NT hash and no rules. I use a ~2 GB wordlist and just vary the length of the very last word. At lengths over 8 and lacking a LF, I can trigger the crash (without the patch). For some reason I have yet to trigger it when the last word has a linefeed.

@magnumripper
Copy link
Member

./john --fork=8 -w:./wordlist.txt --rules /tmp/hashes.txt

Oh by the way: If the above always crashes at the same point, in node 2, you should try replacing --fork=8 with -node=2/8 and run that instead. This will run just node 2 of it, in almost the same conditions so chances are it will crash under the debugger.

@shellster
Copy link
Author

After several partial single runs, I couldn't get a crash. I never did get the wordlist crash again. I can get multiple, almost immediate crashes as soon as a fork and use the ntlmv2-opencl format. I'll try the node option and see if that gets me somewhere. I tried attaching to the crashing process via a second instance of gdb. Here's what I got, though it doesn't look that helpful to me (I probably don't know what I'm doing):

gdb attach 40540
GNU gdb (GDB) 7.7.1
...
attach: No such file or directory.
Attaching to process 40540
[New Thread 0x1203 of process 40540]
[New Thread 0x1303 of process 40540]
Reading symbols from /tmp/john_test/run/john...done.
0x00007fff8e38ff00 in ?? ()
(gdb) bt
#0  0x00007fff8e38ff00 in ?? ()
#1  0x00007fff931d1ec3 in ?? ()
#2  0x0000000101133a00 in ?? ()
#3  0x0000000000003c00 in ?? ()
#4  0x0000000000000001 in ?? ()
#5  0x00000000000000de in ?? ()
#6  0x000000013a06e000 in ?? ()
#7  0x0000000000000030 in ?? ()
#8  0x0000000100ac5508 in ?? ()
#9  0x000000010211c76f in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb) bt 0
(More stack frames follow...)
(gdb) bt 1
#0  0x00007fff8e38ff00 in ?? ()
(More stack frames follow...)
(gdb) bt 2
#0  0x00007fff8e38ff00 in ?? ()
#1  0x00007fff931d1ec3 in ?? ()
(More stack frames follow...)
(gdb) bt 3
#0  0x00007fff8e38ff00 in ?? ()
#1  0x00007fff931d1ec3 in ?? ()
#2  0x0000000101133a00 in ?? ()
(More stack frames follow...)
(gdb) bt 4
#0  0x00007fff8e38ff00 in ?? ()
#1  0x00007fff931d1ec3 in ?? ()
#2  0x0000000101133a00 in ?? ()
#3  0x0000000000003c00 in ?? ()
(More stack frames follow...)
(gdb) ^CQuit
(gdb) q
A debugging session is active.

    Inferior 1 [process 40540] will be detached.

Quit anyway? (y or n) y

@shellster
Copy link
Author

I don't know if the node argument is broken, or if I'm not doing it right:

./john -w:/tmp/uniq.txt --format=ntlmv2-opencl --fork=8 --node=2/8 --rule /tmp/hashes.txt

Invalid node specification: 2/8: node numbers can't exceed node count

@magnumripper
Copy link
Member

You need to drop the -fork=8 and say eg. -node=2/8 instead. If you ran with -fork=n and node m crashed, you'd say -node=m/n.

@magnumripper
Copy link
Member

Here's what I got, though it doesn't look that helpful to me (I probably don't know what I'm doing)

The problem is the crash happens down in a system library where we lack debugging symbols. But normally you'd get some info from a backtrace.

Ideally you should try to find a way for me/us to reproduce the crash with some test hash(es) and some exact wordlist/rules/whatever and exact command line. But it might not be very easy to find out. Especially since (provided this still has anything to do with memory mapping) it depends on how much memory you've got and how taxed that memory is with other things. So you might not even have a reproducible case yourself.

@magnumripper
Copy link
Member

BTW the OpenCL formats are a lot harder to get good debug info from (because a lot more shared libs lacking debug info are involved). It would be better if you can catch a crash in netntlmv2 CPU format.

@magnumripper
Copy link
Member

Oh, and another aspect is that it might not make a lot of sense to run -fork=8 with GPU. Especially not when trying to nail a bug. If you can only get crashes in OpenCL now I'm tempted to write it down to Apple driver problems. The drivers are pretty good now, 2.5 years ago they were terrible. Really really terrible. Almost no OpenCL format could run at all.

Having said that, I just ran some ntlmv2-opencl on my Macbook w/ GT650M, even using fork=8 and a crazy big wordlist with rules. Doesn't crash here (yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants