-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in dynamic formats self test with OMP_NUM_THREADS=3 #825
Comments
What was the command line triggering this? Using memdbg is likely to help nailing it with ease. It looks like we are calling free() on some memory that was allocated by mem_alloc_tiny(). Then at exit, all "tiny memory" is freed in cleanup_tiny_memory() and the above is a fact. |
For example, a --test=0
|
Including OpenCL but not CUDA? Or does it happen with just |
I just passed a full --test=0 including CUDA and OpenCL formats.
|
I know that it happens only in two machines. The problem is that both has Ubuntu LTS, nothing bad should be happening. |
So could it be in some "optional" code? I got the following:
Build bot has this
Another possibility is that something in john.conf or john.local.conf differs, that triggers the problem. |
Again, try enabling memdbg on one of those hosts, and get its verdict.
then a |
To mem_dbg things seems ok. I disabled OpenMP and errors gone. |
The problem is that OMP_NUM_THREADS has to be multiple of 4. If you try |
BTW: I tested using And here the number of threads is 6. |
The workaround |
Since the error still occurs with OMP_NUM_THREADS=3, this bug shouldn't have been closed, even if you have a workaround. This is on well with latest bleeding-jumbo:
|
|
On my local 64bit linux system (CPU-only), I get these problems with OMP_NUM_THREADS 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31. |
git bisect points to this commit as causing the breakage:
|
Apparently, 339052e just exposed the bug for dynamic_0 self test.
|
So, only a few dynamic formats are affected. (This test was with commit 339052e reverted.) BTW, I guess I'll create a new jtrts issue. I guess |
Unfortunately @jfoug might not be able to have a look at this immediately. But it's good we tracked it down. |
This bug fix (339052e) was for the 64 bit formats where hash(hash(x).hash(y)) caused a full 256 bytes of usage. Whirlpool, sha512, etc. This was the problem were we were gettin OMP failures. If this is reverted, then those failures will come back (87-88 107-108 are formats that push the buffer to extreme limit). I will see if I can find the problem. One big problem, is it works perfectly on cygwin64 (sorry linux guys) I will have to see if I can figure this one out on the VM |
nope, got cygwin to core on OMP=5 |
I would think so, I get segfaults on OSX 64-bit. Should not be arch dependant other than SSE2.
Even memdbg itself cored, that's funny! |
The dynamic_fmt.c change for 339052e is obvious a cause of extensive overflows. BUT it was removed due to causing other issues. I just have to find those issues. It may have been in TS failures. |
MEMDBG_HDR and MEMDBG_HDR2 might be to small to detect typical memory corruption on sse2/avx builds, and we neede to (optionally) increase their size. |
That's wrong.
|
This is dyna buffers (inputs ) being overwritten. So heap after them is smashed. What it is will depend on what linker put there. But do not waste time tracking down any more issues. You will simply be chasing ur tail until the memory overflow is fixed. |
I think I have a work around for this. What I only need to do, is to drop the 'top' var down, IF it is larger than the entire count of max values. THAT is where the overflow is. I need to keep the code in there (i.e. the bad bugfix), where we do nothing if top > m_count, UNLESS it is larger than the entire count size of the buffer. What we are getting is the max count is not even divisible. I also need to look at that logic. It may be that the computation of max array size is busted. |
Please test with 8a8dcec It should handle both issues. Issue 1 was dealt with by 339052e BUT it caused this issue. I think 8a8dcec gets both of them. NOTE, leave this open. I do not like this correction. It 'may' be what is kept in the end, but I want to look for other avenues, and see if there is a better way to kill these problems. |
Native (6xOMP)
|
I can confirm that both problems (issues #825 and #690) are fixed with commit 8a8dcec on 64bit linux (AC build) and 32bit linux (legacy linux-x86-any build with OMP enabled): With 64bit Linux, I tested (bleeding-jumbo)run $ for i in `seq 1 33`; do OMP_NUM_THREADS=$i ./john --test=0 --format=dynamic; done |grep -v PASS and (master)test $ for i in `seq 1 9`; do OMP_NUM_THREADS=$i ./jtrts.pl -noprelims -q -type dynamic; echo $i; done With 32bit Linux, I just tested $ for i in `seq 1 33`; do OMP_NUM_THREADS=$i ./john --test=0 --format=dynamic; done |grep -v PASS The test suite (jtrts.pl) tests are still running on my 32bit system. |
@frank-dittrich I want to thank you a lot for the time you put into running this issue down. It saved me a TON of time, being pointed to right where the problem was. |
@jfoug No problem, just let me know when your final version of the fix is ready, and I'll repeat my tests. |
The original code is what I am going with. I did change comment and scrubbed original buggy code (commented out). I also changed the helper code comment showing what was going on. BUT the actual code itself is the same as the prior tested fix. The cosmetic patch is 1d03123 |
Meanwhile, the following 32bit linux test finished without problems: for i in `seq 1 17`; do OMP_NUM_THREADS=$i ./jtrts.pl -noprelims -q -type dynamic; echo $i; done |
JtR fails, but not in any machine I tried it (and only a few formats).
The text was updated successfully, but these errors were encountered: