diff --git a/doc/CHANGES b/doc/CHANGES index 9866e67cef..2d35332d29 100644 --- a/doc/CHANGES +++ b/doc/CHANGES @@ -1,5 +1,6 @@ The following changes have been made between John 1.7.8 and 1.7.9: +* Added optional parallelization of the MD5-based crypt(3) code with OpenMP. * Added optional parallelization of the bitslice DES code with OpenMP. * Replaced the bitslice DES key setup algorithm with a faster one, which significantly improves performance at LM hashes, as well as at DES-based @@ -268,4 +269,4 @@ Mac OS X (PowerPC and x86), SCO, BeOS. * Bug and portability fixes, and new bugs. * Bonus: "Strip" cracker included in the default john.conf (john.ini). -$Owl: Owl/packages/john/john/doc/CHANGES,v 1.69 2011/11/20 05:20:33 solar Exp $ +$Owl: Owl/packages/john/john/doc/CHANGES,v 1.70 2011/11/21 02:36:55 solar Exp $ diff --git a/doc/FAQ b/doc/FAQ index 8ba3c065ab..b373d77e78 100644 --- a/doc/FAQ +++ b/doc/FAQ @@ -190,34 +190,36 @@ and refuses to work if an error occurs. If you need to test all of the low-level routines at once, use "--test". Q: Does John support multi-processing or distributed processing? -A: There's currently built-in parallel processing support (to make use -of multiple CPUs and/or CPU cores on a single system) for DES-based -crypt(3) hashes (traditional, bigcrypt, and BSDI-style), OpenBSD-style -Blowfish-based crypt(3) (bcrypt) hashes (with John's own optimized code) -and for the underlying system's thread-safe password hashing function -(crypt_r(3) on Linux or crypt(3C) on Solaris). The latter is only -reasonable to use for crypt(3) hash types not yet supported by John -natively (that is, for glibc 2.7+ SHA-crypt hashes as used by recent -versions of Fedora and Ubuntu, and for SunMD5 hashes). To use this -limited OpenMP support, you need to make an OpenMP-enabled build of John -by uncommenting one of the OMPFLAGS lines near the beginning of the -Makefile. This requires GCC 4.2+ or another OpenMP-capable C compiler. -For other hash types and/or to distribute the workload between multiple -machines, other approaches need to be used. For a small number of nodes -(CPUs, CPU cores, and/or machines), it is reasonable to use a manual -approach. One of those approaches is to have your nodes try different -password lengths. This is easily accomplished with "incremental" mode's -"MinLen" and "MaxLen" settings (see CONFIG). Typically, you would not -really need to split the workload for "single crack" and wordlist modes -since these are relatively quick, although you may dedicate one node to -those initially. You may safely run multiple instances of John in the -same working directory, all writing to the same "pot file" (this is a -feature). You do, however, need to assign each of them a unique session -name, with "--session". Other approaches, such as splitting password -files naively (without regard to salts), are typically less efficient -(in some cases to the extent where there's no speedup from using -multiple nodes at all). Some advanced and automated approaches are -listed on the wiki at: +A: There's currently built-in parallel processing support using OpenMP +(to make use of multiple CPUs and/or CPU cores in a single system) for +all crypt(3) hash flavors (DES-, MD5-, and Blowfish-based) supported by +John natively, as well as for LM hashes and, when running on Linux or +Solaris, also for the underlying system's thread-safe password hashing +function. The latter is only reasonable to use for crypt(3) hash types +not yet supported by John natively (that is, for glibc 2.7+ SHA-crypt +hashes as used by recent versions of Fedora and Ubuntu, and for SunMD5 +hashes, which may optionally be enabled on Solaris). In "community +enhanced" -jumbo versions, parallelization with OpenMP is also supported +for many (but not all) of the hash types added in those versions. To +use John's OpenMP support, you need to make an OpenMP-enabled build by +uncommenting one of the OMPFLAGS lines near the beginning of the +Makefile. This requires GCC 4.2 or newer, or another OpenMP-capable C +compiler. For other hash types and/or to distribute the workload +between multiple machines, other approaches need to be used. For a +small number of nodes (CPUs, CPU cores, and/or machines), it is +reasonable to use a manual approach. One of those approaches is to have +your nodes try different password lengths. This is easily accomplished +with "incremental" mode's "MinLen" and "MaxLen" settings (see CONFIG). +Typically, you would not really need to split the workload for "single +crack" and wordlist modes since these are relatively quick, although you +may dedicate one node to those initially. You may safely run multiple +instances of John in the same working directory, all writing to the same +"pot file" (this is a feature). You do, however, need to assign each of +them a unique session name, with "--session". Other approaches, such as +splitting password files naively (without regard to salts), are +typically less efficient (in some cases to the extent where there's no +speedup from using multiple nodes at all). Some advanced and automated +approaches are listed on the wiki at: http://openwall.info/wiki/john/parallelization Q: What is the format of the crash recovery files ("john.rec", other @@ -232,4 +234,4 @@ trivial to explain, it is not possible to reasonably describe some others without going into great detail on John internals. If you really need to know, read the source code. -$Owl: Owl/packages/john/john/doc/FAQ,v 1.26 2011/10/24 01:44:46 solar Exp $ +$Owl: Owl/packages/john/john/doc/FAQ,v 1.27 2011/11/21 02:36:55 solar Exp $ diff --git a/src/MD5_fmt.c b/src/MD5_fmt.c index 5cf4d7a5cf..7b0799f147 100644 --- a/src/MD5_fmt.c +++ b/src/MD5_fmt.c @@ -42,7 +42,23 @@ static struct fmt_tests tests[] = { {NULL} }; -static char saved_key[MD5_N][PLAINTEXT_LENGTH + 1]; +static char (*saved_key)[PLAINTEXT_LENGTH + 1]; + +struct fmt_main fmt_MD5; + +static void init(void) +{ + MD5_std_init(); + +#if MD5_std_mt + fmt_MD5.params.min_keys_per_crypt = MD5_std_min_kpc; + fmt_MD5.params.max_keys_per_crypt = MD5_std_max_kpc; +#endif + + saved_key = mem_alloc_tiny( + sizeof(*saved_key) * fmt_MD5.params.max_keys_per_crypt, + MEM_ALIGN_CACHE); +} static int valid(char *ciphertext) { @@ -103,36 +119,43 @@ static int binary_hash_6(void *binary) static int get_hash_0(int index) { + init_t(); return MD5_out[index][0] & 0xF; } static int get_hash_1(int index) { + init_t(); return MD5_out[index][0] & 0xFF; } static int get_hash_2(int index) { + init_t(); return MD5_out[index][0] & 0xFFF; } static int get_hash_3(int index) { + init_t(); return MD5_out[index][0] & 0xFFFF; } static int get_hash_4(int index) { + init_t(); return MD5_out[index][0] & 0xFFFFF; } static int get_hash_5(int index) { + init_t(); return MD5_out[index][0] & 0xFFFFFF; } static int get_hash_6(int index) { + init_t(); return MD5_out[index][0] & 0x7FFFFFF; } @@ -172,21 +195,31 @@ static char *get_key(int index) static int cmp_all(void *binary, int count) { +#if MD5_std_mt + int t, n = (count + (MD5_N - 1)) / MD5_N; +#endif + for_each_t(n) { #if MD5_X2 - return *(MD5_word *)binary == MD5_out[0][0] || - *(MD5_word *)binary == MD5_out[1][0]; + if (*(MD5_word *)binary == MD5_out[0][0] || + *(MD5_word *)binary == MD5_out[1][0]) + return 1; #else - return *(MD5_word *)binary == MD5_out[0][0]; + if (*(MD5_word *)binary == MD5_out[0][0]) + return 1; #endif + } + return 0; } static int cmp_one(void *binary, int index) { + init_t(); return *(MD5_word *)binary == MD5_out[index][0]; } static int cmp_exact(char *source, int index) { + init_t(); return !memcmp(MD5_std_get_binary(source), MD5_out[index], sizeof(MD5_binary)); } @@ -203,10 +236,13 @@ struct fmt_main fmt_MD5 = { SALT_SIZE, MIN_KEYS_PER_CRYPT, MAX_KEYS_PER_CRYPT, +#if MD5_std_mt + FMT_OMP | +#endif FMT_CASE | FMT_8_BIT, tests }, { - MD5_std_init, + init, valid, fmt_default_split, (void *(*)(char *))MD5_std_get_binary, @@ -225,7 +261,7 @@ struct fmt_main fmt_MD5 = { set_key, get_key, fmt_default_clear_keys, - (void (*)(int))MD5_std_crypt, + MD5_std_crypt, { get_hash_0, get_hash_1, diff --git a/src/MD5_std.c b/src/MD5_std.c index b6b66b84c2..019f9d519f 100644 --- a/src/MD5_std.c +++ b/src/MD5_std.c @@ -13,7 +13,16 @@ #include "common.h" #include "MD5_std.h" +#if MD5_std_mt +#include +int MD5_std_min_kpc, MD5_std_max_kpc; +int MD5_std_nt; +MD5_std_combined *MD5_std_all_p = NULL; +static char saved_salt[9]; +static int salt_changed; +#else MD5_std_combined CC_CACHE_ALIGN MD5_std_all; +#endif #if !MD5_IMM static MD5_data MD5_data_init = { @@ -265,12 +274,25 @@ static MD5_data MD5_data_init = { ROTATE_LEFT ((a), (s)); \ (a) += (b); +#if MD5_std_mt +#if MD5_X2 +static void MD5_body_for_thread(int t, MD5_word x0[15], MD5_word x1[15], + MD5_word out0[4], MD5_word out1[4]); +#define MD5_body(x0, x1, out0, out1) \ + MD5_body_for_thread(t, x0, x1, out0, out1) +#else +static void MD5_body_for_thread(int t, MD5_word x[15], MD5_word out[4]); +#define MD5_body(x, out) \ + MD5_body_for_thread(t, x, out) +#endif +#else #if MD5_X2 static void MD5_body(MD5_word x0[15], MD5_word x1[15], MD5_word out0[4], MD5_word out1[4]); #else static void MD5_body(MD5_word x[15], MD5_word out[4]); #endif +#endif #else @@ -310,47 +332,80 @@ static void MD5_swap(MD5_word *x, MD5_word *y, int count) #define prefix MD5_std_all.prefix #define prelen MD5_std_all.prelen -static void init_line(int line, int index, MD5_block *even, MD5_block *odd) -{ - order[line][index].even = even; - order[line][index].odd = odd; -} - void MD5_std_init(void) { int index; MD5_pool *current; +#if MD5_std_mt + int t, n; + + if (!MD5_std_all_p) { + n = omp_get_max_threads(); + if (n < 1) + n = 1; + if (n > MD5_std_mt_max) + n = MD5_std_mt_max; + MD5_std_min_kpc = n * MD5_N; + { + int max = n * MD5_std_cpt; + while (max > MD5_std_mt_max) + max -= n; + n = max; + } + MD5_std_max_kpc = n * MD5_N; +/* + * The array of MD5_std_all's is not exactly tiny, but we use mem_alloc_tiny() + * for its alignment support and error checking. We do not need to free() this + * memory anyway. + */ + MD5_std_all_p = mem_alloc_tiny(n * MD5_std_all_size, + MEM_ALIGN_PAGE); + MD5_std_nt = n; + } +#endif + for_each_t(MD5_std_nt) { #if !MD5_IMM - MD5_std_all.data = MD5_data_init; -#endif - - for (index = 0, current = pool; index < MD5_N; index++, current++) { - init_line(0, index, ¤t->e.p, ¤t->o.psp); - init_line(1, index, ¤t->e.spp, ¤t->o.pp); - init_line(2, index, ¤t->e.spp, ¤t->o.psp); - init_line(3, index, ¤t->e.pp, ¤t->o.ps); - init_line(4, index, ¤t->e.spp, ¤t->o.pp); - init_line(5, index, ¤t->e.spp, ¤t->o.psp); - init_line(6, index, ¤t->e.pp, ¤t->o.psp); - init_line(7, index, ¤t->e.sp, ¤t->o.pp); - init_line(8, index, ¤t->e.spp, ¤t->o.psp); - init_line(9, index, ¤t->e.pp, ¤t->o.psp); - init_line(10, index, ¤t->e.spp, ¤t->o.p); - init_line(11, index, ¤t->e.spp, ¤t->o.psp); - init_line(12, index, ¤t->e.pp, ¤t->o.psp); - init_line(13, index, ¤t->e.spp, ¤t->o.pp); - init_line(14, index, ¤t->e.sp, ¤t->o.psp); - init_line(15, index, ¤t->e.pp, ¤t->o.psp); - init_line(16, index, ¤t->e.spp, ¤t->o.pp); - init_line(17, index, ¤t->e.spp, ¤t->o.ps); - init_line(18, index, ¤t->e.pp, ¤t->o.psp); - init_line(19, index, ¤t->e.spp, ¤t->o.pp); - init_line(20, index, ¤t->e.spp, ¤t->o.psp); + MD5_std_all.data = MD5_data_init; +#endif + + current = pool; + for (index = 0; index < MD5_N; index++) { +#define init_line(line, init_even, init_odd) \ + order[line][index].even = init_even; \ + order[line][index].odd = init_odd; + init_line(0, ¤t->e.p, ¤t->o.psp); + init_line(1, ¤t->e.spp, ¤t->o.pp); + init_line(2, ¤t->e.spp, ¤t->o.psp); + init_line(3, ¤t->e.pp, ¤t->o.ps); + init_line(4, ¤t->e.spp, ¤t->o.pp); + init_line(5, ¤t->e.spp, ¤t->o.psp); + init_line(6, ¤t->e.pp, ¤t->o.psp); + init_line(7, ¤t->e.sp, ¤t->o.pp); + init_line(8, ¤t->e.spp, ¤t->o.psp); + init_line(9, ¤t->e.pp, ¤t->o.psp); + init_line(10, ¤t->e.spp, ¤t->o.p); + init_line(11, ¤t->e.spp, ¤t->o.psp); + init_line(12, ¤t->e.pp, ¤t->o.psp); + init_line(13, ¤t->e.spp, ¤t->o.pp); + init_line(14, ¤t->e.sp, ¤t->o.psp); + init_line(15, ¤t->e.pp, ¤t->o.psp); + init_line(16, ¤t->e.spp, ¤t->o.pp); + init_line(17, ¤t->e.spp, ¤t->o.ps); + init_line(18, ¤t->e.pp, ¤t->o.psp); + init_line(19, ¤t->e.spp, ¤t->o.pp); + init_line(20, ¤t->e.spp, ¤t->o.psp); +#undef init_line + current++; + } } } +#if MD5_std_mt +static MAYBE_INLINE void MD5_std_set_salt_for_thread(int t, char *salt) +#else void MD5_std_set_salt(char *salt) +#endif { int length; @@ -370,11 +425,21 @@ void MD5_std_set_salt(char *salt) } } +#if MD5_std_mt +void MD5_std_set_salt(char *salt) +{ + memcpy(saved_salt, salt, sizeof(saved_salt)); + salt_changed = 1; +} +#endif + void MD5_std_set_key(char *key, int index) { int length; MD5_pool *current; + init_t(); + for (length = 0; key[length] && length < 15; length++); current = &pool[index]; @@ -409,7 +474,11 @@ void MD5_std_set_key(char *key, int index) order[19][index].length = current->l.pp; } -void MD5_std_crypt(void) +#if MD5_std_mt +static MAYBE_INLINE void MD5_std_crypt_for_thread(int t) +#else +void MD5_std_crypt(int count) +#endif { int length, index, mask; MD5_pattern *line; @@ -591,11 +660,40 @@ void MD5_std_crypt(void) #endif } +#if MD5_std_mt +void MD5_std_crypt(int count) +{ +#if MD5_std_mt + int t, n = (count + (MD5_N - 1)) / MD5_N; +#endif + +#ifdef _OPENMP +#pragma omp parallel for default(none) private(t) shared(n, salt_changed, saved_salt) +#endif + for_each_t(n) { +/* + * We could move the salt_changed check out of the parallel region (and have + * two specialized parallel regions instead), but MD5_std_crypt_for_thread() + * does so much work that the salt_changed check is negligible. + */ + if (salt_changed) + MD5_std_set_salt_for_thread(t, saved_salt); + MD5_std_crypt_for_thread(t); + } + + salt_changed = 0; +} +#endif + #if !MD5_ASM #if !MD5_X2 +#if MD5_std_mt +static void MD5_body_for_thread(int t, MD5_word x[15], MD5_word out[4]) +#else static void MD5_body(MD5_word x[15], MD5_word out[4]) +#endif { MD5_word a, b = Cb, c = Cc, d; @@ -687,8 +785,13 @@ static void MD5_body(MD5_word x[15], MD5_word out[4]) #else +#if MD5_std_mt +static void MD5_body_for_thread(int t, MD5_word x0[15], MD5_word x1[15], + MD5_word out0[4], MD5_word out1[4]) +#else static void MD5_body(MD5_word x0[15], MD5_word x1[15], MD5_word out0[4], MD5_word out1[4]) +#endif { MD5_word a0, b0 = Cb, c0 = Cc, d0; MD5_word a1, b1, c1, d1; diff --git a/src/MD5_std.h b/src/MD5_std.h index 0d305d0c74..2c00e3cc25 100644 --- a/src/MD5_std.h +++ b/src/MD5_std.h @@ -65,7 +65,6 @@ typedef struct { typedef struct { #if !MD5_IMM MD5_data data; - double dummy; #endif MD5_binary out[MD5_N]; @@ -77,7 +76,30 @@ typedef struct { int prelen; } MD5_std_combined; +#if defined(_OPENMP) && !MD5_ASM +#define MD5_std_mt 1 +#define MD5_std_cpt 128 +#define MD5_std_mt_max (MD5_std_cpt * 24) +extern MD5_std_combined *MD5_std_all_p; +extern int MD5_std_min_kpc, MD5_std_max_kpc; +extern int MD5_std_nt; +#define MD5_std_all_align 64 +#define MD5_std_all_size \ + ((sizeof(MD5_std_combined) + (MD5_std_all_align - 1)) & \ + ~(MD5_std_all_align - 1)) +#define MD5_std_all \ + (*(MD5_std_combined *)((char *)MD5_std_all_p + t)) +#define for_each_t(n) \ + for (t = 0; t < (n) * MD5_std_all_size; t += MD5_std_all_size) +#define init_t() \ + int t = (unsigned int)index / MD5_N * MD5_std_all_size; \ + index = (unsigned int)index % MD5_N; +#else +#define MD5_std_mt 0 extern MD5_std_combined MD5_std_all; +#define for_each_t(n) +#define init_t() +#endif /* * MD5_std_crypt() output buffer. @@ -107,9 +129,9 @@ extern void MD5_std_set_salt(char *salt); extern void MD5_std_set_key(char *key, int index); /* - * Main encryption routine, sets MD5_out. + * Main hashing routine, sets MD5_out. */ -extern void MD5_std_crypt(void); +extern void MD5_std_crypt(int count); /* * Returns the salt for MD5_std_set_salt(). diff --git a/src/x86-any.h b/src/x86-any.h index 82183c1b63..63584fcb53 100644 --- a/src/x86-any.h +++ b/src/x86-any.h @@ -1,6 +1,6 @@ /* * This file is part of John the Ripper password cracker, - * Copyright (c) 1996-2001,2008,2010 by Solar Designer + * Copyright (c) 1996-2001,2008,2010,2011 by Solar Designer */ /* @@ -42,7 +42,11 @@ #define DES_BS_VECTOR 0 #define DES_BS_EXPAND 0 +#ifdef _OPENMP +#define MD5_ASM 0 +#else #define MD5_ASM 1 +#endif #define MD5_X2 0 #define MD5_IMM 1 diff --git a/src/x86-mmx.h b/src/x86-mmx.h index 63c59c6539..a112a02397 100644 --- a/src/x86-mmx.h +++ b/src/x86-mmx.h @@ -63,7 +63,11 @@ #define DES_BS 1 #define DES_BS_EXPAND 1 +#ifdef _OPENMP +#define MD5_ASM 0 +#else #define MD5_ASM 1 +#endif #define MD5_X2 0 #define MD5_IMM 1 diff --git a/src/x86-sse.h b/src/x86-sse.h index 82f9bd9d04..add0589ec5 100644 --- a/src/x86-sse.h +++ b/src/x86-sse.h @@ -94,7 +94,11 @@ #endif #define DES_BS_EXPAND 1 +#ifdef _OPENMP +#define MD5_ASM 0 +#else #define MD5_ASM 1 +#endif #define MD5_X2 0 #define MD5_IMM 1