-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance dynamic so it can 'compile' a self-contained thin format from an expression #1463
Comments
That would be really awesome, but it might be a QC nightmare. We could add all sorts of stuff to TS but the user may find untested variants that ends up with latent bugs. Having said that, we should support username salts ($u, shouldn't pose a problem) and perhaps constant strings (that may or may not need hex notation) for salt/pepper/whatnot. If nothing else, it will be a cool experiment. |
Doing some stuff by hand (using the lexer/parser output from pass_gen.pl). Here is a by hand run-thru (just thinking outloud, and writing it down so I do not forget).
|
I would think the ONLY things supported will end up being the flat_buffer functions. The flat buffer methods ARE more limiting, since there is only the ability to crypt from an input outputting to an input (appending or overwriting). So things like that common sub-expression may not be able to be done. The reason I did it the way I did, was I have 2 input and 2 output buffers. I built the flat buffer code the way I did, because it greatly reduces the complexity of the code. I 'could' change that, to work more like the interleaved code, but I would really have to think that one through. |
I only plan on support of these things: all usable hash types I do not plan on support of $f1..$f9 $s2 Flags would not be 'settable', but I think I can find several optimizations (such as all salts being hash($s), then I can use the MGF_SALT_AS_HEX, and there are other optimizations that can be auto-found, such as the common expressions here. |
Ok, I have this done:
I did get things running (not compiling a script 'yet'
The 'new' format is the @dynamic= line. At least it works. I also have most of the 'linkage' functions done that manage everything EXCEPT the compile of the script. The compile should be easy. I will simply add constant code that can handle a couple of 'static' hashes. I think I will allow md5($s.$p), sha1($p), sha512($p) at first (and then simply strcmp to return the proper script). Yes, it is not a compiler, 'but' it will allow me to make sure that everything else is working. Once I get to that point, I simply have to get the parser/emitter done (big task). But this may be more than just a pipe dream now. |
Getting close. Current the compile only 'handles' these 4 exact signatures. Now, I just have to get the real parser working (for real)
|
Here is a fun one :) The 'script' which the expression compiler will have to produced (I did it by hand now)
Generation of some data (using pass_gen.pl)
And running
And the .pot file
And a -show command
|
Does it matter what exactly you specify as --format on the command line? |
There are 2 lines in the split function which are commented out. Remove those comments and the format will behave like you request. Yes this is still very early. I am not ready to announce it's behavior to the user base. |
Ok, one benefit of current logic, is that if the prefixes do not get added, then hashes are removed as found upon a re-run, no matter what expression was used. So a list of raw hashes is obtained, and someone runs a md5(md5(md5($p))) and finds a bunch, if later a run of md5(sha1($p)) is run, all of the items found on the first run will not be looked at again. Now on a raw run, it does not matter all that much. But if these were salted, it really matters (or if doing re-gen-salts ) |
Ok, here is an example. I created a file called inp with 20 hashes each, made with sha1($p), sha1($p.$p) and sha1(md5($p)) (each of those are currently 'accepted' by our fake compiler). I changed the format to output the expressions to the front part of the hash. Here is the results:
So as is shown, the already found hashes are not removed, and are re-tested. Note, the first run that found 40 instead of 20, was due to rules (double a password rule). I changed the format back to not appending the @dynamic=... wort onto the hashes, reran the input file, and here is what was seen.
It can be seen here that the hashes are being properly removed from subsequent runs, without regard to what type expression found it before. Now, I ALSO like adding the expression to the found hash, SO THAT you really know just wtf type algorithm created and can crack this hash. BUT if we do that, then I will have to find some way to make sure that proper reducing behavior is seen. |
This is definitely a problem, because it means --show will use them too. Let's say you cracked the password "magnum" with sha1($p.$p). It will be stored in john.pot like
Then you use sha1($p) and --show. It will say the password is "magnum" although it's really "magnummagnum" [for that algo]. Similar problems may exist for For any raw hash (of non-UTF16 data), we can live without tags in john.pot but anything else needs tags! |
But another aspect of what you said is true: If we have a million unknown 128-bit hashes (of mixed origin), we store them untagged and try eg. some MD4 algos on them. Many are cracked. Then we try some MD5 algos. Now we'd not want the already cracked MD4 ones to be included. Fixing the problem I describe means it will be hard to accomplish this. |
Yes, this is NOT an easy fix, I think. It all depends on where the hash is coming from, and how it is expected to be used. You are 100% right on the -show. They need the tags. But if tags are there, then we need to know if we are 'running', and if so when re-reading the .pot file, we need to strip tags to 'check', and also strip them from input (just for the check), BUT likely we only have this type behavior IF we are cracking. hmmmm. |
This is not new with @dynamic@ though. Here's another way to look at it.
With clever handling (and perhaps a revision of john.pot format) any of the formats would be able to use any one of the lines to produce a correct --show figure from an input hash. For example, you load the hash |
Yet another way to look at it, is that all prepend/append-salt raw formats like Maybe this should be an "issue" on its own, for possible enhancements. |
I am 'working' on a kludge. If I get that working, then we can use the logic to figure out what to do the right way. You are right, this might be something that we can do in a generic way, and get this behavior for other formats (specifically other dynas). This would be 'almost' as good as alias. |
Ok, here it the kludge (for cracking). First, comment out the #if HAVE_CRYPT stuff around the ldr_in_pot in loader.c. My build does not have CRYPT (VC), BUT I need that flag set. I know no reason to ONLY set that flag if the build has crypt lib?? Then, I changed the split in the thin format to this: static char *our_split(char *ciphertext, int index, struct fmt_main *self)
{
extern int ldr_in_pot;
if (strncmp(ciphertext, "@dynamic=", 9) && strncmp(ciphertext, dyna_signature, dyna_sig_len)) {
// convert back into @dynamic@ format
static char Buf[512];
sprintf(Buf, "%s%s", dyna_signature, ciphertext);
ciphertext = Buf;
}
if (ldr_in_pot == 1 && !strncmp(ciphertext, "@dynamic=", 9)) {
static char Buf[512], Buf2[512];
char *cp = strchr(&ciphertext[1], '@');
if (cp) {
strcpy(Buf, &cp[1]);
sprintf(Buf2, "%s%s", dyna_signature, Buf);
ciphertext = Buf2;
}
}
return ciphertext;
} What this does is if loading from .pot file, if it sees any @dynamic= hashes, it strips those off, and in their place, it inserts the proper signature. NOTE, for show, this would NOT be the right way. This would be as bad as having the raw hashes in the DB. Again, that gets back to my point of we not only need to know where the data is coming from, BUT how it is being used. |
I'll open a new issue for the generic brain-storming |
I have the get_token, lexer (with FULL expression validation, and points out exact spot of any errors), and pcode compiler parts done. Now all I need to do is to get this optimized, and the pcode converted into dynamic script (an emitter module). I think I am also going to extend the dynamic language. I think I am going to add a few temp variables. One thing I can see, is to pre-process into temp vars. So if I have a hash like md5(md5($p).$s.$p.md5($p)) that I can put md5($p) into a temp var. This would end up being a flag to dyna, so that when new keys were added, the temp var would get filled in. I can see other usages for these vars. |
I now have a partly working script generator. But I now have to be able to properly build Test= string(s). I will probably have to use dynamic crypt_all in some way to do this. But I am sort of in that proverbial chicken and egg conundrum. However, things are getting closer. I would LOVE to be able to drop the arbitrary numbering system of dynamic, which results in needing 'alias' logic to tell that dyna_0 and dyna_2000 are the same. It would be so much better to simply use dynamic=md5($p) and lose the implementation detail driven ambiguity |
I guess you should always use OpenSSL for building test vectors. |
I forced NoFlat and ran the dyna-compiler-test.sh and here is what I got for first group
I would be most of the failures are due to improper MaxLength setting. Dyna1 works, except that I reduced salt from -32 to -16, and all salt in the test data were 32, so the new format loaded no test hashes. So if i set ,saltlen=32 then that format works fine. |
After a little more work, I am here:
Getting closer, but I think I am going to turn in for the nite, and work on this later. |
Ok, I think I am getting closer. I am able to handle this type format now: ../run/john -test=0 -form=dynamic='sha1($s.sha1($s.sha1($s.$p)).$s)' Before I was failing with this one ../run/john -test=0 -form=dynamic='sha1($s.sha1($s.sha1($p)))' which is dynamic_38. Now, I can not do this one (not 'sure' that I can do it with flat dynamic at all). ./run/john -test=0 -form=dynamic='sha1($s.sha1($s.sha1($s.$p)).$s,sha1($p.$s))' I believe that this hash would require 3 input buffers for the flat dyna to do. It could have been done by having 4 'data' fields (prior to going to the flat interface). We had 2 input and 2 output buffers, and we 'could' have done this one by using an output buffer as a temp variable. Here is the 'script' that is being generated, and I will highlight where the problem is:
However, with a bit of creativity (much more than I think I would EVER be able to put into the compiling engine), I 'think' I can do this format by hand. I will try and see. Ok, I was able to actually do this format in flat dyna.
But again, I just do not see the code generator being made smart enough to 'find' these type methods. |
I may not fully grasp this, but how about doing the very first step in mixed mode, and the rest in flat? Is it already possible? Doable? No-go zone? |
That usually does not help much. The conversion loses gain, and then you are in flat mode. Now, by hand, I have gone the other way (for expressions with several to many crypts). Then if the password is only in one small part of the expression, I have done that flat, so that I am not limited by 1 limb, and then switched over to mixed. This is pretty much what we have done with the PBKDF2 code. 1 oSSL hashes and from then on, single limb hashes (the 2nd limb of a 2 limb crypt), can easily be done in mixed format. I will move to get mixed SIMD working as well as possible for md[45], but there will likely be some area where it will simply return a self-test failure for some expressions, and then you will simply have to live with flat expressions. Even flat expressions will have a point where they do not generate working code. If I add the 'variable' fields, that point could be greatly extended, but there will STILL be a limit. |
I am going to first expand the large-hash interface. Currently it only allows this:
This was done because it WAS all hand coded and maintained crap before. But now, with the code being auto-generated, I think I will add more functions. Maintenance is nothing. I simply have to get the function 'macro's done properly once, and then they are ALL done. Once I do this, it will double the available variables. This is as good as it gets on the mixed simd code. To do this, I only need to add 4 functions. Now, I do have those already (for md[45] and some for sha1) in the mixed mode. The code would be pretty much the same, but it would not have mixed mode stuff. Only flat (multi-buffer), and oSSL. I think I can get this EASILY done, and see how far it takes me. And this should add almost no complexity to the 'code' of dynamic. It will add a LOT of boiler plate functions, but they will all 'work' right out of the box. |
Hmm, I thought I was going to be able to use all the existing manipulation functions of the mixed SIMD code. I do not think so. The only code there, that uses the 'output' data works only for md[45], 16 byte. That is why the sha1 is only very limited hooked into the mixed SIMD code. I still think I can do this easily, but it will not be as simple as I had hoped for (leveraging existing code). Well it may actually be better in the end, writing all new code for this, scrapping the legacy stuff, which is a hodge podge of half finished ideas. At least so far, the FLAT interface has maintained some reasonable design thought. |
@magnumripper, @frank-dittrich, How big is 'too' big of a .c source file? dynamic_big_crypt.c could get pretty large. I 'could' split the file, but that would mean either the functions in dynamic_big_crypt_header.cin would have to become extern global, or be static and replicated. Now, since it is auto-generated code, replicated would never have problems with code maintenance. But it is extra 'bloat'. But in the same sense, it also does localize the code and put functions closer to each other (if that matters at all anymore). I think all the manipulation functions will be singletons (I do not need them for each hash type), but the in1->out1, etc functions will all have to be specialized for each hash type. The output buffers will be shared between all types (thus, they will be 128 bits blocks in arrays. There will also be some other data for each output buffer (I may go with 4 instead of 2), and that data will be width of hash currently in the buffer, if it needs swapped for output, etc. Then these output buffers could be used as data to input conversion functions (which would know amount of data, and how to treat it), or they could be used as intermediate results (as is #saltfail, PBKDF2, etc!!!) or as 'raw' input. Not quite the 'generic' buffer I wanted, but half way there. |
Here is my big hash output variable. What do you think, before I get too far into this?
I think it looks workable, it should be writable by oSSL, by SIMD. Now, it does not do MD5_body (x2), but I am not sure I am supporting that at this time. Yes, it did help some non-intel systems, but it was a LOT more ugly code to maintain. I may put it in at some later time, but not right now. Flat has always been pure CTX or SIMD logic only. |
I love the new way I did dynamic! I just added 280 extern global and 70 'static' dynamic functions functions in an hour. Now, right now, they only 'work' for CTX, but all I have to do is one function (which would be 35 more static functions), and it works for CTX and SIMD. The 8 new 'extern' functions are very thin. They simply handle the DYNA_OMP_PARAMS, and then simply call the single function, passing in a 1 or 2 for input, and a 1, 2, 3, 4 for the output. Once I get them working (for both SIMD and CTX), I will then make the data manipulators. The nice thing about those functions, is they can be in the dynamic_big_crypt_header.cin file. They only have to be a single function, not something tied to the crypt type. |
@jfoug for generated code I wouldn't care if the file size increases. IMHO there's no reason to add extra bloat just to split that file into smaller ones. |
the .o file is 1.4mb now (but that is lots of dbg stuff also. It builds fine right now. The 8 new external globals for each type (that will increase) was pretty thin. Those functions do very little. They just set things up for a static inline 'common' function, that is told which of the inputs and which of the outputs to use, and then calls either the CTX or the SIMD function (again, 2 new inlines for each hash). Later, I will likely have to add functions that can crypt from one of these outputs, into an input or possibly also into another output (including crypt_out_4_to_out_4 for an inplace crypt). Adding these functions, and getting them right, is trivial. One thing I have thought of, is that the output is not 'packed'. I should probably re-think how I have my structure, and try to pack them. I have added new code to the SIMD, and I bet it would be better if I did not, and instead tried to use packed data, and possibly even SIMD mixed data for these buffers. The mixed might be something optional (would need another flag in the structure). If I know that I will do a lot of crypt into one of these buffers, then muliple crypts (or even just one more), directly from these buffers, then storing mixed, to reload mixed 'might' be fastest. |
If I use SSEi_FLAT_OUT can I resume, i.e. multi buffers ? |
Nope! SSEi_FLAT_OUT can not be resumed (here is a test) #include "arch.h"
#include "simd-intrinsics.h"
#include <stdio.h>
#include <openssl/md5.h>
ARCH_WORD_32 __attribute__((__alligned__(16))) Buffer[5000];
ARCH_WORD_32 __attribute__((__alligned__(16))) input[5000];
void dump(ARCH_WORD_32 *, int);
int main() {
MD5_CTX ctx;
strcpy((char*)input, "1234567890123456789012345678901234567890123456789012345678901234567890\x80");
input[30] = 70*8;
dump(input,128);
SIMDmd5body(input, Buffer, Buffer, SSEi_FLAT_IN|SSEi_2BUF_INPUT_FIRST_BLK);
SIMDmd5body(&input[16], Buffer, Buffer, SSEi_FLAT_IN|SSEi_2BUF_INPUT|SSEi_RELOAD);
dump(Buffer,16);
SIMDmd5body(input, Buffer, Buffer, SSEi_FLAT_IN|SSEi_2BUF_INPUT_FIRST_BLK|SSEi_FLAT_OUT);
SIMDmd5body(&input[16], Buffer, Buffer, SSEi_FLAT_IN|SSEi_2BUF_INPUT|SSEi_RELOAD|SSEi_FLAT_OUT);
dump(Buffer,16);
SIMDmd5body(input, Buffer, Buffer, SSEi_FLAT_IN|SSEi_2BUF_INPUT_FIRST_BLK);
SIMDmd5body(&input[16], Buffer, Buffer, SSEi_FLAT_IN|SSEi_2BUF_INPUT|SSEi_RELOAD);
dump(Buffer,16);
MD5_Init(&ctx);
MD5_Update(&ctx, "1234567890123456789012345678901234567890123456789012345678901234567890", 70);
MD5_Final((unsigned char *)Buffer, &ctx);
dump(Buffer,16);
system("echo -n 1234567890123456789012345678901234567890123456789012345678901234567890 | md5sum");
return 0;
}
void dump(ARCH_WORD_32 *_p, int len) {
int i;
unsigned char *p = (unsigned char*)_p;
for (i = 0; i < len; ++i) {
printf ("%02x", p[i]);
if (i & 4 == 3)
printf (" ");
}
printf ("\n");
} Output is:
the 0297.... is from the SSEi_FLAT_OUT run. |
We already have a SSEi_FLAT_RELOAD_SWAPLAST flag.. But we have to flat_reload. Is there any way I can talk you into building code for a new flag: SSEi_RELOAD_FLAT = 0x2000 | SSEi_RELOAD Now I do find this in sybase-ase: SIMDSHA256body(prep_key[index/MAX_KEYS_PER_CRYPT], crypt32, NULL, SSEi_FLAT_IN|SSEi_FLAT_RELOAD_SWAPLAST);
SIMDSHA256body(&(prep_key[index/MAX_KEYS_PER_CRYPT][1]), crypt32, crypt32, SSEi_FLAT_IN|SSEi_RELOAD|SSEi_FLAT_RELOAD_SWAPLAST);
SIMDSHA256body(NULL_LIMB, crypt32, crypt32, SSEi_FLAT_IN|SSEi_RELOAD);
SIMDSHA256body(NULL_LIMB, crypt32, crypt32, SSEi_FLAT_IN|SSEi_RELOAD);
SIMDSHA256body(NULL_LIMB, crypt32, crypt32, SSEi_FLAT_IN|SSEi_RELOAD);
SIMDSHA256body(NULL_LIMB, crypt32, crypt32, SSEi_FLAT_IN|SSEi_RELOAD);
SIMDSHA256body(NULL_LIMB, crypt32, crypt32, SSEi_FLAT_IN|SSEi_RELOAD);
SIMDSHA256body(&(prep_key[index/MAX_KEYS_PER_CRYPT][2]), crypt32, crypt32, SSEi_FLAT_IN|SSEi_RELOAD|SSEi_FLAT_RELOAD_SWAPLAST);
// Last one with FLAT_OUT
SIMDSHA256body(&(prep_key[index/MAX_KEYS_PER_CRYPT][3]), crypt_out[index], crypt32, SSEi_FLAT_IN|SSEi_RELOAD|SSEi_FLAT_OUT); But I may need (almost certainly will) any intermediate values to be flat format, yet still need to be able to re-run the hash. I guess I could byte swap if I need to run the next limb, but doing so on the CPU side is not simple, and then I have to address the hash type, the PARA, etc. |
To me it can be a gigabyte as long as it's within a certain topic.
I'm not sure I understand your need yet but if there's a combination we don't support that would gain speed or functionality, we should just add it. |
I hope I can get by without it. However, I did have to add this, and it was easy to add: SSEi_OUTPUT_AS_2BUF_INP_FMT = 0x2000 | SSEi_OUTPUT_AS_INP_FMT This was added because the buffers I have added (output 1 to output4), are arrays of 128 byte buffers. They can hold 'any' hash type, and are 'self-describing' at least on what data is in there. So, for the 32 bit formats, I had to add a 2-buffer-input format since these buffers are input style, but offset 128 bytes. I also added that flag to the SIMD64 sha512 formats, but it is not used (yet). But with this change to the 64 bit ones, it would allow them to crypt directly into the 'real' input buffer array, which is 256 bytes (or 2 input buffers). So doing something like sha512raw(sha512raw(sha512raw(sha512raw($p)))) would be considerably faster, since there would be no data marshaling or data manipulation. Just do the first crypt, set the buffer up properly, and then call the crypt as many times as needed, then byteswap the final result (only 16 bytes), and put the results into the 'crypt' buffer. But let me work on, and see what I can do, without the reload of a flat crypt. |
This Fedora cross compiled exe not running in 'real' windos is really disheartening. What sux, is that john can run a single hash just fine: for f in `(./john -list=formats -format=dynamic-all && ./john -list=formats) | sed s/,//g` ; do ./john -test=0 -form=$f 2>/dev/null ; done This simply runs john testing all formats but doing so 1 at a time. This works like a champ, as does a real test (not just -test=0). But simply running -test or -test=0 with this build, and not specifying a format, will crash on most formats. Ugg... |
Getting 'close' on regen salts.
It does not find them all. It is possible, that the file is invalid. I will have to check to make sure. |
Ok, re-running the original file (with the This will be the signature needed: I have to make sure things ARE working properly, get some code cleaned up, and adjust the documentation. Also, I will be away from computer for a few days, so it will have to wait a bit, lol. |
example: my_wordlist.txt test: And nothing ! Why doesn't find anything ? |
@kernelr00ter You probably won't achieve anything useful, at least beyond what I wrote below, by posting what looks like an advanced user support request as a mostly-offtopic comment to a closed issue on GitHub. That said, FWIW, your hash is a |
@kernelr00ter OK, you did achieve a bit more. This works:
Whether it's a good solution to your actual (unspecified) problem or not is a separate question, not for here. |
I entered the hash incorrectly. We don't know in advance which hash is salted. md5($s.$p) salt pass in file hash_dump.txt run john --regen-lost-salts *** Process received signal *** In this case, the utility correctly detects the hash result, but does not show the salt. And the report file remains empty. But if you swap hashes Everything works perfectly. If you try to run the same file with examples in the format raw-md5 Or does she just write the salt and password this way. I don't know if this is a mistake or not. Создателю JTR, Александру, большое спасибо за труд. |
Wow, looks like it actually finds "the salt", but with it being NULL it then dies, LOL. I'll open an issue for that (we abuse the issue tracker by writing here). Thank you for reporting! |
RFC:
This one is somewhat of a pie in the sky wish. A command like this
../run/john -format="dynamic=md5($s.sha1($p.$s).$s)" input.txt
Now, input.txt could either by 'raw' hashes (32 byte hex, appended with $salt), OR it could be in the format that would be written to the .pot file. This format will NOT contain a$dynamic_##$ signature. The signature must be something that triggers the code compiler. I think the format label can be any length (it is a char*), so I think we could use this as the .pot line:
If this is done, then I expect to use an expression language very similar to what is being used within pass_gen.pl
ISSUES:
The text was updated successfully, but these errors were encountered: