New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cracking NetNTLMv1/v2 using NT hashes #2607
Conversation
|
Got very confused by the only one DES optimization in 5510sxx, but worked it out. Have kept the optimization for a0 but removed for a3 as it does not make sense in that instance. |
Mistake, optimized a3 5510 does not work yet
|
Hello Cablethief, Any chances you would be adding the same for the Kerberoasting attack (-m 13100)? I probably can add it by myself, but why not to ask, if you're contributing this. And I also assume you don't need to add both pure and optimized modules for this attack, since this attack should require only one of it, and the other one will never be used. I think only "a0_optimized" module should be needed. |
|
Since I need this attack right now, I've just removed the next lines from m13100_a0-optimized.cl : And it just works (with --self-test-disable --hex-wordlist). So, if you wold like me to properly contribute this to hashcat, please tell me, or you can contribute it by yourself in this PR, so both Kerberoasting and NetNTLMv1/v2 algorithms will be added in the same PR. |
I will be pushing out kernels for all of the other windows hashes that can be shucked in this manner after this is merged. Kerberos, DDC2, etc. No need to worry :) |
|
Hey, sorry for being late. I don't want to ruin this party, but there's a lot to do in this mode. There's a lot of potential missed but more important it seems that there is some general concept misunderstanding. I will separate the different issues. We need to provide the wordlist as hex encoded, otherwise we will not be able to process entries which are using 0x0a. Actually hashcat supports reading data from stdin that are zero bytes, but the CR is the problem. That results in an input that is always exact the length 32 byte string for both mode 27000 and 27100. We can add two functions to both modules: and register them later in the module: The modes will work only if the user specifies --hex-wordlist. We can't automatically enable the command line switch from the module, but we can add a warning in case the user is not doing so. I suggest to use OPTS_TYPE_PT_ALWAYS_HEXIFY to OPTS_TYPE in both modules. So far there's no warning but I will add it later to the core code, because there's other modules "suffering" from the same problem. For instance: 9710. Since the binary password length which arrives in the kernel is always 16 byte, we can save us the double maintenance of pure and optimized kernel. We just need to decide what's the better base. Typically, we only provide pure kernels. But if password length is not so relevant we can exploit this to improve performance on user request by using an optimized kernel with limited password length support. But this kind of optimization really only has an impact if we worked a different bottleneck first and that is the PCI-e transfer time. Hashcat works around this bottleneck by using an "amplifier". But in these modes, there's no amplifier, because there's no rule, no combination and no mask. So I made a test just to confirm that password length has no impact: pure: optimized: That's what I thought exactly. There's no difference in terms of performance between pure and optimized kernels. But that's great so we can drop the optimized kernels entirely to reduce code and to reduce maintainance. I also want to bring your attention to the following: Since the hash mode 27000 is so fast to process, the PCI-e bottleneck is so intense that all parallelization power of our GPU's can't create any benefit from this. So if you run the same attack but limit the attack to CPU: So the speed is slightly faster on CPU than on GPU! Of course that's only for super fast hashes like 27000 and 27100. If you are later adding other hash-modes using the same hash-as-input technique and that are slower to process you maybe see an advantage (or not) depending on where the NTLM calculation is done in the entire code chain. At this point I should also add that there's no performance difference between 27000 and 27100 even that 27100 is so much more to compute intense (I think 8 times or more instructions because of the additional HMAC). But the PCI-e bottleneck is so intense that even that 8 times as much computation overhead vanishes entirely. Now if you think of multi GPU system, they often use PCI riser cards and other effects such as limited PCI-e lanes. These effects can have serious impact on the the PCI-e multiplier depending on how much you spend on the hardware. This means for multi GPU, the bottleneck will be even more relevant. The benchmark for this mode is not applicable at all. I believe the -a 3 kernels were only added so that hashcat can run the -b mode. But that's not fair. I think there should be no -a 3 kernel. I think for the same reason you did not add the -a 1 kernel. And that is OK!. However, hashcat lacks an option to disable specific attack-modes on request of the developer. So hashcat core needs a change to support this. I could do the changes to hashcat core, but then I thought again about what's actually the problem. The real problem is that these hash-modes were implemented as fast hashes. But since there's no amplifier, they should have been implemented as slow hashes (even if they are fast to compute). If they were slow hashes, then there's no need for -a 1 and -a 3 kernel and then there's no need for special hashcat core changes. As a bonus the -b value would be the correct one. So I think the refactorization into a slow hash kernel is the first step to do. There's a good chance that the performance will increase in speed compared to fast hash implementation (as crazy as it sounds). That is because there's no registers lost to the rule engine amplifier, because we have no rules. Additionally we can add some special optimizations like setting pw_len to 16 hard coded. This allows the JiT to zero w[4] - w[15] elements and because they are zero it will realize it can remove all the bitwise operations from the hash computation for these elements. The unit tests are missing. Basically all you need to do is to copy the tools/test_modules/ entries and to remove the MD4 part. Please add this because it improves the overall quality of hashcat. |
|
Sorry, I have not had the time and will get on this shortly. I would just comment that the a3 was not a mistake as people have been doing these sorts of brutes prior https://hashcat.net/forum/thread-5912.html, and so I thought it would be cool to make it a bit more convenient. I do have the length restrictions, however its 16 not 32 as that seems to work for the --hex-wordlist. Thank you for such a comprehensive analysis as well, its my first time making a module and the feedback is really nice. I will work out those test modules as well. And try get it into a slow hash format. |
|
I have finally hopefully progressed some of the changes requested. The main thing I think I have an issue with currently is the test scripts, as I am unsure how to make the inputs give HEX without making changes to the main test.sh where maybe we have an if switch for hashtypes that require HEX inputs. So before that I wanted to check if that would be correct. I did have some issues with the amount of iterations to get the _loop to get called, and looking at the plugin development that appeared to be via the salt_iter. I am very skeptical I did that correctly. There are length limits in place for length 16 in both the modules and I added the ALWAYS_HEXIFY. |
|
I can see lots of movement here. Let me know when you want me to do a new review because I can't know if there's more changes coming in or not. |
|
Will do, I am just asking on the side for some advice as I am not certain about if you have only a single loop, should you just ignore teh _loop kern and do everything in _comp? Or do I set salt iter to 1? The second way is how its currently done, so if that is the way then I am braced for review. |
|
Is good you started to update the PR. I can see huge improvements already. There's a few things that we can improve even further.
I think it is better to stick to 32. The idea is for the user to enter the NTLM hash in hex encoded input. It is really much easier for the user to use in the hex encoded format than in the binary format. There is also the problem with the 0x0a in the binary input, which is automatically solved by using hex encoded input. A good example of how "binary" data encoded as hex encoded input from a word list is used is mode 22001. It follows the same idea as we want to do in 27x00. Take a look at the module, here are some relevant parts, but keep in mind that this mode handles a 256-bit key. In 27x00 we have a 128-bit key so all strings / configurations are half the size. The hash can be in hex-coded format: Now this function makes more sense: Now the length of the constraint checking functions is exactly the length of ST_PASS and the length of the input strings that we expect from user input. Feels much more natural. Please also take a look at this function of 22001: In order to run the min / max length check of the mask processor, we have to "update" the mask. This is important for the benchmark to be correct. You must also provide this function and remove some of the "x" to match 32. To understand how Hashcat handles decoding the hex encoded word list to binary form, place the code here in the _init kernel. Here again a snippet from 22001: You can ignore the swap(), it is only because 22001 is processing SHA1, but we are not in 27x00. So the idea is to decode the hex string into binary on the computing device. Why ask the host to do this when the GPU can do it :) Since there are no iterations, this section from the _loop kernel is not needed:
Do not worry about it. We assume that the user who uses test.pl functionality knows how this hash mode works. We expect the user to enter hex encoded input. If the input is anything else, we just process it knowing that whatever comes out is junk.
The way you've implemented it now, with salt_iter set to 1, it works. However, I wouldn't recommend it, not from a technical point of view but for maintenance reasons. It's just easier to read when the code is packaged in the kernel, as is the case with any function in general. The split within the slow hash type kernel is only there to help the compute API reduce the kernel kernel's return time, but in our case this is not relevant. In this particular situation, you can set salt_iter to 0. You would just move all of the code that you now have in the _init and _loop kernels into the _comp kernel. But this is not a must. And remember, the hex to binary decoding remains in the _init kernel. The idea is really to separate the tasks again just to make it easier to read. If you execute this: Based on your code from the unit test you will receive something like this: If you run this command now it won't crack, but it should. The reason is this: I'm not sure if there are any other issues in the unit test as this needs to be fixed first. The min / max you had set to 16 filtered out that special NTLM string "48950386887195732126164388762791" because it has the length 32. On contrary, if you execute this: I get some ugly warnings. Not sure if they should work yet. Good job so far. Please make the changes and we will provide a high quality plugin that the users can benefit from. |
|
Thank you so much once again for the really nice feedback, I will get on this hopefully much faster than I did previously. For the unit test, this is what I was worried about. This wont crack because it is expecting the Didnt realize the single test, as I was testing with the default, will get on that. |
|
Just doing some more tests on the tests. Otherwise nearly ready. And im super happy we take the NT hash in and then convert, much nicer, did mean had to remove the ALWAYS_HEXIFY as it was outputing the hexed hex. |
|
Right, I believe I have implemented all the changes you wanted from the last comment. The defaulting to HEX input is 10x better than needing --hex-wordlist or HEX[] |
|
Some ugly stuff left: Works fine with 27000 |
|
OK after (locally) fixing the above issues, I get some good results: 27100: |
|
I dont get the same errors for the test 27100, which is very weird. But maybe my test is a little outdated as I have to run it: Let me try recreate |
|
I dont get the same uglyness when doing the tests. I dont see how the domain is uninitialised for you. Hmmmm |
|
@jsteube do you get the same test errors for m05600? It should be near identical to m27100? |
|
I think by fixing the constraints section this also fixed the warnings, but did not test. |
|
Looking good: |
|
I am still a bit wary of the constraints section, I couldnt find a explanation for exactly which bits are which. But I am very happy its working correctly |
|
https://github.com/hashcat/hashcat/blob/master/docs/hashcat-plugin-development-guide.md |
|
Ah thank you. I am just blind. Seems like its correct now thanks to you pointing out the fix earlier |
|
OK, doing some additional tests. Otherwise will merge this PR tomorrow. |
|
Awesome! Hopefully I managed to carry through all the optimizations and things that were possible from the previous 5500 and 5600. Thanks so much for the really detailed feedback once again. Was very nice. |
|
Good work, many thanks! |
|
Hi, Thank you all for this PR. I would like to try this functionality. Where can I read about cracking using Hashcat and a wordlist of NT hashes? I need to know things like which modes this functionality is supported for? Which additional flags, if any, do I need to use in this scenario? |
Currently the only modes enabled are 27000 (NetNTLMV1) and 27100 (NetNTLMv2). As far as how to use these modes, it's pretty simple, a basic -a 0 wordlist attack but the wordlist is, literally, a list of NTLM hashes. Generally I think most people use NTLM hashes gathered from previous engagements and/or the HaveIBeenPwned NTLM list. Unfortunately they don't provide the NTLM list as a straight download anymore, you have to use the API to get it. But really,. it is just that simple. If you suspect you have very weak passwords on and engagement and have NetNTLMv1 or v2 hashes to crack from spoofing or whatever, this can be a great way to quickly crack some low-hanging fruit. As an aside, I have also had great luck lately with using rockyou.txt and a custom ruleset that I produced by combining a crapton of different rulesets we had lying around, deduplicating it, then running duprules on it, which checks the rules to see if different rules will produce the same candidate, and deduplicates those. I originally had about 345,000 rules in the file before duprules, which reduced it to around 285,000, A 17% DROP. Tested the before duprules and after duprules files on 15,000 NTLM hashes. Both cracked the same exact amount and the exact hashes. So that's another low-hanging fruit attack much the same as 27000 and 27100. |
Given the ability to PTH with NT hashes as well as the availability of hash sources such as https://haveibeenpwned.com/Passwords where all passwords might not be cracked for wordlists, the ability to use the NT hashes as a wordlist might be useful.
This also allows for using other sources of NT hashes where not all of them might be cracked but are still usable.
Most of the code is just the original 5500 and 5600 but with the MD4 portion removed.
Thanks to my hashathon team mates for the collaboration :)
Darryn Cull (@R4g3D_)
Aurélien Chalot (@Defte_)
Charly Njoume