Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DJB2 Algorithm #1925

Closed
waleedyaseen opened this issue Feb 20, 2019 · 16 comments
Closed

DJB2 Algorithm #1925

waleedyaseen opened this issue Feb 20, 2019 · 16 comments

Comments

@waleedyaseen
Copy link

waleedyaseen commented Feb 20, 2019

Description

I would like the DJB2 algothim implemented.

Usage

This algorithm is not widely used, it is rarely seen however, the Java programming language uses
it for it's String.hashCode() function, and it is used to check if strings are equal without exposing the
String content which is usually short names.

Implementation

Example implementation in the C language:

int hash(char *text) {
    int name32 = 0;
    int count = strlen(text);
    for (int index = 0; index < count; index++) {
        name32 = (name32 << 5) - name32 + text[index];
    }
    return name32;
}

Or the JVM implementation of it:

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

or another simpler implementation in Java:

public static int hash(CharSequence text) {
    int count = text.length();
    int name32 = 0;
    for (int index = 0; index < count; index++) {
        name32 = (name32 << 5) - name32 + text.charAt(index);
    }
    return name32;
}

Example

Input Output (Decimal) Output (Hexadecimal)
iceberg 1629187779 611b6ec3
cheese -1361525545 aed8c4d7
hard 3195115 30c0eb
chip 3052620 2e944c
@jsteube
Copy link
Member

jsteube commented Feb 20, 2019

I think it's also used in perl, not just Java. However the results are not matching. For example with the C code I do get 2163000881 (Decimal) as result for iceberg. Something is wrong, please verify everything again.

@jsteube
Copy link
Member

jsteube commented Feb 20, 2019

I found the reason. Your C code does not match the Java code. For example, the Java code does not have that initialization value 5381. Also the multiplier is 33, not 31.

@waleedyaseen
Copy link
Author

waleedyaseen commented Feb 20, 2019

I apologize, I've updated the C implementation to match the one in the Java function, they both output identical results now.

@jsteube
Copy link
Member

jsteube commented Feb 20, 2019

What's the reason to have this in hashcat? It should be pretty easy to collide the hash. Please explain.

@waleedyaseen
Copy link
Author

waleedyaseen commented Feb 20, 2019

This algorithm is widely used to hide the string content in a obfuscated code, and when reverse engineering, it could take huge amount of time to reverse a single string on the CPU, and by implementing it in hashcat, it makes the process go quite faster to retrieve the string content in that code.

The obfuscated code usually comes with many strings (could be in thousands), so using the CPU to crack all of those is really painful and time consuming.

@jsteube
Copy link
Member

jsteube commented Feb 20, 2019

Wait, there's no way to reverse back the original string.

For example the strings:

BA

and

A`

they both produce the same hash 0x0000083f. There's no way to tell which is the correct string.

@waleedyaseen
Copy link
Author

waleedyaseen commented Feb 20, 2019

That's correct, you could collide them pretty easily, but it's unlikely when dealing with file names or other names, which they use to hide in the most of the cases.

However, you could tell which one is the correct/wanted one by telling which one makes more sense, let's say debug.txt and OwinxTU collide for the sake of giving an example, it would be obviously for the person who's trying to crack it that the first result is the one we are looking for.

@jsteube
Copy link
Member

jsteube commented Feb 20, 2019

OK, but then there are many collisions for the same hash possible. Do you have any plans on controlling the keyspace low enough so that the number of collisions is low enough to work with the result?

Can you please give me a real life example where this obfuscation was used?

@waleedyaseen
Copy link
Author

waleedyaseen commented Feb 20, 2019

The collision can be minimised to very little or none by using a related words list for a dictionary or a hybird attack/cracking.

Regarding the examples, there is plenty of Java applications that do this to hide their file names, such as the RuneScape file storage system, you can click here to see an example of reversing some of these names.

@jsteube
Copy link
Member

jsteube commented Feb 21, 2019

How about 64 bit version? How important would that be?

@jsteube
Copy link
Member

jsteube commented Feb 21, 2019

There's also a large number of possible optimizations. For example the last character can be detected automatically. How to handle that?

@waleedyaseen
Copy link
Author

As for the 64-bit version of this algorithm, I would say it is really unnecessary to implement as it is very uncommon to come across, don't see a need for it as of now.

However, as for the optimisations, I am quite unaware on how to handle that properly.

@jsteube
Copy link
Member

jsteube commented Feb 21, 2019

So I've pushed support for DJB (32 bit output) with commit 51eb9eb

If you do not want to build from source you can use the beta on https://hashcat.net/beta/

Simply because it was so easy to add. Anyway, this implementation does not include any optimizations such as the last letter can be reversed etc. I hope it does what you want to do with it.

You maybe want to use the --keep-guessing flag to see all collisions if there are any.

If you want some fun, try: ./hashcat -m 18700 -O -w 3 ffffffff -a 3 ?l?l?l?l?l?l?l?l --keep-guessing

@jsteube jsteube closed this as completed Feb 21, 2019
@jsteube
Copy link
Member

jsteube commented Feb 22, 2019

Please note that I'll rename this algorithm to "Java Object hashCode()". The original DJB algorithm uses * 33 and not *31 plus it has an initialization value. Just to avoid confusion.

@Plazmaz
Copy link

Plazmaz commented Jun 17, 2020

This naming convention is pretty unclear. Java's Object.hashCode function varies wildly based on JVM, object, and implementation. Many objects explicitly override it. Also, the example hash given is super confusing, as Object.hashCode only ever returns an int value, so hex encoded is probably not how you're going to see it in the wild. I am a little confused here, because the default Object.hashCode guarantees no uniqueness. I would imagine there will be a ton of collisions here, and most of those would not be the value you're looking for.

@immibis
Copy link

immibis commented Dec 19, 2020

This naming convention is pretty unclear. Java's Object.hashCode function varies wildly based on JVM, object, and implementation. Many objects explicitly override it. Also, the example hash given is super confusing, as Object.hashCode only ever returns an int value, so hex encoded is probably not how you're going to see it in the wild. I am a little confused here, because the default Object.hashCode guarantees no uniqueness. I would imagine there will be a ton of collisions here, and most of those would not be the value you're looking for.

FYI: the Java String hashCode algorithm is documented: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html - any JVM which doesn't implement this is a bug.

Object.hashCode is the implementation specific one, but default object hashcodes are pretty useless, and not something you can crack, because they are basically the address of the object. I agree it should be called "Java String hashCode" instead of "Java Object hashCode"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants