# Using the Password as the Salt

We had a great question in class today about what could happen if the password was hashed, and the resulting digest was used as the salt. Let's explore this idea!

In [20]:
%pip install pycryptodome

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## First Steps

Let's start by implementing a simple version of this algorithm. We'll use `pycrptodome` as usual.

In [21]:
from Crypto.Hash import SHA256
from Crypto.Protocol.KDF import scrypt


def format_bytes(data: bytes):
    return " ".join(f"{byte:02x}" for byte in data)


def hash_password(password: str):
    # Generate the salt by hashing the password
    salt = SHA256.new(password.encode()).digest()

    # Hash the password
    hashed_password = scrypt(password.encode(), salt, 32, N=2**14, r=8, p=1)

    return hashed_password

## Digging Deeper

Let's try to understand what's happening here and why it's not a good idea to derive the salt from the password like this. First of all, what happens if two different users have the same password?

In [22]:
user_password_1 = "password123"
user_password_2 = "password123"

hashed_password_1 = hash_password(user_password_1)
hashed_password_2 = hash_password(user_password_2)

print("User 1's hashed password:", format_bytes(hashed_password_1))
print("User 2's hashed password:", format_bytes(hashed_password_2))

print("User 1's hashed password == User 2's hashed password:", hashed_password_1 == hashed_password_2)

User 1's hashed password: 0b 10 01 3d 3a e7 a3 c8 c4 c8 1f e5 b8 1b d2 43 68 c2 70 50 33 ab 18 cd a1 98 90 a3 d9 c9 f7 fb
User 2's hashed password: 0b 10 01 3d 3a e7 a3 c8 c4 c8 1f e5 b8 1b d2 43 68 c2 70 50 33 ab 18 cd a1 98 90 a3 d9 c9 f7 fb
User 1's hashed password == User 2's hashed password: True


So it looks like the same password will always produce the same hash. This is not good because it means that if two users have the same password, their hashes will be the same. This is a problem because it means that if an attacker gets access to the database, they can easily identify users with the same password. Cracking one of the passwords will give them access to all the accounts with the same password!

This makes sense when you think about it. If the password is the only variable, then it makes sense that the hash will always be the same. This is why we need a unique salt for each user.

## Pre-Computed Dictionary Attack

The same password always producing the same hash is a problem because it makes it easy for an attacker to pre-compute a dictionary of hashes for common passwords. Because the password is the only variable, the hash will always be the same!

Let's see how this pre-computation works in practice...

In [24]:
import requests


def fetch_passwords():
    # There are many other password lists available here, feel free to try them out!
    # I picked this one because it's small and easy to read... and doesn't contain anything offensive.
    # The same can't be said for the others...
    url = "https://raw.githubusercontent.com/danielmiessler/SecLists/master/Passwords/Common-Credentials/top-20-common-SSH-passwords.txt"
    response = requests.get(url)
    if response.status_code == 200:
        content = response.text
        passwords = content.strip().split("\n")
        return passwords
    else:
        raise Exception("Failed to fetch the URL:", response.status_code)


passwords = fetch_passwords()
max_len = max(len(password) for password in passwords)
print("Password".ljust(max_len), "Hashed Password")
for password in passwords:
    hashed_password = hash_password(password)
    print(password.ljust(max_len), format_bytes(hashed_password))

Password      Hashed Password
root          7e 37 55 89 4d d0 d3 f3 ae 5a 59 e8 57 be 81 d6 b8 17 66 a4 59 9e 43 cc b9 7c 33 c1 37 d0 ae 78
toor          88 64 c4 20 7c 0c 8f b9 16 ae 82 36 04 9f 56 a4 46 c1 0e 22 6a 20 49 7f 42 c4 51 0e 00 b2 19 0e
raspberry     b8 52 78 3d bd df 31 7a 9f 51 78 56 2c 95 12 94 41 b7 e1 14 68 1d 5f 08 31 b9 b6 be 58 b2 00 bb
dietpi        e5 03 49 61 23 ed 90 ec 92 fe d1 1d de 54 10 5f 96 c5 c3 cb cc 11 63 d0 d8 d3 86 ff 3e 81 24 4a
test          15 4a 6b da 2b e3 2a 74 57 ad f2 6f b3 be 36 fe e5 c8 c9 db d3 55 32 f5 91 11 ed 63 db 6c 3c 8e
uploader      a9 d8 c2 4d 9c 21 db 51 70 c9 d9 31 d0 2d f7 83 64 09 50 f9 81 b7 51 e3 fd a0 16 73 af 3a fb a4
password      a7 64 79 a0 fd 27 b7 d5 e2 34 d5 b4 41 b1 3e 83 d6 9f 86 46 6d ef f9 30 ed 8c b6 14 13 21 c8 96
admin         96 41 20 e1 ce ac 2f d0 08 2e f8 11 8a 3e 93 3d dc 47 ac 9a 7a 09 97 4c 9a 0e 97 2a 8b 49 49 ea
administrator 85 3b 14 e3 0f a0 fa 3d a6 fd 53 6d 2b ed 13 01 89 ac 82 09 7a 48 6e a4 24 5

A pre-computed dictionary attack is viable against this scheme because the salt is derived from the password. In fact, this scheme doesn't add any security at all! It's equivalent to not using a salt at all.

## Conclusion

Stick to the standard way of hashing passwords. Use a unique salt for each user and store it in the database. Don't derive the salt from the password itself. Generating a random salt for each user and password is the way to go! Don't forget to use a fresh salt each time a user changes their password too.