# Chapter 23: Secrets, Tokens, and Password Hashing

Generating cryptographically secure random values is critical for passwords, session
tokens, API keys, and other security-sensitive data. This notebook covers Python's
`secrets` module for secure randomness and `hashlib.pbkdf2_hmac()` for password hashing.

## Topics Covered
- **secrets vs random**: When to use which module
- **Token generation**: `token_hex()`, `token_bytes()`, `token_urlsafe()`
- **Secure passwords**: `secrets.choice()` for password generation
- **Session tokens and API keys**: Generation patterns
- **Password hashing**: Salt, iteration, key derivation with `pbkdf2_hmac()`
- **Password verification**: Storing and checking hashed passwords
- **Secure random numbers**: `secrets.randbelow()`, `secrets.SystemRandom`

## secrets vs random: Choosing the Right Module

Python has two random modules with very different purposes:

- **`random`**: Uses a Mersenne Twister PRNG. Fast and reproducible with seeds, but
  **NOT cryptographically secure**. Use for simulations, games, and statistical sampling.
- **`secrets`**: Uses the OS-provided CSPRNG (`/dev/urandom` on Unix, `CryptGenRandom`
  on Windows). Slower but **unpredictable**. Use for passwords, tokens, and keys.

**Rule of thumb**: If the value must be unguessable, use `secrets`. If it just needs to
be "random enough" for non-security purposes, use `random`.

In [None]:
import random
import secrets

# random module: reproducible with a seed (NOT secure)
random.seed(42)
print("random module (seeded with 42):")
print(f"  random.randint(0, 100): {random.randint(0, 100)}")
print(f"  random.randint(0, 100): {random.randint(0, 100)}")

# Re-seeding produces the same sequence -- predictable!
random.seed(42)
print(f"  After re-seed(42):      {random.randint(0, 100)}")

# secrets module: always unpredictable, no seed
print("\nsecrets module (cryptographically secure):")
print(f"  secrets.randbelow(100): {secrets.randbelow(100)}")
print(f"  secrets.randbelow(100): {secrets.randbelow(100)}")
print(f"  secrets.randbelow(100): {secrets.randbelow(100)}")

# Demonstrate the danger: random's state can be recovered
print("\nKey difference:")
print("  random:  deterministic PRNG -- state can be reconstructed from output")
print("  secrets: OS entropy source -- no internal state to reconstruct")

## Token Generation: token_hex, token_bytes, token_urlsafe

The `secrets` module provides three convenient functions for generating random tokens:

- `token_bytes(n)`: Returns `n` random bytes
- `token_hex(n)`: Returns `2n` hex characters (representing `n` random bytes)
- `token_urlsafe(n)`: Returns a URL-safe base64-encoded string from `n` random bytes

The default `n` is chosen to provide reasonable security (currently 32 bytes = 256 bits).

In [None]:
import secrets

# token_bytes: raw random bytes
raw_token: bytes = secrets.token_bytes(16)
print(f"token_bytes(16): {raw_token}")
print(f"  Type:  {type(raw_token).__name__}")
print(f"  Len:   {len(raw_token)} bytes")

# token_hex: hex-encoded string (2 chars per byte)
hex_token: str = secrets.token_hex(16)
print(f"\ntoken_hex(16):   {hex_token}")
print(f"  Type:  {type(hex_token).__name__}")
print(f"  Len:   {len(hex_token)} characters")

# token_urlsafe: base64-encoded, safe for URLs and filenames
safe_token: str = secrets.token_urlsafe(16)
print(f"\ntoken_urlsafe(16): {safe_token}")
print(f"  Type:  {type(safe_token).__name__}")
print(f"  Len:   {len(safe_token)} characters")

# Default size (no argument) uses a reasonable default
default_token: str = secrets.token_hex()
print(f"\ntoken_hex() default: {default_token}")
print(f"  Len: {len(default_token)} characters ({len(default_token) // 2} bytes)")

# Each call produces a unique value
print("\nUniqueness check:")
tokens: list[str] = [secrets.token_hex(8) for _ in range(5)]
for i, t in enumerate(tokens):
    print(f"  Token {i}: {t}")
print(f"  All unique: {len(set(tokens)) == len(tokens)}")

## Generating Secure Passwords with secrets.choice()

`secrets.choice()` selects a random element from a sequence using the CSPRNG.
Combined with a character alphabet, it can generate strong passwords. Unlike
`random.choice()`, the selection is cryptographically unpredictable.

In [None]:
import secrets
import string


def generate_password(length: int = 16,
                      use_uppercase: bool = True,
                      use_digits: bool = True,
                      use_punctuation: bool = True) -> str:
    """Generate a cryptographically secure random password.

    Args:
        length: Number of characters in the password.
        use_uppercase: Include uppercase letters.
        use_digits: Include digits.
        use_punctuation: Include punctuation characters.

    Returns:
        A random password string.
    """
    alphabet: str = string.ascii_lowercase
    if use_uppercase:
        alphabet += string.ascii_uppercase
    if use_digits:
        alphabet += string.digits
    if use_punctuation:
        alphabet += string.punctuation

    # Ensure at least one character from each required category
    password: str = ""
    while True:
        password = "".join(secrets.choice(alphabet) for _ in range(length))
        # Verify the password meets complexity requirements
        has_lower: bool = any(c in string.ascii_lowercase for c in password)
        has_upper: bool = not use_uppercase or any(c in string.ascii_uppercase for c in password)
        has_digit: bool = not use_digits or any(c in string.digits for c in password)
        has_punct: bool = not use_punctuation or any(c in string.punctuation for c in password)
        if has_lower and has_upper and has_digit and has_punct:
            break

    return password


# Generate various passwords
print("Generated passwords:")
for i in range(5):
    pwd: str = generate_password(length=20)
    print(f"  {i + 1}. {pwd}")

# Simpler password (letters and digits only)
simple: str = generate_password(length=12, use_punctuation=False)
print(f"\nSimple (no punctuation): {simple}")

# Passphrase generation (using word lists is often more memorable)
words: list[str] = [
    "correct", "horse", "battery", "staple", "cloud", "river",
    "mountain", "forest", "bridge", "castle", "thunder", "crystal",
    "garden", "lantern", "voyage", "anchor", "falcon", "marble",
    "willow", "canyon", "ember", "harbor", "meadow", "summit",
]
passphrase: str = "-".join(secrets.choice(words) for _ in range(4))
print(f"\nPassphrase: {passphrase}")

## Session Tokens and API Key Generation Patterns

Tokens and API keys need to be unique, unguessable, and appropriately sized.
Here are practical patterns for generating them.

In [None]:
import secrets
import hashlib
from datetime import datetime, timezone


def generate_session_token() -> str:
    """Generate a session token (URL-safe, 32 bytes of entropy)."""
    return secrets.token_urlsafe(32)


def generate_api_key(prefix: str = "sk") -> str:
    """Generate an API key with a prefix for easy identification.

    Format: prefix_hex (e.g., sk_a1b2c3d4...)
    The prefix helps identify the key type without revealing the secret.
    """
    token: str = secrets.token_hex(24)  # 48 hex chars = 192 bits
    return f"{prefix}_{token}"


def generate_reset_token(user_id: int) -> dict[str, str]:
    """Generate a password reset token with metadata.

    Returns the token and its hash. Store the HASH in the database,
    send the TOKEN to the user. On verification, hash the submitted
    token and compare it to the stored hash.
    """
    token: str = secrets.token_urlsafe(32)
    token_hash: str = hashlib.sha256(token.encode()).hexdigest()

    return {
        "token": token,          # Send this to the user (via email)
        "token_hash": token_hash, # Store this in the database
        "user_id": str(user_id),
        "created_at": datetime.now(timezone.utc).isoformat(),
    }


# Generate examples
print("Session token:")
session: str = generate_session_token()
print(f"  {session}")
print(f"  Length: {len(session)} chars")

print("\nAPI keys:")
for key_type in ["sk", "pk", "test"]:
    api_key: str = generate_api_key(prefix=key_type)
    print(f"  {api_key}")

print("\nPassword reset token:")
reset_info: dict[str, str] = generate_reset_token(user_id=42)
print(f"  Token (for user): {reset_info['token'][:32]}...")
print(f"  Hash (for DB):    {reset_info['token_hash'][:32]}...")
print(f"  User ID:          {reset_info['user_id']}")
print(f"  Created at:       {reset_info['created_at']}")

## Password Hashing: Salt, Iteration, and Key Derivation

Passwords must **never** be stored as plain text or simple hashes. Attackers who obtain
a database can use rainbow tables and brute force to crack simple hashes almost instantly.

**Key Derivation Functions** (KDFs) like PBKDF2 are designed for password hashing:
- **Salt**: A random value unique to each password, preventing rainbow table attacks
- **Iterations**: Deliberate slowness (hundreds of thousands of rounds) to make brute
  force expensive
- **Key stretching**: Transforms a weak password into a strong derived key

`hashlib.pbkdf2_hmac()` implements PBKDF2 with HMAC as the pseudorandom function.

In [None]:
import hashlib
import os
import secrets

# Why plain hashing is dangerous
password: str = "mysecretpassword"
plain_hash: str = hashlib.sha256(password.encode()).hexdigest()
print("BAD: Plain SHA-256 hash of password")
print(f"  Hash: {plain_hash}")
print("  Problems: no salt, too fast, vulnerable to rainbow tables\n")

# Same password always produces the same hash (bad!)
plain_hash2: str = hashlib.sha256(password.encode()).hexdigest()
print(f"  Same password, same hash: {plain_hash == plain_hash2}")

# GOOD: PBKDF2 with salt and iterations
salt: bytes = os.urandom(32)  # 32 bytes of random salt
iterations: int = 600_000     # OWASP recommended minimum for PBKDF2-SHA256

derived_key: bytes = hashlib.pbkdf2_hmac(
    hash_name="sha256",
    password=password.encode("utf-8"),
    salt=salt,
    iterations=iterations,
    dklen=32,  # Derived key length in bytes
)

print("\nGOOD: PBKDF2-HMAC-SHA256")
print(f"  Salt (hex):       {salt.hex()}")
print(f"  Iterations:       {iterations:,}")
print(f"  Derived key (hex): {derived_key.hex()}")

# Same password, different salt = different derived key
salt2: bytes = os.urandom(32)
derived_key2: bytes = hashlib.pbkdf2_hmac(
    "sha256", password.encode(), salt2, iterations, dklen=32
)
print(f"\n  Different salt, same password:")
print(f"  Key 1: {derived_key.hex()[:32]}...")
print(f"  Key 2: {derived_key2.hex()[:32]}...")
print(f"  Same?  {derived_key == derived_key2}")

## Password Verification Workflow

A complete password hashing system needs two operations:
1. **Hash**: When creating/changing a password, generate a salt, hash the password,
   and store the salt + hash together
2. **Verify**: When authenticating, retrieve the stored salt, re-hash the submitted
   password with that salt, and compare

In [None]:
import hashlib
import hmac
import os

# Configuration
HASH_ALGORITHM: str = "sha256"
ITERATIONS: int = 600_000
SALT_LENGTH: int = 32
KEY_LENGTH: int = 32


def hash_password(password: str) -> str:
    """Hash a password for storage.

    Returns a string in the format: algorithm$iterations$salt_hex$key_hex
    This format stores everything needed to verify the password later.
    """
    salt: bytes = os.urandom(SALT_LENGTH)
    key: bytes = hashlib.pbkdf2_hmac(
        HASH_ALGORITHM,
        password.encode("utf-8"),
        salt,
        ITERATIONS,
        dklen=KEY_LENGTH,
    )
    return f"{HASH_ALGORITHM}${ITERATIONS}${salt.hex()}${key.hex()}"


def verify_password(password: str, stored_hash: str) -> bool:
    """Verify a password against a stored hash.

    Uses constant-time comparison to prevent timing attacks.
    """
    algorithm, iterations_str, salt_hex, key_hex = stored_hash.split("$")
    salt: bytes = bytes.fromhex(salt_hex)
    stored_key: bytes = bytes.fromhex(key_hex)
    iterations: int = int(iterations_str)

    computed_key: bytes = hashlib.pbkdf2_hmac(
        algorithm,
        password.encode("utf-8"),
        salt,
        iterations,
        dklen=len(stored_key),
    )

    # Constant-time comparison to prevent timing attacks
    return hmac.compare_digest(computed_key, stored_key)


# Simulate user registration
print("User registration:")
user_password: str = "correct-horse-battery-staple"
stored: str = hash_password(user_password)
print(f"  Password:    {user_password}")
print(f"  Stored hash: {stored[:60]}...")
print(f"  Components:  algorithm$iterations$salt$key")

# Simulate login attempt (correct password)
print("\nLogin (correct password):")
is_valid: bool = verify_password("correct-horse-battery-staple", stored)
print(f"  Verified: {is_valid}")

# Simulate login attempt (wrong password)
print("\nLogin (wrong password):")
is_valid_wrong: bool = verify_password("wrong-password", stored)
print(f"  Verified: {is_valid_wrong}")

# Show that same password produces different stored hashes (due to salt)
stored2: str = hash_password(user_password)
print(f"\nSame password, different stored hashes:")
print(f"  Hash 1: {stored[:48]}...")
print(f"  Hash 2: {stored2[:48]}...")
print(f"  Same?   {stored == stored2}")
print(f"  Both verify: {verify_password(user_password, stored) and verify_password(user_password, stored2)}")

## Secure Random Numbers: randbelow() and SystemRandom

`secrets.randbelow(n)` returns a secure random integer in the range `[0, n)`. For more
complex random operations (shuffling, sampling), `secrets.SystemRandom` provides a
drop-in replacement for `random.Random` that uses the OS CSPRNG.

In [None]:
import secrets

# randbelow(n): secure random integer in [0, n)
print("secrets.randbelow():")
for _ in range(5):
    value: int = secrets.randbelow(100)
    print(f"  randbelow(100) = {value}")

# Simulating a fair coin flip
flips: list[str] = [
    "heads" if secrets.randbelow(2) == 0 else "tails"
    for _ in range(10)
]
print(f"\nSecure coin flips: {flips}")

# Simulating a fair die roll
rolls: list[int] = [secrets.randbelow(6) + 1 for _ in range(10)]
print(f"Secure die rolls:  {rolls}")

# secrets.choice() -- pick a random element
colors: list[str] = ["red", "green", "blue", "yellow", "purple"]
print(f"\nSecure choice: {secrets.choice(colors)}")

# SystemRandom: a full Random interface backed by the CSPRNG
secure_rng = secrets.SystemRandom()

print("\nsecrets.SystemRandom() methods:")
print(f"  randint(1, 100):  {secure_rng.randint(1, 100)}")
print(f"  random():         {secure_rng.random():.6f}")
print(f"  uniform(1, 10):   {secure_rng.uniform(1, 10):.4f}")

# Secure shuffle (in-place)
deck: list[str] = ["A", "K", "Q", "J", "10", "9", "8", "7"]
secure_rng.shuffle(deck)
print(f"  Shuffled deck:    {deck}")

# Secure sample (without replacement)
sample: list[str] = secure_rng.sample(colors, k=3)
print(f"  Sample 3 colors:  {sample}")

In [None]:
import secrets
import string


def generate_otp(length: int = 6) -> str:
    """Generate a numeric one-time password (OTP).

    Uses secrets.choice for cryptographic security rather than
    random.randint which is predictable.
    """
    return "".join(secrets.choice(string.digits) for _ in range(length))


def generate_verification_code(length: int = 8) -> str:
    """Generate an alphanumeric verification code (uppercase + digits).

    Excludes ambiguous characters (0/O, 1/I/l) for readability.
    """
    safe_chars: str = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789"
    return "".join(secrets.choice(safe_chars) for _ in range(length))


# Generate OTPs
print("One-Time Passwords (6-digit):")
for i in range(5):
    print(f"  OTP {i + 1}: {generate_otp()}")

# Generate verification codes
print("\nVerification codes (no ambiguous chars):")
for i in range(5):
    code: str = generate_verification_code()
    # Format with dashes for readability: ABCD-EFGH
    formatted: str = f"{code[:4]}-{code[4:]}"
    print(f"  Code {i + 1}: {formatted}")

## Practical: Complete Token Management System

This example demonstrates a token management pattern that generates, stores (as hashes),
and verifies tokens. This is the same pattern used by real-world systems for API keys,
password reset links, and email verification tokens.

In [None]:
import hashlib
import hmac
import secrets
from dataclasses import dataclass, field
from datetime import datetime, timedelta, timezone


@dataclass
class StoredToken:
    """A token as stored in the database (hashed, never plain text)."""
    token_hash: str
    user_id: int
    purpose: str
    created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    expires_at: datetime | None = None
    used: bool = False


class TokenManager:
    """Manages secure token generation, storage, and verification."""

    def __init__(self) -> None:
        # In a real app, this would be a database
        self._store: dict[str, StoredToken] = {}

    @staticmethod
    def _hash_token(token: str) -> str:
        """Hash a token for storage. We store hashes, not raw tokens."""
        return hashlib.sha256(token.encode("utf-8")).hexdigest()

    def create_token(self, user_id: int, purpose: str,
                     ttl_hours: int = 24) -> str:
        """Create a new token and store its hash.

        Returns the raw token (only shown once to the user).
        """
        raw_token: str = secrets.token_urlsafe(32)
        token_hash: str = self._hash_token(raw_token)
        now: datetime = datetime.now(timezone.utc)

        self._store[token_hash] = StoredToken(
            token_hash=token_hash,
            user_id=user_id,
            purpose=purpose,
            created_at=now,
            expires_at=now + timedelta(hours=ttl_hours),
        )

        return raw_token

    def verify_token(self, raw_token: str, purpose: str) -> int | None:
        """Verify a token and return the user_id if valid, None otherwise."""
        token_hash: str = self._hash_token(raw_token)

        stored = self._store.get(token_hash)
        if stored is None:
            return None  # Token not found

        if stored.used:
            return None  # Token already used

        if stored.purpose != purpose:
            return None  # Wrong purpose

        now: datetime = datetime.now(timezone.utc)
        if stored.expires_at and now > stored.expires_at:
            return None  # Token expired

        # Mark as used (one-time tokens)
        stored.used = True
        return stored.user_id


# Demonstrate the token lifecycle
manager = TokenManager()

# Create a password reset token
reset_token: str = manager.create_token(user_id=42, purpose="password_reset", ttl_hours=1)
print(f"Reset token (shown to user once): {reset_token[:24]}...")
print(f"Tokens in store: {len(manager._store)}")

# Verify with the correct token and purpose
user_id = manager.verify_token(reset_token, purpose="password_reset")
print(f"\nVerify (correct): user_id = {user_id}")

# Try to reuse the same token (should fail -- already used)
user_id_reuse = manager.verify_token(reset_token, purpose="password_reset")
print(f"Verify (reuse):   user_id = {user_id_reuse}")

# Try a fake token
fake_user = manager.verify_token("fake-token-abc123", purpose="password_reset")
print(f"Verify (fake):    user_id = {fake_user}")

# Create and verify an email verification token
email_token: str = manager.create_token(user_id=99, purpose="email_verify", ttl_hours=48)
wrong_purpose = manager.verify_token(email_token, purpose="password_reset")
right_purpose = manager.verify_token(email_token, purpose="email_verify")
print(f"\nWrong purpose:    user_id = {wrong_purpose}")
# Note: the token was consumed by the wrong_purpose check internally,
# but since purpose didn't match, used was not set -- let's check again
right_purpose2 = manager.verify_token(email_token, purpose="email_verify")
print(f"Right purpose:    user_id = {right_purpose2}")

## Summary

### Key Takeaways

| Concept | Tool | Purpose |
|---------|------|---------|
| **Secure randomness** | `secrets` module | Generate unpredictable tokens, passwords, keys |
| **Token generation** | `token_hex()`, `token_urlsafe()` | Create session tokens, API keys |
| **Secure choice** | `secrets.choice()` | Pick random elements securely |
| **Random integers** | `secrets.randbelow(n)` | Secure alternative to `random.randint()` |
| **Full CSPRNG** | `secrets.SystemRandom` | Drop-in replacement for `random.Random` |
| **Password hashing** | `hashlib.pbkdf2_hmac()` | Derive keys from passwords with salt + iterations |
| **Password storage** | `algorithm$iterations$salt$key` | Store all parameters needed for verification |

### Best Practices
- Always use `secrets` (not `random`) for security-sensitive values
- Never store passwords as plain text or simple hashes
- Use `pbkdf2_hmac()` with at least 600,000 iterations for password hashing
- Generate a unique random salt for every password
- Store token hashes in the database, never the raw tokens
- Use `hmac.compare_digest()` for all security comparisons
- Set expiration times on all tokens
- Mark tokens as used after verification to prevent replay attacks