# ln._hash - Stable Hashing for Complex Data Structures

**hash_dict()** generates stable, consistent hash values for complex nested data structures:

**Core Features:**
- **Order-Independent**: Dicts and sets hash consistently regardless of key/element order
- **Deep Hashing**: Recursive handling of nested structures (lists, dicts, tuples, sets)
- **Pydantic Support**: Seamless hashing of Pydantic BaseModel instances
- **Type-Aware**: Distinguishes between similar structures (list vs tuple, set vs frozenset)
- **Fallback Handling**: Graceful handling of unhashable objects
- **Mutation Safety**: Optional `strict` mode with deepcopy protection

In [1]:
from pydantic import BaseModel

from lionpride.ln._hash import hash_dict

## 1. Basic Hashing - Primitives

Primitives (str, int, float, bool, None) hash directly to stable values.

In [2]:
# Primitive types hash consistently
h1 = hash_dict("hello")
h2 = hash_dict("hello")
print(f"String hash: {h1}")
print(f"Consistent: {h1 == h2}")

# Different primitives
print(f"\nInt: {hash_dict(42)}")
print(f"Float: {hash_dict(3.14)}")
print(f"Bool: {hash_dict(True)}")
print(f"None: {hash_dict(None)}")

String hash: -7506617582977799438
Consistent: True

Int: 42
Float: 322818021289917443
Bool: 1
None: 269495894


## 2. Order-Independent Dict Hashing

Dictionaries with the same key-value pairs hash identically regardless of insertion order.

In [3]:
# Same keys/values, different order
dict1 = {"name": "Alice", "age": 30, "city": "NYC"}
dict2 = {"city": "NYC", "name": "Alice", "age": 30}
dict3 = {"age": 30, "city": "NYC", "name": "Alice"}

h1 = hash_dict(dict1)
h2 = hash_dict(dict2)
h3 = hash_dict(dict3)

print(f"dict1 hash: {h1}")
print(f"dict2 hash: {h2}")
print(f"dict3 hash: {h3}")
print(f"\nAll equal: {h1 == h2 == h3}")

dict1 hash: -5945735942057831323
dict2 hash: -5945735942057831323
dict3 hash: -5945735942057831323

All equal: True


In [4]:
# Different content → different hash
dict_a = {"x": 1, "y": 2}
dict_b = {"x": 1, "y": 3}  # Different value
dict_c = {"x": 1, "z": 2}  # Different key

print(f"dict_a: {hash_dict(dict_a)}")
print(f"dict_b: {hash_dict(dict_b)}")
print(f"dict_c: {hash_dict(dict_c)}")
print(f"\nAll different: {len({hash_dict(dict_a), hash_dict(dict_b), hash_dict(dict_c)}) == 3}")

dict_a: 7495512859820443861
dict_b: -6851677270114544846
dict_c: 6276072163849107435

All different: True


## 3. Nested Structure Hashing

Recursive hashing handles arbitrarily nested structures.

In [5]:
# Deeply nested structure
nested = {
    "user": {"name": "Bob", "roles": ["admin", "editor"]},
    "settings": {"theme": "dark", "notifications": {"email": True, "sms": False}},
    "tags": {"python", "ai", "ml"},
}

# Same structure, different dict order
nested_reordered = {
    "tags": {"ml", "python", "ai"},  # Set order doesn't matter
    "user": {"roles": ["admin", "editor"], "name": "Bob"},  # Dict order doesn't matter
    "settings": {"notifications": {"sms": False, "email": True}, "theme": "dark"},
}

h1 = hash_dict(nested)
h2 = hash_dict(nested_reordered)

print(f"Nested hash: {h1}")
print(f"Reordered hash: {h2}")
print(f"Equal: {h1 == h2}")

Nested hash: 6058915463097909490
Reordered hash: 6058915463097909490
Equal: True


## 4. Lists and Tuples - Order Matters

Lists and tuples preserve order - different order means different hash.

In [6]:
# Lists with same elements, different order
list1 = [1, 2, 3]
list2 = [3, 2, 1]

print(f"list1 [1,2,3]: {hash_dict(list1)}")
print(f"list2 [3,2,1]: {hash_dict(list2)}")
print(f"Different: {hash_dict(list1) != hash_dict(list2)}")

list1 [1,2,3]: -5362326865511912790
list2 [3,2,1]: 2986389586904134016
Different: True


In [7]:
# Lists vs tuples - type matters
as_list = [1, 2, 3]
as_tuple = (1, 2, 3)

print(f"List [1,2,3]: {hash_dict(as_list)}")
print(f"Tuple (1,2,3): {hash_dict(as_tuple)}")
print(f"Different types → different hash: {hash_dict(as_list) != hash_dict(as_tuple)}")

List [1,2,3]: -5362326865511912790
Tuple (1,2,3): 4487821786452475262
Different types → different hash: True


In [8]:
# Nested lists preserve structure
nested_list1 = [[1, 2], [3, 4]]
nested_list2 = [[3, 4], [1, 2]]  # Different order of sublists
nested_list3 = [[1, 2], [3, 4]]  # Same as nested_list1

print(f"[[1,2], [3,4]]: {hash_dict(nested_list1)}")
print(f"[[3,4], [1,2]]: {hash_dict(nested_list2)}")
print(f"[[1,2], [3,4]]: {hash_dict(nested_list3)}")
print(f"\nnested_list1 == nested_list3: {hash_dict(nested_list1) == hash_dict(nested_list3)}")
print(f"nested_list1 != nested_list2: {hash_dict(nested_list1) != hash_dict(nested_list2)}")

[[1,2], [3,4]]: 4462741929570363007
[[3,4], [1,2]]: -3879024036069339729
[[1,2], [3,4]]: 4462741929570363007

nested_list1 == nested_list3: True
nested_list1 != nested_list2: True


## 5. Sets and Frozensets - Order-Independent

Sets hash consistently regardless of iteration order.

In [9]:
# Sets with same elements hash identically
set1 = {"apple", "banana", "cherry"}
set2 = {"cherry", "apple", "banana"}
set3 = {"banana", "cherry", "apple"}

h1 = hash_dict(set1)
h2 = hash_dict(set2)
h3 = hash_dict(set3)

print(f"set1: {h1}")
print(f"set2: {h2}")
print(f"set3: {h3}")
print(f"All equal: {h1 == h2 == h3}")

set1: -4276244137905233084
set2: -4276244137905233084
set3: -4276244137905233084
All equal: True


In [10]:
# Sets vs frozensets - type matters
as_set = {1, 2, 3}
as_frozenset = frozenset({1, 2, 3})

print(f"set{{1,2,3}}: {hash_dict(as_set)}")
print(f"frozenset{{1,2,3}}: {hash_dict(as_frozenset)}")
print(f"Different types → different hash: {hash_dict(as_set) != hash_dict(as_frozenset)}")

set{1,2,3}: 2050763939712229569
frozenset{1,2,3}: -6545831482032933995
Different types → different hash: True


In [11]:
# Sets with mixed, unorderable types (excluding duplicates)
mixed_set1 = {2, "a", 3.14, "b"}
mixed_set2 = {"b", 3.14, "a", 2}  # Same elements, different Python iteration order

h1 = hash_dict(mixed_set1)
h2 = hash_dict(mixed_set2)

print(f"Mixed set 1: {h1}")
print(f"Mixed set 2: {h2}")
print(f"Stable despite unorderable types: {h1 == h2}")

# Note: True and 1 are hash-equivalent, avoid mixing them in sets for predictable hashing

Mixed set 1: 2727135982102922500
Mixed set 2: 2727135982102922500
Stable despite unorderable types: True


## 6. Pydantic Model Hashing

Pydantic BaseModel instances hash based on their dumped dict representation.

In [12]:
# Define Pydantic models
class User(BaseModel):
    name: str
    age: int
    active: bool = True


# Create instances
user1 = User(name="Alice", age=30)
user2 = User(name="Alice", age=30, active=True)  # Explicit default
user3 = User(name="Bob", age=30)

print(f"user1 hash: {hash_dict(user1)}")
print(f"user2 hash: {hash_dict(user2)}")
print(f"user3 hash: {hash_dict(user3)}")
print(f"\nuser1 == user2 (same data): {hash_dict(user1) == hash_dict(user2)}")
print(f"user1 != user3 (different name): {hash_dict(user1) != hash_dict(user3)}")

user1 hash: -4455910259271887788
user2 hash: -4455910259271887788
user3 hash: 7091098181612230392

user1 == user2 (same data): True
user1 != user3 (different name): True


In [13]:
# Nested Pydantic models
class Address(BaseModel):
    city: str
    country: str


class Person(BaseModel):
    name: str
    address: Address
    tags: set[str]


person1 = Person(name="Charlie", address=Address(city="NYC", country="USA"), tags={"dev", "python"})

person2 = Person(
    name="Charlie",
    address=Address(city="NYC", country="USA"),
    tags={"python", "dev"},  # Different set order
)

print(f"person1: {hash_dict(person1)}")
print(f"person2: {hash_dict(person2)}")
print(f"Equal (set order doesn't matter): {hash_dict(person1) == hash_dict(person2)}")

person1: -2632654013857238376
person2: -2632654013857238376
Equal (set order doesn't matter): True


In [14]:
# Pydantic vs plain dict - different hashes
user_model = User(name="Alice", age=30, active=True)
user_dict = {"name": "Alice", "age": 30, "active": True}

print(f"Pydantic model: {hash_dict(user_model)}")
print(f"Plain dict: {hash_dict(user_dict)}")
print(
    f"Different (type marker distinguishes them): {hash_dict(user_model) != hash_dict(user_dict)}"
)

Pydantic model: -4455910259271887788
Plain dict: 6343866846145241152
Different (type marker distinguishes them): True


## 7. Strict Mode - Mutation Safety

Use `strict=True` to deepcopy data before hashing, preventing side effects.

In [15]:
# Mutable data example
data = {"items": [1, 2, 3], "config": {"mode": "test"}}

# Hash in strict mode (deepcopies data)
h1 = hash_dict(data, strict=True)
print(f"Original hash: {h1}")

# Modify original
data["items"].append(4)
data["config"]["mode"] = "prod"

# Hash changed data
h2 = hash_dict(data, strict=True)
print(f"Modified hash: {h2}")
print(f"Different after mutation: {h1 != h2}")

Original hash: 2739951112027621232
Modified hash: 7950998311431398351
Different after mutation: True


In [16]:
# Non-strict mode (default) doesn't protect from concurrent mutations
# But is faster for immutable data
immutable_config = {"timeout": 30, "retries": 3, "endpoint": "https://api.example.com"}

# Non-strict is sufficient for data you won't mutate
h1 = hash_dict(immutable_config, strict=False)
h2 = hash_dict(immutable_config, strict=False)
print(f"Consistent: {h1 == h2}")
print(f"Hash: {h1}")

Consistent: True
Hash: 8965087722079522303


## 8. Stability and Use Cases

Hashes are stable within a Python process - ideal for caching, deduplication, and equality checks.

In [17]:
# Deduplication example
configs = [
    {"db": "postgres", "host": "localhost", "port": 5432},
    {"host": "localhost", "port": 5432, "db": "postgres"},  # Duplicate (different order)
    {"db": "mysql", "host": "localhost", "port": 3306},
    {"db": "postgres", "host": "localhost", "port": 5432},  # Duplicate
]

# Deduplicate using hash_dict
unique_hashes = {}
for config in configs:
    h = hash_dict(config)
    if h not in unique_hashes:
        unique_hashes[h] = config

print(f"Original configs: {len(configs)}")
print(f"Unique configs: {len(unique_hashes)}")
print("\nUnique configurations:")
for config in unique_hashes.values():
    print(f"  {config}")

Original configs: 4
Unique configs: 2

Unique configurations:
  {'db': 'postgres', 'host': 'localhost', 'port': 5432}
  {'db': 'mysql', 'host': 'localhost', 'port': 3306}


In [18]:
# Caching example
cache = {}


def expensive_computation(config: dict) -> str:
    """Simulate expensive operation with caching."""
    cache_key = hash_dict(config)

    if cache_key in cache:
        print(f"  Cache hit for {config}")
        return cache[cache_key]

    print(f"  Computing for {config}...")
    result = f"Result for {config['task']}"
    cache[cache_key] = result
    return result


# First call - computes
result1 = expensive_computation({"task": "analyze", "depth": 5})

# Same config, different key order - cached
result2 = expensive_computation({"depth": 5, "task": "analyze"})

# Different config - computes
result3 = expensive_computation({"task": "summarize", "depth": 3})

print(f"\nCache size: {len(cache)}")

  Computing for {'task': 'analyze', 'depth': 5}...
  Cache hit for {'depth': 5, 'task': 'analyze'}
  Computing for {'task': 'summarize', 'depth': 3}...

Cache size: 2


In [19]:
# Equality checking for complex structures
config_a = {
    "services": ["api", "worker", "scheduler"],
    "environment": {"vars": {"DEBUG": "false", "PORT": "8000"}, "timezone": "UTC"},
    "features": {"auth", "logging", "metrics"},
}

config_b = {
    "features": {"metrics", "auth", "logging"},  # Set, different order
    "environment": {
        "timezone": "UTC",
        "vars": {"PORT": "8000", "DEBUG": "false"},
    },  # Dict, different order
    "services": ["api", "worker", "scheduler"],  # List, same order
}

config_c = {
    **config_a,
    "features": {"auth", "logging"},  # Missing 'metrics'
}


def configs_equal(c1, c2):
    return hash_dict(c1) == hash_dict(c2)


print(f"config_a == config_b: {configs_equal(config_a, config_b)}")
print(f"config_a == config_c: {configs_equal(config_a, config_c)}")

config_a == config_b: True
config_a == config_c: False


## 9. Edge Cases and Fallbacks

Handling of unhashable objects and edge cases.

In [20]:
# Custom objects without __str__ or __repr__
class CustomObject:
    def __init__(self, value):
        self.value = value


obj = CustomObject(42)

# hash_dict uses str() fallback
h = hash_dict(obj)
print(f"Custom object hash: {h}")
print(f"Hashable: {isinstance(h, int)}")

Custom object hash: -4849535241955129001
Hashable: True


In [21]:
# Objects with __str__
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __str__(self):
        return f"Person({self.name}, {self.age})"


p1 = Person("Alice", 30)
p2 = Person("Alice", 30)
p3 = Person("Bob", 30)

print(f"Person 1: {hash_dict(p1)}")
print(f"Person 2: {hash_dict(p2)}")
print(f"Person 3: {hash_dict(p3)}")
print(f"\nSame __str__ → same hash: {hash_dict(p1) == hash_dict(p2)}")
print(f"Different __str__ → different hash: {hash_dict(p1) != hash_dict(p3)}")

Person 1: -4561938631942677923
Person 2: -4561938631942677923
Person 3: 1648699945411145425

Same __str__ → same hash: True
Different __str__ → different hash: True


In [22]:
# Empty collections
print(f"Empty dict: {hash_dict({})}")
print(f"Empty list: {hash_dict([])}")
print(f"Empty set: {hash_dict(set())}")
print(f"Empty tuple: {hash_dict(())}")

# All different due to type markers
empty_hashes = {hash_dict({}), hash_dict([]), hash_dict(set()), hash_dict(())}
print(f"\nAll distinct: {len(empty_hashes) == 4}")

Empty dict: -5190988587195632635
Empty list: 2500886146856312502
Empty set: -8532767121629096755
Empty tuple: 7952677415836332426

All distinct: True


## Summary Checklist

**Core Functionality:**
- ✅ `hash_dict(data)` generates stable integer hash
- ✅ Order-independent hashing for dicts and sets
- ✅ Order-preserving hashing for lists and tuples
- ✅ Recursive handling of deeply nested structures
- ✅ Type-aware hashing (list ≠ tuple, set ≠ frozenset)

**Pydantic Support:**
- ✅ Seamless hashing of BaseModel instances
- ✅ Lazy initialization (no import overhead)
- ✅ Type marker distinguishes Pydantic models from plain dicts

**Advanced Features:**
- ✅ `strict=True` mode for mutation safety (deepcopy)
- ✅ Graceful fallback for unhashable objects (str/repr)
- ✅ Stable hashing for mixed, unorderable types in sets

**Use Cases:**
- ✅ Deduplication of complex configurations
- ✅ Caching with nested dict/list keys
- ✅ Equality checking for order-independent structures
- ✅ Identity generation for objects

**Guarantees:**
- ✅ Identical data → identical hash (within process)
- ✅ Different data → (almost certainly) different hash
- ✅ No mutation side effects in strict mode
- ✅ TypeError raised only if final representation is unhashable