# HashableModel - Content-Based Hashable Pydantic Model

HashableModel provides content-based hashing for value equality, unlike Element's ID-based hashing. Use when identical field values should be treated as duplicates.

**Core Features:**
- **Content-Based Equality**: Identical fields → same hash (vs Element's ID-based)
- **Immutable by Default**: Frozen to prevent hash corruption in sets/dicts
- **Deduplication**: Safe for `to_list(unique=True)` and set operations
- **Cache Keys**: Identical configs produce same hash for caching
- **Multi-Mode Serialization**: `python`/`json` modes for different contexts

In [1]:
from lionherd_core.base import Element
from lionherd_core.ln import to_list
from lionherd_core.types.model import HashableModel

## 1. Basic Construction

Define custom models by subclassing HashableModel. All fields contribute to the hash.

In [2]:
class Config(HashableModel):
    """Example configuration object."""

    name: str
    temperature: float = 0.7
    max_tokens: int = 1000


# Create instances
config1 = Config(name="gpt-4", temperature=0.7)
config2 = Config(name="gpt-4", temperature=0.7)  # Identical values
config3 = Config(name="gpt-4", temperature=0.9)  # Different temperature

print(f"config1: {config1}")
print(f"config2: {config2}")
print(f"config3: {config3}")

config1: name='gpt-4' temperature=0.7 max_tokens=1000
config2: name='gpt-4' temperature=0.7 max_tokens=1000
config3: name='gpt-4' temperature=0.9 max_tokens=1000


## 2. Content-Based Equality vs Element's ID-Based

**HashableModel**: Equality based on field values (content)  
**Element**: Equality based on ID (identity)

This is the fundamental difference.

In [3]:
# HashableModel - content equality
print("=== HashableModel (Content-Based) ===")
print(f"config1 == config2 (same values): {config1 == config2}")
print(f"config1 == config3 (different values): {config1 == config3}")
print(f"hash(config1) == hash(config2): {hash(config1) == hash(config2)}")
print(f"hash(config1) == hash(config3): {hash(config1) == hash(config3)}")

print("\n=== Element (ID-Based) ===")
elem1 = Element(metadata={"value": 1})
elem2 = Element(id=elem1.id, metadata={"value": 999})  # Same ID, different metadata
elem3 = Element(metadata={"value": 1})  # Different ID, same metadata

print(f"elem1 == elem2 (same ID, different metadata): {elem1 == elem2}")
print(f"elem1 == elem3 (different ID, same metadata): {elem1 == elem3}")
print(f"hash(elem1) == hash(elem2): {hash(elem1) == hash(elem2)}")
print(f"hash(elem1) == hash(elem3): {hash(elem1) == hash(elem3)}")

=== HashableModel (Content-Based) ===
config1 == config2 (same values): True
config1 == config3 (different values): False
hash(config1) == hash(config2): True
hash(config1) == hash(config3): False

=== Element (ID-Based) ===
elem1 == elem2 (same ID, different metadata): True
elem1 == elem3 (different ID, same metadata): False
hash(elem1) == hash(elem2): True
hash(elem1) == hash(elem3): False


## 3. Immutability - Frozen by Default

HashableModel is frozen to prevent hash corruption. If an object's hash changes while in a set/dict, lookups break.

In [4]:
config = Config(name="gpt-4", temperature=0.7)

try:
    config.temperature = 0.9  # Attempt to modify
    print("❌ Should not reach here")
except Exception as e:
    print(f"✓ Cannot modify frozen model: {type(e).__name__}")
    print(f"  {e}")

# Why this matters:
config_set = {config}
print(f"\n✓ Safe in sets: {config in config_set}")
# If we could modify, the hash would change and lookup would fail!

✓ Cannot modify frozen model: ValidationError
  1 validation error for Config
temperature
  Instance is frozen [type=frozen_instance, input_value=0.9, input_type=float]
    For further information visit https://errors.pydantic.dev/2.12/v/frozen_instance

✓ Safe in sets: True


## 4. Cache Key Usage

Identical configurations produce the same hash, perfect for caching LLM calls or expensive computations.

In [5]:
from functools import lru_cache


class LLMConfig(HashableModel):
    model: str
    temperature: float
    max_tokens: int


@lru_cache(maxsize=128)
def expensive_llm_call(config: LLMConfig, prompt: str) -> str:
    """Simulated LLM call - cached by config + prompt."""
    return f"Response for '{prompt}' with {config.model} @ temp={config.temperature}"


# Same config instances
cfg1 = LLMConfig(model="gpt-4", temperature=0.7, max_tokens=1000)
cfg2 = LLMConfig(model="gpt-4", temperature=0.7, max_tokens=1000)

# First call
result1 = expensive_llm_call(cfg1, "Hello")
print(f"Call 1: {result1}")
print(f"Cache info: {expensive_llm_call.cache_info()}")

# Second call with identical config - cache hit!
result2 = expensive_llm_call(cfg2, "Hello")
print(f"\nCall 2: {result2}")
print(f"Cache info: {expensive_llm_call.cache_info()}")
print("✓ Cache hit! (hits=1, misses=1)")

Call 1: Response for 'Hello' with gpt-4 @ temp=0.7
Cache info: CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)

Call 2: Response for 'Hello' with gpt-4 @ temp=0.7
Cache info: CacheInfo(hits=1, misses=1, maxsize=128, currsize=1)
✓ Cache hit! (hits=1, misses=1)


## 5. Set and Dict Deduplication

HashableModel instances deduplicate based on content when used in sets or as dict keys.

In [6]:
# Create multiple configs with duplicate values
configs = [
    Config(name="gpt-4", temperature=0.7),
    Config(name="gpt-4", temperature=0.7),  # Duplicate
    Config(name="gpt-4", temperature=0.9),
    Config(name="claude", temperature=0.7),
    Config(name="gpt-4", temperature=0.7),  # Another duplicate
]

print(f"Original list: {len(configs)} configs")
for i, cfg in enumerate(configs):
    print(f"  {i}: {cfg}")

# Deduplicate with set
unique_configs = set(configs)
print(f"\nUnique configs: {len(unique_configs)} configs")
for cfg in unique_configs:
    print(f"  {cfg}")

# Use as dict keys
config_usage = {}
for cfg in configs:
    config_usage[cfg] = config_usage.get(cfg, 0) + 1

print("\nUsage counts:")
for cfg, count in config_usage.items():
    print(f"  {cfg}: used {count} times")

Original list: 5 configs
  0: name='gpt-4' temperature=0.7 max_tokens=1000
  1: name='gpt-4' temperature=0.7 max_tokens=1000
  2: name='gpt-4' temperature=0.9 max_tokens=1000
  3: name='claude' temperature=0.7 max_tokens=1000
  4: name='gpt-4' temperature=0.7 max_tokens=1000

Unique configs: 3 configs
  name='gpt-4' temperature=0.7 max_tokens=1000
  name='gpt-4' temperature=0.9 max_tokens=1000
  name='claude' temperature=0.7 max_tokens=1000

Usage counts:
  name='gpt-4' temperature=0.7 max_tokens=1000: used 3 times
  name='gpt-4' temperature=0.9 max_tokens=1000: used 1 times
  name='claude' temperature=0.7 max_tokens=1000: used 1 times


## 6. Integration with to_list(unique=True)

HashableModel works seamlessly with `to_list(unique=True)` for deduplicating structured LLM outputs.

In [7]:
class Task(HashableModel):
    """Structured task output from LLM."""

    title: str
    priority: str
    estimate_hours: int


# Simulate LLM returning duplicate tasks
llm_outputs = [
    Task(title="Write docs", priority="high", estimate_hours=4),
    Task(title="Fix bug", priority="medium", estimate_hours=2),
    Task(title="Write docs", priority="high", estimate_hours=4),  # Duplicate
    Task(title="Add tests", priority="high", estimate_hours=3),
    Task(title="Fix bug", priority="medium", estimate_hours=2),  # Duplicate
]

print(f"Raw LLM outputs: {len(llm_outputs)} tasks")
for task in llm_outputs:
    print(f"  - {task.title} ({task.priority}, {task.estimate_hours}h)")

# Deduplicate with to_list
unique_tasks = to_list(llm_outputs, flatten=True, unique=True)
print(f"\nDedup with to_list(flatten=True, unique=True): {len(unique_tasks)} tasks")
for task in unique_tasks:
    print(f"  - {task.title} ({task.priority}, {task.estimate_hours}h)")

Raw LLM outputs: 5 tasks
  - Write docs (high, 4h)
  - Fix bug (medium, 2h)
  - Write docs (high, 4h)
  - Add tests (high, 3h)
  - Fix bug (medium, 2h)

Dedup with to_list(flatten=True, unique=True): 3 tasks
  - Write docs (high, 4h)
  - Fix bug (medium, 2h)
  - Add tests (high, 3h)


## 7. Serialization Modes

Two modes: `python` (native types) and `json` (JSON-safe strings).

In [8]:
config = Config(name="gpt-4", temperature=0.7, max_tokens=1500)

# Python mode - native types
python_dict = config.to_dict(mode="python")
print("Python mode:")
print(f"  {python_dict}")
print(f"  Types: {[(k, type(v).__name__) for k, v in python_dict.items()]}")

# JSON mode - JSON-safe types
json_dict = config.to_dict(mode="json")
print("\nJSON mode:")
print(f"  {json_dict}")
print(f"  Types: {[(k, type(v).__name__) for k, v in json_dict.items()]}")

# to_json - deterministic JSON string (sorted keys for stable hashing)
json_str = config.to_json()
print(f"\nJSON string: {json_str}")
print(f"  Type: {type(json_str).__name__}")

Python mode:
  {'name': 'gpt-4', 'temperature': 0.7, 'max_tokens': 1500}
  Types: [('name', 'str'), ('temperature', 'float'), ('max_tokens', 'int')]

JSON mode:
  {'max_tokens': 1500, 'name': 'gpt-4', 'temperature': 0.7}
  Types: [('max_tokens', 'int'), ('name', 'str'), ('temperature', 'float')]

JSON string: {"max_tokens":1500,"name":"gpt-4","temperature":0.7}
  Type: str


## 8. Roundtrip Fidelity

Serialization and deserialization preserve values across both modes.

In [9]:
original = Config(name="gpt-4", temperature=0.85, max_tokens=2000)

# Python mode roundtrip
python_data = original.to_dict(mode="python")
python_restored = Config.from_dict(python_data, mode="python")
print("Python roundtrip:")
print(f"  Original:  {original}")
print(f"  Restored:  {python_restored}")
print(f"  Equal: {original == python_restored}")
print(f"  Same hash: {hash(original) == hash(python_restored)}")

# JSON mode roundtrip
json_data = original.to_dict(mode="json")
json_restored = Config.from_dict(json_data, mode="json")
print("\nJSON roundtrip:")
print(f"  Original:  {original}")
print(f"  Restored:  {json_restored}")
print(f"  Equal: {original == json_restored}")
print(f"  Same hash: {hash(original) == hash(json_restored)}")

# from_json convenience method
json_str = original.to_json()
json_method_restored = Config.from_json(json_str)
print("\nfrom_json roundtrip:")
print(f"  Equal: {original == json_method_restored}")
print(f"  Same hash: {hash(original) == hash(json_method_restored)}")

Python roundtrip:
  Original:  name='gpt-4' temperature=0.85 max_tokens=2000
  Restored:  name='gpt-4' temperature=0.85 max_tokens=2000
  Equal: True
  Same hash: True

JSON roundtrip:
  Original:  name='gpt-4' temperature=0.85 max_tokens=2000
  Restored:  name='gpt-4' temperature=0.85 max_tokens=2000
  Equal: True
  Same hash: True

from_json roundtrip:
  Equal: True
  Same hash: True


## 9. Hash Stability and Determinism

Hash computation uses sorted JSON serialization for deterministic hashing.

In [10]:
# Create same config multiple times
configs = [Config(name="gpt-4", temperature=0.7, max_tokens=1000) for _ in range(5)]

# All should have identical hashes
hashes = [hash(cfg) for cfg in configs]
print("Hash values:")
for i, h in enumerate(hashes):
    print(f"  config{i}: {h}")

print(f"\n✓ All hashes identical: {len(set(hashes)) == 1}")

# JSON determinism
json_strings = [cfg.to_json() for cfg in configs]
print(f"✓ All JSON strings identical: {len(set(json_strings)) == 1}")
print(f"\nJSON: {json_strings[0]}")

Hash values:
  config0: 3753372333651053024
  config1: 3753372333651053024
  config2: 3753372333651053024
  config3: 3753372333651053024
  config4: 3753372333651053024

✓ All hashes identical: True
✓ All JSON strings identical: True

JSON: {"max_tokens":1000,"name":"gpt-4","temperature":0.7}


## 10. When to Use HashableModel vs Element

**Use HashableModel when:**
- Value equality matters (identical configs should be same)
- Caching based on configuration
- Deduplicating structured LLM outputs
- Immutable configuration objects
- Set operations based on content

**Use Element when:**
- Identity matters (workflow entities)
- Entities mutate over time (ID remains stable)
- Tracking state changes
- Building workflows/graphs

In [11]:
# HashableModel: Value equality for configs
class APIConfig(HashableModel):
    endpoint: str
    timeout: int
    retries: int


cfg1 = APIConfig(endpoint="/api/v1", timeout=30, retries=3)
cfg2 = APIConfig(endpoint="/api/v1", timeout=30, retries=3)
print("=== HashableModel (Value Equality) ===")
print(f"Same values → same object: {cfg1 == cfg2}")
print(f"Can deduplicate in sets: {len({cfg1, cfg2})} unique")

# Element: Identity-based for workflow entities
from lionherd_core.base import Node

task1 = Node(content={"task": "Write docs"})
task2 = Node(content={"task": "Write docs"})  # Same content, different entity
print("\n=== Element (Identity-Based) ===")
print(f"Same content → different entities: {task1 == task2}")
print(f"Both tracked separately: {len({task1, task2})} unique")
print("\nTask 1 can evolve:")
task1.metadata["status"] = "complete"
print(f"  task1.metadata: {task1.metadata}")
print(f"  task2.metadata: {task2.metadata}")
print(f"  Still same ID-based identity: {task1.id == task1.id}")

=== HashableModel (Value Equality) ===
Same values → same object: True
Can deduplicate in sets: 1 unique

=== Element (Identity-Based) ===
Same content → different entities: False
Both tracked separately: 2 unique

Task 1 can evolve:
  task1.metadata: {'status': 'complete'}
  task2.metadata: {}
  Still same ID-based identity: True


## Summary Checklist

**HashableModel Essentials:**
- ✅ Content-based equality (identical fields → same hash)
- ✅ Immutable by default (frozen to prevent hash corruption)
- ✅ Safe for sets, dicts, and `to_list(unique=True)`
- ✅ Perfect for cache keys (identical configs → same hash)
- ✅ Deterministic JSON serialization (sorted keys)
- ✅ Two serialization modes: `python` and `json`
- ✅ Lossless roundtrip through serialization

**Key Difference from Element:**
- HashableModel: Value equality (same fields = equal)
- Element: Identity equality (same ID = equal)

**Use Cases:**
- Configuration objects for caching
- Structured LLM output deduplication
- Set operations based on content
- Immutable data structures

**Next Steps:**
- See `Element` for identity-based entities
- See `Node` for mutable workflow entities
- See `to_list` for collection utilities